Recently I’ve been working on a fairly large archival cleanup project. This project involves overhauling the arrangement of a collection, updating physical processing to meet our internal standards, and improving metadata online to improve accessibility of the material. This type of cleanup work can be hugely time consuming and it often reminds me of diving down a rabbit hole.
Cleanup Scope Creep
Cleanup projects can be sneaky in their size. Often what looks like a quick fix of a few records and grow into a much larger. For example, you’re working on a research request and you find a few things in an online description that need updating. You you start to fix that and then you realize that a bunch of other things could be improved as well and before you know it the project has snowballed and grown in scope.
Sometimes you need to be realistic about what you can fix and what is worth cleaning up. It would be wonderful if all museum and archival collections were perfectly described and consistently followed description standards. But description work is time consuming, standards and procedures change, and people make mistakes. I’m sure every heritage professional has a least one collection they have personally processed that they look back on and wish they had processed it differently. Sometimes you have the ability to make those changes and sometimes you don’t.
I have a few general rules around taking on cleanup projects:
Make sure you understand the complete scope of the project. This might mean limiting yourself to only doing a certain level of cleanup initially.
Consider if the cleanup is really necessary. Is the collection accessible the way it is? How will accessibility be improved by the cleanup? Is this a collection that is frequently accessed by researchers? Does the time required to undertake the work match the level of accessibility improvement that it will be seen?
Keep an ongoing list of collections that need cleanup attention. This list can be used to allocate resources based on time and staffing levels. Having this ongoing list also makes me feel a lot better about the fact that things need fixing but we don’t simply have the time/staff to do so currently. Instead of simply ignoring the cleanup that needs to be done the list helps document the work and schedule it for later.
Determine what type of skills are needed to do the work. Is it something that needs to be done be a staff person or is it something a student/intern can manage?
Document how you are going to tackle the project – I often use spreadsheets to keep track of progress on larger projects, particularly if I know I’m going to be doing the work intermittently.
How institutions prioritize this cleanup type of work is going to vary widely. I’m lucky in that there is virtually no backlog at the archives where I work. This lack of backlog means we have substantially more time to spend on accessibility, digitization, and outreach projects. That being said accessioning new donations, educational programming, and reference take priority over cleanup – so even though there’s no backlog I often find cleanup projects being pushed to the back burner.
The Province of Ontario has announced that it in the process of making government data open by default. This is part of Ontario’s larger Open Government initiative that focuses on open data, open engagement, and open government more generally.
Since November 2012 the Ontario government has been publishing statistics in the open data catalogue. So far 170 data sets have been placed online. This includes statistics on marriage registrations, farmers markers, water wells, flu shot clinics, woodland caribou and a wide range of other interesting topics. The data already online is a huge boon to researchers and is available in a variety of formats depending on the type of data and the original collection method.
In addition to the open data calalogue Ontario has created a data inventory. Which describes more than 1,000 data sets. The inventory is designed to allow the public to vote on which data sets are the most popular as a means of prioritizing the order in which data sets are made accessible.
Hurray for access! January 1st was Public Domain Day. On that day, unpublished works by authors who died prior to 1942 became part of the public domain in the United States. This includes works by notable authors such as:
A more complete list can be seen here. Lists of authors whose works entered the public domain in 2011 or 2010 are also available.
However there are some conditions around the entrance of works into the public domain. This legislation only applies to works which have not been previously published and which were not made during parts of employment. Separate copyright legislation applies to those works.
Check out my latest post at the ActiveHistory.ca site. The post talks about options for cultural heritage organizations looking to share photograph collections online through free or low coast image hosting and image sharing sites.
Recently, while at a friend’s house I picked up a local history book that was sitting on their coffee table. The book focused on the history of Espanola Ontario that was written by a local history enthusiast. In the introduction of the book, the author stated that he had not made an effort to record any sources; however if readers were curious they could contact him and he might be able to point them to where his information came from. Instantly, the academic historian in me cringed and I began to lament the state of local history writing.
However, upon later reflection I began to think about the larger question of citations in popular publishing, local history works, and public history writing. Footnotes or endnotes are standard practice in academic writing. But, they are rarely used in more popular publishing. In my mind good public history writing should find a way to cite information without being intrusive.
Digitally published information can include hyperlinks as a means of providing supplemental and source information without the formality of a footnote. Print publishing is faced with a slightly more arduous task of integrating sources into the flow of writing. Despite the many intrusive methods of citing information, good writers can seamlessly note where material derived from within the context of their writing. I think it is crucial that academic historians who desire to be accessible to a popular audience consider how to maintain historical credibility while appealing to the reading sensibilities of the public at large.
The Canadian history magazine The Beaverrecently announced that the publication is changing its name. This name change is based in the desire to be more accessible to the online community. Currently the magazine’s title is often caught in spam filters due to the title’s possible sexual connotation. This name change is just one of the many examples of the importance of naming with machines in mind.
One of my personal favorite examples of names in a machine readable world is the band Live. Googling ‘live’ or ‘live+band’ in an attempt to find more than a Wikipedia entry on the band is an exercise in futility. On the other hand, such a common name makes it difficult to download the bands music. Depending which side you’re on, the inability to easily download their music could be considered a great thing or a horrible thing.
Both of these examples highlight the importance of naming schemes being machine readable. Names can no longer merely be catchy, they need to also be searchable. I’m just waiting until children’s names are picked with machines in mind….
Fotopedia is a collaborative open source photo encyclopedia. The site is an interesting blend of the knowledge of Wikipedia combined with the expansive array of image of flickr. The emphasis is more on the side of the photos, however each collection of photos is accompanied by a brief encyclopedia article. The number of photos and quality of photos for each entry is all dependent on what has been uploaded.
One the more valuable features of Fotopedia is that the site is easily searchable by categories. These categories allow users who are interested in a particular type of photo to easily find the images they desire. The category feature can be particular useful for anyone researching a specific topic. The site is also keyword searchable. However, the keyword search results are not always as neatly organized as the rest of the site.
Fotopedia also hosts a “Fotopedia Community” designed to allow interaction between users. This social media feature allows photos to be commented on, voted on, highlights best contributors, and a variety of other interactive features. This site has great potential for sharing photos, geocaching, and providing context to photos that may otherwise be merely a picture.
I have been in favour of the Google Books project for some time, mainly because the project allows for greater accessibility of scholarship. This past week Google announced a new facet to Google Books. Now, more than 2 million books, which are currently featured on Google Books, can be turned into “instant paperbacks.”
Google has signed an agreement with On Demand Books, the owner of The Espresso Book Machine. The Espresso Book Machine (EBM) can print and bind a book in the same amount of time it takes to brew an espresso. Espresso book machines are currently located in bookstores in the US, Australia, Britain, Egypt and Canada. The Canadian EBMs are currently only a few in University bookstores. This is great for the impoverished student, but somewhat limits the audience which the EBM currently reaches.
This agreement allows for one of the complaints of many Google Books users to be addressed: many people simply do not enjoy reading a 300 page book online. A retail price has not been set for these instant paperbacks, but estimates have been around the eight dollar mark. Overall it sounds like a cost effective way to make public domain books available. That being said, various governments, privacy groups, Amazon and Microsoft have already filed objections to this new agreement.
My love for Google Books has been grown once again. Earlier this month Google released new features for Google Books. A variety of features were released including; the ability to embed books or book previews in html, better searching within book text, page turn feature, and an improved book overview page.
For historians the improved searching within book text is one of the most valuable new features. Search results now appear with context surrounding the searched word, and can be clicked on directly to easily examine relevant content. This is a huge improvement and has the potential to help researchers easily locate relevant information. The ability to embed books in blogs, or websites with a simple html line is also valuable. It allows users who not overly web savvy to easily share pages of works, which has the possibility to enhance interactivity and accessibility.
Lists of what is most popular, and the most popular searches conducted aren’t anything new. However, Google has expanded on people’s interests in trends and created Google Trends. This search feature allows users to search anything their heart desires, and receive a chart which highlights current and past trends on the topic.
This feature is also closely related to Google’s move to make searching public data such as population more accessible. Currently, if you go to Google.com and type in [unemployment rate] or [population] followed by a place in the U.S, you will see the most recent estimates and an interactive chart. The information used for these charts and is from the U.S Census Bureau’s Population Division. Most importantly this is a huge step towards making census information far more searchable and accessible to the general public. This newly organized data has the potential to be a valuable to historians attempting to gauge population changes, the movement of people, employment, and numerous other facets of history.