I Take Exception to This

Power Searching With Google.

Google has a course that will make you a “power searcher.” Or so they say. I do not take exception to this because I feel like Google is moving in on my library territory, or because I believe technology is the root of all evil.

I take exception to this course because it reinforces the fallacy that all information can be found online, and that all online sources are readily accessible via Google.

If this comes as a great shock, put down that Kool-aid, and we’ll talk.

First and foremost, not all information is digitized. The United States alone is full to brimming of archives that are absolutely jam-packed with letters, photographs, manuscripts, newspapers, and government documents that have never been digitized. They exist only on paper. Similarly, the finding aids, the lists that tell us what exists in all those paper collections, are frequently not digitized either. So there’s no way of knowing that that un-digitized thing exists by looking on Google, either.

Libraries have gone a long way towards ensuring that their records of what books they own are available online. Just check out WorldCat. That is a federated catalog, meaning a catalog that collects littler catalogs and mushes them all together. Consequently, when you search WorldCat, you are searching a big chunk of the world’s catalogs. But only if the catalog records have been digitized, and not on Google, on WorldCat.

Being able to find that a book exists is lovely, but that is not the same as actually finding the contents of the book. But if the book you want was published after 1923, it’s still under copyright (Thank you, DMCA), so it is actually not legal to show you the entire book online, for free. Google Books and Amazon might digitize snippets to show you, if they feel it’s worth their time and effort to do so, but then again, they might not.

But you’re talking about online articles. Every undergraduate knows that there are mountains and mountains of them available online. Well yes, mountains and mountains of scholarly articles have been digitized. And if you do a Google search for them, you’ll probably find them. But you can’t access them via Google. You would have to be an affiliate at a college or university that subscribed to the database that offers access to that content, and to reach that content, you would have to search using the database’s proprietary search interface. Not that searching using the proprietary interface would be that bad: most of them are extremely powerful and tailored to find articles more efficiently with less effort than one would expend tailoring a Google search.

So I don’t take exception to Google offering us a way to get better at using their search interface correctly. But by suggesting that all information, or even a record of all information, can be found online, via the Google search interface, they are doing the world a grave disservice.

Banned Books 2012

The American Library Association hosts many “weeks,” but Banned Books week is my favorite.  I am vehemently against book-banning, and welcome every opportunity to increase awareness of the practice and of the titles that are challenged or banned.

I do not believe that everyone needs to like or agree with every book that’s written.  There are many books I find offensive, but here’s the thing: I leave it at that.  Sometimes, I will read things I expect to disagree with, just to see if I really do disagree.  Sometimes I change my mind, and other times I come away from the reading with a better understanding of my beliefs and why I disagree with what I just read.  This experience is both valuable and necessary: if I cannot defend what I believe to be right and good, how do I know it is right and good?

To celebrate my right to read whatever I please, darnit, I read a banned book every year.  Last year, it was Matilda, the year before that it was To Kill a Mockingbird.  This year, it’s The Great Gatsby.  Would you know, it is the most-challenged classic, according to the American Library Association’s Office of Intellectual Freedom.

I’m interested to hear others’ experiences with banned books.  Have you read a banned book, and did you know it was banned at the time? Would you try to prevent others from reading the book too?

Unit 13: End-of-Semester Reflection

After thirteen weeks of trials and travails, this class is winding down, and it is time for reflection.

For me, this class was not about new concepts.  I did learn how to complete small tasks, but understanding the big things–metadata, repositories, management, and virtual machines–happened in my earlier DigIn courses.  This course was where we finally took all those concepts and put them into practice.  While not as rigorous as creating a production repository for a museum or university, actually carrying out tasks like selecting a metadata schema and controlled vocabulary for a repository instead of theorizing on how I might do such a thing required me to think, plan, and deal with details in a way that I never had before.

The other thing I learned in this class was that the quality of a repository cannot be determined from a single perspective–end user, depositor, or site administrator–nor can quality be determined by a single attribute–number of schemata supported, ability to organize along organizational structure, kinds of customization available.  Every aspect of a repository, from how easy (or difficult) it is to install to how easy it is to migrate to a new repository.  Easy installation was important for me, but I also needed comprehensive and easily understandable software documentation.  My collection also needed a repository with easily modifiable input forms, an attractive display, and tools that would allow users to contribute metadata to the collection.  As I attempted to make my collection “fit” with various repositories, I learned that finding a package of mostly desirable attributes with only a few undesirable attributes was quite difficult.

I am definitely not done learning about repositories; I doubt that I will ever be done.  But looking back to where I was at the beginning of the semester, I am pleased to realize my knowledge of repositories has made a big leap from where it was before.

Unit 12: Benefits or Drawbacks of Pre-Installed Virtual Machines

This week winds up our work building and configuring virtual repositories, and we were asked to consider the possibility of creating collections in a pre-installed virtual machine instead of building the machine and the repository.  My own preference for learning is to do everything hands on.  My persistence (or stubbornness) when it comes to troubleshooting and my desire to know _why_ came in handy when building repositories from scratch,
but sometimes my desire to understand was held back by my middling computer skills.

These preferences aside, there are several reasons that I would argue that using a pre-installed virtual machine in order to allow students to focus on building collections would be less beneficial, not more beneficial, when the end goal is to give students a strong understanding of repositories.

First, students lose out on a more holistic understanding of how a repository is built when they work only with pre-installed virtual machines.  Assuming that most DigIn students come from a library background, a weaker understanding of the technical aspects of repositories would negatively impact the ability of the student to work with IT professionals in building repositories, which runs counter to the goals of the DigIn program.

Troubleshooting a VM that I had installed myself, and consequently had some rudimentary idea of where to look for issues, was difficult and time-consuming enough.  If troubleshooting a virtual machine with the repository software pre-installed can be more difficult, then the time savings–the stated purpose of using a virtual machine with pre-installed software–is lost.

Third, by installing and configuring the repository softwares myself, I learned that the repository and the collections it holds cannot be considered separately.  With each installation, I had to determine whether my custom metadata fields and controlled vocabularies could be added.  If some of the repositories had been production repositories, I would’ve had to add even more metadata, because some repositories did not allow users to add metadata.

Finally, the more opportunity students have to interact with repositories, the better equipped they are to critique repositories and make well-informed decisions about what repository is “best” for any given collection.

Unit 11: Repository Home Sites

Here we are in week 11, on to another repository software.  At this point, we were asked to review the homepages for the various repositories and other kinds of software we have used this semester.  There is a lot of variability in the appearance, organization, and content of the websites, as with the softwares themselves.  However, this correlation was not always exact: JHOVE was a relatively easy-to-use software, but its website made the software seem more daunting and difficult to use than it was.

Consequently, judging a repository software exclusively by its website is not an entirely accurate way to determine the quality and usability of the software.  The software’s website is the portal to software help and documentation, however, and as such, it should be welcoming, usable, and contain the content that the software’s users will need to answer their questions, be they new to repository technology or old hands experimenting with advanced features.


This website is well organized, with an appealing layout, and clearly explains the repository.  The help documentation is also well-organized and covers the main topics new users will likely need to know.

OAI PKP Harvester

Website has a clean layout and well-organized information, but users must navigate several levels down in the website to actually get at any of the documentation


The layout is decently clean, and content is reasonably well-organized.  The repository help is stored in a wiki, which normally I appreciate.  In this instance, however, the wiki’s appearance of helpfulness contrasts with the actual experience: the wiki is actually difficult to navigate and find information in.


The layout of the DSpace homepage is clean and appealing, and content on the website is logically organized.  Unfortunately, there is so much information on this website that it is very overwhelming to search, and there is a gap between the basic, high-level “getting started” material and the extremely technical content on the wiki.


The Drupal website has a tidy, bright, and well-organized homepage.  While the site contains tons of information, the basic information pages contain links to related advanced topics, facilitating navigation.  User comments on the content of pages is useful as well.


The homepage and entire website are clearly organized but text-heavy and aesthetically lacking.  Worse, the website does not make the JHOVE validator easy to understand.  Another turn-off for me was that the “Community” is a mailing list.  I really dislike mailing lists.

Unit 10: OAI Service Providers

Week 10 is here, and we are careening into OAI harvesting.  After turning our own virtual machines into little OAI service providers using the PKP application, we checked out other OAI service providers.

scirus for scientific information only (http://www.scirus.com/srsapp/)

I found this search site very well done.  The interface isn’t pretty, but it has a lot of functionality.  Users can go to the “About” page and see where the service gets its records from (the web, Medline, MIT OpenCourseWare, PubMed Central, among other sources).  Before beginning a search, users can also set preferences, which include the ability to display partner links.  I chose the American Museum of Natural History, and a link to their catalog displayed next to every journal article that the AMNH library owns.  Users can also sort results by relevance or date, and limit searches to journal articles, websites, etc.

In the results list, the record’s source appears underneath the title and summary.  I appreciated that I didn’t have to click away from the results list to find where a particular record was coming from.

Sheet Music Consortium (http://digital2.library.ucla.edu/sheetmusic/)

This service provider offers a beautiful interface and functionality too!  From search box on the the homepage, users can limit their search to names, subjects, place, publisher, and digitized sheet music.  The site also has an advanced search which, among other things, allows users to search for items from a specific collection.  I was quite taken with the “virtual collections” feature, which allows users to drag and drop records from search results into a little floating menu, and from there, e-mail the records or (if one logs in) save the virtual collection.  Virtual collections can be made public, so other users can visit the Consortium website and view the collections others have made.  This would be a very useful feature for music teachers to use.

OAIster (http://oaister.worldcat.org/)

OAIster is basically WorldCat’s interface: clean, with lots of options to focus the search (format, author, language, year, etc).  The results list does not show the institution that the record came from, but this information has its own section on the individual item records.  For me, the roomy, ordered results list makes up for the slight inconvenience caused by not having the contributing institution listed on the results list.

Unit 09: Cataloging and Metadata Consistency

My collection only addresses consistency in the controlled vocabularies.  My sweater style terms are drawn from the Getty’s AAT, and the names I use to describe synthetic fibers come from “A Quick Guide to Manufactured Fibers,” produced by the American Fiber Manufacturer’s Association.  However, the metadata fields I chose to use for my collection are not drawn directly from standard Dublin Core, and I am reminded of how much my collection’s metadata diverges from the “standard” every time I try to fit my collection to a new repository software.

Part of my intention in choosing the metadata fields that I did was to provide a core body of information that, in a production environment, would allow information professionals to track objects and identify them reliably without becoming cumbersome.  A collection of sweaters could be relevant to museums with fashion collections, students of fashion design, and knitters.  My hope would be that, in a production environment, users from these diverse backgrounds would contribute descriptive terms that would encompass the diverse terminology from each of these fields.  From my experience as a crafter, there is a certain amount of variability in the terms that are used to describe garment styles, and the rapid pace of the fashion world ensures that new styles with new names are constantly appearing.  With users contributing descriptive metadata, I think the metadata would better reflect the diversity of terminology used by the different fields that a collection of garments would be useful to.

Of course, tapping user knowledge does not mean that the librarians curating a collection like mine in a production environment would be able to step back from the collection after the core metadata was added: building a dedicated user group would require an investment of repository staff’s time and also funding for advertising.