Wednesday, July 05, 2006

Keyword v. Controlled Vocabulary - Why Google is NOT the answer

I've been composing this post in my head for several weeks now. It was partly inspired by the discussions in ALA Executive Board conference calls [for some reason, only the agenda is here] about the Library of Congress announcing its discontinuance of series authority work. The current issue (on my desk) of Library Journal (6/15) quotes from the EB statement: "Keyword search is not an adequate substitute for authority-controlled series access, especially over time as variants and name changes proliferate and as errors enter even the best databases."

Later in that same issue of LJ is an article about e-resources. One line there caught my attention:

One day she came over to show me a wonderful new online tool she'd discovered that let her search through thousands of scholarly articles and print out the full text. She was referring to Google Scholar. "Isn't this great?" she asked happily, as the titles of thousands of articles scrolled across her screen. I pointed out that almost none of the pages she'd retrieved actually provided the full text for free, that she couldn't search by subject terms or in the article abstracts, and that she could search by author but not sort articles by author or date. She was undeterred: "But this covers so many sources! Where else could I find this much in one place?" she exclaimed. I showed her the hundreds of online sources available at the Yale library web site, including an African American newspapers database and historical databases for national newspapers. She had never seen or used any of these before.

How these two items resonate with me! I will be the first to admit that I was part of the EB letter. However, it was catalogers on the Exec Board who pushed the issue. [That would be Janet Swan Hill and Michael Gorman.] But the rest of us certainly concurred.

Even in the days of the text web, I remember my frustrations with the search engines. Remember Alta Vista?

I learned to search in the days when time was literally money. With the old TI-745 (earmuffs, and thermal paper) searching the New York Times cost something like $300/hour PLUS the connect time charges of up to $1/minute. When you were receiving at 120 baud (and most of us can read faster than that) creating a search structure, and planning out your search was critical. Searching, and revising on the fly, just did not make it. It was great when, by the mid-1980s you could type in your search to the PC, and run it as a script, capturing the results as it streamed back at first 1200 baud, the 3600 baud, and finally at 9600 baud. It was cool, search costs came down.

With a thesaurus and controlled vocabulary, searching becomes very efficient, even when you are not searching for a known item. I guess that is what frustrates me most when using Google. or Ask (I used to use Teoma, now part of Ask). While the results are "ranked for relevancy" when I am searching for a known item, I often do not find it in the first couple of pages. Theoretically the items are de-duplicated, but that has not been my experience.

I admit, I don't work a reference desk any more, and have not in about a decade. I think I would quickly become frustrated. Perhaps it is time for me to hit the floor again and find out how to do reference in the 21st Century.....or not.

1 comment:

  1. Very nice we blog and useful! I feel i will come back one day !

    PIC Bonus Singapore