Controlled vocabulary: the key to using a library catalog

Search engines and online library catalogs differ fundamentally. The catalog is a database–a very special kind of database where all names, all subjects, and even some titles have been selected from a controlled vocabulary, or list of authorized headings. You can search a catalog using key words, but that’s not the only way to search, as it is with a search engine.

Have you ever used a search engine and wondered what on earth the results have to do with what you were looking for? I remember very well the day (some time before Google) I was cataloging a book and had a question about it. Whatever keywords I thought of, my search returned only four results. I was afraid to look at any of them. They appeared to be porn sites.

I don’t remember what search engine I was using, but I’m pretty sure Google eventually put it out of business. When I use Google Images, I still have to wonder what most of the pictures have to do with my search term, but at least I nearly always get something plausibly related to my terms otherwise. I have become a lot more sophisticated in choosing keywords to search, but even my most successful searches turn up many more results than I can ever look at, and there’s nearly always something completely irrelevant on the first page of results. A search engine doesn’t give the option of using a controlled vocabulary.

So what is a controlled vocabulary? It is the concept that every name (personal, corporate, legal jurisdiction, or anything else capable of authorship), every subject heading, and the title of every book or other work must be expressed in one and only one way throughout the catalog. In the US, every library uses the same controlled vocabulary, which is devised and controlled by the Library of Congress.
Library of Congress Name Authority File

screen from the Library of Congress Name Authority File

I have written a special article on finding names in a catalog. Basically, however many Frank Smiths there are, each one of them ought to have his own unique identifier:

  • Smith, Frank
  • Smith, Frank, 1847-1892
  • Smith, Frank, 1859-1921
  • Smith, Frank R.
  • Smith, Frank R., 1944-
  • Smith, Frank R (Robert)
  • Smith, Frank Robert
  • Smith, Frank Robert, June 16, 1948
  • . . . and so on

Those are not intended to be real, authorized headings. I made them up. All of the real headings have their own separate records in the Library of Congress Name Authority File. In the case of Frank Smith, there are screen after screen of records. Each record contains unauthorized forms that might turn up on various title pages, etc. That way, if anyone uses them in a search, the catalog will return the authorized heading. In practice, it is not always possible to distinguish every single Frank Smith, so there are actually records that combines all of the undifferentiated Frank Smiths with their varying middle names or initials. After all, someone might eventually be able to distinguish one of them and give him his own record.

When someone’s name changes, the authorized heading changes, and the former heading becomes one of the unauthorized forms in the record. The same thing happens when countries change their names. What was called Belgian Congo became Congo at independence. Long-time dictator Mobutu changed it to Zaire, and his successor changed it back to Congo–unfortunately. I say unfortunately, because there is another country called Congo just across the river. The Library of Congress Name Authority File has sorted it all out, with proper dates and documentation.

Another kind of complication arises whenever a work is known by more than one title. That happens any time something is translated into another language. It also happens any time something, a musical composition perhaps, has a generic title like “sonata” or “symphony.” If three people translate one Russian novel into English, each translation is likely to have a different English title. I don’t even want to think about how many ways Beethoven’s “Moonlight” Sonata can be expressed in English alone. The word “moonlight” doesn’t necessarily appear at all!

The preferred title for every work is in the name authority file and not a title authority file simply because more often than not, works are associated with a particular author, and therefore follow the author’s authorized heading.

There is a separate Library of Congress Subject Authority File. Most academic libraries use it. Most public libraries use something else. The National Library of Medicing and the National Agricultural Library also have their own subject vocabulary. Children’s libraries use a different subject file than any of the adult headings. The English language is rich in synonyms, and every controlled vocabulary uses one and only one term. Where most subject files use “Cancer,” the National Library of Medicine uses “Neoplasms.”

So how on earth is an ordinary person supposed to use controlled vocabulary. There seems to be no way to figure it out! Fortunately, there is an easy way.

  1. Start with a keyword search. If you know an author’s name and a word in the title, that’s good. Unlike Google, though, the catalog probably requires you to put “and” between the terms. As for a title word, the less common the better.
  2. Select something from the resulting screen–assuming, of course, that the results contain the right kind of thing. Otherwise, start over.
  3. When you see the resulting record, you will notice that the names and subjects are hot links. That’s the controlled vocabulary for the library you’re in.
  4. Click on any link to find everything the library has that’s associated with that name or subject.

Some names will not be hot links if no authority record has been written yet. Authorized name/title entries may or may not be links, depending on the capabilities of the library’s software. Subject headings present a special opportunity. Many of them have at least one subheading, and maybe as many as four. In well designed catalog software, the main heading and all of the subheadings should be underlined separately. If you click on the main heading, you will get a list of all the subheadings along with it. If you are only interested in the last subheading, click on that; you won’t get any of the stuff you don’t want. If there are multiple subheadings, the farther to the right you click, the narrower your results will be. As I say, that only works for well-designed software, and unfortunately, there are plenty of dreadful catalogs.

A record for printed music in WorldCat

Using that four step search strategy, you will find everything that your library has related to the hot link you select. (By the way, the call number will also be a hot link. You can browse the “shelves” from the computer.) Or, you can use WorldCat to find out what’s available all over the world. Whether you are using a small public library, a university library, or WorldCat, the catalogs all work on the same principles of controlled vocabulary even if they use different subject lists. The only difference is that you will have greater need to narrow your searches in WorldCat.

Take a look at the screen shot of the Newberry Libary’s catalog. The tab for “quick search” is open, and there is also a tab for “advanced search”. It is unusual to have so many choices for a quick search. Too often, a quick search screen will have only a single search box, based on the misguided notion that making the catalog look more like a search engine will make it easier to use. Except for the initial keyword search, you’ll be better off with whatever option the catalog provides for an advanced search, guided search, or whatever it may be called. But your first task is always to find the controlled vocabulary. Once you learn to use that, it gives you much more precise results than any search engine can, as well as manageable numbers of hits.
