Search engines, online library catalogs: how they work

Most people begin to search for information using Google or other search engines. They turn to library catalogs later, if at all. When they get to the catalog, they have trouble using it if they expect it to work anything like Google.

Some library and information technology professionals have drawn entirely wrong conclusions from that fact. One faction says it demonstrates that online library catalogs are obsolete, that the software system that runs them is old fashioned and difficult to program in, and therefore that we need to abandon the catalog. Another declares that people would use the catalog more if the user interface were more familiar, and that catalogs ought to have the look and feel of Google.

Meanwhile, library patrons have needed help learning to use the catalog since long before the days of online catalogs–even before the invention of the card catalog. Google seems much easier and more intuitive only because it is a completely different way to search for information. Google is a search engine. The library catalog is a database. Actually, librarians spend as much time explaining how a search engine works as they do explaining the catalog.

Search engines

When a search engine user fills in the search box, it looks in an index it has made of the full content of an unimaginably large number of web pages. It makes the index by “crawling” through all those pages to discover, for example, how many times particular words or phrases occur. If the terms typed in to the search box occur a lot in a particular article, the search engine will include it in the results. Then it tries to put the results in order so that the most helpful pages turn up on the first or second page of results.

If the words in the search box occur in the title or headings of a page, or if they are in italics or bold in the text, that page will rank higher. The search engine also counts the number of links from other sites to that page. More inbound links means a higher page rank.

These are only two parts of a search algorithm that Google and other search engines keep secret and change regularly. In principal, the pages with the highest rank ought to be the most important results in all the possibly thousands of pages in the entire list. Are they?

I have had to learn a lot about how a search engine works since I started blogging. Everyone who writes online content wants people to read it. To read it, they must find it. Writers who know about search engines know that they have to choose the right words, so they find out what terms people search on and look at statistics on them. When they choose a term, they carefully put it in the title and use it over and over in the body of the article.

Notice that I have “search engine” frequently in this article. It’s also in the title. Almost 2 million people in the US  search for that term every month. I notice that more than a million people mistype it and search for serach engine or searcg engine instead.

Hmm. I wonder if using those two misspellings some more will help boost this post: serach engine, searcg engine, serach engine,  searcg engine. (That’s called keyword stuffing. If you ever wonder why you get results that use a keyword so often the article is totally unreadable and useless, that could be why. Fortunately the search engines catch on pretty quickly. A real turkey won’t be at the top of the results for long.)

Then the authors write other articles, visit forums, comment on blogs, sign up for lots of bookmarking sites, and use social media like FaceBook and Twitter, all for the sake of  linking back to the original article. In other words, whatever pages appear on the first page of search engine results are there because of hard work that someone has done quite beyond researching and writing in the first place.

Someone who  doesn’t know all of these techniques might well write a better, more useful article than the ones that appear on the first page, but without playing the rest of the game, it might wind up on the tenth page of the search results. Hardly any searchers routinely look past the first two. So here’s a hint: if you want to make sure that you have found all the most useful results–as defined by you and not a computer algorithm–search at least 15 or 20 pages!

Library catalogs

If finding really good information with a search is not as easy as it appears at first glance, neither is an online library catalog as difficult. When you look at a catalog record, you will probably see a labeled list, and if you pay too much attention to the labels you might find some of them a little odd.

But beneath the hood, the computer has divided the entire record into separate indexes and looks at each one a little differently. Instead of reading the full text of a book or other library item, the library catalog looks at all these indexes of metadata, that is, data about data.

There is an author index, in which each different author’s name always appears in the same form. It must be different from that of any other author’s form–even if the name is as common as Jack Smith. There is a subject index, in which each subject is drawn from an official list of terms. There is a title index, where, again, there is one preferred form of the title no matter how many different ways it might appear in various publications or lists. Other indexes include a keyword index, which searches all of these fields and more.

Plenty of earlier posts in this blog deal with how to search an online library catalog. It’s past time for me to update them and put up new versions, just as many popular magazines run some story ideas at least once every year. For now, let me just say that since the keyword index in a library catalog is not searching full text, as a search engine does, don’t be dismayed if you don’t get good results with the single search box! If only one item looks like what you want, you can use it to find lots more.

Look for an option to use “advanced search” or similar term. Advanced search is almost always easier to use than basic search in any database. That’s where you get to decide what index to search. You won’t be able to guess at the special vocabulary used in the online library catalog, so you ought to start with some kind of keyword search. But the first time you find something that looks useful, all of the official terms will appear as hot links. Follow them, and you’re well on your way.

Photo credit: Search engine marketing. Some rights reserved by Danard Vincente


Leave a Reply

Your email address will not be published. Required fields are marked *