Digitizing old books

One of the oldest of printed books, New York Public Library’s Gutenberg Bible

Not everything is available for free on the Internet.

Once upon a time, the list of material not available for free on the Internet included almost every book ever published. The problem was that unless a book or other printed format was either currently in print or available in multiple libraries, it wasn’t conveniently available to much of anyone at all.

Now, many libraries and archives are digitizing their collections. Not only old books, but old pamphlets, sheet music, maps, manuscripts, etc. have become more accessible than ever before.

I have been writing a series of posts on Civil War music for my blog Musicology for Everyone. I have relied heavily on the Library of Congress’ collection of digitized Civil War sheet music. Before it went online, I and anyone else interested in viewing the collection would have had to travel to Washington.

Besides the efforts of large and small libraries and archives, at least two major digitization projects are underway.

Project Gutenberg

The oldest predates the Internet. Michael Hart, a student at the University of Illinois, obtained an account on the university’s mainframe computer with nearly unlimited time. To give back, he decided to make 10,000 or so heavily consulted books publicly available either free or very cheap by the end of the century. When he digitized his copy of the Declaration of Independence in 1971, Project Gutenberg was born.

The University of Illinois computer was an original part of what eventually became the Internet, but that was way in the future. Personal computers, if they existed at all, were made from kits by hobbyists. The first pre-assembled small computers (Apple II, PET 2001, TRS-80) all first appeared in 1977.

And yet Hart was convinced that the general public would one day have access to computers and wanted to make useful content available on them. The only available technology at the time was to type the content by hand, so Hart began to recruit volunteers to help him. Usable image scanners and optical character recognition software did not become available until 1986.

By this time, Project Gutenberg has far outstripped Hart’s original goal. It has digitized some 38,000 books. All of them are in public domain. That means most of them were originally published before 1923.

You can read Project Gutenberg books online or download them to any ebook reader. The books are free, but the project solicits donations.
[ad name=”Google Adsense 728×90″]

Google Books

Just a few of the old books at the British Library

In partnership with several large, mostly academic libraries, Google began to scan both public domain books and works still under copyright in December 2004. More libraries joined the project. Beginning in 2006, these began to include libraries in non-English speaking foreign countries.

By now, Google Books, has scanned more than 20,000,000 books and magazines. In August 2010 it made an inventory of all known extant books worldwide and determined there are just under 130 million. It plans to digitize all of them by 2020.

Google Books makes several views available, depending on the copyright status of the book. If a book is in the public domain, “full view” enables readers to download it for free.

If a publisher has given Google permission to scan a book under copyright, the “preview” gives readers limited access to portions of the book. If a publisher has denied Google permission to issue a “preview,” Google provides only “snippets,” just two or three lines of text. Google has also scanned numerous book for which it provides nothing but the title.

Problems with Google Books

The project, originally called the Google Print Library Project, immediately faced lawsuits from publishers who claimed copyright violation and failure to give adequate compensation to authors and publishers. Most of these suits were settled by 2008, but in 2009 a French court forced Google to stop digitizing copyrighted books in France. A suit by visual artists (including photographers and illustrators) has resulted in a federal judge rejecting the original settlement of American suits.

Unlike Project Gutenberg, Google does not have its scans proofread before putting them online. As a result, pages may be in the wrong order, scanned upside down, or simply unreadable. Google doesn’t make it easy for readers to report these and other problems, either.

Possible futures

Not everything is available for free on the Internet.

But Project Gutenberg and Google Books, among other projects, have already made a tremendous amount of content freely available. If Google succeeds in its stated intention to digitize every extant book in the entire world, a disproportionate amount of the world’s information will be freely available.

And not only on the Internet, by the way. You can download them to your ebook readers, too.

Of course, material under copyright and other proprietary material will continue to cost a lot of money if they’re available on the Internet at all. The latest US copyright law put off by 20 years the time when anything published in 1924 or later goes into the public domain.

If by the time that’s over publishers get the copyright term extended again, maybe nothing will enter public domain ever again. In that case, much of what Google has digitized will likely never be available at all.

[ad name=”Google Adsense 728×90″]
Photo credits.
Gutenberg Bible. Public domain, from Wikimedia
British Library stacks. Some rights reserved by Steve Cadman.


Digitizing old books — 2 Comments

  1. Dear David,
    Thank you for your article. I have a question. Why digitise the books? Why not simply scan them in as they are and let the readers read them as in the original books? Quicker, simpler and so on. Is there a reason that I am not spotting?
    Thank you and best wishes, Luke Wiseman

    • The Google Books project I described works by scanning the books to create digital files. I expect nearly every other comparable project also begins by scanning the book. Project Gutenberg started out by typing the text, but scanners didn’t exist yet.

      It’s possible to read these digital files on various electronic devices. I suppose if someone wanted to, they could print the file, too.

      Thanks for your question, and thanks for reading Reading, Writing, Research.

Leave a Reply to David Guion Cancel reply

Your email address will not be published. Required fields are marked *