This talk will examine digitizing books at scale and some interesting technology tidbits as to how an operation like this actually works. For example: why is the page-turning not automated? What are the building blocks of such a system? What were some of the most significant (and unexpected) issues along the way of scaling this system to digitize over one million books a year on the Internet Archive books digitization platform?
Why do this in the first place, one may ask? In short, because accessibility drives preservation and, for an increasing amount of use cases, if a book is not easily accessible online, it might as well not exist. Moreover, digital artifacts have specular properties to the physical ones in that they are easy to distribute (and easy to censor!), which means that once the expensive task of creating one is done, the problem is only one of access control. There is a lively policy discussion about what these access controls can and should be, but the argument here is that not only is it important that we invest in creating the digital artifacts, but also that these are maintained by some type of lender of last resort.
This talk will discuss how people can make digital libraries part of their lives, and how these libraries can improve those lives. There is often a misunderstanding of digital books being an alternative to physical ones. In fact, they are a complement, working together to give us better knowledge. Digital books allow us to do things like full text search, direct linking, and can support digital media embedding. This talk will also include a discussion on a few of these use cases, as well as examples of tools that are available to enrich one's reading and learning experience.