InfoconDB

Presented at A New HOPE (2022), July 22, 2022, 2 p.m. (50 minutes).

This talk will examine digitizing books at scale and some interesting technology tidbits as to how an operation like this actually works. For example: why is the page-turning not automated? What are the building blocks of such a system? What were some of the most significant (and unexpected) issues along the way of scaling this system to digitize over one million books a year on the Internet Archive books digitization platform?

Why do this in the first place, one may ask? In short, because accessibility drives preservation and, for an increasing amount of use cases, if a book is not easily accessible online, it might as well not exist. Moreover, digital artifacts have specular properties to the physical ones in that they are easy to distribute (and easy to censor!), which means that once the expensive task of creating one is done, the problem is only one of access control. There is a lively policy discussion about what these access controls can and should be, but the argument here is that not only is it important that we invest in creating the digital artifacts, but also that these are maintained by some type of lender of last resort.

This talk will discuss how people can make digital libraries part of their lives, and how these libraries can improve those lives. There is often a misunderstanding of digital books being an alternative to physical ones. In fact, they are a complement, working together to give us better knowledge. Digital books allow us to do things like full text search, direct linking, and can support digital media embedding. This talk will also include a discussion on a few of these use cases, as well as examples of tools that are available to enrich one's reading and learning experience.

Presenters:

Davide Semenzin
**Davide Semenzin (@lluvt)** develops and maintains the Internet Archive books digitization systems and scan centers, from the book scanning software to the backend services, scan center deployment, and automation. He studied computer science in Italy and got his MSc from Utrecht University in 2013. He co-founded and ran a cloud computing business in roles ranging from services and network engineering to dev ops and orchestration. An avid Borges reader, Davide has been interested in libraries long before joining the Archive: notably, he's been involved in digital humanities and digital libraries through the Berkeley Prosopography Services project at UC Berkeley since 2012 as a core developer. When his free time doesn't look like work (computers and books), he likes to fly airplanes, fly on airplanes, and play with lasers.

Why Building Digital Libraries Matters

Presenters:

Links:

Similar Presentations: