Open Book Genome Project

What is the Open Book Genome Project?

The Open Book Genome Project is an open, community effort to enable deeper, more hollistic understanding of a book's unique characteristics.
Learn more

How it works

The community colaboratively defines the Book Genome
An open-source Sequencer is designed to extract a book's genome
A Sequencer pipeline is run against a batch of books
Book genome insights are made freely available to the public

Try it

Anyone can try sequencing an open access book using the OBGP Sequencer™ Google Colab Notebook. This interactive notebook runs directly within the browser, no installation required.

What's the point?

Once one has a book genome, there's a lot one can do. One can calculate the reading level of books. One can also extract book identifiers, perform Named-Entity Recognition (NER) of people, places, and subjects, identify themes, produce summaries, train language models (like word2vec) extract urls and citations, identify important pages, like glossaries and tables of contents, and even talk to books.

Contributing

Contribute to our Sequencer library on Github
Request an invitation to join our community slack.

Steering Committee

Mek <mek@archive.org>	Internet Archive / OpenLibrary.org
Nolan Windham	Student at Claremont McKenna College
Drini Cami	Internet Archive / OpenLibrary.org