Open Book Genome Project

What is the Open Book Genome Project?

The Open Book Genome Project is an open, community effort to enable deeper, more hollistic understanding of a book's unique characteristics.
Learn more

How it works

  1. The community colaboratively defines the Book Genome
  2. An open-source Sequencer is designed to extract a book's genome
  3. A Sequencer pipeline is run against a batch of books
  4. Book genome insights are made freely available to the public

Try it

Anyone can try sequencing an open access book using the OBGP Sequencer™ Google Colab Notebook. This interactive notebook runs directly within the browser, no installation required.

What's the point?

Once one has a book genome, there's a lot one can do. One can calculate the reading level of books. One can also extract book identifiers, perform Named-Entity Recognition (NER) of people, places, and subjects, identify themes, produce summaries, train language models (like word2vec) extract urls and citations, identify important pages, like glossaries and tables of contents, and even talk to books.

Contributing

Steering Committee

Mek <mek@archive.org> Internet Archive / OpenLibrary.org
Nolan Windham Student at Claremont McKenna College
Drini Cami Internet Archive / OpenLibrary.org