Technology

Kadaxis incorporates several machine learning algorithms and sophisticated data capture and processing systems to derive insights and metadata from a manuscript. Tens of thousands of books have been processed and analyzed, from multiple sources, including The Gutenberg Project (see our classification of Gutenberg books) and self-published books offered free. This raw book content is then combined with book metadata, bestseller lists, and reader sentiment from several online sources and APIs. After parsing, cleaning and sorting this huge dataset, hundreds of machine learning experiments were conducted, to create models that are able to read a new book, and make a recommendation on the best keywords to use, determine which BISAC categories it belongs to, find other books similar to it, know locations the book is set in, who the characters are and more.

Stack

Kadaxis uses the following technologies:

  • Scala programming language
  • Mongo database
  • Hadoop
  • Machine learning libraries: weka, standford NLP, mahout

Technology Showcase

Gutenberg Library Organized By Location and BISAC Category

We've classified thousands of books from the Gutenberg Project, into potential BISAC categories, and extracted locations and character names from each book.

View Kadaxis' Gutenberg library index.

Find Out A Manuscript's BISAC Code

In seconds, using our advanced classification algorithms.

Major publishing houses use experts to assign BISAC codes to works for maximum marketability (the codes impact placement on shelves in bookstores, online placement and discoverability in search engines). The assignment of codes to works is a crucial element of book publication, but since over 3000 codes exist, it is highly challenging for a first time author, or any non-expert to understand the best category for a book.

BISAC Classifier has (machine) learned this expertise from publicly available data (initially focusing only on fiction/juvenile fiction genres), and can predict the most appropriate BISAC code, providing self-published and unsigned authors with access to the same expertise as industry pros.

Please reach out to us if you'd like to trial this service.