Have you ever wondered if the genres authors most enjoy writing in, match the genres readers most enjoy reading? Before self-publishing, all new books for sale were filtered by agents and publishers, who acquired and worked on books they thought would sell well. If there was an oversupply of manuscripts by authors in a particular genre, the competition to be chosen and published within the genre, would be higher too. Enter self-publishing: now any writer can publish, without filter, into any genre they desire. Given the influx of new books across genres, does the proportion of books in each genre meet with readers' demand?
(We focussed on fiction for this experiment).
Methodology (or How to Speed Read 3000 books in 3 hours)
To answer our question, we needed a way to read and understand a good sized sample of self-published books, to determine their genre. You might ask why we couldn't simply use the categories or tags authors themselves apply to their books? The reason is accuracy and consistency - most indie authors don't have years of book categorization experience, working across a number of titles. Even traditionally published books are categorized inconsistently from book to book and from publisher to publisher. The inconsistency is not because publishers are poor at the job, but because standardizing the process would require centralizing the categorization effort. (We've worked with data feeds from all major publishers and have experienced this phenomenon first hand). The only way to derive accurate and consistent categorization is to read a large sample of books, understand how each book relates to each category, and assign it, while ensuring consistency across the sample. One of our systems does just this.
We gave our categorization system over 3000 self-published novels to read and understand (these were books offered free by the author). For each book, our system identified all the topics the novels were about, then used this topical knowledge to assign each book to one or more categories and genres. Overall, our system read over 260 million words and figured out all the genres, categories and topics in the data below, in a few hours.
What writers write
The top genres (by count) detected by our system were Romance, Fantasy and Science Fiction.
Romance was the most popular genre, with 24.4% of books tagged. By combining Science Fiction and Fantasy though, to derive a total score of 32.1%, we can deduce that writers enjoy writing in this genre more than any other. Literary and Mystery & Detective both came in around 6%. How does this compare to what readers read?
To understand the genres readers enjoy reading the most, we looked at revenue data. This doesn't incorporate units purchased or read, or ratings, but in aggregate, revenue is a good proxy indicator for reader enjoyment.
The highest selling fiction genres were Romance/Erotica, Crime/Mystery and Science Fiction and Fantasy. Romance was high in both charts, but we can broadly extrapolate that there’s a potentially underserved market for Crime and Mystery & Detective and an oversupply for Science Fiction and Fantasy books (when combining the two genres in our first chart).
The correlation isn’t perfect of course, as our sample size is small, we're not considering units sold vs. price, and the revenue data is based on the less consistent human classification of books. We also assume the novelists in our sample wrote their books for the joy of it, and didn’t select their categories purely for commercial potential. These points aside, for the purposes of this post, the proportional difference in genres across the two charts is interesting.
We also wanted to understand the categorical split of each genre, so we dove deeper and analyzed the individual BISAC categories that made up each genre. The chart below is measured by category composition - which analyzes how much of each book belongs to a category. For example, instead of tagging a book as Romance and Fantasy, our systems tell us the book is 30% Romance and 70% Fantasy.
(BISAC is the US publishing industry’s system for categorizing books. You can read all about it here - https://www.bisg.org/tutorial-and-faq)
This chart closely matches our genre chart, but tells us that Romance books typically consist of more granular categories than Science Fiction and Fantasy categories. This is somewhat reflected in the number of different BISAC subcategories for the genres - Romance has almost 50% more sub-categories than Science Fiction and Fantasy combined. It also alludes to a level of variance in the categories - our system was more easily able to split Romance titles into clearly distinct categories, but for Sci Fi and Fantasy, most content was generalized to Fantasy / General or Science Fiction / General.
(We've also classified tens of thousands of freely available Gutenberg books, which you can browse here. Many of these books were published before BISAC was invented.)
Next, we dove even deeper to look at the topic composition of our sample of books, and analyzed how much each book was made up of each topic. The topics listed below aren’t industry standard, and were created by our team. Topics allow us to quickly and programmatically understand, at a more granular level, what a book is about.
Given the strong bias for Science Fiction and Fantasy, the top few topics aren't particularly surprising. One observation we can make from this data, is that some genres have a higher proportion of genre-specific content than others. For example, a Romance novel will have many romantic scenes and dialogue, and be romance-themed. But the story will often revolve around another topic (western, military, etc.). A Science Fiction or Fantasy novel will usually contain a high proportion of genre-specific content - the whole world of the story will usually relate to the genre. Books in these genres are also likely to encompass elements of other genres too. Therein lies the categorization challenge we discussed earlier - should a novel be FIC027130 (Romance / Science Fiction) or FIC028000 (Science Fiction / General) or both? Are the romance elements strong enough for a book to be categorized as a 'Romance' book? Publishers of course, use knowledge of the book as well as strategic category selection, to influence placement of their books on bookstore bookshelves.
A few notes on the topics above. 'Existence' - covers concepts such as consciousness, the universe, humanity and realms - elements often found in Sci Fi / Fantasy. 'Vampires' have their own topic (instead of being part of 'Creatures & Monsters') which reflects the more prominent showing of vampires compared to other monsters, in recent fiction. 'Erotica' as a topic is smaller in representation for the reasons we discussed above for 'Romance'.
We speculate that writers write more Sci Fi and Fantasy books, as it's simply a lot of fun to create entire worlds with their own rules, creatures and customs. Mystery & Detective or Crime novels, while also fun to write, are often set in our reality, and typically require some technical or specialized knowledge - details which may need to be fact checked and accurate. Many authors in these genres have had prior experience in the field, or have spent significant effort researching their topics. These books will often teach the reader something, which is appealing to readers.
As a writer, should you switch to Crime and Mystery in order to increase your odds of landing an agent or selling more self-published books? We don't think so. Write in the genre that is the best fit for you, as doing so will be reflected in your published work.
We hope you enjoyed this glimpse into what self-published authors are writing. Please let us know how you interpreted our results in the comments below.
If you'd like to see this data for your book, sign up for Author Checkpoint beta.