Do Writers Write What Readers Want To Read?

Have you ever wondered if the genres authors most enjoy writing in, match the genres readers most enjoy reading? Before self-publishing, all new books for sale were filtered by agents and publishers, who acquired and worked on books they thought would sell well. If there was an oversupply of manuscripts by authors in a particular genre, the competition to be chosen and published within the genre, would be higher too. Enter self-publishing: now any writer can publish, without filter, into any genre they desire. Given the influx of new books across genres, does the proportion of books in each genre meet with readers' demand?

(We focussed on fiction for this experiment).

Methodology (or How to Speed Read 3000 books in 3 hours)

To answer our question, we needed a way to read and understand a good sized sample of self-published books, to determine their genre. You might ask why we couldn't simply use the categories or tags authors themselves apply to their books? The reason is accuracy and consistency - most indie authors don't have years of book categorization experience, working across a number of titles. Even traditionally published books are categorized inconsistently from book to book and from publisher to publisher. The inconsistency is not because publishers are poor at the job, but because standardizing the process would require centralizing the categorization effort. (We've worked with data feeds from all major publishers and have experienced this phenomenon first hand). The only way to derive accurate and consistent categorization is to read a large sample of books, understand how each book relates to each category, and assign it, while ensuring consistency across the sample. One of our systems does just this.

We gave our categorization system over 3000 self-published novels to read and understand (these were books offered free by the author). For each book, our system identified all the topics the novels were about, then used this topical knowledge to assign each book to one or more categories and genres. Overall, our system read over 260 million words and figured out all the genres, categories and topics in the data below, in a few hours.

What writers write

Writer’s Genres

The top genres (by count) detected by our system were Romance, Fantasy and Science Fiction.

Romance was the most popular genre, with 24.4% of books tagged. By combining Science Fiction and Fantasy though, to derive a total score of 32.1%, we can deduce that writers enjoy writing in this genre more than any other. Literary and Mystery & Detective both came in around 6%. How does this compare to what readers read?

Reader’s Genres

To understand the genres readers enjoy reading the most, we looked at revenue data. This doesn't incorporate units purchased or read, or ratings, but in aggregate, revenue is a good proxy indicator for reader enjoyment.

Source: Leading book genres worldwide as of January 2014, by revenue (in million U.S. dollars)

The highest selling fiction genres were Romance/Erotica, Crime/Mystery and Science Fiction and Fantasy. Romance was high in both charts, but we can broadly extrapolate that there’s a potentially underserved market for Crime and Mystery & Detective and an oversupply for Science Fiction and Fantasy books (when combining the two genres in our first chart).

The correlation isn’t perfect of course, as our sample size is small, we're not considering units sold vs. price, and the revenue data is based on the less consistent human classification of books. We also assume the novelists in our sample wrote their books for the joy of it, and didn’t select their categories purely for commercial potential. These points aside, for the purposes of this post, the proportional difference in genres across the two charts is interesting.

BISAC Categories

We also wanted to understand the categorical split of each genre, so we dove deeper and analyzed the individual BISAC categories that made up each genre. The chart below is measured by category composition - which analyzes how much of each book belongs to a category. For example, instead of tagging a book as Romance and Fantasy, our systems tell us the book is 30% Romance and 70% Fantasy.

(BISAC is the US publishing industry’s system for categorizing books. You can read all about it here - https://www.bisg.org/tutorial-and-faq)

This chart closely matches our genre chart, but tells us that Romance books typically consist of more granular categories than Science Fiction and Fantasy categories. This is somewhat reflected in the number of different BISAC subcategories for the genres - Romance has almost 50% more sub-categories than Science Fiction and Fantasy combined. It also alludes to a level of variance in the categories - our system was more easily able to split Romance titles into clearly distinct categories, but for Sci Fi and Fantasy, most content was generalized to Fantasy / General or Science Fiction / General.

(We've also classified tens of thousands of freely available Gutenberg books, which you can browse here. Many of these books were published before BISAC was invented.)

Topics

Next, we dove even deeper to look at the topic composition of our sample of books, and analyzed how much each book was made up of each topic. The topics listed below aren’t industry standard, and were created by our team. Topics allow us to quickly and programmatically understand, at a more granular level, what a book is about.

Given the strong bias for Science Fiction and Fantasy, the top few topics aren't particularly surprising. One observation we can make from this data, is that some genres have a higher proportion of genre-specific content than others. For example, a Romance novel will have many romantic scenes and dialogue, and be romance-themed. But the story will often revolve around another topic (western, military, etc.). A Science Fiction or Fantasy novel will usually contain a high proportion of genre-specific content - the whole world of the story will usually relate to the genre. Books in these genres are also likely to encompass elements of other genres too. Therein lies the categorization challenge we discussed earlier - should a novel be FIC027130 (Romance / Science Fiction) or FIC028000 (Science Fiction / General) or both? Are the romance elements strong enough for a book to be categorized as a 'Romance' book? Publishers of course, use knowledge of the book as well as strategic category selection, to influence placement of their books on bookstore bookshelves.

A few notes on the topics above. 'Existence' - covers concepts such as consciousness, the universe, humanity and realms - elements often found in Sci Fi / Fantasy. 'Vampires' have their own topic (instead of being part of 'Creatures & Monsters') which reflects the more prominent showing of vampires compared to other monsters, in recent fiction. 'Erotica' as a topic is smaller in representation for the reasons we discussed above for 'Romance'.

Conclusion

We speculate that writers write more Sci Fi and Fantasy books, as it's simply a lot of fun to create entire worlds with their own rules, creatures and customs. Mystery & Detective or Crime novels, while also fun to write, are often set in our reality, and typically require some technical or specialized knowledge - details which may need to be fact checked and accurate. Many authors in these genres have had prior experience in the field, or have spent significant effort researching their topics. These books will often teach the reader something, which is appealing to readers.

As a writer, should you switch to Crime and Mystery in order to increase your odds of landing an agent or selling more self-published books? We don't think so. Write in the genre that is the best fit for you, as doing so will be reflected in your published work.

We hope you enjoyed this glimpse into what self-published authors are writing. Please let us know how you interpreted our results in the comments below.

If you'd like to see this data for your book, sign up for Author Checkpoint beta.

How Book Categories Have Changed This Year

Amazon continually updates it's browse node categories for books, to cater to the shifting needs of the market. In the last six months, 563 new categories were added, and 122 were removed. There were also a number of category name refinements. Categories serve the purpose of helping readers find similar books. As the number of books allocated to a category fluctuates, the granularity of the categories needs to change too. If categories remained static, they'd become unbalanced, with too few or too many books, which would make browsing and searching for books a challenge.

We've summarized the changes to book categories (browse nodes), that have occurred over the past six months, looking at the impact on various top level categories.

 

Genres with new categories added

categories-with-new-browse-nodes-added.jpg

There were 345 new 'Teen & Young Adult' categories added in the past six months, which is likely a reflection of the huge increase in YA sales over the past year. 

Only one 'New Adult' category was added (Science Fiction & Fantasy/Fantasy/New Adult & College), to take the total to two categories (the other is: Romance/New Adult & College). Rather than expand this newer category, the breadth of 'Teen & Young Adult' has been expanded to accommodate the influx of titles in this area.

Additions to 'Computers & Technology' were dominated by 'Software' (Adobe, Enterprise Applications and Business) and 'Web Development & Design' (Programming and Web Design) sub-categories. 

The largest increase to the 'Religion & Spirituality' genre was in the 'Religious Studies' sub-category, comprising 24 of the 52 additions.

Teen & Young Adult Category Additions

Digging deeper into the 'Teen & Young Adult' genre we see that of the 345 additions made, 152 were in fiction and 193 were in non-fiction. These were broken up as follows:

teen-and-young-adult-category-additions.jpg
categories-browse-nodes-removed.jpg

The 'Computers & Technology' genre had the most categories removed (88), but had an almost equal number of categories added (87). Technology experiences rapid changes, so a commensurate shift in categorization of the subject matter is likely to occur.

The second largest genre with categories removed was 'Crafts, Hobbies & Home', and within that genre, almost half were in the 'Home Improvement & Design' sub-category.

For a full list of the removed categories, click here.

Conclusion

Changes in browse node categories reflect shifts in the type of books available for sale. An increase in categories for a genre is likely driven by a combination of additional supply of books in the genre, and of increased effort to improve the searchability of those books. One may speculate how these two factors reflect increased demand for books in a genre.

 

9 Common Keyword Mistakes

Coming up with relevant and effective keywords is hard! Keeping them up-to-date and optimized for then number of sales your book is currently making is even harder. Here are common mistakes we see authors make when implementing their keyword strategy on Amazon:

1.  Choosing keywords that are too broad

2. Not validating that a keyword is commonly used by customers

3. Choosing keywords without much traffic

4. Not monitoring keyword progress (checking search rank for a book)

5. Leaving keywords unchanged for a month or longer

6. Choosing keywords that are too competitive for their book

7. Repeating terms across keywords

8. Not aligning keyword strategy with external marketing activities (to capitalize on sales rank increases)

9. Not having a keyword strategy!

How Copyediting Could Be Disrupted

A human copyeditor is unlikely to be completely displaced by a machine, but a significant portion of common copy edit’s to manuscripts could be automated. A primitive tool to assist with copyediting exists (AutoCrit) which suggests changes to text based on readability and other metrics. An advanced tool could be created to capture micro edits across multiple manuscripts, compare these edits and then automatically apply the changes where confidence in the change is high.

Publishing houses are best placed to create these specialized copyediting knowledge bases. They could start by installing software on editors' machines to capture each line-edit and log it to a central database. A copyediting rules engine would then analyze the before and after text changes using part-of-speech (POS) tagging to disambiguate word-categories. After collecting enough examples of similar edits, a rule could be learned by the system, and applied to similar occurrences in new text. These rules would be saved as templates that understand POS tagging. A rules-based library already exists that could easily be adapted to support this system.

The new copyedit system will undoubtedly suggest suboptimal changes, or multiple text alternatives. In this scenario, a human would verify the change. The system would learn which changes were preferable, under which circumstances, until it has enough knowledge and confidence to apply edits automatically. The review process could be extended to include feedback from book reviewers, to rate the most effective changes.

It's unlikely the system could turn good writing into great writing, but at the very least, it could learn enough Strunk-like style suggestions to improve poor writing, via rule based templates, for example to ‘use the active voice', 'omit needless words' and to 'put statements in positive form'.

 

Slush Filter beta version released!

Kadaxis is pleased to announce the beta release of Slush Filter, a tool for literary agents and publishers. Slush Filter accepts fiction manuscripts of 40,000 words or more, in doc, docx, txt and ePub formats, and provides a machine generated report in seconds. Each report contains:

  • A recommendation on whether to review the manuscript (based on potential marketability)
  • BISAC Codes
  • Comp Titles
  • Locations and character names.

Please email [email protected] for an unlimited trial license.