This post takes a deep dive into how Amazon's search engine works. We'll touch on information-theory concepts, and aim to use as little technical jargon as possible. If it's all a bit much, skip to the end to read the summary of takeaways to help you choose and optimize keywords for your book.
Note: 'keyword' in this post refers to the seven keywords authors can enter into KDP, or the keywords publishers send to Amazon in their ONIX files. 'Keywords' covers both single terms and key phrases (multiple terms).
Finding New Keywords
As you've no doubt guessed, Amazon logs every search request a person makes, and stores this information. Every so often, this data is analyzed to find new keywords to use in the search engine. This way, when a lot of people start using a new keyword (e.g. because of a trending topic in the news) books matching these new terms will show up in search results.
It pays to regularly review your keywords to incorporate the latest trending keywords. Doing so means you'll probably have less competition for the new terms, since most authors and publishers update their keywords once and leave them. Less competition and increased search volume means it's easier to rank highly for these valuable terms. Keeping up-to-date with trending keywords related to your book will give you a distinct edge.
A search query isn't a keyword
Most people who write about Amazon keywords believe a search query (which is entered by a person searching for books on Amazon) is the same thing as a keyword. The advice given to authors is that they should search for terms they think people might use to find their book, then use those exact terms as keywords on Amazon. As we'll see, this simplistic strategy is not optimal in every case.
While direct matches from search query to book do form part of the matching algorithm, much more processing occurs to extract summary data from both the search query and metadata. In most cases, this summary data is used to match a search query to a book.
Adding more terms in each keyword slot can often get you more search coverage *
* For broad queries, such as category names. There's an exception to this for exact phrases, as we'll see below.
simplifying your words
One process of extracting data from search queries and keywords involves changing the keyword/query to lowercase, removing very common terms, and lemmatising the remaining terms. Lemmatisation is a fancy word, but is a simple concept. It means variations of a word are reduced to a common base form. Here are some examples:
• walked, walking, walk → walk
• paid, paying, paid → pay
• is, was, am, being → be
Variations of keyword terms often don't matter (due to lemmatisation). So don't waste keyword slots on variations. E.g. 'romancing' will be interpreted as 'romance'.
(You can test keyword variations to see their base form here: http://textanalysisonline.com/nltk-wordnet-lemmatizer).
Keywords within keywords
Another process extracts combinations of words from keywords/search queries. For example, the keyword "new adult contemporary romance" will have the following phrases extracted to match search queries to books:
• new adult
• adult contemporary
• contemporary romance
• new adult contemporary
• adult contemporary romance
• new adult contemporary romance
You often get more mileage from your amazon keywords by including multiple keywords in one slot.
The sum is greater than the whole
Important phrases are extracted from keywords and search queries, and given more value. We often use two or more words together, when combined produce a different meaning than the words on their own - in information theory, these phrases are called collocations. Here are some examples:
• "fast food"
• "united states"
• "southern culture"
• "beach read"
The search engine identifies these groups of words as having special meaning when combined together. The more statistically significant a phrase is, the more important it is to the search engine (and to us). Phrases such as "volume 1" and "a book about" aren't statistically significant (or strong collocations either).
If you're interested in learning more about this topic, search online for: 'collocations' and 'pointwise mutual information'.
The order of terms, especially for phrases with special meaning, are important.
Amazon's search engine understands when a phrase is important.
Comparing Apples with Apples
When your book is added to Amazon's search index, the search engine first tries to understand what your book is about, by seeing what browse node categories and topics it fits into. It chooses browse node categories by reading the categories you've chosen for the book, and also by extracting terms from the book's keywords (using the processes above) and from other data, such as the book's title. To be included in certain Amazon categories, the your keywords need to include certain words. A list of categories with keyword requirements can be found here.
Ensure you review the keyword requirements for special amazon browse node categories, to increase the potential exposure of your book.
A separate process reads all the book metadata for all the books Amazon has, and looks for groups of words that occur together. These clusters of words are called 'topics'. This sounds complicated, but think of a topic as a word cloud for all the words related to a topic. Words related to the topic 'World War II', might be: germany, invasion, nazi, auswitch and europe. Words related to the topic 'Vampires' might be: slayer, fang, stake, undead and bite. (For the data scientists: extracted terms are assigned to topics via LDA).
For each topic and browse node category, a score is assigned for your book, which tells the search engine how relevant your book is to each topic or category.
When a person performs a search, this same process of topic and category assignment, is also applied to the person's search query. The search engine takes all the matching topics and categories from the search query, and finds all the books that also match these topics and categories. This list of books is then ranked and returned as a search result to the person.
It's important that the keywords you assign to your book, returns books in similar categories as your book. The search engine is pretty good at guessing the correct category a person is looking for when they type in a search query. So if your book doesn't match the categories of the other books in a search result, it'll be less likely the person will click through to your book page.
Reading your mind
The search engine takes a best guess at what you're thinking when you type in a search query, and tries to figure out what list of books will best answer your query. In other words, it tries to understand the intent behind your search. The search engine applies different algorithms, depending on what it thinks you're looking for.
On the surface, you might think - well, a book search is a book search, but consider the differences in search intent, for a person searching for these queries:
• a book called a time to kill
• harper lee's new book (try this query, it works well)
• books by stephen king
• pulitzer prize winner 2015
• zombie books
• loveswept romance story
• books in different languages
• a quote from a book
Each different user intent requires a different strategy to match a search query to books. The search engine needs to work out the best book results to match your intent, or even to satisfy multiple potential user intents for a search query. The logic to do this effectively is complex, and is combination of query classification and running multiple search query operations, then combining the results.
Search queries with different intents, are ranked differently.
As we mentioned above, books are matched to a search query based on how relevant they are to the query. While the order of search results is underpinned by relevance, the overriding goal of Amazon's book search (to help sell more books) and search intent heavily influence search ranking.
Depending on the type of search intent, search results have their ranking influenced by 'non-relevance' factors, which include all the (seen and unseen) data attached to your book. Examples of non-relevance factors are number of reviews, average rating, number of clicks and sales rank.
Non-relevance factors are weighted differently depending on the type of search query. For example, a search for a specific quote in a book is likely to be a long tail query. The best result for a person searching for the quote, is likely to be the book that the quote appears in. The same applies for searches for an exact book title or books by an author.
As a rule of thumb, the more nebulous a query, the more short tail it is. Short tail queries usually match a large number of books, and typically receive a lot more search traffic (which means many more people will see the books returned in the results). Since the searcher has been less specific about what they're looking for, the search engine assumes that by returning relevant books many people have purchased, it has a higher chance of satisfying the person's search query (and also showing them books they have a higher chance of buying).
For short tail queries, non-relevance factors are much more influential in ranking the search results. Chief of the non-relevance factors is sales rank, which is an indicator of how well a book is selling. A book with a lower (better) sales rank has sold well, so logically, it has a higher chance of selling when placed in front of a potential customer who has indicated a broad preference of potential books they might be interested in.
Unless you have a high number of recent book sales, don't target all high volume, short tail keywords, as your book won't show up in the first couple of pages of search results, and won't receive much search traffic.
Big Brother Search refinement
Search algorithms aren't perfect. Even with all the complicated algorithms working to deliver the best book search results, the search engine isn't going to hit the mark every time. Along with recording every search query you make, every click you perform on the site is also recorded. Every time you click on a book in a search result, your click data is recorded and used to improve the ranking of future searches. Other actions also factor into this calculation (such as purchasing a book).
Earlier we discussed how each book's relevance score was associated with the topics and categories it was matched to, and how non-relevance factors influence it's ranking score. Each action a person performs on a book in a search result contributes to this non-relevance score. If a book is continually clicked on when shown in a list of search results, it is a positive indicator that the search engine is doing a good job. Likewise, if a book never receives any interaction from a person, then perhaps it isn't the best fit for the query. The ranking of the books in the search results is adjusted based on this type of user behavior.
It's difficult to game the search engine by using unrelated, high volume keywords - if your book shows up in search results, and other books are continually clicked or purchased more often than yours, your book will end up with a lower score for those keywords, and your search ranking will drop. Along with ensuring that your book is a good match to the keywords you've chosen, a good cover and catchy description will also help to convert to a sale and increase your chances of entering the positive ranking validation loop.
Keep your metadata fresh!
As we discussed in the beginning, indexed keywords are continually updated based on what people are actually searching for. Since recency is a factor, it means using newer terms can give you an advantage.
It pays to keep abreast of trends and modify your book metadata (especially keywords), to use recently popular or seasonal keywords.
Google trends is great for checking potential keywords. Here's an example for 'new adult romance', which has had a steady increase since 2013 and is enjoying a spike right now https://www.google.com/trends/explore#q=new%20adult%20romance. Here's another example of 'beach reads', which peaks in popularity during the summer months: https://www.google.com/trends/explore#q=beach%20reads.
Continually refresh and test your keywords!
If you've chosen a poor set of keywords, the user interaction score for the book-keyword combination will be low. This means your book may have shown up in search results for relevant search phrases, but had a low number of people clicking through to your book page, or purchasing a book. This may result in your book showing up further down the search results list. Changing your keywords to new keywords will mean starting with a fresh keyword-book-user interaction score.
If you're not achieving good rankings and interesting people in search, you should keep changing your book's metadata (keywords, title, description, cover, categories, etc.) until you do.
These are a few ways your data is processed in Amazon search. There's much more to the search engine, but this post should give you an insight into some of the complexity involved. The best approach to success is to conduct research for high value keywords that are relevant to your book, and appropriate for the number of sales you're currently making. It's difficult to game the system in a sustainable, long-term manner.
Summary of keyword action points
• Adding more terms in each keyword slot can often get you more search coverage.
• Variations of keyword terms often don't matter (due to lemmatisation). So don't waste keyword allocations on variations.
• You can often get more mileage from your amazon keywords by including multiple terms in one keyword slot.
• The order of terms, especially for phrases with special meaning, is important.
• The search engine understands when a phrase is (statistically) significant.
• Ensure you review the keyword requirements for special Amazon browse node categories, to increase the potential exposure of your book.
• Unless you have a high number of recent book sales, don't target all high volume, short tail keywords, as your book won't show up in the first couple of pages of search results, and won't receive much search traffic.
• It pays to keep abreast of trends and modify your book metadata (especially keywords), to use recently popular or seasonally popular keywords.
• Continually refresh and test your keywords!