How do Amazon and Google use my book metadata in search?

May 11, 2015

This post explains some basic concepts of how search engines work and index your book metadata, and the differences between Amazon and Google search engines.

Before search engines can show you results for a query you type into a search box, they have to process and understand all the raw data that make up the search results you see after entering in your query. This process is called ‘indexing’.

Amazon and Google search engines differ in the type of data they index, and how they do it. Both search engines index structured (e.g. Amazon keyword fields, book categories and microdata) and unstructured data (e.g. web pages, or book excerpt text).

Amazon indexes book data by reading the information you enter into KDP, or by reading ONIX files (if you’re a larger publisher with many books). Google indexes book data by crawling the web and consuming any data it finds about your book, for example reading your book or author web page, your Amazon book page or social discussions about you book.

The process of indexing includes a step called ‘feature extraction’, which involves extracting a subset of data from the raw data (e.g. book web page), and is actually what goes into the search index. Features are used by search algorithms to match your search query to a book page on Amazon, or a web page on Google. The features are used to determine which search results are most relevant, as well as the order to display them to you (also known as ranking).

Amazon book search has a much easier job than Google, as almost all of the data that it indexes is structured (meaning it knows what each piece of data means) and it is generally answering a much narrower question - such as “find me books related to my search query”. Compare this with Google, who needs to index and understand exponentially more mostly unstructured data, and then answer infinitely broader questions from users’ search queries.

There is some crossover in the book data each search engine indexes. Basically any data that you can see on your Amazon book page, Google also sees, and will index. Google will also know that your Amazon book page is about a book, but will need to infer that knowledge when analyzing your web page.

Armed with a broad understanding of the different problems Amazon and Google’s search engines are solving, and how they differ in indexing book metadata, we can start to analyze the impact changes to your book metadata will have on how you rank in search. This we’ll leave for our next post.