Keyword portfolio ordering

We have written separately about portfolio membership: which keywords belong in a book's keyword field, and why a set of individually strong keywords often makes a weak portfolio. Membership is the first optimization. Ordering is the second, and it runs on the same logic applied one position at a time.

Most systems treat ordering as sorting. Score every keyword, sort by score, and the order falls out: highest scorer first, next second, down to the budget limit.

This gives the wrong order. Most keywords fail to add value because they repeat what the book's visible metadata already supplies, a point we have made before. The same logic governs the sequence. The value of a keyword at any position is the value it adds over everything ranked above it, starting with the book's own visible metadata.

So the portfolio is an ordered coverage system. Each position is judged against a coverage set that grows as the list is built.

Diagram showing visible metadata followed by keyword slots, with each slot adding to the coverage set. — Ordering is a sequence of marginal decisions. Each accepted keyword changes the coverage set for the next slot.

The coverage set starts as the book's own metadata

The starting coverage set is the book's visible metadata: title, subtitle, author, categories, and description. These already work for the book. They already index it for the queries they contain. A keyword's job is to open a door the visible metadata leaves shut. A door is a route by which a reader's search reaches the book: a query, or a family of related queries, that brings the book into the results. The title, subtitle, and categories already open the doors their own words describe. A useful keyword opens a different one.

So the first keyword is measured against the visible metadata alone. For position one, the question is plain: among the candidates that belong in the portfolio, which opens the most valuable door the title, subtitle, author, categories, and description leave shut?

Once that keyword is in, its concepts join the coverage set. The second keyword is measured against the visible metadata plus the first. The third, against the visible metadata plus the first two. The coverage set accumulates as the list grows.

coverage_0 = title + subtitle + author + categories + description
k_1        = candidate that adds the most value over coverage_0
coverage_1 = coverage_0 + concepts(k_1)
k_2        = candidate that adds the most value over coverage_1
coverage_2 = coverage_1 + concepts(k_2)
...

Every position is a marginal-value decision made against everything already covered.

What "best first" means

The best first keyword adds the most over the book's visible metadata. Often that keyword sits some way down the raw-score list.

To take the first position, a keyword has to clear a set of checks:

It is a query real readers type.
It fits the book clearly.
It has demand.
The book is rankable for it.
It is supported by evidence: how many results the query returns, what the search box's autocomplete suggests, how the results page behaves.
It is absent from the title, subtitle, author, and category.
It is neither a competitor's title nor a repeat of the book's own title or author.
It opens a valuable door the visible metadata leaves shut.

The last check decides most cases. A keyword can be true, in demand, and rankable, and still be a weak first pick when the visible metadata already covers its door.

So a lower standalone score can beat a higher one. Take two candidates. The first scores well and mostly restates the subtitle. The second scores lower and opens a reader-outcome door that nothing in the visible metadata touches. The second adds more to the portfolio, so it ranks above the first, despite the lower score in isolation.

Base value and marginal value

Every candidate carries two distinct quantities.

Base value answers a question about the keyword alone: is it good in isolation? It combines semantic fit, reader-search realism, demand evidence, rankability, phrase quality, evidence completeness, support from the book's facts, query specificity, and commercial intent. Base value is what a ranking-based system computes, and where it stops.

Marginal value answers a question about the keyword in context: what does it add, given everything already covered? It turns on how much the keyword overlaps with the visible metadata and the keywords already selected, whether it targets the same reader intent, the same set of search results, the same lane, or the same book identity those earlier choices already cover. A lane is one commercially distinct way readers search for a book: by genre, by reader outcome, by audience, by format, by the problem the book solves.

Useful marginal value comes from a new reader intent, a new use case, a new audience, a new category, a new problem or outcome, a new search behavior, a new long-tail phrasing, or a new rankable subtopic.

Low marginal value looks like this:

wealth habits
wealth building habits
millionaire wealth habits
habits of wealthy people

All four may be true and on-topic. They compete for the same door. Once the first is in, the marginal value of the other three collapses, and they sink down the order or fall out of it, whatever their individual scores.

The ordering score

A slot-level ordering score looks like this:

\begin{aligned} \operatorname{SlotScore}(c \mid \operatorname{Cov}) ={}& \operatorname{Base}(c) \times \operatorname{Marginal}(c \mid \operatorname{Cov}) \times w_{\operatorname{slot}} \\ &+ \operatorname{Bonus} - \operatorname{Penalty} \end{aligned}

The bonus terms cover lane gaps and rankability. The penalty terms cover overlap with the visible metadata, overlap with the already-selected set, opening a door the portfolio already opened, competitor-title risk, dilution from terms that are too broad, awkward phrasing, and unsupported claims.

The multiplication does the real work. A high base value times a near-zero marginal value still comes out low. A keyword cannot buy a top position with base value alone; it has to be good on its own and add something the portfolio lacks. The weights, the slot-weight curve, and the penalty functions are proprietary. The shape is the lesson: base value and marginal value multiply rather than add, which keeps the top positions clear of high-scoring duplicates.

Membership and ordering are separate optimizations

Two questions have to stay apart:

membership:  which keywords belong in the portfolio?
ordering:    in what sequence should they appear?

The right sequence of operations runs:

Choose the final portfolio membership.
Rerank the whole selected set by marginal contribution, position by position.
Confirm that each position adds value over every position above it.
Run a final pass for duplication, lane crowding, title risk, and phrasing.

The rerank is necessary because membership selection and final ordering use the coverage set differently. During selection, candidates compete for inclusion against a large pool. Once the members are fixed, the order has to be rebuilt against the actual selected set, because the marginal value of each member depends on which other members made the cut.

Skipping the rerank is a common error. A system that picks fifteen good keywords and then lists them by base score has done the membership work and skipped the ordering work. The published field then leads with whatever scored highest in isolation, which is often a near-duplicate of the title.

Lane crowding shows up in the order

Ordering interacts with lanes through diminishing returns. We have written about lanes and portfolio allocation at length elsewhere.

The first strong keyword in a valuable lane counts for full credit. The second in the same lane counts for less. The third counts for much less, unless it carries distinct demand or intent. As the list grows and the coverage set fills, more keywords in an already-covered lane lose marginal value and sink.

That decay keeps a portfolio from piling into one lane. Hard lane quotas are clumsy. Diminishing returns by lane, applied through the marginal-value calculation, produce balanced coverage without forcing it.

Title-like keywords

Some phrases are both a book title and a reasonable generic search phrase. The system has to separate title-driven demand from generic search demand.

Block by default when the phrase is an exact or near-exact known book title, when its demand comes from a competitor title, when it mostly routes buyers to another commercial book, or when its demand exists because of brand, series, or identity.

Allow or review when the phrase is genuinely generic, when the result page shows broad non-title usage, when it fits the book on its own, and when it adds distinct value.

The distinction looks like this:

secrets of the millionaire mind is a specific known title. Its demand belongs to that book. Block.

frugal millionaire habits is generic, on-book, and descriptive of content. A possible keeper.

millionaire mindset is ambiguous. It may be generic, or it may be brand-adjacent to a specific property. It needs evidence before it goes in.

Getting this wrong costs more than a slot. A competitor-title keyword aims the book's own metadata effort at a query the book cannot win and should not chase.

Edit before use

A candidate's underlying concept can be good while its exact query string is unfit for production. The phrasing might be awkward or machine-generated, too long, or unnatural as search language. It might overlap too heavily with an existing keyword, or carry low demand precisely because the phrasing is strange.

Take this candidate:

long term wealth habits from millionaire research

The concept is sound. The phrase is awkward, and no reader types it. Cleaned alternatives might be:

long term wealth habits
research based wealth habits
habits of self made millionaires

The rule here is strict. A cleaned phrase has to be rescored from scratch. It cannot be rewritten by hand and dropped into the portfolio at the original candidate's slot. The new phrasing carries different demand, different competition, and different marginal value. It goes back into the process as a new candidate, judged on its own evidence.

A system that rewrites awkward phrases without rescoring them is publishing keywords on assumed demand. The rewrite changes the query, and the query's value has to be measured again.

Bottom line

Order each position by the highest marginal, evidence-backed, rankable, non-duplicative value, measured against the book's visible metadata and every keyword already ranked above it.

Two habits work against this. Sorting the portfolio by raw score fills the top positions with high-scoring duplicates of the visible metadata. Listing existing keyword rows first and new additions last ignores value altogether and preserves whatever order history left behind.

A portfolio's worth is the set of new doors it opens beyond the ones the book's own metadata already opens. The order is how those doors get opened, one marginal decision at a time. Membership decides what the book could cover. Ordering decides what it covers first, and under a finite budget, what it covers at all.