When AI Calls a Specialist a Generalist

January 22, 2026

The first warning sign is rarely a harsh answer. More often the system is polite, almost complimentary, while quietly moving the firm from a narrow commercial shelf to a crowded one.

At 6:20 one February morning in Halifax, I had three answers open beside the ledger: one from ChatGPT, one from Perplexity, and one from a smaller answer engine I was testing because it had begun showing up in buyer conversations. All three named a four-person regulatory advisory consultancy in Ontario. That sounds like a good morning. The firm was visible. The language was respectful. No wild hallucination had burned the house down.

Then I read the phrasing again. “Business advisory.” “Startup support.” “Documentation help.” One answer even placed the firm beside grant consultants, which was not quite wrong in a dictionary sense, but commercially wrong enough to matter. This was a composite scenario, assembled from several runs with health-adjacent service firms. One answer also got a program name slightly wrong, which is the kind of small bruise I like to keep in the notes. Perfect examples make poor field evidence. Real answers wobble.

The polite version of being misfiled

Most owners first notice misclassification when the answer sounds obviously false. A clinic becomes a spa. A technical studio becomes a retailer. A litigation-adjacent consultant becomes a general coach. Those cases are easy to dislike, and sometimes easy to repair, because the mismatch is loud.

The more useful case is quieter. The answer gets the broad subject right while missing the commercial shape of the work. It keeps the name and loses the shelf. In the ledger, I call this category thinning: category thinning is the compression of a specific expert service into a broader service class, because the answer system preserves topic relevance while dropping the evidence that justifies a narrower buying decision.

That definition matters because a lot of firms mistake the problem for poor sentiment. They read a warm description and assume the answer has understood them. The system says the firm “helps startups with compliance and documentation.” It may even describe the team as experienced. But the buyer who needed market-entry risk language for a medical-device launch has heard something softer, cheaper, and less specific.

An answer can praise your expertise while placing it in the wrong purchasing category.

I have seen this in advisory, technical, editorial, legal-adjacent, and health-adjacent services. The answer often knows the field. What it does not always preserve is the business reason someone would pay for judgment instead of task completion. That is where high-ticket work gets sanded down. The system does not need to insult the firm. It only needs to choose the wrong neighbouring words.

How the shelf changes without a dramatic error

Imagine a consultant whose service page says “we help founders prepare for launch,” “we support documentation,” and “we guide teams through readiness.” A human buyer might infer the specialist context from surrounding detail: regulated product categories, risk language, review protocols, messy evidence from past projects. A generative system often works with a shorter mouthful. It chews the page into labels.

In a simplified teaching example, the answer has several possible shelves. It could file the firm under regulatory strategy, market-entry advisory, compliance documentation, startup coaching, business consulting, or grant-readiness support. Some of those shelves are near each other in language. They are far apart in buyer expectation.

The system may not be choosing “wrong” in a moral sense. It is compressing. If the page gives it ten signals and six of them are broad, the broad signals may become the easiest summary. This is why one prompt screenshot rarely tells the truth. A single answer can be the odd card in the drawer. A pattern across prompt families shows whether the shelf keeps changing in the same direction.

In the Ontario advisory composite, the same thinning appeared across several buyer situations. When the prompt asked for help with “health startup documentation,” the firm appeared beside documentation freelancers. When the prompt asked for “medical-device market-entry readiness,” it moved closer to regulatory consultants. When the prompt asked for “founder support before launch,” the firm slid toward coaches and grant advisers. The firm was not absent. It was unstable.

That instability is the evidence.

The words that invite a cheaper reading

There is a kind of service language that feels safe because it is familiar. “We help.” “We support.” “We guide.” “We partner with.” The phrases are not bad. I use some of them myself when they fit. The problem appears when broad verbs carry too much of the explanation.

In my runs, broad verbs tend to survive compression better than narrow nouns. If a page says, “we help early-stage teams prepare documentation,” the answer may retain “help,” “early-stage,” and “documentation,” while losing the specific regulated context that made the service expensive. A specialist may have written the page for human nuance, and the answer system reads it like a sorting clerk with a wet thumb.

This does not mean every service page should become stiff or overstuffed with technical phrasing. Heavy language can cause a different failure: the answer recites credentials and still misses the buyer problem. That is a neighbouring issue I treat separately under proof that survives compression. Here, the core mechanism is simpler. Repeated broad phrasing creates broad category gravity.

Category gravity is my term for the pull created when repeated words make one service class easier to summarize than another. A firm can have precise evidence lower on the page and still be pulled upward by the softer phrases that appear in headings, intros, navigation labels, and bio summaries.

The cure is usually smaller than owners expect. I do not begin by rewriting a site into a block of rigid keywords. I look for the few phrases that sit at the entrance of the page. The first sentence of a service description. The label in the menu. The phrase under the founder’s name. The line that says who the work is for and what kind of decision it supports. These are small hinges. They swing heavy doors.

Visibility without category fit is a weak win

A named appearance in an AI answer feels like progress. I understand that feeling. Owners have been trained by search dashboards to see presence as the first prize. If you appear, you exist. If you rank, you are in the game.

Generative answers complicate that old comfort. Presence can carry the wrong commercial framing. A buyer looking for a specialist sees your firm, then sees you described in terms that belong to a cheaper provider. The answer has transferred you into another price climate.

In the ledger, I separate visibility, accuracy, category fit, and commercial usefulness. These fail differently. A firm can be visible and inaccurate. It can be accurate and commercially weak. It can have category fit in a general prompt and lose that fit when the prompt becomes more purchase-oriented. Owners often want one score because one score feels decisive. I find the split more honest.

For the advisory composite, the visibility score would have looked encouraging. The firm appeared often enough to notice. Accuracy was mixed but passable. Category fit was the weak point. Commercial usefulness dropped whenever the answer framed the work as ordinary startup help. A founder reading those answers might still inquire, but the expectation would be lower, and the comparison set would be messier.

That is the uncomfortable part. Misclassification can reduce perceived value before the buyer reaches the website.

Measuring the thinning before touching the copy

The first measurement step is plain: run prompt families that vary the buyer’s situation. I do not trust a single prompt asking “who is the best consultant for X.” It is too theatrical. It invites a list. Better to observe several prompt families: problem-aware prompts, category-aware prompts, local prompts, comparison prompts, and buying-intent prompts. The exact language depends on the market.

Then I record the shelves. I do not only ask whether the firm appeared. I ask what class of provider the answer seemed to assign. Did it describe the firm as a specialist adviser, a general consultant, an agency, a vendor, a coach, a freelancer, a clinic, a platform, or something else? Did that class change when the problem became more specific? Did the same substitute words return?

One rough run is not enough. A model can cough. A tool can over-weight a stray page. A local query can pull in directory language that distorts the answer. I want repeated behaviour across enough prompts to see a groove forming. The groove is the thing worth acting on.

Only after that do I touch copy. The best fixes are often evidence clarifiers: a sharper service label, a more precise first sentence, a case note tied to a buying decision, a comparison phrase that says what the firm is not competing with without sounding defensive. I prefer small changes because small changes can be measured. Broad rewrites turn the ledger into fog.

The question I ask before recommending changes

When a firm is misclassified, I ask one question before I recommend any wording change: what did the answer need to preserve for a buyer to understand the buying decision?

For a regulatory advisory firm, it may need to preserve risk language, launch context, regulated product categories, and the distinction between judgment and paperwork. For a studio, it may need to preserve design intent, specification responsibility, technical performance, and collaboration with architects or developers. For a consultant, it may need to preserve the decision being improved, not merely the task being performed.

This question keeps the work from becoming cosmetic. We are not trying to make the answer “sound better” in a vague way. We are trying to stop the system from filing expensive expertise under ordinary help. The difference is small in language and large in money.

Sometimes the answer system keeps the firm on the right shelf after a few clarifying changes. Sometimes it improves only in certain prompt families. Sometimes it still drifts because the surrounding web describes the market poorly. The data is subtler than owners would like. Still, a measured misclassification is better than a praised misunderstanding. At least then we know where the name is being rubbed thin.

Ledger Mark

Ledger Mark — The answer named the firm, then shelved it with broader help. The risk is a warm description that lowers the buyer’s expected price and precision before inquiry. Next cue: track whether specialist terms survive when prompts move from general discovery to purchase intent. Marked: a visible firm can still be commercially misfiled when the answer keeps the name and loses the shelf.