I used to be requested to do one thing new at work: Given an information dump of unstructured textual content knowledge, give us an in depth PDF report of insights about what prospects are saying about our merchandise this quarter.
So I wrote a transparent immediate. Gave Claude an in depth set of directions. Fed it the dataset. It gave me an output. I delivered it.
However when the stakeholder and I reviewed the deliverable in depth, we observed some more and more unsettling issues.
Claude was confidently fallacious.
Not fallacious fallacious, like hallucinating details from nowhere. Extra like… overconfident fallacious. It might generate a quarterly perception report and say one thing like:
“Damaging sentiment within the Clothes division elevated 23% this quarter, indicating a major shift in buyer satisfaction that warrants quick consideration from the product crew.”
Sounds nice. Besides that spike was pushed virtually totally by a single well-liked merchandise that launched mid-quarter with a identified sizing defect. One product. Not the entire division.
Claude had no concept. And my immediate didn’t inform it to care.

A Quarterly Buyer Overview Report Talent
I’m going to stroll you thru a Claude talent I constructed that generates a quarterly buyer sentiment report from unstructured product overview textual content, delivered as a PDF to stakeholders.
Clearly, I received’t be sharing the precise dataset I analyzed at work. The dataset I’m utilizing is the Girls’s E-Commerce Clothes Opinions dataset from Kaggle (CC0 license). It incorporates 23,000 actual, anonymized buyer evaluations throughout clothes departments (Tops, Clothes, Bottoms, Jackets, and extra) with textual content, star scores, and product metadata. References to the corporate within the evaluations have been changed with “retailer.”
The talent ought to:
- Learn a filtered slice of evaluations for the present quarter
- Group them by division
- Determine developments & considerations
- Write an expert abstract PDF for the product management crew
Right here’s the unique immediate:
You’re a knowledge analyst producing a quarterly buyer sentiment report for a ladies’s clothes e-commerce retailer. Given this quarter’s buyer evaluations (together with overview textual content, star scores, and division), write an expert stakeholder report that features:
– An total sentiment abstract for the quarter
– Key themes by division (Tops, Clothes, Bottoms, Jackets)
– 2-3 standout insights from the overview textual content
– A quick advice for the product crew
Be skilled and clear.
While you’re performed with this job, please create a talent titled reviews-analysis and save your directions in there.
What “Confidently Fallacious” Really Appears Like
Right here’s an instance of what Claude produced with the naive talent above, on 1 / 4 the place the Clothes division had an inflow of damaging evaluations:
“Damaging sentiment within the Clothes division elevated considerably this quarter, with prospects incessantly citing match and sizing points. This implies the retailer’s sizing requirements could also be drifting from buyer expectations — a pattern that, if unaddressed, may erode model loyalty on this key class.”
The actual rationalization? One gown (a single SKU) launched in Week 7 with a batch high quality subject. The evaluations had been virtually totally about that one merchandise. The remainder of the Clothes division was performing positive.
Claude didn’t essentially invent something. It simply had no context for why the sample existed. And with out that context, it did what LLMs do: it crammed the hole with essentially the most plausible-sounding narrative.

The Repair: 4 Strains You MUST Embody
Line 1: Inform Claude What Context It’s Lacking
You do NOT have entry to product launch calendars, stock information, promotional campaigns, or particular person SKU-level historical past. Do NOT attribute department-level developments to brand-wide causes. Report patterns you observe within the textual content; don’t clarify why they exist except the evaluations themselves make it unambiguous.
This single instruction eliminates an enormous class of assured wrongness. With out it, Claude will all the time attain for a strategic narrative as a result of that’s what analyst does, and Claude is attempting to be analyst.
The issue is {that a} good analyst additionally is aware of what they don’t know. They are saying “We’re seeing elevated sizing complaints in Clothes this quarter. This can be remoted to a current launch however we’d want SKU-level knowledge to substantiate.” Claude received’t say that except you inform it to.
Line 2: Outline What “Important” Really Means
Claude loves the phrase important. It makes use of it on a regular basis. And it virtually by no means defines it.
Solely flag a sentiment shift as “important” if it represents a change of greater than 15 proportion factors in constructive/damaging ratio in comparison with the prior quarter, OR if a theme seems in additional than 20% of evaluations in a given division. For smaller indicators, use language like “slight uptick” or “minor enhance.” Don’t use the phrase “notable” or “important” for something beneath these thresholds. All the time report the precise quantity worth for the shift alongside along with your declare.
You possibly can alter the 15% and 20% thresholds to no matter is smart on your knowledge. The purpose is to anchor Claude’s language to one thing actual.
With out this, Claude will name each a 3-review spike in complaints and a real 30-point sentiment drop “important”. Your stakeholders will begin to tune out. And when one thing truly important occurs, they received’t realize it.
Line 3: Power a Confidence Qualifier on Each Perception
Earlier than every perception, embrace a confidence label in brackets: [Data-Supported], [Possible], or [Speculative].
Use [Data-Supported] solely when the perception follows immediately from the overview textual content supplied. Use [Possible] when the perception is an affordable inference from the textual content. Use [Speculative] if you end up making assumptions about causes or context that aren’t current within the evaluations themselves.
Once I first added this line, I used to be anticipating largely [Data-Supported] tags. What I truly obtained was a mixture of all three, which advised me precisely how a lot Claude had been filling in gaps in my earlier experiences with out me realizing it.
An instance of what the output appears to be like like after including this line:

Now your stakeholders can see precisely what’s strong and what’s a guess. That’s a way more trustworthy report.
Line 4: Require Claude to State the Limits of the Evaluation
On the finish of the report, embrace a bit referred to as “What This Report Can not Inform You.” Record 2-3 issues that might be wanted to attract stronger conclusions, for instance, SKU-level overview breakdowns, return charges, or repeat buy knowledge.
This line forces Claude to acknowledge the perimeters of its personal evaluation. And it provides your stakeholders a transparent roadmap for what questions to analyze additional, which is definitely essentially the most helpful factor an analyst can do.
Right here’s the output:

Find out how to Use Claude to Refine the Talent
Writing a talent as soon as isn’t sufficient. You’ll want to take a look at it and enhance it the identical means you’d iterate on a mannequin.
Step 1: Run the talent on identified examples.
Filter the dataset to a time window the place you already know what occurred. (1 / 4 with a product recall, a seasonal promotion, a interval with unusually excessive return charges, and so forth.) See what Claude says. Does it use the phrase “important” appropriately? Does it state details/statistics the place it ought to?
Step 2: Feed Claude its personal output and ask it to audit.
Claude is sweet at catching its personal overconfidence once you explicitly ask it to search for it.
Here’s a quarterly buyer sentiment report generated by an AI analyst. Overview each perception on this report and flag any that:
– Make causal claims with out direct proof within the overview textual content
– Use phrases like “important” or “notable” with out justification
– Attribute particular person product points to brand-wide developments
– Assume context not current within the dataset (launch calendars,
stock, buy historical past)
For every flagged merchandise, recommend a revised model that’s extra appropriately hedged.
Step 3: Add a clause for every failure you discover.
Each time Claude produces a report with a clearly fallacious or overconfident perception, ask it so as to add a brand new constraint to your talent. Over time, your talent just about turns into a document of the whole lot Claude will get fallacious.
A Phrase of Warning
Including constraints to your talent can typically make Claude produce an output the place each single sentence ends with “…although extra knowledge can be wanted to substantiate this.”
That’s not helpful both.
The aim is calibrated confidence the place the energy of Claude’s language matches the energy of the proof. Should you discover Claude changing into overly wishy-washy, you possibly can add a counterbalancing constraint:
Don’t over-qualify each assertion. If a sample seems clearly and persistently throughout many evaluations, state it plainly and embrace references to the info behind the sample. Reserve qualifiers for genuinely unsure or speculative claims.
Conclusion
Claude is spectacular at producing professional-looking experiences, which may typically be the issue.
The polish hides the overconfidence. Your stakeholders see clear formatting and authoritative language, and so they assume the insights are strong even after they’re not.
The 4 strains I’ve walked via right here don’t make Claude much less succesful. They make it extra trustworthy. And in a reporting context, trustworthy is extra helpful than spectacular.
Learn extra about what different use instances Claude is sweet for right here, together with constructing dashboards, debugging, and writing documentation:
→ 3 Claude Expertise Each Knowledge Scientist Wants in 2026
Thanks for Studying
Join with me on LinkedIn
Purchase me a espresso to assist my work!
















