in American tradition is the next:
“You may’t have your cake and eat it too.”
I discover this sentence extraordinarily poetic but in addition very sensible and helpful. The message of this saying is simple: every thing you accomplish is achieved via a tradeoff, as every thing has a worth.
The philosophical dialogue is out of scope for this text, however the sensible penalties of those concerns are very a lot consistent with information science and software program engineering on the whole. Let me clarify.
In software program engineering and information science, there isn’t a such factor because the “good design” per se. The identical algorithm that’s improbable for a given utility fails miserably in others.
Consider the computation versus reminiscence tradeoffs within the following circumstances:
It makes a variety of sense to precompute the gap between two cities and retailer them in a dataset, and it doesn’t make sense to compute them on the flight. It’s because you anticipate the dataset to be fairly low upkeep (cities don’t simply transfer round usually), and it could be silly to compute the gap between New York and San Francisco each fraction of a second. [Case A]
Nevertheless, it could be equally silly (and possibly unimaginable) for a chatbot to memorize all of the attainable questions a human can ask and pull the reply to that query every time it’s requested. It’s because the character of the issue is way more dynamic, and it requires an “on the fly” computation. [Case B]
In Case A, we’re sacrificing reminiscence and getting extraordinarily fast computation. In Case B, we’re spending extra computational time, however we aren’t utilizing any “question” reminiscence.
Are you able to get no computational time and no reminiscence? Not likely, as a result of you’ll be able to’t have your cake and eat it too 🙂
However let’s take a much less apparent and extra “fashionable” instance. Let’s speak about Giant Language Fashions (LLMs).
LLMs are probably the most highly effective AI fashions we’ve, and they’re educated on all of the information obtainable to the world. They’re additionally large. They’re truly so massive that we hardly ever have them in-house, and we often invoke them via APIs. Nevertheless, API name = tokens = value.
Now think about you need to use an clever system to select one of the best restaurant for tonight. You’ll ask ChatGPT one thing like: “Are you able to present me with a great Italian restaurant that’s not tremendous costly however romantic and in a great location?”
Now, think about if the GPT mannequin needed to discover all of the eating places within the universe and determine if they’re Italian, not costly, in a great location, and near your house. Finest-case state of affairs: you’d spend hundreds of thousands in tokens, and also you’d already be in mattress by the point the computation is executed.
Nevertheless, we additionally don’t need to fully surrender all of the juicy, natural-language interpretative, and information-retrieving energy of the LLMs. The hot button is that, to be able to use the LLM and get sensible info, we are able to’t use probably the most clever a part of the pipeline on a regular basis (that will be like having your cake and consuming it too).
On this article, I’m going to provide you a recipe for these sensible, LLM-improved suggestion programs, utilizing the restaurant suggestion instance we had been doing as a use case.
The enter of this technique would be the consumer’s description of their splendid restaurant in a selected metropolis, and the output will likely be a set of really useful eating places.
Let’s get began!
1. System Design
The cake saying we mentioned can also be recognized in engineering because the Accuracy-Scale-Time triangle:
- You may make one thing correct and on a large dataset, however it is going to be sluggish
- You may make one thing correct and quick, however it received’t scale nicely on a big dataset
- You may make one thing quick and scale nicely, however it received’t be that correct.

After all, we would like our outcomes to be in the end correct, so choice 3 alone received’t minimize it. Nevertheless, we are able to refine choice 3 with a extra correct mannequin on prime of the primary one. In different phrases, Possibility 3 can provide us a great checklist of candidates with a small computational time, and we are able to choose probably the most correct checklist of suggestions utilizing a Giant Language Mannequin.
In different phrases, the design appears to be like like this:
- A fast and easy search will discover the highest Okay closest eating places (rule-based, excessive recall, low precision)
- A sluggish, very clever Giant Language Mannequin will assist us select, among the many prime Okay, one of the best primarily based on the question. (AI-based, excessive precision)
By doing this, we aren’t losing money and time on the sluggish LLM, however we’re nonetheless getting their smartness by utilizing them on a particular checklist of candidates.
Sufficient yapping. Let’s begin coding!
2. The Script
2.1 The Setup
I did the soiled work behind the scenes for you 🙂
Every thing is written in an object-oriented programming (OOP) trend, with scripts and a pipeline that may deal with the entire course of. The GitHub folder is this one, and to be able to generate the remainder of the code, you’ll be able to clone it and use this import block right here:
2.2 Knowledge Era
Earlier than we are able to advocate something, we’d like one thing to advocate. In an actual system, we’d use a restaurant database in an S3 location. For this text, we generate an artificial one so the entire thing is totally reproducible and free to run.
That is the job of the RestaurantDataGenerator class inside datagenerator.py. It builds a reproducible desk of ~10,000 eating places scattered throughout eight cities (New York, San Francisco, Chicago, Austin, Seattle, Boston, Miami, and Denver). Every restaurant will get:
– a randomly assembled title
– a metropolis and a latitude/longitude sampled round that metropolis’s heart (inside ~13 km),
– a delicacies fashion (Italian, Japanese, Mexican, Thai, French, …),
– a dietary profile (omnivore/vegetarian/vegan)
– an common rating
– a variety of votes
– a worth vary (10 / 100 / 1000, an order-of-magnitude common ticket per particular person).
This generator is supposed to run as soon as. Producing the information is so simple as:
That single name writes the desk to information/eating places.csv, that appears like this:
Excellent, now that we’ve our eating places, let’s see how we are able to advocate them.
2.3 Producing the Candidates
That is Stage 1 of the funnel: a budget, fast, rule-based checklist of candidates. The consumer tells us which metropolis they’re in, and we preserve solely the geographically closest eating places. The code filters the desk right down to the town, computes the great-circle distance from the consumer to each restaurant, and identifies the N_DISTANCE_CANDIDATES (50 by default).
This stage is intentionally excessive recall, low precision. With this method, we are able to run over the entire desk (10k eating places) with out a single API name and token prices. Positive, we don’t do something significantly sensible or fancy right here, however we are literally filtering all the information that’s not a possible candidate for the consumer. That alone is an enormous deal.
For instance, let’s attempt an actual request to the search:
“low-cost vegan tacos with a full of life environment” in a number of cities
That is the output:
Discover how the shortlist under has no thought about “vegan”, “low-cost” or “tacos”: it solely is aware of about distance. Nevertheless, that is okay, because the purpose of this stage is to create an in-the-right-city start line that the LLM will rerank in Stage 2.
Let’s prepare for the LLM!
2.4 Deciding on the Candidates
That is Stage 2, the sluggish, clever, LLM-driven, high-precision finish of the funnel. This builds instantly on prime of the 50-restaurant shortlist from 2.3. The LLM by no means sees the complete 10,000-row desk; it solely ever sees the small, already-relevant slice that the gap filter handed it.
We discuss to the mannequin via a small OpenAI shopper. The hot button is learn from OPENAI_API_KEY (saved within the setting). The recommender, outlined as RestaurantRecommender, runs on the question and on the town via RestaurantRecommender.recommender(question,metropolis):
A few issues are price calling out:
- Precision goes up. Stage 1 was excessive recall, low precision: it returned the 50 closest eating places whatever the request. Stage 2 truly reads the question (low-cost vegan tacos with a full of life environment), discards every thing that doesn’t match, and returns solely one of the best 5 to 10 with an sincere
fit_score.
- Structured output with Pydantic. We by no means parse free-form textual content. The mannequin is compelled to reply within the form of a Pydantic mannequin (through OpenAI structured outputs), so each response is assured to match the schema.
The output schema carries the restaurant_id and title (from the candidates), a fit_score, worth between 0 and 100, and a brief motive. The response can also be wrapped with a pleasant abstract. Working the decision for our three cities offers, for instance:
For those who discover, that is significantly better than the uncooked distance shortlists from 2.3. There, the closest restaurant in every metropolis was an primarily random match (Korean, Lebanese, Mexican-but-vegetarian). Right here, the mannequin has reordered the identical 50 candidates round what we truly requested for: vegan and Mexican locations float to the highest with excessivefit_scores, and the mannequin is sincere when nothing is an ideal match, marking partial matches down and explaining why within the motive. That’s the precision the LLM buys us, utilized to a shortlist sufficiently small to remain low-cost at scale.
3. Outcomes
Let’s step again and take a look at what the two-stage funnel truly purchased us, utilizing the identical request throughout three cities: “low-cost vegan tacos with a full of life environment”.
- Stage 1 offers us the checklist of candidates. The gap shortlists from 2.3 had been excessive recall and low precision by design.
- Stage 2 identifies the true suggestions. Feeding the 50 candidates from Stage 1 to the LLM reorders them round what was truly requested.
Listed here are the ultimate picks the mannequin returned for every metropolis:
- New York: Golden Spoon (vegan, 4.9) and Maison Fork (Mexican, in finances) rise to the highest with match scores of 90 and 85.
- Miami: Royal Tavern & Co. (vegan, Mexican, reasonably priced) leads at 85.
- Boston: City Spoon and Little Home, each finances Mexican spots, take the highest two slots at 90 and 85.
In each metropolis, the mannequin promoted the candidates that matched the vegan, low-cost and Mexican/tacos intent, and it was sincere about imperfect matches: locations that nailed the weight-reduction plan however not the delicacies (or vice versa) had been stored as backups with visibly decrease fit_scores.
4. Conclusions
Thanks for spending time with me, it means lots. ❤️ Right here’s what we’ve performed collectively:
– Constructed a two-stage suggestion funnel that’s each scalable and clever.
– Used an inexpensive, rule-based distance filter (Stage 1) to chop 10,000 eating places right down to the closest 50.
– Used an LLM rerank (Stage 2) to show these 50 candidates into one of the best 5 to 10, with an sincere rating and motive for every.
In lots of actual initiatives, a funnel just like the one we constructed right here is often very talked-about. These sorts of programs are very scalable, because the LLM is used correctly, and clever, as we’re utilizing fashions that may perceive the context very effectively.
7. Earlier than you head out!
Thanks once more to your time. It means lots. My title is Piero Paialunga, and I’m this man right here:

I’m initially from Italy, maintain a Ph.D. from the College of Cincinnati, and work as a Knowledge Scientist at The Commerce Desk in New York Metropolis. I write about AI, Machine Studying, and the evolving position of information scientists each right here on TDS and on LinkedIn. For those who preferred the article and need to know extra about machine studying and observe my research, you’ll be able to:
A. Observe me on Linkedin, the place I publish all my tales
B. Observe me on GitHub, the place you’ll be able to see all my code
C. For questions, you’ll be able to ship me an electronic mail at piero.paialunga@hotmail















