7 MIN READ · Pedro Thomaz

A case for not using AI — and using ML instead

Every app shipped in 2026 bolted a chat box onto something. Here's the case for the boring alternative: ranked suggestions from a small model, cheap enough to run on every screen load, explainable enough to defend in a clinic.

A case for not using AI — and using ML instead

Every product meeting in 2026 ends with the same line: "and then GPT could suggest…". We've been on the other side of that sentence enough times to write down what we actually do instead — and why, for the kind of products we build, it almost always wins.

The LLM tax nobody itemises

An LLM call is not a feature. It's a recurring cost, a latency budget, a regulatory exposure, and an evaluation problem — all rolled into one API key. On a daily-active wellness screen with 30 000 users, even a small model becomes the line item your CFO learns to spell. And that's before you count the streaming infrastructure, the prompt caching, the jailbreak monitoring, and the "as an AI language model" embarrassments your support inbox forwards back to you.

None of which is an argument against LLMs in general. It's an argument against using them for ranked suggestion lists — which is what most "AI features" actually are once you peel the wrapper off.

What a "suggestion" usually needs

Take a step back. A suggestion in a product is almost always answering the same three questions:

  1. Which candidates exist? The candidate set is finite and yours — recipes, exercises, articles, supplements, sounds, products. You wrote them.
  2. How relevant is each one for this user, right now?
  3. What's the cost of being wrong?

Notice what isn't on the list: generating new text. Most of the time you don't need novel prose — you need to rank known things well.

The boring model that beats the chatbot

For Jofit — our fitness and wellness app — we ship a small ranker that scores every candidate intervention (a supplement, a meal swap, a sleep cue, a training tweak) on three axes:

Multiply, sort, take the top. Anything in the URGENT clinical tier bypasses the ranker entirely. The whole thing runs in under a millisecond and fits in a single file.

Why a chatbot would do this worse

An LLM could generate the same recommendation. But:

The ranker has none of those problems. Every score is a function of inputs we control. We can replay last week's data against this week's model and quantify the lift. We can ask "what would happen if we doubled the weight on adherence cost?" and answer in an afternoon.

The toolbox, ordered by reach-for first

  1. Weighted scoring with hand-tuned coefficients. Boring. Effective. Two weekends to ship.
  2. Gradient-boosted trees (XGBoost / LightGBM) on engagement labels. The single best ROI in applied ML over the past decade. Trains on a laptop.
  3. Matrix factorisation for collaborative recommendations once you have a user-item history. Same family as Netflix circa 2009. Still excellent.
  4. Contextual bandits when you want to explore as well as exploit. Vowpal Wabbit will run them on the edge.
  5. A small transformer or embedding model for semantic similarity. Sentence-transformers ships a 90 MB model that beats GPT-4 on most retrieval benchmarks.
  6. LLM — only when the output is genuinely novel text the user needs to read, and you have a budget line for it.

When to actually reach for an LLM

Three legitimate cases:

The shipping order

If you're staring at a feature spec that says "AI suggestions", do this:

  1. List the candidates. If you can't, you don't have a recommendation problem — you have a content problem. Fix that first.
  2. Pick three scoring axes. Hand-tune the weights for a week.
  3. Ship it. Log every suggestion shown and every one acted on.
  4. After 4–6 weeks of data, train a gradient-boosted model on the logs. Replace the hand-tuned weights.
  5. Only then, if you still need novel prose, add an LLM — to explain, not to decide.

You'll ship faster, spend less, sleep better, and — most importantly — be able to answer the one question that matters when a user asks why they got that suggestion: here are the three numbers, and here is what they mean.