The Best AI Solution Is Often Not the Obvious One
First-principles thinking for enterprise AI: a field story from CPG.

There's a trap that most enterprise AI teams fall into: we have a problem, we have frontier LLMs, so we have a solution. It's an easy reflex. The models are genuinely good. But this reflex is how you end up with expensive, brittle systems that don't survive real-world conditions.
Here's a story from CPG field ops that makes this concrete.
The problem
Field reps for a large CPG company conduct in-store shelf audits. Product placement, pricing, promotions, out-of-stock detection, planogram compliance, competitive shelf activity. The volume is huge: thousands of store visits, hundreds of SKUs, data that needs to be captured and acted on fast.
The obvious approaches and why they don't work
First instinct: build a dedicated CV model. Train it on product images. Except CPG portfolios don't sit still. New products launch. Packaging refreshes. Regional variants multiply. A trained CV model goes stale the moment the catalogue changes. Retraining at the pace of CPG innovation is not operationally viable. It doesn't scale.
Second instinct: route shelf images to a frontier multimodal model. GPT-5o, Gemini 3.5 etc. These models can describe product images well. But the unit economics fall apart fast. At the query volumes of a national field force, per-call API costs stack up into serious OpEx. And frontier models carry cold start and latency overhead that makes them a poor fit for high-volume, time-sensitive field workflows.
The right approach
The Affine team asked a more basic question: what does this problem actually need?
It needs to match an image to a known catalogue. That's a similarity problem, not a generation problem. And similarity problems have clean, scalable solutions that don't require an LLM at all.
Here's what the solution looks like. Embed the full product catalogue using a vision-language embedding model. Every SKU, every variant, every packaging version stored as a vector in an index. At inference time, the shelf image goes through a lightweight detection pass that crops individual products. Each crop gets embedded and matched against the catalogue index using cosine similarity. The result is fast, accurate product identification without the cost or complexity of a frontier model call.
Accuracy is high because the model is never asked to understand the product, only to find its nearest neighbor in embedding space. Cost is a fraction of a frontier API approach. And when new SKUs hit the catalogue, updating the index is just an embedding operation, not a retraining cycle.
What this is really about
The most effective enterprise AI work is not about using the most powerful tools. It's about decomposing the problem correctly before picking a tool.
LLMs are the right call for tasks that need language understanding, reasoning, synthesis, or generation. They are the wrong call, and expensive overkill, for tasks that are fundamentally retrieval or matching problems or can be solving using conventional ML which still remain a very robust technology.
This is Affine.
Have a problem worth solving?
The hard ones, with no existing playbook, are the ones we were built for. Let’s talk through yours.
Talk to an Expert
