Enterprise AI

The Best AI Solution Is Often Not the Obvious One

First-principles thinking for enterprise AI: a field story from CPG.

Vineet Kumar

Chief Executive Officer, AffineJune 5, 2026 4 min read

There's a trap that most enterprise AI teams fall into: we have a problem, we have frontier LLMs, so we have a solution. It's an easy reflex. The models are genuinely good. But this reflex is how you end up with expensive, brittle systems that don't survive real-world conditions.

Here's a story from CPG field ops that makes this concrete.

The problem

Field reps for a large CPG company conduct in-store shelf audits. Product placement, pricing, promotions, out-of-stock detection, planogram compliance, competitive shelf activity. The volume is huge: thousands of store visits, hundreds of SKUs, data that needs to be captured and acted on fast.

The obvious approaches and why they don't work

First instinct: build a dedicated CV model. Train it on product images. Except CPG portfolios don't sit still. New products launch. Packaging refreshes. Regional variants multiply. A trained CV model goes stale the moment the catalogue changes. Retraining at the pace of CPG innovation is not operationally viable. It doesn't scale.

Second instinct: route shelf images to a frontier multimodal model. GPT-5o, Gemini 3.5 etc. These models can describe product images well. But the unit economics fall apart fast. At the query volumes of a national field force, per-call API costs stack up into serious OpEx. And frontier models carry cold start and latency overhead that makes them a poor fit for high-volume, time-sensitive field workflows.

The right approach

The Affine team asked a more basic question: what does this problem actually need?

It needs to match an image to a known catalogue. That's a similarity problem, not a generation problem. And similarity problems have clean, scalable solutions that don't require an LLM at all.

Here's what the solution looks like. Embed the full product catalogue using a vision-language embedding model. Every SKU, every variant, every packaging version stored as a vector in an index. At inference time, the shelf image goes through a lightweight detection pass that crops individual products. Each crop gets embedded and matched against the catalogue index using cosine similarity. The result is fast, accurate product identification without the cost or complexity of a frontier model call.

Accuracy is high because the model is never asked to understand the product, only to find its nearest neighbor in embedding space. Cost is a fraction of a frontier API approach. And when new SKUs hit the catalogue, updating the index is just an embedding operation, not a retraining cycle.

What this is really about

The most effective enterprise AI work is not about using the most powerful tools. It's about decomposing the problem correctly before picking a tool.

LLMs are the right call for tasks that need language understanding, reasoning, synthesis, or generation. They are the wrong call, and expensive overkill, for tasks that are fundamentally retrieval or matching problems or can be solving using conventional ML which still remain a very robust technology.

This is the kind of problem Affine was built to solve. The kind where the right answer requires stepping back before stepping in. Where first-principles thinking beats default instincts. And where the difference between a proof of concept and a production system is not which model you picked, but whether you asked the right question first.

Share on LinkedIn All Perspectives

New here?

This is Affine.

Who We Are

Fifteen years of production AI. The full lifecycle partner, engineered for production.

Explore

What We Build

Production-grade AI agents and accelerators deployed across the enterprise.

Explore

Proof, Not Promises

Measurable outcomes from real enterprise AI deployments.

Explore

Have a problem worth solving?

The hard ones, with no existing playbook, are the ones we were built for. Let’s talk through yours.

Talk to an Expert