Choosing the Right Model for the Right Job
The decision framework most developers never build
The idea of a “best model” is comforting.
It is also wrong.
There is no best model.
There is only the best fit for a specific job.
Once you accept that, model selection stops being emotional and starts being engineering.
Why model choice actually matters
Picking the wrong model shows up immediately:
- unnecessary cost
- slow responses
- inconsistent outputs
- hallucinations in edge cases
- infrastructure that is bigger than the problem
Most failures blamed on “LLMs” are actually selection failures.

What model selection should be based on
A model should be chosen using clear constraints, not vibes.
At minimum, you need to evaluate:
- Task type
Code, summarization, extraction, classification, planning - Context window requirements
How much input does the model need to see at once - Accuracy expectations
Good enough vs must-be-correct - Latency constraints
Interactive vs background processing - Cost budget
Per request, per user, per month
Example rules that work in practice:
- use Llama or Qwen for structured, repeatable tasks
- use Claude when large input understanding and coherence matter

Why one model is never enough
Production systems need more than a single benchmark score.
Real selection requires:
- benchmarking on your own data
- A/B testing outputs
- latency profiling under load
- consistency checks across retries
- long-context stress testing
One model rarely performs best across all dimensions.
Accepting that early saves months of refactoring later.

How I think about model tiers
This mental model keeps systems predictable:
- small models
extraction, classification, tagging, routing - medium models
coding, reasoning, transformations - large models
strategy, planning, multi-step logic, synthesis
This is not about power.
It is about alignment between task and capability.
The real takeaway
Model selection is an architectural decision.
Treating it like a preference is how systems drift into chaos.
When models are chosen intentionally, AI systems become boring.
And boring is exactly what production needs.

Closing
This post is part of InsideTheStack, focused on building AI systems that behave predictably under real constraints.
Follow along for more.
#InsideTheStack #ModelSelection