RAG vs fine-tuning for small businesses: the honest comparison
Fine-tuning sounds technical and impressive, so consultants pitch it. Retrieval-augmented generation (RAG) sounds boring, so it doesn't get the slide deck. But for 90% of small-business AI projects, RAG is the right answer and fine-tuning is overkill. Here's how to tell which one you actually need.
The one-paragraph version
RAG = the AI looks things up in your documents at query time. Fine-tuning = the AI is trained on your data so it embeds the knowledge. RAG is cheaper, faster to iterate, easier to debug, and lets you change the source data without retraining. Fine-tuning is worth it when you need the AI to mimic a specific style, follow a specific format, or behave consistently in ways prompt engineering can't reliably enforce. Most small businesses need the first; consultants pitch the second.
When RAG is the right answer
- You want the AI to answer questions over your knowledge base, SOPs, product catalog, or customer history.
- Your source content changes regularly (weekly product updates, evolving policies, fresh CRM data).
- You need to cite the source of every answer — RAG can point at the document, fine-tuning can't.
- You want to add or remove documents without a retraining cycle.
- Your team is small and you'd rather iterate on prompts and chunking than on training pipelines.
When fine-tuning is worth the cost
- You need the AI to consistently produce output in a very specific format that prompts can't reliably enforce (rare in practice — most format problems are solvable with structured outputs or function calling).
- You're hitting context-window limits even with smart retrieval, and the base model genuinely needs to embed the knowledge.
- You're working in a domain with a specific vocabulary that the base model struggles with (medical, legal, technical specs).
- Your usage volume is high enough that the cost of always-on retrieval starts to exceed the cost of training.
What RAG actually costs to build right
A production-ready RAG pipeline isn't "embed your documents and call it done." The real components: ingestion that handles your actual document formats (PDF scans, Word with embedded tables, Notion exports), chunking that respects semantic boundaries (not just "every 1000 characters"), embedding generation, vector DB storage (we typically use Supabase pgvector for cost and operational simplicity), retrieval that combines vector similarity with metadata filters, reranking to push the most relevant chunks into context, and prompt engineering that handles the edge cases gracefully.
All-in, expect $10k-$35k for a properly built small-business RAG system, plus $50-$300 a month for the vector DB and embedding API depending on your document volume.
What fine-tuning actually costs
Fine-tuning is where the surprise bills live. The OpenAI fine-tuning API charges per training token, and to get a meaningful result you typically need 500-5000 high-quality input/output pairs — that's a labelled dataset someone has to create. Then there's ongoing inference cost: fine-tuned models cost more per call than the base model. Then the gotcha: when OpenAI deprecates the base model (which they do regularly), you re-tune. A serious fine-tuning project is $25k-$80k of build + a multiple of base inference cost forever after + a re-tuning cycle every model deprecation.
The hybrid that actually wins
Almost every "we need fine-tuning" conversation lands on RAG + structured outputs + a careful system prompt as the actually-correct answer. You get the consistency people want from fine-tuning, the freshness and citability of RAG, and the ability to swap models when better ones ship. We've done this pattern for both of our own AI products and for client work — the only fine-tuning cases we've actually shipped are domain-specific classifiers where the model needs to behave consistently in ways prompting couldn't lock down.
Red flags in fine-tuning pitches
- The vendor talks about fine-tuning without first asking what your source data looks like.
- There's no discussion of how the fine-tuned model gets re-tuned when the base model is deprecated.
- The vendor can't articulate when RAG would have been the correct choice.
- The pitch leans heavily on "the AI learns your business" — fine-tuning doesn't really do that; it adjusts response style, not knowledge depth.
- There's no discussion of evals to detect when the fine-tuned model regresses.