AI integration · Scoping · 8 min read

AI integration in 2026: the eleven questions to scope before you hire

Published May 14, 2026 · By Komlan Kouhiko

Every week a business owner asks us to "add AI" to a product. Half the time the right answer is yes; the other half it's a calculator, a workflow tweak, or a search box dressed up in a chatbot UI. Here's the actual scoping checklist we run before quoting an AI integration.

Before scope: what are you actually replacing?

Almost every "add AI" request maps to one of three patterns: replace a human reading something (triage, classification, summarisation), replace a human writing something (drafting, formatting, follow-up), or replace a human deciding something (routing, prioritising, suggesting next actions). The pattern dictates everything that follows — accuracy targets, guardrail strictness, audit needs, human-in-the-loop placement.

If you can't tell me which of those three you're replacing, the project isn't ready to scope. That's a discovery conversation, not a quote.

The eleven questions

1. What's the cost of being wrong?

An AI that misroutes a support ticket costs an extra reply. An AI that misclassifies a legal document costs a lawsuit. The cost-of-wrong dictates whether the AI gets to act autonomously, suggest with one-click confirm, or only draft for human review. We've never seen a project where this question was clearly answered upfront and the build went sideways later.

2. What's the input quality you actually have?

Most AI projects fail not on the model but on the data feeding it. PDFs that are scans-of-scans, unstructured notes typed years ago by different staff, a CRM with three different spellings for every customer name — all of these will torpedo a model that demoed beautifully on clean test data. Scoping has to start with a sample of the real inputs, not a curated demo set.

3. What does "correct" look like, measurably?

If you can't write an eval — a hundred input/expected-output pairs — you can't ship production AI. The eval is the regression suite that catches when GPT-5 silently changes behaviour, or when your prompt edit breaks the case you forgot about. No eval means you're flying blind every time the model is updated.

4. Will it run autonomously, ever?

Autonomous AI (no human gate before it acts) is the highest-stakes shape. We build it sometimes, but only when there's a confidence threshold, a rollback path, and audit logs that let you reconstruct every decision. Most projects shouldn't start autonomous — start with human-in-the-loop, measure the confidence distribution, then graduate the high-confidence cases to autonomous later.

5. Which model, and what's the swap plan?

GPT-4, GPT-4o, Claude Sonnet, Claude Opus, Gemini, open-weight models on your own infra — each has tradeoffs around cost, latency, context window, structured output, and behavior under adversarial input. The right answer almost never is "the latest model." It's "the model that meets the accuracy threshold at the lowest cost-per-call." Plan to swap; never hard-code a single provider.

6. Where does the context come from?

Naive prompting ("here's the user's question, answer it") only works for trivial tasks. Production AI needs context: the user's recent activity, the relevant documents, the company policy, the prior conversation. That context comes from either a long system prompt (cheap, brittle), retrieval-augmented generation against a vector DB (richer, more moving parts), or tool calls into your live data (most powerful, most complex). Scope determines which.

7. What's the latency budget?

An AI answer that takes eight seconds is fine in an email draft, dead in a chat UI. If the surface is real-time, you're picking faster models (Haiku, GPT-4o-mini) and streaming tokens. If it's batch, you can use the most capable model and accept ten-second waits. Scoping needs to ask which.

8. What's the audit and rollback story?

When the AI does something a customer complains about, can you reconstruct exactly what input it saw, what tools it used, what response it generated, and which model version produced it? If not, the build isn't production-ready. We instrument every AI call into Sentry + a dedicated audit table from day one — it's not bolt-on later.

9. Cost ceiling per user per month?

A copilot that costs 30 cents per active user per month is sustainable. One that costs 30 dollars only works if you're charging hundreds. We model the unit cost during scoping — token count per call × call frequency × model price × buffer for retries — and price the engagement against that ceiling.

10. Who owns the prompts and the data?

We don't keep client prompts, evals, or training data. Everything we build for you is your IP, in your repo, exportable to a different vendor on day one if you decide to swap us out. If a vendor wants to gate your prompts behind their UI or hold your fine-tuned weights, that's a vendor-lock trap.

11. What does "done" look like?

AI projects can balloon if the success criteria are fuzzy. Define done as: a specific eval set passes at a specific threshold, the cost-per-call is below a specific number, the latency p95 is below a specific number, and a real user has completed a real task through the new flow without help. Three measurable lines, agreed in writing before the build starts.

What we won't quote against

We won't quote an AI build where the buyer expects 100% accuracy on an unbounded input space. We won't quote one that has no human gate and no audit trail. We won't quote one where the dataset is sensitive but there's no privacy review in scope. Honest scoping kills these projects on day one — it's cheaper than killing them in week six.