Engineering case study / Proveo

Building a photo trust system where AI identifies the evidence but never decides the verdict.

Service contractors sell with before/after photos. The problem is anyone can fake them. Proveo needed a trust layer that was verifiable by design — not just a policy.

Next.js + SupabaseGPT-4o VisionProvenance engineMulti-model architecture

The problem with before/after photos in home services

Before/after photos are the #1 sales tool for service contractors — detailers, painters, pressure washers, landscapers. They're also the easiest thing to fake. A contractor can screenshot work from someone else's portfolio, claim it as their own, and build an entirely fabricated track record. Clients viewing a portfolio can't tell. There's no metadata they can inspect, no timestamp they can trust, no way to verify the work actually happened. The naive answer is a watermark or a timestamp overlay. Neither proves anything. Watermarks can be cropped, timestamps can be Photoshopped in. The problem isn't labeling — it's evidence.

Layer 1 — Visual detection with GPT-4o

The first AI layer solves a UX problem: contractors uploading two photos shouldn't have to manually label which is "before" and which is "after." GPT-4o Vision analyzes both images simultaneously and identifies which shows the untreated or pre-work state and which shows the finished result. The model also detects the service category (pressure washing, painting, landscaping, detailing, concrete, etc.) and generates a suggested job title — localized in the contractor's preferred language. If the platform user has their locale set to French, the suggested title comes back in French. Workspace-scoped rate limiting (30 requests/hour shared across all team members) prevents per-member multiplication from inflating AI costs unboundedly. If the OpenAI key isn't configured, the endpoint returns a safe default rather than erroring — the upload still works, the badge engine still runs.

Layer 2 — The proof state system

Every enhancement preset a contractor can apply to their photos is classified server-side into one of three trust tiers. The classification is deterministic, not AI-driven — the model can't be asked to reclassify a photo's processing level:

Trust tier — Raw

"original" preset

No processing applied. The image that viewers see is exactly what came off the camera. Eligible for both trust badges.

Trust tier — Corrected

bright_clean, cool_professional, job_site_clean, high_def, studio_light, definition

Exposure, sharpness, or white-balance corrections that don't alter the scene or add artistic treatment. These improve readability without misrepresenting the work.

Trust tier — Styled

warm, dramatic, daylight_pop, soft_glow, vivid, vintage, and 10+ others

Aesthetic treatments that change the mood or appearance in ways that go beyond correction. Trust badge eligibility is removed.

The key design choice: unknown presets fall to 'styled' by default. This is a safety bias — new presets cannot accidentally inherit badge eligibility. The trust tier must be explicitly granted in the server-side registry, never assumed.

Layer 3 — Client-side provenance (the architectural constraint)

Photos upload directly from the browser to Cloudinary. The Proveo server never receives the file bytes — it only gets the Cloudinary URL after the upload completes. This is standard practice for performance and cost, but it creates a provenance problem: if the server never sees the bytes, how does it know what was uploaded? The answer: compute the evidence before the upload. Before each photo leaves the browser, the client computes: — SHA-256 hash of the raw file bytes, using the Web Crypto API (not a dependency — native browser API) — Capture method: 'guided' (taken within Proveo's in-app camera), 'upload' (selected from camera roll), or 'import' — Capture timestamp — from EXIF DateTimeOriginal when available, falling back to File.lastModified (reliable on iOS and Android even when EXIF is stripped) — Device signal — truncated user-agent string (not PII on its own; helps detect same-device both-shots scenarios) — Guided capture session ID — a random per-session token from the parent component, issued only for guided captures This signed provenance payload is sent to the API alongside the comparison data. The server validates every field and downgrades anything malformed to the weakest method ('upload') — a bad client never accidentally earns the gold tier.

Layer 4 — The badge algorithm

With proof state and provenance in hand, the badge resolver runs five checks in sequence. All five must pass for the top tier — "Verified Authentic":

Guided capture

Both photos taken in-app, not from camera roll

Raw proof state

No enhancement applied (original preset)

Minimum time gap

> 2 seconds between before and after

Maximum time gap

< 12 hours between before and after

SHA-256 uniqueness

Before and after bytes are not identical

Any failed check downgrades to "Verified Unfiltered" (raw proof state, any capture method) or removes the badge entirely for styled/corrected output. The timestamp window — 2 seconds minimum, 12 hours maximum — is designed around real job cadence: a pressure wash takes minutes, a paint job takes hours. Two photos captured within the same second are almost certainly duplicates. Two photos 14 hours apart are plausibly from different jobs. The badge decision is entirely deterministic. AI identifies the evidence; the algorithm decides the tier.

The multi-model design

Proveo uses two models deliberately, for different reasons: GPT-4o handles visual reasoning — identifying before vs. after, detecting service category, generating captions from the composite image. OpenAI's vision model has strong spatial and compositional reasoning for image pairs. Claude handles all language and reasoning tasks — trust audit explanations, operations copilot, help center answers, onboarding guidance. Claude follows complex multi-constraint instructions reliably and produces contractor-readable prose rather than generic AI text. The trust audit agent (Claude) is the clearest example of the separation: it reads the computed badge tier and provenance signals, then explains the result to the contractor in plain language — including what they'd need to do differently to earn the badge next time. Claude interprets the evidence. It does not decide it.

What shipped

GPT-4o identifies before/after orientation automatically — no manual labeling from the contractor
Deterministic proof state registry maps every enhancement preset to a trust tier; unknown presets are safely rejected
Client-side SHA-256 hashing captures file identity before upload — the server can verify evidence it never directly received
Two-tier badge system (Verified Authentic, Verified Unfiltered) earned through a chain of evidence no model can override
GPT-4o for visual tasks, Claude for reasoning and explanation — each model doing what it does best
Trust audit agent explains badge status in plain language to contractors and tells them specifically how to improve

AI identifies what's in the photos. Deterministic code decides whether those photos are trustworthy. We never mixed those two responsibilities.

Building something where trust is a product feature, not just a policy?

This is the kind of architecture work we do — systems where AI and deterministic logic are each used where they're appropriate, not interchangeably.

Start a conversation See how we work