Skip to main content

For founders · AI engineering

AI agents that ship to production.

You have an AI product idea, or an automation that needs to run reliably. Prompts in notebooks are not the answer. Agent pipelines that handle real workloads, with output QC, observability, and a real backend underneath, are.

I build that layer, hand it over with a runbook, and walk away. Two to three person teams shipping AI products or AI-augmented backends are who this is for.

Section 01

What I build

Stack

Anthropic SDK, Python (FastAPI) or TypeScript, Supabase, Postgres, Stripe, Sentry. React or Next.js on top when an operator dashboard is in scope.

"Our agent works in the demo. In production it is flaky and we do not know why. I want someone who has shipped this layer before."

Six pieces. Not all of them apply to every project; the discovery call decides which ones do. What I will not ship is an agent system that punts on output validation, treats Stripe webhooks as fire-and-forget, or leaves observability for "later."

  • Multi-agent pipelines

    Orchestrator, chain runner with goal propagation, per-client memory, append-only audit log, tool-use loop (web search, web fetch, site crawl). The shape that turns a roster of prompts into a coordinated system.

  • Output QC for LLM systems

    JSON schema validation plus an LLM-as-judge with retry-and-feedback. The reliability layer that turns a flaky prototype into something you can put in front of customers.

  • Production backend underneath

    Supabase, Postgres, RLS for tenancy, Stripe Checkout with a signed and idempotent webhook ledger, edge functions for the business logic that cannot live in the database.

  • Observability and bounded cost

    Sentry on hot paths, structured logging, daily token-budget guard, audit-logged state changes. Failures visible, costs capped, retries explicit.

  • EU compliance when relevant

    GDPR Article 17 erasure and Article 20 portability endpoints, retention and orphan cleanup jobs, audit-loggable deletion. Built in, not bolted on.

  • Handover

    Migrations runbook, .env template, walkthrough. You can deploy, run migrations, invoke each agent, verify Stripe end-to-end without my hand-holding.

Section 02

How it works

Duration
Typically 3 to 6 weeks for a focused agent system
Investment
Scoped on the call. Larger and smaller engagements both fit.
  1. Step 01 · Scope

    On the discovery call, we map the agents, the tenancy boundary, the integrations, and the deadline. You get a scoped statement of work before any code is written.

  2. Step 02 · Build

    Schema and RLS first, then agents and the QC layer, then payments and observability. Preview deployments along the way; nothing gets shipped to production unwitnessed.

  3. Step 03 · Handover

    Runbook, env template, walkthrough of the system. The agents and the backend are yours to operate, extend, and rerun against new data.

  4. Step 04 · Two weeks of fixes

    Post-delivery fixes are included. Anything I missed gets fixed; small adjustments that surface in real production traffic get made.

Section 03

Where this runs

A flagship case study; the rest are on the case studies page.

Case study

AI agents SaaS for cold outreach

Cold outreach is broken at scale. Bad targeting plus generic pitches push reply rates below one percent. Mail merge personalization fools nobody. The client wanted real personalization without paying a human team to do it manually.

System eliminates 40+ hours per week of manual research and outreach work for the client. Currently running in production. Built and delivered as a one-time engagement; client owns and operates it.

Read the full case study

Next step

Tell me what you are building. Twenty minutes, fit check, not a pitch.

If what you need is not a fit for how I work, I will tell you on the call.

Book a 20-min discovery call