ai code review Archives

If you have ever shipped an AI-built product to production, you already know the truth — not all AI coding tools generate the same kind of code. One gives you something a senior engineer would call “fine.” Another gives you something that looks great in a demo and dies in production. A third produces code so dense and idiomatic that even your in-house team takes a week to understand what changed. This guide compares the three most-used AI coding tools of 2026 — Cursor, Claude Code, and Bolt — at the level that actually matters: what the code looks like when real users hit it. We are Triple Minds, and we run Vibe Coding Cleanup Services for founders who shipped fast and now have to fix the bill — so we see the output of all three, every single week, in their messiest production state. This article distils what we have actually found.

The promise of every AI coding tool is the same: write a short prompt, get a working feature. The reality is that “working” hides an enormous range. We took one realistic production prompt — a TypeScript Next.js 14 endpoint that uploads a profile photo to S3 — and ran it through Cursor, Claude Code, and Bolt with no follow-up edits. We then graded the outputs the way we grade them in our cleanup engagements: code cleanliness, security, type safety, performance, observability, and production readiness. The results are not subtle. There are real, measurable differences between the three, and the differences map directly to how much it costs to clean up the codebase later.

👉 Already shipping with one of these tools and worried about the code quality? Triple Minds offers a fixed-price $499 Cleanup Audit — five days, written report, severity per finding, and a fixed quote for the cleanup itself. Book a free 30-minute call →

The Methodology

The same prompt was given to each tool, no system prompt customisation, no follow-up clarifications, default model where a choice existed (GPT-4 / Sonnet 4.5 / Bolt’s default), single-shot generation. We then evaluated the output against a production-grade rubric used in our cleanup audits — covering 11 dimensions including input validation, type safety, error handling, observability, secret management, idempotency, and AWS-specific best practices. The same rubric is what a senior engineer or auditor would apply during code review at a Series-A SaaS or above.

One single-shot test does not capture a tool’s full capability — Cursor in particular is meant to be used iteratively, with chat refinements. But it does capture what most “vibe coders” actually do: type a prompt, accept the output, ship it. That is where production failures start, and that is what this article measures.

The Prompt

Create a TypeScript Next.js 14 App Router API route that accepts a
profile photo upload, validates it (max 5MB, PNG or JPEG only), stores
it in S3, and returns the public URL.

This is deliberately a “Tuesday afternoon” prompt — the kind a founder or junior engineer types when they need a feature in a hurry. It exercises every common production failure surface: untrusted input, file handling, AWS credentials, error states, and URL construction.

Round 1 — What Cursor Generated

Cursor produced a tight, idiomatic Next.js handler. It is the kind of code a mid-level engineer writes when the spec is clear and they are moving fast. Notably, Cursor inferred sensible defaults from the surrounding (empty) project context — App Router conventions, async/await, environment-variable pattern.

// app/api/upload-photo/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'

const s3 = new S3Client({ region: process.env.AWS_REGION! })
const BUCKET = process.env.S3_BUCKET!
const MAX_SIZE = 5 * 1024 * 1024

export async function POST(req: NextRequest) {
  const formData = await req.formData()
  const file = formData.get('photo') as File | null

  if (!file) {
    return NextResponse.json({ error: 'No file' }, { status: 400 })
  }
  if (file.size > MAX_SIZE) {
    return NextResponse.json({ error: 'Too large' }, { status: 400 })
  }

  const allowed = ['image/png', 'image/jpeg']
  if (!allowed.includes(file.type)) {
    return NextResponse.json({ error: 'Invalid type' }, { status: 400 })
  }

  const key = `profiles/${Date.now()}-${file.name}`
  const buffer = Buffer.from(await file.arrayBuffer())

  await s3.send(new PutObjectCommand({
    Bucket: BUCKET,
    Key: key,
    Body: buffer,
    ContentType: file.type,
  }))

  return NextResponse.json({
    url: `https://${BUCKET}.s3.amazonaws.com/${key}`,
  })
}

What’s right

Correct App Router signature, idiomatic NextResponse.json() usage.
Pulls bucket and region from env — does not hardcode.
Constants for limits, easy to find and adjust.
Lean, readable, no dead code.

What’s missing for production

Trusts file.type — the client sets that header. An attacker uploads .exe with image/png in the request and your bucket is now hosting malware.
Uses file.name in the S3 key — a filename of ../../etc/passwd.jpg is a real attack surface depending on downstream consumers.
No magic-byte verification — the only thing protecting you from non-image content is the client.
No auth check — anyone with the URL can upload.
Hardcoded URL pattern — breaks for buckets in non-default regions, or if you front the bucket with CloudFront later.
No structured logging — when this fails in production you have no breadcrumb.
No try/catch around s3.send() — an AWS-side timeout returns a generic 500 with no useful detail.

Cursor’s signature failure mode: code that looks clean and reads clean, but assumes the input is trusted. Excellent for working inside a hardened codebase that already has middleware doing validation. Dangerous when used to scaffold a new endpoint from zero.

Round 2 — What Claude Code Generated

Claude Code produced the longest output of the three by a meaningful margin. It validated environment variables at module load time, used fileTypeFromBuffer to sniff the actual binary, and added cache headers and a separate public-URL base so the bucket can move behind CloudFront later without changing the API contract.

// app/api/upload-photo/route.ts
/**
 * Profile photo upload endpoint.
 * - Max 5 MB
 * - PNG or JPEG only (verified by reading magic bytes, not the
 *   client-supplied MIME)
 * - Stored under a non-guessable, sanitised key
 */
import { NextRequest, NextResponse } from 'next/server'
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
import { randomUUID } from 'crypto'
import { fileTypeFromBuffer } from 'file-type'

const REGION       = process.env.AWS_REGION
const BUCKET       = process.env.S3_BUCKET
const PUBLIC_BASE  = process.env.S3_PUBLIC_BASE_URL // e.g. https://cdn.example.com

if (!REGION || !BUCKET || !PUBLIC_BASE) {
  throw new Error('Missing required AWS / CDN environment variables')
}

const s3 = new S3Client({ region: REGION })
const MAX_BYTES = 5 * 1024 * 1024
const ALLOWED_MIME = new Set(['image/png', 'image/jpeg'])

export async function POST(req: NextRequest) {
  // TODO: replace with your auth helper. Reject anonymous uploads.
  // const userId = await requireUser(req)

  let formData: FormData
  try {
    formData = await req.formData()
  } catch {
    return NextResponse.json({ error: 'Invalid multipart payload' }, { status: 400 })
  }

  const file = formData.get('photo')
  if (!(file instanceof File)) {
    return NextResponse.json({ error: 'Missing field: photo' }, { status: 400 })
  }
  if (file.size === 0 || file.size > MAX_BYTES) {
    return NextResponse.json(
      { error: `File must be between 1 byte and ${MAX_BYTES} bytes` },
      { status: 400 }
    )
  }

  const buffer = Buffer.from(await file.arrayBuffer())

  // Verify by magic bytes — never trust the client-supplied MIME.
  const detected = await fileTypeFromBuffer(buffer)
  if (!detected || !ALLOWED_MIME.has(detected.mime)) {
    return NextResponse.json(
      { error: 'Only PNG or JPEG images are allowed' },
      { status: 415 }
    )
  }

  const ext = detected.ext === 'jpg' ? 'jpeg' : detected.ext
  const key = `profiles/${randomUUID()}.${ext}`

  try {
    await s3.send(
      new PutObjectCommand({
        Bucket: BUCKET,
        Key: key,
        Body: buffer,
        ContentType: detected.mime,
        CacheControl: 'public, max-age=31536000, immutable',
      })
    )
  } catch (err) {
    console.error('[upload-photo] S3 put failed', err)
    return NextResponse.json({ error: 'Upload failed' }, { status: 500 })
  }

  return NextResponse.json({ url: `${PUBLIC_BASE}/${key}` })
}

What’s right

Magic-byte validation via fileTypeFromBuffer — actual content type, not what the browser claims.
Env validation at boot — fails loud at deploy time, not in production at request time.
Cryptographically random keys — no path traversal, no enumeration, no collisions.
Separate public-base URL — CloudFront / custom domain ready.
Correct status codes (415 for unsupported media type, not just 400).
Try/catch around S3 with structured logging — production-debuggable.
Cache headers on the upload — saves CDN bandwidth automatically.
Comments explain “why,” not “what” — a readable hand-off.
Polite TODO marker for the auth integration with a hint at the helper name.

What’s missing for production

Auth is still a TODO — but Claude flagged it explicitly, which is the right behaviour for an unknown codebase.
No rate limiting — would need IP-or-user middleware. Reasonable to leave to the framework layer.
Throws at module load if env is missing — correct behaviour for production but can crash the dev server in a way some teams find annoying.

Claude Code’s signature: defensive code that anticipates the production-grade questions a senior reviewer would ask. Sometimes verbose. Almost always closer to mergeable than the alternatives.

Round 3 — What Bolt Generated

Bolt produced a complete, runnable file with imports, AWS client setup, and what looks like a sensible upload flow. It also produced — in a single shot, with no warnings — five distinct production-grade issues we would mark “Critical” or “High” in a security audit.

// app/api/upload/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import fs from 'fs';
import path from 'path';

const s3Client = new S3Client({
  region: 'us-east-1',
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID || '',
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY || '',
  },
});

export async function POST(request: NextRequest) {
  try {
    const formData = await request.formData();
    const file: any = formData.get('photo');

    if (!file) {
      return NextResponse.json({ error: 'No file uploaded' }, { status: 400 });
    }

    // Check file size
    if (file.size > 5 * 1024 * 1024) {
      return NextResponse.json({ error: 'File too large' }, { status: 400 });
    }

    // Check file type
    const fileName = file.name;
    const fileExt  = path.extname(fileName).toLowerCase();
    if (fileExt !== '.png' && fileExt !== '.jpg' && fileExt !== '.jpeg') {
      return NextResponse.json({ error: 'Only PNG and JPEG allowed' }, { status: 400 });
    }

    const bytes  = await file.arrayBuffer();
    const buffer = Buffer.from(bytes);

    // Save locally first
    const tempPath = path.join('/tmp', fileName);
    fs.writeFileSync(tempPath, buffer);

    // Upload to S3
    const uploadParams = {
      Bucket: 'my-app-uploads',
      Key: `profiles/${fileName}`,
      Body: buffer,
      ContentType: file.type,
    };

    const command = new PutObjectCommand(uploadParams);
    await s3Client.send(command);

    // Clean up temp
    fs.unlinkSync(tempPath);

    return NextResponse.json({
      success: true,
      url: `https://my-app-uploads.s3.amazonaws.com/profiles/${fileName}`,
    });
  } catch (error) {
    console.error(error);
    return NextResponse.json({ error: 'Upload failed' }, { status: 500 });
  }
}

What’s right

Wraps the whole flow in a try/catch — at least it will not crash the request.
Reads the buffer correctly.
Correct AWS SDK v3 usage.

What’s broken — security and production-readiness

🚨 Bucket name hardcoded as my-app-uploads. No env var. Any environment promotion (dev → staging → prod) silently goes to the wrong bucket.
🚨 Region hardcoded as us-east-1. Same problem.
🚨 Filetype check by extension only. Rename shell.sh to shell.png and it passes.
🚨 Full path traversal vector via file.name being trusted as the S3 key. ../../passwords.txt stores at profiles/../../passwords.txt — and depending on the bucket policy, that may be readable from outside the profiles/ prefix.
🚨 Filename collisions overwrite. Two users upload profile.jpg, the second replaces the first.
⚠️ Unnecessary local file write to /tmp — adds IO, race-condition risk, and may not even work on serverless deploy targets like Vercel.
⚠️ Untyped file: any — ejects from the TypeScript type system for the whole function.
⚠️ Trusts file.type for ContentType — same MIME-spoofing issue as Cursor, but on top of the extension-only check.
⚠️ No env validation — empty AWS keys silently fall through and produce confusing 500s.
⚠️ Generic error log — console.error(error) with no request context.

Bolt’s signature: code that looks like a working scaffold, but every shortcut a junior would take is taken. Hardcoded values, extension-based file checks, trust of client input, and the unmistakable smell of “I copied an old StackOverflow answer.”

Side-by-Side: Code Cleanliness Scorecard

Below is the rubric we use during a Cleanup Audit. Score 0–3 per dimension; 33 points possible.

Dimension	Cursor	Claude Code	Bolt
Type safety	2	3	0
Input validation	1	3	0
Magic-byte check	0	3	0
Env-var handling	2	3	0
Error handling	1	3	2
Logging / observability	0	2	1
S3 key safety	1	3	0
Status codes	1	3	1
Public-URL portability	0	3	0
Comments / readability	2	3	1
Production deployability	2	3	0
Total / 33	12	32	5

Claude Code’s lead is not subtle. Bolt’s score is consistent with what we measure during real cleanup engagements — Bolt-generated code is almost always the most expensive to clean up per line.

Side-by-Side: Security Audit

Vulnerability class	Cursor	Claude Code	Bolt
MIME-spoofing	❌ Vulnerable	✅ Patched (magic bytes)	❌ Vulnerable (extension only)
Path traversal via filename	⚠️ Partial (timestamped but uses raw name)	✅ Patched (UUID key)	❌ Fully vulnerable
Filename collision / overwrite	⚠️ Mitigated (timestamp prefix)	✅ Eliminated (UUID)	❌ Fully vulnerable
Anonymous upload	❌ No auth check	⚠️ Marked as TODO	❌ No auth check
Hardcoded credentials / paths	✅ None	✅ None	❌ Bucket + region hardcoded
Empty-credentials silent fail	✅ Throws (non-null assertion)	✅ Throws at boot	❌ Falls through with empty string
Sensitive data in logs	N/A (no logging)	✅ Tag without payload	⚠️ Logs raw error object

One single-shot prompt produced five Critical-or-High security issues in Bolt’s output. In a real production codebase with twenty endpoints written this way, the cleanup is not a matter of “fixing a bug” — it is a matter of rewriting your security model. This is the single biggest reason Bolt-generated apps dominate our cleanup engagements.

Side-by-Side: Performance & Production Behaviour

Behaviour	Cursor	Claude Code	Bolt
Memory profile	Single buffer, ~5 MB peak	Single buffer, ~5 MB peak	Double buffer (memory + /tmp)
Cold-start safe (Vercel / Lambda)	✅ Yes	✅ Yes	❌ No (writes to /tmp)
CDN-ready response	❌ No cache headers	✅ `max-age=31536000, immutable`	❌ No cache headers
S3 fail behaviour	500 with no detail	500 with logged context	500 with raw error logged
Backpressure / streaming	❌ Buffers entire file	❌ Buffers entire file	❌ Buffers + writes to disk

None of the three streamed the upload. For a 5 MB cap that is acceptable. For a system that later grows to 50 MB CSV uploads or 500 MB video, all three need to be re-architected — but Bolt’s /tmp write breaks first, on serverless platforms that disallow filesystem writes outside /tmp or that have aggressive cold-start cleanup.

Pricing — What You Actually Pay

Tool	Free tier	Mid tier	Top tier	Best for
Cursor	2k completions / mo, slow GPT-4	$20 / mo (Pro) — fast GPT-4 / Sonnet, unlimited slow	$40 / mo (Business) — admin / SSO / privacy mode	Editing inside an existing repo
Claude Code	Free tier on Claude.ai web	$20 / mo (Pro) for Claude.ai · API metered for Claude Code CLI	$200 / mo (Max) — high context, priority capacity	Multi-file refactors and architecture reasoning
Bolt	1M tokens / mo, attached to `bolt.new`	$20 / mo (Pro) — 10M tokens	$50–$200 / mo (Pro+ tiers) — 26M–120M tokens	Greenfield prototypes you will throw away

The headline numbers are deceptive. The actual cost of an AI tool is (subscription + the cleanup bill your code will generate). Based on engagements we have priced:

Cursor-built code: typically $3,000 – $6,000 cleanup for a small SaaS — moderate refactors, mostly architectural tightening.
Claude Code-built code: typically $1,500 – $4,000 cleanup — usually only the integration glue and some env / DevOps work.
Bolt-built code: typically $8,000 – $15,000+ cleanup — security rewrites, data-model fixes, and full DevOps setup are standard.

If you have already shipped a Bolt-generated MVP and you are seeing the symptoms — slow endpoints, security warnings, customer-reported bugs — you are not alone, and you do not need a rebuild. Hire Triple Minds for Vibe Coding Cleanup Services from $4,000 fixed-price.

Cleanliness Score — One-Number Summary

Tool	Code cleanliness	Security	Production-ready out of box	Cleanup cost (relative)
Cursor	★★★★☆	★★★☆☆	~70%	1.5×
Claude Code	★★★★★	★★★★★	~92%	1×
Bolt	★★☆☆☆	★☆☆☆☆	~25%	3–4×

Best-For Use Cases

Use Cursor when…

You already have a hardened codebase with middleware, validation, and conventions.
You need fast, surgical edits — refactor a function, rename across files, add a small feature.
You have a senior reviewer in the loop on every PR.
You are an existing engineer using AI to go faster, not a non-engineer using AI to ship a product.

Use Claude Code when…

You are designing or refactoring at the architecture level.
You want production-shape code from the first prompt, not the third.
You are working on something with security or compliance implications (auth, payments, file uploads, PII).
You are willing to read a longer output for the trade of fewer surprises later.

Use Bolt when…

You are prototyping something a customer will see for 30 minutes and never again.
You are validating a design / UX hypothesis, not a backend.
You explicitly do not plan to ship the generated code to real users.
You will hand the result to a senior engineer (or a Vibe Coding Cleanup Specialist) before any real traffic touches it.

The Verdict

If you forced us to pick one tool to run a startup on, today, with no in-house senior engineer, the answer is Claude Code. Not by a margin you can argue with. Not because it is hyped. Because the code it produces requires the least cleanup before it can be put in front of paying users — and cleanup, not generation, is what eats founder time.

If you are an existing engineering team and you want a daily-driver editor, Cursor is excellent. It is not as defensive as Claude Code, but it is faster and fits inside the editor where most of your work already happens. Pair it with a strict ESLint config, a CI gate, and a senior reviewer and the gap closes meaningfully.

If you are a founder using Bolt to ship to real customers, please hear us: it is built for prototyping. The output we have analysed is consistent with what we see in every cleanup engagement — fast to demo, expensive to operate. If you have already shipped, that is fine. The fix is not a rewrite. It is a structured cleanup, and we do those for a living.

What This Means for Your Codebase

Whichever tool produced your code, the question that matters is the same: can it survive real users, real load, real audits? The way to answer that is not by reading the code yourself — that is the same lens that wrote it. The way to answer it is by giving it to a third party who has cleaned up hundreds of these and knows the failure patterns by sight.

Triple Minds runs Vibe Coding Cleanup Services for startups, AI SaaS founders, marketplace operators and clone-app businesses who shipped fast and now need to harden. We have audited code from Cursor, Claude Code, Bolt, Lovable, v0, Replit Agents, and the AI co-pilot of every other framework you have heard of. Our cleanup engagements run $4,000 to $8,000 fixed-price, deliver in 2–4 weeks, and almost always avoid a full rewrite.

🚀 Ready to find out where your codebase actually stands?

Book a free 30-minute consultation with Triple Minds. We will tell you which of the patterns above are in your code, what they will cost to leave alone, and what they will cost to fix.

Book Your Free Audit Call →

Quick Answers to Common Questions

Is Cursor really better than Claude Code, or just faster?

Cursor is faster for inline edits inside an existing project. It is not better at producing complete, defensive, production-grade code from a single prompt. Both tools are useful for different jobs — Cursor for daily-driver editing, Claude Code for architecture and one-shot scaffolding.

Can I use Bolt for production at all?

You can. Many teams have. The pattern that works is: use Bolt for the first 70% of the build, then export and hand it to engineers (in-house or an agency like Triple Minds) for hardening before launch. Treat Bolt’s output as a scaffold, not a finished product.

How do I know if my AI-generated codebase needs cleanup?

Common signals: features take longer than they should to ship, your team is afraid to touch certain files, security scanners report issues you do not understand, performance degrades as users grow, or a senior engineer left with no documentation. Any one of those is enough to book a Cleanup Audit. Multiple signals means it is overdue.

What does a Triple Minds Cleanup Audit cover?

Static analysis, security scanning, performance probing, schema review, API consistency check, DevOps maturity score, and a written report with severity per finding. Five days, $499, includes a 30-minute walkthrough call and a fixed-price quote for the cleanup itself. More on the Cleanup Services page.

Will switching from Bolt to Claude Code fix my existing codebase?

No — switching tools changes what you generate next, not what is already in your repo. The existing code still has whatever issues it has. Cleanup is a separate engagement.

Do you sign NDAs before reviewing my code?

Yes. We sign whatever NDA you have. We work in your private GitHub / GitLab / Bitbucket org with reviewers you control, and you can revoke access at any time.

Which AI coding tool is best for non-technical founders?

For prototyping: Bolt or Lovable. For getting real working software: pair Claude Code with an actual engineer reviewing every PR, or skip the AI tool and hire one. Almost every “non-technical founder ships solo with AI” story has a hidden chapter where they pay $10K+ to clean it up later.

How long does a typical cleanup take?

Most engagements ship the first cleaned-up production deploy in 10–25 days. Full handover (with documentation, CI/CD, monitoring, and runbooks) inside 4 weeks. Larger marketplaces and clone-style products may need 8–12 weeks for the full Enterprise tier.

Who actually does the cleanup work?

Senior engineers led by a Vibe Coding Cleanup Specialist consultant who scopes and oversees the engagement. You see the same person from kickoff to handover. Meet the team on the cleanup services page.

Stop Vibing. Start Shipping Code That Survives.

The fastest way from “AI-built MVP” to “production-grade product” is not to throw it all away. It is to give it to a team who has cleaned up dozens of these before, ask them what is broken, and let them fix it on a fixed-price plan you can budget for.

That is what Triple Minds does. Whichever tool wrote your code — Cursor, Claude, Bolt, or anything else — we will tell you in 5 days exactly what is broken, what is salvageable, and what it costs to fix.

👉 Visit the Vibe Coding Cleanup Services page for the full process and pricing.
👉 Or book a free 30-minute call directly — we’ll tell you what camp your codebase is in.

The Methodology

The Prompt

Round 1 — What Cursor Generated

What’s right

What’s missing for production

Round 2 — What Claude Code Generated

What’s right

What’s missing for production

Round 3 — What Bolt Generated

What’s right

What’s broken — security and production-readiness

Side-by-Side: Code Cleanliness Scorecard

Side-by-Side: Security Audit

Side-by-Side: Performance & Production Behaviour

Pricing — What You Actually Pay

Cleanliness Score — One-Number Summary

Best-For Use Cases

Use Cursor when…

Use Claude Code when…

Use Bolt when…

The Verdict

What This Means for Your Codebase

Quick Answers to Common Questions

Stop Vibing. Start Shipping Code That Survives.

Our Services

About Triple Minds

Consultation

Development

Marketing

Tech Stack Consultation

Business Consultation

Product Consultation

Market & Trend Analyze

IT & Infrastructure

Emerging Technologies

App Development

Web Development

Product Engineering

Software Development

Lead Generation

Social Media Marketing

Video, Reels and Shorts

Review Management & Branding

Analytics & CRO

Our Services

Consultation

Tech Stack Consultation

Business Consultation

Product Consultation

Market & Trend Analyze

IT & Infrastructure

Development

App Development

Web Development

Product Engineering

Software Development

Emerging Technologies

Marketing

Lead Generation

Social Media Marketing

Video, Reels and Shorts

Review Management & Branding

Analytics & CRO

White Label

Industries

The Methodology

The Prompt

Round 1 — What Cursor Generated

What’s right

What’s missing for production

Round 2 — What Claude Code Generated

What’s right

What’s missing for production

Round 3 — What Bolt Generated

What’s right

What’s broken — security and production-readiness

Side-by-Side: Code Cleanliness Scorecard

Side-by-Side: Security Audit

Side-by-Side: Performance & Production Behaviour

Pricing — What You Actually Pay

Cleanliness Score — One-Number Summary

Best-For Use Cases

Use Cursor when…

Use Claude Code when…

Use Bolt when…

The Verdict

What This Means for Your Codebase

Quick Answers to Common Questions

Stop Vibing. Start Shipping Code That Survives.