Rafter - OWASP Top 10:2025 — A06 Insecure Design for AI-Generated Code

Your /api/refund endpoint works. It checks the session, validates the order ID, calls Stripe, returns 200. The LLM that wrote it never thought to ask: "should one user be able to refund another user's order?" Nothing in the diff is wrong in the SQL-injection sense. The whole feature is wrong in the business-logic sense. That's A06. You can't grep for it, your linter won't catch it, and the model that generated it is statistically biased to ship the happy path and stop.

What A06 actually is

Insecure Design is the bug that exists before a single line of code. The architecture allows the bad thing to happen — there is no place in the system where you could even add a check, because the design never asked the question.

Concrete example: a "forgot password" flow that lets you submit a username and email, and emails the reset link to whatever email you typed. The code does exactly what it was asked to do. The design forgot to ask "what stops me from resetting someone else's password to my email?" No amount of input validation fixes this. You redesign the flow (send to the email on file, or nowhere) or you stay broken.

Why AI-generated code trips on it

LLMs are trained on code that compiles and tutorials that demo. Tutorials don't model adversaries. So the model gives you:

Endpoints with no rate limits. Login, signup, password reset, OTP send, "contact us" — anything that could be brute-forced or abused as a free SMS pump. Nothing in the prompt said "limit this," so nothing limits it.
Object access by ID with no ownership check. GET /api/invoice/:id looks up the invoice, returns it. Whose invoice? The model didn't ask. (This is also where A06 hands off to A01 Broken Access Control.)
State machines with missing edges. "Mark order as paid" doesn't check the order is still pending. You can re-pay a refunded order, cancel a shipped one, etc.
Cost-uncapped abuse vectors. AI features especially: a public endpoint that calls an LLM with the user's prompt, no auth, no budget cap, no per-IP limit. One curl loop drains your OpenAI account by morning.
"Magic" platform helpers. Next.js server actions, Supabase RLS-off-by-default, Firebase rules set to true during prototyping. The framework makes the unsafe thing one line shorter than the safe thing, and the model picks the shorter one.

The pattern: the model implements the feature you described and nothing else. Design holes are invisible to it because they're not in the prompt.

The fix on agentic CLIs (Claude Code, Codex)

Don't ask the CLI to "add security." Ask it to enumerate misuse, then close the gaps. A prompt that works:


For each route in src/api/, list:
1. Who is allowed to call it (auth required? role? ownership of the resource?).
2. What rate limit applies, and where it's enforced.
3. What happens if the action is replayed, or called out of order.
4. What it costs us in $ if an attacker hits it 1M times in an hour.

Then produce a diff that closes every gap, plus an integration test that
asserts a non-owner gets 403 and the 11th request in a minute gets 429.

That second paragraph is the one most people skip. "Find the bugs" produces a list. "Find the bugs and write the test that proves the fix" produces a working patch.

When Rafter flags an A06 finding (missing rate limit, missing ownership check, abusable webhook), the Copy-for-AI button hands your CLI a prompt that already includes the file, the line, the threat, and the acceptance criteria. Paste, review the diff, run the test. Don't accept the diff without reading it — the same model that wrote the bug is the one proposing the fix.

The fix on opinionated platforms (base44, Greta, OpenClaw, Replit-style)

You usually can't open middleware.ts and add a rate limiter. You have to make the platform's agent do it for you, in the platform's vocabulary. What works:

Name the misuse, not the mitigation. "Add rate limiting" gets ignored or done cosmetically. "A user should not be able to call this endpoint more than 10 times per minute, and should not be able to read or modify another user's records — show me where this is enforced, or add it" gets traction.
Demand the enforcement location. Ask the platform to tell you which layer enforces the check — DB row-level security, API middleware, generated function. If the answer is "the frontend hides the button," that's not enforcement.
Test from outside the platform. Use curl or any HTTP client against the deployed URL with a second account's session, or no session at all. The platform's preview pane is not a security test. The platform's preview pane is a lie about who the request came from.
Treat AI features as cost surface. If your app exposes an LLM call, ask the platform how per-user quotas, prompt-size limits, and monthly caps are enforced. If it can't answer, assume zero of them exist.

You don't need to read the underlying code. You do need to refuse vague reassurance. "It's secure by default" is not an answer; "RLS is enabled on table X with policy Y, here's the SQL" is.

OWASP Top 10:2025 — A06 Insecure Design for AI-Generated Code

What A06 actually is

Why AI-generated code trips on it

The fix on agentic CLIs (Claude Code, Codex)

The fix on opinionated platforms (base44, Greta, OpenClaw, Replit-style)

See also

OWASP Top 10:2025 — A06 Insecure Design for AI-Generated Code

What A06 actually is

Why AI-generated code trips on it

The fix on agentic CLIs (Claude Code, Codex)

The fix on opinionated platforms (base44, Greta, OpenClaw, Replit-style)

See also