The statistic circulates in two halves, and people reliably quote the comforting one. LLMs produce code that compiles successfully about 90% of the time. The other half: roughly 45% of that code contains OWASP Top 10 class vulnerabilities, the industry’s standard list of the most common serious security flaws, things like bypassable login checks and injection bugs. Worth being precise about what that does and doesn’t mean before it either panics you or gets dismissed.
What the number is not
It’s not a claim that 45% of AI-built apps get hacked, or that generated code is uniquely bad; human developers write vulnerable code too, which is why the OWASP list exists. And it’s not an argument that AI coding tools are useless, since a developer who reviews the output catches much of this in the normal course of work.
What it does mean: flip a coin per generation, and that’s roughly the odds the code contains a flaw from the industry’s best-known catalogue of serious vulnerabilities, while compiling and demoing perfectly. The flaw costs nothing in visible behavior. That’s the trap.
Why working code isn’t safe code
AI models optimize for quick visual success, because that’s what the prompt asked for and what the builder can verify by looking. Security is precisely the property you can’t see from the preview window, and the documented failure modes follow that gradient. Access control gets implemented in the browser rather than on the server, so a user can bypass the check by modifying code on their own machine; the app looks identical either way. Database permissions get configured too broadly so things work on the first try, leaving data exposed if any other part of the app is compromised. Secrets get hardcoded into files because managing environment variables is exactly the kind of invisible chore models and beginners both skip, and those credentials then get pushed to public GitHub repos. And the utility flows that carry real security weight (password recovery, OTP verification, session management) simply don’t get built, because nobody demos them.
The builder’s trust position makes it worse. A non-technical operator can’t read the generated code to verify any of this, and manual testing exercises the happy path, not the bypass. The app that “works perfectly” and the app that leaks every customer’s records can be the same app.
What to check before real users log in
A minimum audit, in plain terms. First, server-side enforcement: confirm that what a user can see is decided on the server or in database rules, not by hiding buttons in the UI. Second, database posture: rules should deny by default and grant narrowly; on Supabase-backed tools like Lovable, that means actually reading the RLS policies, not trusting that the prompt produced them. Third, secrets hygiene: search the repo for hardcoded keys and connection strings, especially if the project ever synced to a public repository. Fourth, the forgotten flows: password reset, session expiry, and signup restrictions exist and behave. Fifth, the second-user test: create two accounts and try to reach one user’s data from the other’s session, including by editing URLs directly. None of this is exotic; all of it is exactly what generation skips.
And note the maintenance clause: this isn’t a one-time check. Every regeneration is a fresh roll of the same dice, so the fix loop on an auth-adjacent feature means re-verifying the layer you already verified, and paying for the round that re-verifies it. That re-verification burden is one face of the Day Two problem: the audit you run today is the audit you run again after the next change.
The honest fork
That audit list is the real cost of the 45% number, and there are exactly two ways to pay it. Own the review: you or a developer you trust reads the generated code, every round, treating the AI as a fast junior whose security work always gets checked. That’s a reasonable position for developers and a fiction for everyone else. Or take the layer off the table: for business apps like portals and internal tools, build on a platform such as Softr, where auth, permissions, and data access rules are visually configured platform infrastructure rather than per-project generated code, so the 45% lottery never applies to the layer that matters most. What’s not a position is the popular third option: generating, glancing at the demo, and shipping. The statistic exists because that’s what most people do.