TL;DR

A May 2026 Google whitepaper says software teams are moving from writing code line by line toward expressing intent and supervising AI-generated implementation. Its central claim is that the model is only a small part of agent performance, while tests, tools, prompts, context and oversight carry most of the weight.

Google has published a new whitepaper, The New SDLC With Vibe Coding, arguing that software engineering is shifting from writing code directly to expressing intent and verifying machine-generated output, a change the authors say is already reflected in widespread use of AI coding agents by professional developers.

The paper, written by Addy Osmani, Shubham Saboo and Sokratis Kartakis, says 85% of professional developers regularly use AI coding agents, 51% use them daily and about 41% of new code is AI-generated. Those figures are presented in the paper as evidence that AI assistance has moved from experiment to routine engineering practice.

The authors draw a distinction between casual “vibe coding” and more disciplined “agentic engineering.” In their framing, vibe coding means using loose prompts, accepting output with light review and relying on repeated fixes when the result fails. Agentic engineering, by contrast, places AI-generated work inside formal specifications, automated tests, evaluation systems, CI gates, tooling and human architectural oversight.

The paper’s most pointed claim is that the model itself is only a small part of agent performance. It describes an agent as the combination of a model and a surrounding harness: prompts, tools, context, hooks, sandboxes, observability and review systems. The authors cite examples in which benchmark performance improved after teams changed the harness while keeping the same model.

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Verification Becomes The New Work

The paper matters because it shifts attention away from model choice alone and toward the engineering systems that govern AI output. If the authors are right, software teams that treat AI coding as a prompt-and-paste workflow may face higher maintenance, security and review costs than teams that invest in tests, evals and controlled deployment paths.

For engineering leaders, the argument also changes the spending question. The paper frames casual AI coding as low upfront cost but high long-term cost, because errors, token-heavy repair loops and unclear ownership can accumulate. It frames agentic engineering as higher upfront investment in specifications, context and evaluation, with the possible payoff of cheaper and more reliable delivery over time.

Coding with AI For Dummies (For Dummies: Learning Made Easy)

As an affiliate, we earn on qualifying purchases.

From Vibe Coding To Agents

The phrase “vibe coding” was popularized by Andrej Karpathy in 2025 to describe a loose style of building software by prompting an AI system and accepting its outputs with limited inspection. The Google paper argues that the term has since been stretched too broadly and should be understood as one end of a spectrum.

At the other end, the authors place agentic engineering: AI systems that work within explicit constraints and are checked by automated tests, evals and human judgment. The paper says tests are suited to deterministic checks, while evals are needed for less predictable agent behavior, including whether an agent chose appropriate tools or met a quality bar.

The supplied source material also characterizes the paper as partly strategic for Google. It says the concepts are broadly tool-agnostic, while the commercial paths point toward Google’s Gemini, Jules and Agent Development Kit products.

“Generation is solved; verification, judgment, and direction are the new craft.”
— Osmani, Saboo and Kartakis, according to the Google whitepaper

Portable Mini Inductor Tester, Type-C Powered High Precision Mainboard Coil Testing Tool, Fast Inductance Fault Detection Diagnosis Repair Tool for Mobile Phone Electronic Components-2 Pcs

Instant Inductor– In-Circuit Friendly：Simply bring the sensing tip close to the target inductor directly on the board –…

As an affiliate, we earn on qualifying purchases.

Benchmark Gains Need Scrutiny

Several claims remain dependent on how the paper’s cited studies and benchmarks were measured. The source material says one team moved from outside the Top 30 to the Top 5 on Terminal Bench 2.0 by changing only the harness, and that a LangChain experiment improved performance by adjusting prompts, tools and middleware. Those examples support the paper’s thesis, but they do not prove that the same gains will generalize across all codebases, teams or production environments.

It is also not yet clear how many companies have the test coverage, observability, security review and process discipline needed to make agentic engineering work at scale. The paper presents a direction of travel, not a settled industry standard.

AI-assisted Software Development: A Pragmatic Operating Model for Safe Adoption in Regulated Environments

As an affiliate, we earn on qualifying purchases.

Teams Face Build Choices

The next step for software teams is likely to be practical rather than theoretical: deciding how much of the AI coding harness to build, buy or standardize internally. That includes rules for model routing, context management, test generation, eval design, security scanning and human approval.

The paper also points to a vendor contest over the new software development stack. Google’s framing gives teams a way to evaluate AI coding systems beyond model quality alone, but it also directs attention toward Google’s own tools. Readers should watch whether future benchmarks and enterprise case studies support the paper’s claim that disciplined harness design, rather than model upgrades alone, determines production results.

Teaching with AI: A Practical Guide to a New Era of Human Learning

As an affiliate, we earn on qualifying purchases.

Key Questions

What is the actual news in this story?

Google published a May 2026 whitepaper arguing that AI is changing the software development life cycle by moving developers from direct code writing toward intent-setting, supervision and verification.

What does “the model is only 10%” mean?

It means the paper claims agent behavior depends heavily on the surrounding harness: prompts, tools, context, tests, evals, sandboxes and observability. The exact percentage is a framing device from the paper, not an independently settled industry measurement.

How is agentic engineering different from vibe coding?

Vibe coding refers to casual prompting with limited review. Agentic engineering uses AI inside a controlled process with specifications, automated checks, evals, CI gates and human oversight.

Why should engineering leaders care?

If the paper’s argument holds, the cost and reliability of AI-assisted development will depend less on picking one model and more on building strong verification systems around whichever models teams use.

What is still uncertain?

It remains unclear how widely the reported benchmark gains will apply in production codebases and whether most organizations can afford the upfront investment needed for disciplined agentic workflows.

Source: Thorsten Meyer AI

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Steam Cleaner Premium Buying Guide for Healthier Home Cleaning

Author

The Earlier Stuff Team

The model is only 10%

Verification Becomes The New Work

Coding with AI For Dummies (For Dummies: Learning Made Easy)

From Vibe Coding To Agents

Portable Mini Inductor Tester, Type-C Powered High Precision Mainboard Coil Testing Tool, Fast Inductance Fault Detection Diagnosis Repair Tool for Mobile Phone Electronic Components-2 Pcs

Benchmark Gains Need Scrutiny

AI-assisted Software Development: A Pragmatic Operating Model for Safe Adoption in Regulated Environments

Teams Face Build Choices

Teaching with AI: A Practical Guide to a New Era of Human Learning