12 min

May 30, 2026

How AI Is Changing the Software Development Lifecycle

A phase-by-phase look at the AI SDLC: where agents insert across plan, design, code, test, review, deploy, and operate, what changes, and what stays human.

AI SDLC AI Software Development Process Agentic SDLC Future of Software Development

We The Flywheel Research & Analysis

Published May 30, 2026

✓

Key Takeaways

The phases survive; the labor inside them moves. Plan, design, code, test, review, deploy, operate all remain. What changes is that agents produce the first draft and humans shift to specifying intent and approving outcomes.
The gains are real and uneven. McKinsey measured AI roughly halving the time to write new code, while the 2024 DORA report measured a 7.2% estimated drop in delivery stability as adoption rose. Both findings are true at once.
Implementation and testing lead; planning and operations lag. Maturity falls off sharply toward the edges of the lifecycle, where context quality and controls are thinnest.
The delegation gap is the headline. Anthropic found developers use AI in about 60% of work but fully delegate only 0–20% of tasks. The unbridged 40% is where judgment lives.

Go Deeper

Definition What Is Agentic Software Development? Pillar Future of Software Development

AI is changing the software development lifecycle by moving the work inside each phase from typing to specifying, while leaving the phases themselves intact. Plan, design, code, test, review, deploy, and operate still describe how software ships. What is different in 2026 is that an AI agent now produces the first draft of the work in most of those phases, and the engineer's job moves toward stating intent, judging the result, and owning the decision. The lifecycle did not get shorter or simpler. The labor inside it relocated.

The data underneath that shift points in two directions at once, which is the only honest way to read it. McKinsey's 2023 study Unleashing developer productivity with generative AI measured AI cutting the time to write new code by nearly half and documentation time in half. The 2024 DORA report from Google Cloud measured, in the same window, an estimated 1.5% decline in delivery throughput and a 7.2% decline in delivery stability as AI adoption rose. Faster authoring and weaker delivery are not a contradiction; they are what happens when a phase gets cheaper to produce but no easier to verify.

~50%

faster to write new code with generative AI

McKinsey, 2023

60%

of developer work now uses AI, but 0–20% fully delegated

Anthropic, 2026 Agentic Coding Trends Report

−7.2%

estimated delivery stability as AI adoption rose

DORA, 2024

The Frame

What the AI SDLC Actually Is

The AI SDLC is the same seven-phase lifecycle every engineering organization already runs, with AI agents inserted as participants inside each phase rather than bolted onto the coding step alone. An agent, here, is a system that takes a goal, acts across files and tools, reads its own results, and iterates, distinct from an autocomplete assistant that suggests the next token. The companion definition of agentic software development develops that distinction in full. The point for the lifecycle is narrower: an agent can hold a task across a phase, so it can do real work in a phase, not just inside a single keystroke.

Microsoft published a concrete reference for this in February 2026: an end-to-end agentic SDLC built on Azure and GitHub, walking an AI-generated application from specification through generation, testing, deployment, and observation. The supporting tool in that workflow is GitHub's open-source Spec Kit, released in 2025, which puts a written specification at the center of the process and uses it to steer agents through implementation, checklists, and task breakdowns. The pattern there is worth naming: the specification is the human's primary artifact, and the code is increasingly the agent's. That inversion is the lifecycle change, stated plainly.

The phases that follow are walked in lifecycle order. Each notes where AI inserts, what changes, what stays human, and how mature the insertion actually is today. Maturity is not uniform, and pretending it is leads directly to the failed projects Gartner counts later in this piece.

Plan & Design

Planning: Drafting the Work

Planning is where AI inserts earliest in the lifecycle and matures slowest. An agent can take a feature brief and draft tickets, decompose a goal into tasks, and surface the files and dependencies a change will touch. GitHub's Spec Kit formalizes this by making the specification the steering document an agent reads before it writes anything. The work that gets faster is the translation from a fuzzy intent into a structured, machine-readable plan.

What stays human is the intent itself. An agent decomposes the goal it is given; it does not decide that the goal is the right one, that the deadline is realistic, or that the feature should exist. Anthropic's 2026 report frames the broader pattern as a move from writing code toward coordinating agents: architecture, system design, and strategic direction become the engineer's primary output. Planning with AI is fast at the mechanical layer and untouched at the judgment layer.

Maturity: low. Ticket drafting works; autonomous prioritization does not, and should not be trusted to.

Architecture and Design: A Sounding Board, Not an Architect

In design, AI is most useful as a fast, well-read collaborator. An agent can sketch an API surface, compare two data models, enumerate failure modes, or generate a first-pass schema from a description. It compresses the research that used to precede a design decision: what patterns exist, what the tradeoffs are, what a reference implementation looks like.

The decision stays human, and for a specific reason. Architecture is the phase where a wrong choice is most expensive to reverse and least visible at the time it is made. An agent optimizes against the constraints it was told about; it does not know the organizational debt, the team's operational maturity, or the three-year roadmap that should bound the choice. McKinsey's research is blunt that generative AI tools introduced incorrect recommendations into engineering work, which is survivable in a function body and serious in a system boundary. Use the agent to widen the option set and pressure-test a design. Do not let it pick the binding constraint.

Maturity: moderate as an assistant, low as a decision-maker. The value is in the breadth of options surfaced, not in the choice made.

"The agent moved the bottleneck. It is no longer how fast you can write the code; it is how clearly you can specify what correct looks like, and how fast you can verify the agent got there."

Thomas Prommer Group CEO & CTAiO, We The Flywheel

Build & Verify

Implementation: The Most Mature Insertion Point

Coding is where the AI SDLC is genuinely load-bearing. An agent takes a ticket, locates the relevant files, writes the change across all of them, and runs the build: the multi-file implementation loop that defines agentic coding. McKinsey measured the authoring gains directly: new code in roughly half the time, refactoring in close to two-thirds of the time, documentation in half. Those are not projections; they are measured task-completion deltas.

The scale of what an agent can hold is climbing fast. Anthropic's 2026 report cites Rakuten running autonomous modifications across a 12.5-million-line codebase at 99.9% accuracy over seven hours: a single agentic session doing work that would have been a multi-week migration. Google reported on its Q3 2024 earnings call that more than 25% of new code at the company was already AI-generated. Implementation is the phase where the technology has stopped being a demo.

The catch sits in DORA's data. The same adoption that speeds authoring tends to enlarge change batches, and larger batches carry more delivery risk, which is the mechanism behind the measured throughput and stability declines. Faster code generation without a matching investment in review and testing does not improve delivery; it degrades it. What stays human is the discipline of small, reviewable changes, which the agent will happily abandon if no one enforces it.

Maturity: high. This is the rung the rest of the lifecycle is catching up to.

Testing: Generation Is Cheap, Judgment Is Not

Testing is the second-most-mature phase and the one most often misread as fully automatable. AI is strong here in a concrete way the 2024 DORA report documents: it accelerates testing by generating test cases, improving coverage, and auto-documenting test processes and results, which improves traceability. An agent that writes code can write the tests for that code and run them to close its own loop; the verification step is what separates an agent from a code generator.

The phase does not remove QA; it relocates it. Cheap test generation makes it trivial to produce a green suite that proves very little, because the agent tends to test the behavior it implemented rather than the behavior the requirement demanded. Deciding what must be tested, what an edge case actually is, and whether a passing suite constitutes proof of correctness stays human. The DORA stability decline is in large part a testing-discipline failure: more code, generated faster, batched larger, with verification that did not scale to match. QA in the AI SDLC is the design of the verification, not the typing of the assertions.

Maturity: high for generation, unchanged for judgment. The leverage is real; the responsibility did not move.

The verification trap

An agent's confidence is uncorrelated with its correctness. A system that cannot independently verify its own output will produce convincing failures: plausible code with a plausible passing test that proves the wrong thing. This is why the 2024 DORA report's stability decline tracks AI adoption: generation outran verification. The phase to over-invest in is testing, not coding.

Code Review: AI as the First Pass

Review is the phase where AI is emerging rather than established. An agent reviews diffs for defects, proposes simplifications, and flags inconsistencies with existing patterns, a useful first pass that catches the mechanical issues before a human spends attention. DORA notes automated review and error detection among AI's positive contributions to the workflow, alongside its testing gains.

Two things keep review human. The first is accountability: someone has to own the decision to merge, and an agent cannot hold that. The second is the distrust the data already records: the 2024 DORA report found 39.2% of respondents expressed low trust in AI-generated code, which is a rational response to a reviewer that cannot be held responsible for a miss. The productive arrangement is an agent that does the first read and a human who does the last one. Review is where the AI SDLC's "agents generate, humans approve" structure is most visible.

Maturity: emerging. Valuable as a filter, not yet as a gate.

Ship & Run

Deployment: Bounded Automation

Deployment is the phase where AI inserts most cautiously, and where that caution is correct. Agents can generate and adjust CI/CD configuration, write deployment scripts, and assemble release notes from a diff. Microsoft's February 2026 reference SDLC carries its example application through to deployment and observation as part of the agent-driven loop, which shows the wiring is feasible.

What stays human is the release decision and the controls around it. Gartner's caution about inadequate risk controls lands hardest at this phase: an agent with deploy authority and weak guardrails is precisely the failure mode that gets a project cancelled. The 7.2% stability decline DORA measured is, downstream, a deployment-stage symptom of upstream batching and verification problems. Automate the mechanics of the release; keep the authority to ship, and the rollback judgment, with a human and a policy.

Maturity: moderate for mechanics, deliberately low for authority.

Operations: The Least Settled Edge

Operations is the back edge of the lifecycle and, with planning, the least mature. Agents can triage incidents, summarize logs, correlate an alert to a recent change, and draft a remediation. The appeal is obvious: incident response is pattern-matching under time pressure, which agents do well. The constraint is equally clear: operations depends on live, accurate context, and an agent acting on stale or partial signals makes confident wrong calls in exactly the moment when a wrong call is most costly.

This is where Gartner's project-cancellation forecast concentrates. The June 2025 prediction that more than 40% of agentic AI projects will be cancelled by the end of 2027 cites escalating costs, unclear business value, and inadequate risk controls, and operations is the phase where all three converge. The value is hardest to pin down, the controls are thinnest, and the cost of an autonomous mistake is highest. Treat operational agents as advisory until the context and governance around them are demonstrably solid.

Maturity: low. Real promise, thin controls, highest stakes: the combination Gartner is warning about.

Adoption & Limits

What the Adoption Data Says

The demand and the caution arrive in the same datasets, and reading only one half produces a wrong strategy. Gartner projects that 75% of enterprise software engineers will use AI code assistants by 2028, up from less than 10% in early 2023, and that 33% of enterprise software applications will include agentic AI by 2028, up from less than 1% in 2024. Anthropic's usage data shows the behavioral counterpart already in place: developers use AI in roughly 60% of their work.

The same sources draw the line. Those developers report being able to fully delegate only 0–20% of tasks, Anthropic's delegation gap, the distance between assisting with most work and owning any of it end to end. Gartner's more than 40% cancellation forecast for agentic projects by end of 2027 names the reasons: cost, unclear value, weak controls. High adoption and high cancellation are not in tension. They describe a category where interest has outrun the engineering and governance needed to make it durable.

Three limits hold across every phase above. First, verification: an agent cannot reliably judge its own correctness, so any phase that generates output needs a verification step the agent does not own. Second, context: an agent acting on incomplete information makes well-formed wrong decisions, which is why the scaffolding around the agent matters more than the model inside it. Third, judgment: agents optimize the task assigned, not the task that should have been assigned. Deciding what to build, and whether the result is right, does not delegate.

The verdict: AI is changing the software development lifecycle by relocating labor inside each phase, not by collapsing the phases. The gains are largest and safest at implementation and test generation, real but supervised at review, and deliberately bounded at design, deployment, and operations. The teams getting durable value run agents as fast, tireless, supervised participants and over-invest in the verification, context, and review scaffolding that keeps the output honest. For the underlying definition of the agents doing this work, the companion piece on agentic software development is the place to start; the broader argument lives on the future of software development pillar.

Companion Definition

Want the definition behind the lifecycle shift?

Agentic software development is the engine driving these phase-by-phase changes: the maturity ladder, the session data, and the 2026 evidence.

Read the Definition

How is AI changing the software development lifecycle?

AI is shifting from a coding aid to a participant across the whole lifecycle. Agents now draft specifications, generate implementations across multiple files, write and run tests, review diffs, and triage incidents, work that previously sat only with engineers. McKinsey measured AI cutting time to write new code by roughly half and documentation time in half, while DORA found that the same adoption can reduce delivery stability if teams skip the basics. The lifecycle stays the same; the labor inside each phase moves from typing toward specifying and reviewing.

What is the AI SDLC?

The AI SDLC is the conventional software development lifecycle (plan, design, code, test, review, deploy, operate) with AI agents inserted as active participants in each phase rather than only at the coding step. The phases and their gates do not change. What changes is who produces the first draft of the work and who reviews it: agents generate, humans specify intent and approve outcomes. Microsoft's February 2026 end-to-end agentic SDLC walkthrough on Azure and GitHub is one reference implementation of the pattern.

Which SDLC phases are most affected by AI?

Implementation and testing are the most affected and most mature. Agents write code across files and generate test cases, improve coverage, and auto-document results, per the 2024 DORA report. Code review is next, with AI as a first-pass reviewer under human supervision. Planning and operations are the least mature: agents can draft tickets and triage incidents, but value there depends heavily on context quality, which is where Gartner concentrates its caution about cancelled agentic projects.

Does AI replace QA and testing?

No. AI changes how testing is done rather than removing the discipline. The 2024 DORA report found AI accelerates testing by generating test cases, improving coverage, and auto-documenting results, but it also found a 7.2% estimated decline in delivery stability as adoption rose, largely from larger, less-reviewed change batches. Test generation becomes cheap; deciding what to test, and judging whether a passing suite actually proves correctness, stays a human responsibility. QA shifts toward designing the verification, not writing every assertion.

What stays human in an AI-driven SDLC?

Judgment about what to build, architecture decisions, and the final review of whether output is correct. Anthropic's 2026 report found developers use AI in about 60% of their work but fully delegate only 0–20% of tasks; the gap is where ambiguity, risk, and design tradeoffs live. Agents optimize the task they are given, not the task that should have been given. Specifying intent, arbitrating tradeoffs, owning accountability, and approving production changes remain human work.

Explore More

Ready to Find the Right AI Tools?

Browse our data-driven rankings to find the best AI tools for your team.

View AI Rankings Get in Touch