Engineering Judgment in the Age of the Agentic SDLC

Leading Engineering Organizations When Agents Take Over Execution A dual-lens operating model — Dreyfus × SFIA — for talent, teams, and accountability in the agentic SDLC Agentic AI is moving entire segments of the software lifecycle from human-executed to autonomously executed. The scarce resource is no longer the ability to write code — it is the judgment to supervise, redesign, and remain accountable for systems of humans and agents.

This briefing offers leaders a durable operating lens that pairs two enduring frameworks — Dreyfus, which measures the maturity of judgment, and SFIA, which sets the level of responsibility — to decide what to delegate, what to keep human, and how to restructure teams. Adoption is now near-universal; scaled value is rare. The winners will treat AI skilling as leadership-led transformation, not training spend. 01 The Strategic Shift: Value Migrates from Execution to Judgment Agentic AI is not better autocomplete.

It is sustained, multi-step execution — systems that reason across long-running workflows, invoke tools, interpret results, and iterate — moving whole segments of the software lifecycle from human-executed to autonomously executed. The trajectory is steep: McKinsey reports the length of tasks AI can reliably complete has been doubling roughly every four months since 2024, with AI plausibly capable of four days of unsupervised work by 2027.

This collapses the value of shallow execution. BCG is blunt: leaders chasing 10–20% productivity gains are “missing the opportunity by a mile.” Top engineers running parallel agents are seeing 20×, 50×, even 100× increases in feature output. But raw output is not where durable value sits. Tokens spent reasoning over a company’s private data are worth far more than generic tokens — a premium that is a function of data access, trust, and institutional integration, not model horsepower.

The strategic consequence: software engineering now reaches upstream into problem framing, prioritization, context engineering, workflow decomposition, risk interpretation, and continuous learning loops. Human judgment is not being removed — it is being pushed higher into the value chain. Microsoft’s 2026 Work Trend Index confirms the behavior is already here: 49% of Copilot chats now support cognitive work, and 86% of AI users treat AI output as a starting point, not an answer. The executive reframe: the defining capability of the agentic era is not using AI.

It is judging AI. 02 The Dual Lens: Judgment and Responsibility, Designed Together Traditional competency ladders assume stable tasks, stable roles, and human-only workflows. That logic breaks when agents change tasks faster than role architectures can be rewritten. Leaders must think in skills and systems, not roles in isolation — because value is increasingly created by systems of humans and agents.

Two enduring frameworks, used together, give executives a language that survives every model release: – SFIA (Skills Framework for the Information Age) defines seven levels of responsibility — from Follow and Assist through to Set Strategy — applied consistently from engineer to CIO.

It answers: what level of responsibility can this person safely hold?

– The Dreyfus model defines five stages of skill, Novice to Expert.

Its radical insight: the highest performance emerges when rules fall away — experts perceive whole situations; novices follow checklists. It answers: how mature is this person’s judgment? Neither alone is sufficient. SFIA tells you someone works at Level 5, but not whether that work is rule-driven or intuitive.

Dreyfus tells you someone perceives situations holistically, but not what they can be accountable for.

Together they answer the two questions that matter in agentic delivery: what do we delegate to an agent, and at what point — and at what stage of judgment must the supervising human operate? Dreyfus stage SFIA band What agents do more of What the human increasingly owns Novice L1–2 Boilerplate, first-pass drafts, simple test generation Following guardrails; verifying obvious errors Advanced Beginner L2–3 Multi-step assistance across coding, debugging, docs Competent L3–4 Repeatable workflow segments; routine decomposition Choosing among known approaches; escalating uncertainty Workflow planning, acceptance criteria, quality review Proficient L4–5 Larger spans across build, test, review, operate Expert L5–7 Large-scale automation under policy and shared patterns Cross-functional trade-offs; exception handling; guardrail tuning Operating-model design, governance, architecture, ethics, risk acceptance

The implication is decisive: a comb-shaped engineer at novice level is still a novice.

Multidimensional breadth creates advantage only when paired with mature situational judgment and appropriately scaled responsibility. Judgment must now be encoded, not only exercised. Dreyfus locates the maturity of a person’s judgment and SFIA the responsibility they can safely hold — but an expert’s situational intuition creates little leverage while it lives only in one head.

The scaling move is to externalize that premium judgment into governed, reusable agent primitives: the instructions, skills, and agents that steer how work is delegated. Daniel Meppiel’s Agentic SDLC handbook names the discipline that makes this reliable — the PROSE framework: Progressive Disclosure, Reduced Scope, Orchestrated Composition, Safety Boundaries, and Explicit Hierarchy — five constraints that turn probabilistic agent output into something verifiable, maintainable, and repeatable. Read against the dual lens, PROSE is where judgment becomes architecture.

The expert encodes what an agent may touch and what it must escalate (Safety Boundaries), which rules apply where (Explicit Hierarchy), and how context is sized and revealed (Reduced Scope, Progressive Disclosure) — assembled from small, chainable units rather than monolithic prompts (Orchestrated Composition). Treated this way, primitives become code: versioned, lockfile-pinned, and auditable, running on a defined runtime of model, harness, and agent source.

That is what converts a single Expert / Level-7 individual’s intuition into a repeatable, governed pattern at scale — and it adds a third question to the two the dual lens already asks: how well can this person encode their judgment into primitives the organization can reuse and trust?

The Reality Check: Adoption Is Wide, Scaled Value Is Rare The gap between experimentation and impact is the central executive problem. McKinsey’s State of AI 2025 finds nearly nine in ten organizations use AI regularly, but only about a third have begun scaling. Agentic AI sits at ~62% experimentation, yet only ~23% have scaled even one agentic system, and no business function exceeds 10% scaled adoption. Only ~39% report any enterpriselevel EBIT impact — and most of those attribute under 5% of EBIT to AI.

The separation is structural, not technological. AI “high performers” — roughly 6% of organizations — are 3× more likely to be redesigning workflows and 3× more likely to have senior leaders demonstrably owning AI initiatives. The differentiator is operating-model courage, not tooling. Leaders should assume these bottlenecks are already present in their organization: Bottleneck What it looks like in practice Executive unlock Review queues More and bigger PRs, slower reviews, brittle quality Redesign review, testing, and release as one system Fragmented context Great demos, weak production outcomes Build a shared context layer and retrieval patterns Weak governance Inconsistent controls, policy drift, ambiguity Put governance in the loop — visible, auditable, enforceable Role-siloed design Teams optimize locally; handoffs persist Organize around product workflows with shared services Managerial lag Pressure to adopt, no safety to redesign work Train and incent managers as transformation multipliers Activity-based metrics Seat and PR growth, but flat time-to-value Measure system performance, quality, risk, innovation ratio The pattern is consistent: AI raises output faster than the surrounding system — review, context, governance, org design, management, and metrics — can absorb it.

The unlock is never more tokens; it is a stronger system around the tokens. 04 Talent and Team Architecture: From Skill Shapes to Agentic Pods The classic talent archetypes no longer hold as defaults. I-shaped and T-shaped talent models were built for stable role boundaries and human-only coordination. Both weaken in multi-agent workflows. The valuable professional is no longer the deep specialist with a thin collaborative veneer, but the one combining real depth with enough adjacent capability — product, data, architecture, security, policy — to orchestrate outcomes across human-agent workflows.

The shapes evolve from I to T to Pi and comb. McKinsey’s three emerging profiles map directly onto this progression: M-shaped supervisors (broad generalists orchestrating agents across domains), T-shaped experts (deep specialists who redesign workflows and safeguard quality), and AI-augmented frontline workers (spending less time on systems and more with humans). Team structures shift accordingly — from functional silos to flat networks of outcome-aligned agentic teams.

Org charts give way to “work charts”: flows of tasks and outcomes between humans and agents. McKinsey’s field observation is striking: a human team of two to five can already supervise an “agent factory” of 50 to 100 specialized agents running an end-to-end process such as customer onboarding or closing the books.

The better structure is layered, not centralized.

Three layers carry the load: Layer Ownership What it provides Horizontal enablement Shared capability team Reusable agent patterns, context standards, reliability, governance, FinOps/SecOps controls Development pods Product / value-chain teams Business outcomes, engineering execution, validation, and operations Agents Digital workforce Backlog, coding, review, testing, analysis, deployment, monitoring, self-healing Humans Decision owners Intent, judgment, prioritization, exceptions, governance, final accountability The non-negotiable condition: smaller teams deliver more only when the system carries more of the load — through context convergence, shared guardrails, and reusable workflows.

Headcount shrinks only if the platform, context, and governance layers get stronger.

The Executive Mandate: Skilling Is Transformation, Not Training McKinsey’s Rewired 2.0 names four talent levers leaders must pull: build the “tech muscle” of business leaders, especially two and three levels down; flip IT ratios toward small, highly competent product-engineering pods (70% in-house, 70% engineers) and away from novice-heavy pyramids; create an environment where talent can learn and innovate without bureaucratic drag; and prepare explicitly for human-agent collaboration, with managers orchestrating “guardrails, handoffs, and judgment at the edge.” The deepest reframe is this: in the agentic era, software engineering is no longer the act of writing code.

It is the discipline of designing and governing systems of human and agent intelligence to deliver outcomes.

That discipline now reaches into: • Strategy — which value chains to automate, which to keep human, which to redesign.

• Decision-making under uncertainty — where AI confidence is high but correctness is unverifiable. • Socio-technical design — trust and accountability across hybrid teams.

• Ethics and judgment — the “untrainable corner” where proprietary data, institutional trust, and irreversible consequences live. THE EXECUTIVE AGENDA

• Put a senior leader visibly in charge of AI — with workflow redesign, not tool rollout, as the mandate.

• Pair Dreyfus (maturity of judgment) with SFIA (level of responsibility) as a model-proof operating lens.

• Redesign review, testing, and release as a single system before scaling agent output.

• Stand up the governance backbone — policy-as-code, trust registers, AI FinOps, and versioned agent primitives-as-code — as a precondition for scale. • Reshape teams into small, judgment-rich pods supported by a shared agentic platform.

• Replace activity metrics with measures of system performance, quality, risk, and innovation.

• I- and T-shaped talent are no longer enough; default to Pi- and comb-shaped profiles paired with mature judgment. The technology will keep accelerating. The frameworks endure. The strategic choice is yours. THE BOTTOM LINE The technology will keep accelerating; the frameworks endure. Companies that adopt this dual lens early — treating AI skilling as continuous, leadership-led talent transformation that pushes humans upward into judgment, strategy, ethics, and socio-technical design — will outperform. The strategic choice belongs to leadership.

Three Questions for the Boardroom

1. Where in our delivery system — review, context, governance, or metrics — will rising AI output break first, and what is our explicit plan to absorb it before it erodes quality or trust?

2. Are we still hiring and promoting for execution depth, or are we deliberately building Pi/combshaped talent with the judgment to supervise human-agent workflows at the right level of accountability?

3. Is AI skilling funded and governed as a leadership-led transformation with EBIT accountability — or is it still a training line item measured by completion rates?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular

More like this
Related