How Agentic Engineering Unlocked 10x More Execution

News

Feb

2026

Feb

2026

‍As a CTO, I had maybe 10% of my time for hands-on code. Here’s how I turned that into 10x leverage, and made it repeatable across the entire engineering team.

Last month, I pushed 28 PRs in a single afternoon.

Across 13 different services. EKS 1.32 to 1.35. All controllers updated. ExternalSecrets migrated. Both staging and production. Zero downtime.

What used to take a sprint, done over a weekend.

This isn’t a post about tools. It’s about the four principles that let me multiply my limited coding time by 10x—and how the same principles transformed my entire team.

The CTO Trap

In Q1 2025, I wanted to stay technical. At an early-stage company, the CTO who codes isn't indulgent. It’s essential. You're setting technical direction that will compound for years. You need to feel the friction in your own systems; the slow builds, the brittle tests, the deployment anxiety. That feedback loop is how you make technical bets worth making. Architectural decisions made at a whiteboard can fail in production. You have to stay close to the actual cost of your decisions.

But strategy sessions. Hiring loops. Customer calls. Board prep. My calendar was 90% full before I opened my laptop. When I did code, it was focused—deep dives into our observability collector, careful PRs when I could carve out time.

I was averaging 26 PRs a month. Mostly in one or two repos where I could maintain context.

Then Q3 hit hard.

Our infrastructure was becoming a bottleneck. ArgoCD needed attention. Deployments were piling up. Platform work was blocked—on me. The team needed their CTO hands-on, and my calendar disagreed.

I had two options: accept that CTOs don't code, or find leverage.

I found leverage.

Week One: Learning to Let Go

October 2025. I started experimenting with agentic coding tools—not Copilot-style autocomplete, but actual agents that could execute. Run shells. Write files. Open PRs.The first week was humbling.

I asked it to refactor a critical auth module. It confidently deleted the password hashing salt. I didn’t notice until CI failed. I felt like I’d handed the keys to a toddler.

I kept reaching for the keyboard. The agent would start working, and I’d interrupt it. “No, that’s not right—let me just…” Three keystrokes later, I’d broken its flow.

I was faster than the agent at typing. But I was fighting the wrong battle.

By day four, something clicked. I realized my job wasn’t to type faster. It was to think clearer. Define intent precisely. Specify constraints explicitly. Then get out of the way.

The agent didn’t need my keystrokes. It needed my judgment.

The System: Orchestration Flow

I stopped being a coder and became an orchestrator. Here is the actual flow I use for every task:

‍Before: I’d see a problem, open the file, start typing. Context lived in my head. Quality came from my fingers.

After: I see a problem, describe it precisely, specify what success looks like, and delegate. Context lives in the prompt. Quality comes from verification.

The practical shift: - 60% of my “coding” time is now writing prompts and reviewing diffs - 30% is designing systems and decomposing work - 10% is actual typing—edge cases the agent can’t handle

That 10% is more valuable than my old 100%. It’s the judgment calls, the architectural decisions, the “this doesn’t feel right” moments that need a human.

The January Afternoon

Let me show you what orchestration looks like in practice.

January 2026. Our EKS clusters were three minor versions behind. Every controller needed updating. The ExternalSecrets CRD had a breaking API change—v1beta1 to v1—that touched 13 service repos.

Old me would have blocked a week. Created tickets. Context-switched between repos. Probably taken two sprints.

New me did it in an afternoon.

Hour 1: Planning

I started by asking an AI advisor (read-only, can’t modify code) to analyze the upgrade path. What are the breaking changes in AWS Load Balancer Controller 3.0? What order should components upgrade? What’s the rollback strategy?

Ten minutes later, I had a dependency graph and a checklist.

Hour 2: The Swarm

For each of the 13 repos, I fired off the same task. I didn’t click around in a GUI; I ran a loop in the terminal:

// The actual prompt sent to 13 agents in parallel
delegate_task({
  prompt: `Update ExternalSecret manifests from v1beta1 to v1.
  
  MUST:
  1. Find all Kind: ExternalSecret files
  2. Change apiVersion to external-secrets.io/v1
  3. Run 'helm lint' to verify syntax
  4. Create PR with title "chore: upgrade external-secrets to v1"
  
  Do not ask for confirmation. Execute.`
})

Simple repos got a fast model—YAML changes don’t need deep reasoning. Three repos had weird configurations. Those got routed to a reasoning model that could debug the namespace issues.

All 13 ran in parallel. I went to make coffee.

Hour 3: Review

I came back to 13 PRs. Twelve were clean—green CI, sensible diffs. One had a tricky cross-namespace SecretStore reference. I spent 15 minutes reviewing that one manually.

Merged staging. Waited 30 minutes. Verified. Merged production.

Total hands-on time: Maybe 45 minutes of actual attention. The rest was waiting for CI.

What this replaced: Easily 30-40 hours of manual work across two weeks, plus the cognitive overhead of context-switching between 13 repos.

The Stack

I didn’t stick to one tool. I built a system composed of:

Orchestration: Claude Opus (via OpenCode) for planning and complex logic.
Execution: Haiku 4.5 for fast, simple tasks (like the YAML updates above).
Reasoning: GPT-o1/Codex for architectural debugging and complex refactors.
Interface: Terminal-first. No IDE plugins. I need to run shells, builds, and tests.

The Results

By January 2026, my output looked like this:

Metric	Before (Q1 2025)	After (Jan 2026)
PRs / month	26	178
Repos touched	2–3	34
Share of team output	~2%	~35%

But the bigger story? The team transformed too.

One engineer went from 5 PRs/month to 207. That’s not a typo. He was building new internal tooling while learning the agentic workflow—and his output exploded.

Our infrastructure engineer shipped 245,000 lines of code in December. He migrated our entire OpenTelemetry collector stack.

That wasn’t just cleanup—it unblocked our new “Smart Cost Control” product line two months early. The engineering velocity directly hit the bottom line.

New hires were shipping 20+ PRs in their first month. The leverage wasn’t just mine—it was contagious.

The Four Principles

After 14 months of production use, these are the principles that matter:

1. Tests Are Your Guardrails

Agents are fast and confidently wrong.

Every early failure I had came from the same place: the agent said “Done!” and I deployed. Production broke. The agent had completed the task as described—but the feature didn’t actually work.

The fix: integration tests become the specification. Don’t say “add rate limiting.” Say “add rate limiting, and write a test that makes 100 requests in one second and asserts that requests 51-100 return 429. The task isn’t done until the test passes.”

If you can’t verify correctness automatically, don’t delegate.

2. Context is Precious

Agents have finite attention, just like humans. Dump your entire codebase into the context window and they’ll drown in irrelevant information.

Curate aggressively. “Here’s the auth middleware. Here’s the failing test. Here’s the last 50 lines of logs.” Three files, not three hundred.

I use lightweight search agents to find the right files before loading them into the worker agent. Think of it as research before writing.

3. Decomposition is Everything

“Implement user authentication” fails. The agent gets lost, makes assumptions, builds the wrong thing.

“Create the User model. Then add password hashing. Then build the registration endpoint. Then write the test.” Each step is small enough to succeed, and each step verifies before the next begins.

Big tasks fail. Small tasks compound.

4. Know When NOT to Delegate

Some work should stay human:

Delegate	Don't Delegate
Config changes	Security architecture
Version migrations	Novel algorithms
Test writing	Incident response
Documentation	Code without test coverage

The rule: if failure is expensive and verification is hard, keep it human. Everything else is fair game.

What This Means for You

I’m not telling you to adopt my specific setup. The tools will keep evolving—what I use today didn’t exist 12 months ago, and won’t be the same 12 months from now.

But the pattern is stable:

‍Orchestration over typing. Your job is defining intent, not pressing keys.
Verification over trust. Agents lie confidently. Tests don’t.
Leverage over hours. Same time investment, 10x the output.

As a CTO, my calendar is never going to open up. Meetings, strategy, hiring, customer calls—that’s the job. The question was never “how do I find more coding time?” It was “how do I multiply the coding time I have?”

Same 10% of my calendar. 10x the impact.

Start This Week

Don’t start with “rewrite the auth system.” Start with: - A bug that has a failing test - A dependency version bump - A config file change

One contained task. Let the agent handle the full lifecycle. Review the diff carefully. Resist the urge to “just fix it yourself.”

By week two, something will click.

This week: Ship one PR using an agentic workflow.

This month: Make it your default. And teach your team to make it their own default.

Think more. Type less.

Amir Jakoby is CTO and Co-Founder of Sawmills, the smart telemetry management platform. Follow him at @amir_jakoby on X.