An AI Agent Debugged Our Website Like a Senior Engineer
A performance bug sat at the intersection of four browser subsystems. An AI agent found the root cause in 45 minutes. Here's exactly what happened.
We had just finished building the hero section of this website. A full-screen ASCII video with interactive canvas overlays, gradient layers, and animated text. It looked exactly right. Then someone tried to select text.
Click-dragging over the headline caused severe jank. Frames dropped from 120fps to 13fps. The page felt like it was fighting you. Normal mouse movement was perfectly smooth. The lag only appeared during text selection, and only over the hero section.
This is the kind of bug that lives at the intersection of CSS compositing, video decode, canvas rendering, and text selection paint. A senior web engineer with deep Chromium internals knowledge could probably diagnose it in a day. Most teams would accept the jank and ship anyway.
We opened Cursor's debug mode and pointed an AI agent at it. Forty-five minutes later, the bug was gone. Zero jank frames. Locked 120Hz. This is what the session looked like.
The Debugging Session
The Architecture
To understand the bug, you need to understand what the agent was looking at. The hero section is a six-layer compositing stack:
z-10 Text content (h1, subtitle, CTAs) z-4 hero-text-backdrop (radial gradient for contrast) z-3 GlowOverlay (canvas, mix-blend-mode: color-dodge) z-2 hero-vignette (radial gradient) z-1 <video> (pre-rendered ASCII art, playing at 3x speed) z-0 MatrixRain (canvas fallback while video loads)
Six layers, all absolutely positioned, stacked on top of each other inside a single section element. A video decoding 90 frames per second. A canvas running a blend mode that forces slow-path compositing. And on top of all that, text that users can select.
Hypothesis 1: The Blend Mode
The agent's first move was researching mix-blend-mode: color-dodge. This is a well-documented Chrome performance pitfall. Any non-normal blend mode forces the browser into a slow compositing path. Stack Overflow threads describe frame rates dropping to 5fps from a single blended element. Chromium's issue tracker has open bugs about rendering failures with blend modes and compositor layers.
The agent wrapped the background layers (video, canvases, vignette) in an isolation container with isolation: isolate and contain: strict, creating a compositing boundary to prevent the blend chain from cascading into the parent stacking context.
Result: partial improvement. Some text selections became smooth. Others still janked at 8–18%. The inconsistency correlated with whether the glow overlay canvas had an active mouse trail. Better, but not solved.
Hypothesis 2: The Video
The video element plays at 3x speed, which means the browser decodes roughly 90 frames per second. Keyframe decodes are expensive. The agent hypothesized that video decode spikes were coinciding with text selection repaints, creating compound pressure on the compositor.
The fix was direct: pause the video on mousedown, resume on mouseup.
Result: no improvement. Still 74–84ms frame spikes. This was a critical moment. The agent had just ruled out the most obvious suspect. The video was not the problem.
The Breakthrough
We made an observation during testing: selecting text on the navigation bar was perfectly smooth, even though the nav sits visually on top of the exact same hero video. But selecting the hero headline directly below it was janky.
Same visual position. Same underlying layers. Different behavior. The agent immediately identified the difference:
- Nav bar:
position: fixed— automatically promoted to its own compositor layer by Chrome. - Hero text:
position: relative— shares the hero section's stacking context. No layer promotion. Every selection repaint cascades through the entire compositing stack.
The agent added will-change: transform to the text content div and the gradient backdrop, forcing Chrome to promote each to independent compositor layers. Frame times dropped from 84ms to 50–67ms. Better, but still not clean.
Root Cause
Then the agent found it. The hero text had a CSS text shadow:
text-shadow: 0 2px 30px rgba(0, 0, 0, 0.9), 0 0px 60px rgba(0, 0, 0, 0.6);
On a Retina display at 2x device pixel ratio, that 60px blur becomes 120 device pixels of gaussian blur computation. Every selection change during a drag triggers a main-thread repaint of the text, which includes recalculating both blur layers across the entire text block. During a smooth drag at 120Hz, that computation runs roughly 120 times per second.
The fix: remove the text shadow. A radial gradient backdrop behind the text already provided sufficient contrast against the video background.
The Result
| Metric | Before | After |
|---|---|---|
| Avg frame time | 13–18ms | 8.3ms |
| Max frame time | 75–84ms | 9–11ms |
| Jank frames per drag | 8–30 | 0 |
| Jank rate | 8–18% | 0% |
| Effective FPS during drag | ~60–70 with drops to ~13 | Locked 120Hz |
Three changes. No canvas throttling needed. The glow overlay with its color-dodge blend mode runs at full speed with an active mouse trail during text selection. Zero jank. The isolation wrapper fully contains the compositing chain, the compositor layer promotions isolate text repaints, and removing the text shadow eliminates the expensive main-thread blur recalculation.
The entire process took 45 minutes. The agent systematically isolated variables, tested each hypothesis with a code change and a measurement, eliminated candidates, followed a user observation to its logical conclusion, and arrived at a root cause that sits at the intersection of CSS compositing, browser rendering internals, and display hardware. This is not autocomplete. This is engineering.
What Agents Can Do Today
The debugging session above is not an outlier. The capabilities of AI coding agents have moved faster than most people realize. If you tested them a year ago, tried a few prompts, found the limits, and moved on, you missed the inflection.
Today, a well-directed agent can scaffold a project from a blank directory, implement components that follow framework conventions, configure build tools, write responsive layouts, handle form validation, and iterate on design feedback. It can maintain consistency across files, diagnose build failures, and fix its own mistakes. And as we just demonstrated, it can debug performance issues that require deep knowledge of browser internals.
But this capability comes with a significant constraint that most people discover only after they try to apply it to real work.
The Constraint No One Talks About
Agents are remarkably effective on greenfield projects. Hand them a blank repository and a clear spec, and they will build something that works. But try to point an agent at a large, existing codebase and the results degrade quickly.
Complex dependency graphs, implicit conventions, cross-cutting concerns, legacy patterns that evolved over years of human decision-making: these overwhelm context windows and break agent reasoning. The agent cannot hold enough of the system in memory to make safe, coherent changes. It makes edits that look correct in isolation but violate assumptions three directories away.
The core technical problem is structural. LLM context windows function like memory with malloc() but no free(). You can load context in, but you cannot selectively release it. As an agent works longer on a problem, the context fills with stale information, failed attempts, and mixed concerns. Eventually, the model degrades. It starts referencing bad context, repeating approaches that already failed, and going in circles. Practitioners call this the gutter: once the context is polluted, there is no recovering from within the same session.
This is the gap Agent Operations Lab exists to close. Not to make agents marginally better at writing code, but to build the infrastructure that makes long-running, multi-step agent work durable and reliable on real systems.
The Loop Pattern
We built this site using an iterative agent loop pattern that sidesteps the context pollution problem entirely. The approach is simple: read context, pick a task from a checklist, implement it, commit, exit. Then restart with fresh context and repeat.
Each iteration operates in what we call the smart zone: roughly 40–60% context utilization, where frontier models perform best. Instead of accumulating pollution across a long session, each cycle starts clean. The agent reads its spec, reads its checklist, finds the next unchecked task, builds it, tests it, and checks it off with proof. Then it exits. The loop restarts and the next agent picks up exactly where the last one left off.
State persists in files and git, not in the model's memory. The checklist on disk acts as shared state between otherwise isolated executions. Each agent writes code, commits to the repository, and updates the checklist. The next agent reads that same checklist and continues. No sophisticated orchestration is required. Just a loop and good file hygiene.
# The entire orchestration layer while :; do agent --prompt < prompt.md done
This pattern has a counterintuitive property: deliberately discarding context makes the overall system more reliable, not less. Each iteration is focused, accurate, and unburdened by the failures of previous attempts. The project converges through many small, correct steps rather than one long, degrading session.
Creating the Hero
The hero section that surfaced the debugging story above is itself an interesting piece of agent-built work. The background is not rendered in the browser. It is a pre-rendered ASCII video: each frame of the source footage was converted into ASCII art offline using a custom pipeline, then reassembled into a standard H.264 video file. The browser plays it with a normal <video> tag. Zero runtime rendering cost. The aesthetic is entirely baked into the pixels.
On top of the video, a glow overlay renders a cell-aligned mouse trail that lights up nearby ASCII characters as the cursor moves over the hero. A matrix rain canvas runs underneath as a loading fallback. Both layers are demand-driven: zero CPU when idle, zero animation frames when off-screen.
Our Research and What Comes Next
Building this site was instructive, but it was the easy case. A greenfield project with a clear spec, no legacy code, and no external dependencies beyond a framework. The hard case is what happens when you point agents at a production system with tens of thousands of files, years of accumulated conventions, and real users depending on correctness.
That is the problem we are studying. Our research focuses on three areas:
- Long-running agent systems. Architectures and patterns for agents that persist, learn, and coordinate over extended timeframes in production. The loop pattern described above is a starting point, not a destination. We are investigating what comes after: how agents can maintain coherent state across sessions, recover from failures autonomously, and work on tasks that span hours or days rather than minutes.
- Efficiency-first infrastructure. Cost, latency, and reliability improvements that make agents viable for enterprise deployment at scale. Today, running an agent loop is expensive and unpredictable. We believe efficiency is the lever that drives adoption: make agents ten times cheaper and ten times more reliable, and they stop being experiments and start being infrastructure.
- Agents beyond software. Expanding multi-agent systems into healthcare, logistics, finance, and other domains where the stakes are higher and the tolerance for error is lower. The workflows that work for building websites will need to be fundamentally rethought for environments where failures have real-world consequences.
There is a wave happening in software engineering right now. Engineers are treating agents as programmable workers, not autocomplete. Real products are shipping through agent workflows. Open source projects with hundreds of thousands of stars are being maintained this way. But the infrastructure to support this shift barely exists. The tooling is primitive, the patterns are ad hoc, and the failure modes are poorly understood.
That is the work. Not building more demos, but building the infrastructure that makes agents reliable enough to trust with real systems.
Join Us
We are building in the open. If this work resonates, we would like to hear from you. We share progress, trade notes on agent infrastructure, and coordinate through our community.