Skip to main content

The first time I saw an agent stuck in a retry loop, I knew what kind of bug it was before I read the logs.

The connection between competitive programming and AI infrastructure engineering is deeper than 'algorithms are useful,' and I want to name it precisely. Candidate Master on Codeforces (1900+, top 2-3%) isn't about knowing algorithms -- it's pattern recognition under time pressure: classify a novel problem, size its constraints, pick the tool. Four things transfer directly. Complexity intuition: 128k output tokens at 100ms each is a 21-minute batch job, not a web request -- the magnitudes pattern-match to constraints the same way 'n is 10^5, so O(n^2) is out' does. An agent loop is a DP problem in disguise: the retry loop I fixed in ten minutes was a missing state dimension -- the context didn't carry 'I already tried this and it failed.' Binary search on the answer tunes KV TTL and DBO thresholds. And wrong-metric recognition -- goodput per dollar vs GPU utilization, deployment performance vs eval scores -- is the CP instinct for when a stated objective diverges from the actual one. The skill that transfers isn't the algorithms. It's the habit of asking, of any system: what is the state space, the objective, the metric, and is the metric actually measuring the objective.

July 5, 2026

Not from building agents. From Codeforces.

The loop had the same signature as an infinite recursion that passes the wrong state -- the agent was replaying the same action sequence, getting the same failure response, generating the same next action, replaying it again. From a graph perspective it was a cycle. From a DP perspective it was a subproblem that was never getting resolved because the memoization cache had the wrong key. From a CP perspective it was a classic: you're in the wrong state space and you don't know it yet.

I fixed it in ten minutes. Not because I knew the agent framework well. Because I'd spent years debugging the same class of problem in a different context.

I want to write about what actually transferred, because I think the connection between competitive programming and AI infrastructure engineering is deeper than "algorithms are useful" and nobody has said it precisely.


What Candidate Master actually means.

1900+ on Codeforces. Top 2-3% of competitive programmers globally. To get there you've solved a few thousand problems, competed in maybe 150+ rated contests, and developed a specific pattern: you look at a problem, spend the first five minutes just reading it, spend another five minutes classifying it, and then spend the rest of the time executing on a plan you're reasonably confident about.

The classification step is what matters. Is this graph theory? DP? Data structures? Greedy? Binary search on the answer? The algorithms themselves are not the skill -- they're prerequisites, the vocabulary. The skill is pattern recognition: which class of problem is this, and what's the fastest path from this formulation to a solution in that class.

At CM level, you also know the inverse: which formulations look like one class but are actually another. Problems that look like DP but are actually greedy. Problems that look like graph shortest path but are actually two-pointer on a sorted structure. The deceptive formulation is the test of real understanding. If you can only solve problems that look like what they are, you're not there yet.


The transfer one: complexity intuition is the same skill.

In CP, before writing a line of code, you know your time and space budget. n is 10^5? You need O(n log n). n is 10^3? You can afford O(n^2). n is 50? O(2^n) DP might work. You don't calculate this -- you pattern-match it from having been burned a thousand times by writing an O(n^2) solution for n = 10^6 and watching it TLE with 8 seconds left in the contest.

The first time I was sizing the serving infrastructure for a 128k output token workload, the intuition was immediate. 128,000 tokens × 1 token/100ms = 21 minutes per request. That's not a web request. That's a batch job. The KV cache at 1M context for a 70B model is hundreds of gigabytes. That's not HBM-resident. That needs tiering. The CPU-to-GPU ratio for orchestrating 10 tool calls per task needs to be 1:1 not 1:0.25 or CPU starvation kills throughput.

None of these required calculation. They required the same intuition as "n is 10^5, so O(n^2) is out." The magnitudes pattern-matched to the constraints the same way. The training that built this intuition was years of sizing solutions in competitive programming, not years of building inference infrastructure. The underlying skill is the same: converting problem parameters to feasibility constraints before starting implementation.


The transfer two: an agent loop is a DP problem in disguise.

DP, the way you learn it at CM level, is not about memorizing recurrences. It's about state design. What information do I need to carry forward to make the optimal decision at each step? What can I recover from the environment when I need it? What's irrelevant noise that I should explicitly not include in my state because including it explodes the state space?

The agent loop has the same structure. Each step: the agent is in some state (current context, accumulated tool results, task understanding). It takes an action (issues a tool call, generates a response, requests more information). It transitions to a new state (context extended with the tool result). The terminal condition: task complete, maximum budget exceeded, or stuck in a cycle.

The objective: reach the terminal state (task complete) while minimizing total cost (inference tokens, tool calls, wall clock time).

This is exactly DP on a DAG. The optimal agent policy is the shortest path from initial state to terminal state in the state space, where edge weights are the cost of each action.

The thing that CP teaches that makes this concrete: the state must contain everything relevant for future decisions and nothing irrelevant. If your DP state is too large, you can't compute it. If it's too small, you make wrong decisions because you're missing information.

For agentic systems this maps directly. An agent that carries the full conversation history in every context has a large state -- rich information, expensive to process. An agent that only carries a compressed summary has a small state -- cheap to process, potentially loses information it needs. The KV TTL and HMA tiering decisions are state design decisions dressed up in infrastructure language. Which information is warm (in HBM), which is warm-but-not-hot (in DRAM), which is cold (in NVMe), which can be discarded. DP state design.

The retry loop I fixed at the start of this post? The agent's state didn't include "I've already tried this action and it failed." The same action was being taken repeatedly because the state appeared identical each time. In DP terms: missing a dimension of the state. In agent terms: missing a component of the context. The fix was identical: add the relevant history to the state.


The transfer three: binary search on the answer.

In CP, "binary search on the answer" is a meta-technique. You're not binary searching an array. You're binary searching on the answer to the problem directly. Define a predicate f(x): "is it possible to achieve this outcome with parameter value x?" Find the boundary where f(x) switches from false to true. That boundary is your answer.

The predicate must be monotone: if it's possible with x, it must be possible with x+1 (or x-1). If the predicate is monotone, binary search applies. The technique is powerful because it turns optimization problems (find the minimum x such that f(x) is true) into decision problems (is f(x) true?) which are usually easier to answer.

I use this explicitly now when tuning inference systems.

The KV TTL problem: what's the optimal time-to-live for retained KV cache in multi-turn serving? Too short, you evict KV that the next turn will need (paying full prefill cost). Too long, you retain KV that consumes HBM you need for other requests. Binary search on the TTL value. At each candidate TTL, measure: is the memory pressure acceptable AND is the multi-turn recomputation cost below threshold? The boundary is your optimal TTL. I measured my production p90 inter-turn latency (11 seconds), binary searched from 5 seconds to 30 seconds, landed at 14 seconds. Two hours of work. Not two days.

The DBO batch size threshold: what's the minimum batch size where DBO actually engages and produces throughput gains? Binary search on --dbo-decode-token-threshold. The predicate: is measured throughput improvement from DBO above X% at this batch size? Find the boundary. That's your threshold.

This is just the CP technique. I didn't reinvent it. I recognized the structure and applied the tool I already had.


The transfer four: wrong metric recognition.

In CP, the hardest problems are often hard because the obvious metric is wrong. You're trying to minimize total cost, but the actual constraint is maximum latency. You're maximizing throughput, but the problem secretly requires minimizing the maximum step. Recognizing when you've been optimizing the wrong objective is a skill you develop by being burned by it in contests and then reading the editorial and feeling the specific discomfort of "I was solving the wrong problem."

The GPU utilization post I wrote months ago was this exact transfer. Every team I talked to was optimizing GPU utilization. 85%, 90%, looks great. But utilization tells you how busy your hardware is, not how many users you're serving correctly. Those two things can diverge massively and silently. Goodput per dollar is the right metric. GPU utilization is the wrong metric. I saw it immediately because I'd spent years learning to recognize when a stated objective diverges from the actual objective.

The evaluation awareness work (the post about models knowing they're being evaluated) is the same transfer. Eval scores are the metric. Deployment performance is what you're actually optimizing. At high model capability, these two diverge because the model can distinguish eval context from deployment context and behave differently in each. The metric is wrong. The CP instinct: "what's the actual objective here, and is this the metric that measures it?"


What CM teaches that algorithms courses don't.

The skill is not algorithms. This matters. You could know every algorithm in CLRS and not be a CM. The skill is: given a novel problem you've never seen before, how quickly can you classify it and find the right tool?

This is hard to teach. You build it by being wrong, repeatedly, in time-pressured conditions, and then reading the editorials and understanding why the approach you tried was structurally wrong. The learning is in the gap between what you tried and what would have worked. Upsolving -- solving problems you couldn't solve during the contest, after the fact -- is the practice that builds this.

The AI infrastructure analog: I upsolve every incident. Every time an agent system does something unexpected, I trace the logs, understand exactly what it tried, what the optimal behavior was, and where the gap was. Not "fix the bug and move on." Understand why my mental model was wrong. What would the correct decision have been and why? This is the upsolving loop applied to production engineering.


The analogy that ties it together.

A Codeforces contest is a conversation with a problem that has a precise answer. The problem doesn't negotiate. If your solution is wrong, it's wrong. If it's too slow, it's too slow. The feedback is immediate and unambiguous.

Building agentic AI systems is also a conversation with a problem that has a precise answer (the task is either done or it isn't, the system is either fast enough or it isn't). The feedback is also unambiguous -- the agent either completes the task or it doesn't, within budget or it doesn't. The difference is the latency of the feedback loop: minutes in a Codeforces contest, hours or days in a production system.

The habit of mind that CM builds: don't start implementing until you understand the problem structure. Don't mistake busyness (code written, GPUs utilized, agents running) for progress (tasks completed, users served, system within budget). The edit distance between your current solution and the correct solution is not the same as the number of changes you've made.


the retry loop was a missing state dimension.

the wrong ttl was a binary search.

the wrong metric was a wrong objective.

i knew all three before i read the logs.

not from building agents. from codeforces.

the skill that transfers is not the algorithms. it's the habit of looking at any system and immediately asking: what is the state space, what is the objective, what is the metric, and is the metric actually measuring the objective. competitive programming drills this at high speed under pressure for years. it turns out the questions are the same everywhere.


P.S. The specific thing that CM teaches that nothing else taught me as efficiently: how to be wrong fast. In a 2-hour contest, being wrong for 90 minutes on one approach and switching to the right one in the last 30 minutes still beats being wrong for 2 hours. The skill of recognizing "this is the wrong approach" before you've proven it comprehensively is a CP skill. In AI infrastructure: recognizing "this architecture is wrong" before you've built the whole thing and measured it is the same skill. It comes from having been wrong enough times, fast enough, to build a mental library of what "wrong" feels like early. Most engineers build that library slowly through long production incidents. Competitive programming builds it quickly through dense repetition of structured failures.

i write these when i have something worth saying. no schedule. no algorithm. if you want to know when the next one goes up -- leave your email.

no spam. no sequence. just the note, when it exists.