Uber's 2026 AI budget was $3.4 billion. By April, it was gone.

The company deployed Claude Code to 5,000 engineers in December 2025. Adoption moved fast - from 32% to 84% of the engineering organization in three months. About 11% of live backend code updates were being written entirely by AI agents. When COO Andrew Macdonald spoke on the earnings call, the year's entire AI allocation had been consumed in four months. His explanation was honest in a way most executives avoid: "It's very hard to draw a line between one of those stats and 25% more useful consumer features."

That is not a productivity problem. It is a measurement problem. And the same measurement gap is running quietly inside every organization that has deployed AI agents at scale - including ones with a fraction of Uber's resources.


The Bill You're Tracking Is Not the Real Bill

Every AI business case has the same structure. On the cost side: LLM API tokens, GPU compute, inference endpoints. On the benefit side: developer hours saved, features shipped faster, support tickets deflected. The model closes. The CFO approves.

What the model does not have is a line item for the second bill - the downstream infrastructure consumption that AI changes not because you use it more, but because of how it works.

The mechanism is specific. A human analyst asks a question, issues one SQL query to Snowflake, gets one answer. An AI agent receives the same question and decomposes it: check the churn cohorts, join to usage events, compare prior quarter, fetch support tickets, correlate with feature adoption. That is 6 to 12 sub-queries where a human issued one. Each sub-query is billed separately. If the warehouse has a 60-second minimum billing window - standard for Snowflake - each 2-second agent query still costs 60 seconds of compute.

The output is sometimes equivalent, sometimes richer because the agent explored scenarios the human would not have checked. Richer output should cost more. The problem is not the cost itself - it is that this cost never appears in the AI budget. It lands on the Snowflake invoice, attributed to "increased query volume," with no connection to the agent that issued the queries.

Illustrative split based on reported infrastructure variance patterns. Direct API costs are the visible minority; downstream infrastructure changes represent the larger, untracked share.

The visible slice: API tokens, GPU, inference endpoints. Below the waterline: warehouse scan overruns driven by agent query volume, N+1 database patterns in AI-generated code, agent loop incidents, and shadow AI tooling that bypassed procurement entirely. These costs have no attribution trail. They arrive as noise inside existing line items with no label connecting them to the AI decision that caused them.


Three Layers That Compound the Problem

Every financial model for AI tooling adoption uses the same cost function: users multiplied by license fee. That model is correct for a product that scales with headcount. It is wrong for one that scales with task complexity and query intensity.

At enterprise scale - Uber, 5,000 engineers - that mismatch burns a $3.4 billion annual AI budget in four months. At startup scale - 12 engineers with AI agents running on production data - the same dynamic produces $8K/month in unmodelled infrastructure variance. That is still 20% of a typical early-stage infrastructure budget, and it grows with customer adoption, not headcount.

The cost compounds through three distinct layers.

Layer 1 - Query volume explosion

In March 2026, Hex published data from their analytics platform showing a milestone: AI agents had crossed the point where they were creating more notebook cells than human users - on March 10, 2026. Messages per user had doubled from 5.5 to 12 per week in three months, a 118% increase. Each agent message triggers multiple queries. The warehouse bill scales with message volume, not seat count.

Greybeam, whose multi-engine routing platform processes these queries, documented the ratio directly: AI agents issue 6 to 12 queries per user question. A human issues one. Same question, same answer - and between 6x and 12x the warehouse bill.

Source: Greybeam / DataOps Leadership Substack analysis, May 2026. AI agents decompose analytical questions into sub-queries across cohorts, joins, and comparisons. 6-12 is the observed range; exact count depends on question complexity.

Layer 2 - Agentic loops and context accumulation

Each step in an agentic workflow re-sends the full accumulated context to the language model: the system prompt, the conversation history, every previous tool result. You pay for the same tokens repeatedly, with each step adding more. The cost does not grow linearly with steps - it accelerates.

Stanford Digital Economy Lab research on agentic AI cost patterns found the same task can cost 30x more or less depending on model behavior on a given run. That is not a sizing problem. It is a forecasting problem - a variance band that wide does not fit inside a 5% to 10% CFO variance buffer.

Goldman Sachs analysts estimated that agentic AI may increase token demand by 24x compared to standard generative AI. The mechanism is context accumulation: even the highest-performing agents take 1.4 to 2.7x more steps than the human-determined optimal path for the same task.

Illustrative cost multiplier based on context accumulation mechanics in agentic workflows. Each step re-sends full conversation history, paying for the same tokens repeatedly. Goldman Sachs estimated agentic AI increases token demand 24x vs standard GenAI.

Layer 3 - AI-generated code inefficiencies

AI coding assistants write code that passes tests and satisfies the immediate requirement. What they routinely miss is the N+1 query pattern - where an application fetches a list of 100 items, then issues one additional database query per item to retrieve related data. The result is 101 database round-trips where 2 would do the job.

One engineering team reported their AWS RDS bill climbing from $8,000 to $35,000 per month after a code review revealed a single AI-generated pattern running 847 times per page load. The code was syntactically correct and functionally accurate. The cost was not in the output - it was in the pattern the AI chose to produce it.

† Developer-reported incident, not independently verified. Included as an illustrative example of a well-documented anti-pattern in AI-generated code.


The Snowflake Case Study

Snowflake is the clearest example of the second bill, because the cost mechanism becomes transparent once you inspect it.

Seemore Data published an analysis in 2026 of a Snowflake Cortex AI deployment where a single query against 1.18 billion records generated a $5,000 charge. The question was reasonable. The problem was that Cortex AI applies token-based pricing - input tokens plus output tokens - on top of the warehouse compute cost. Passing an unfiltered large dataset to an AI function is equivalent to instructing an analyst to read every row in the table before answering the question. The answer is correct. The cost is not.

The more revealing finding was the observability gap. Seemore's daily Snowflake usage showed $45.50 in total charges. Itemising through the standard dashboards, they could account for roughly $15. The remaining $30 required manually querying a table most teams do not know exists:

SQL — Snowflake AI cost attribution
-- Snowflake's UI will not show you what AI actually cost.
-- This is the only query that breaks it down by agent and model.
SELECT
event_timestamp::DATE AS billing_date,
model_name,
initiating_user,
SUM(input_tokens) AS total_input_tokens,
SUM(output_tokens) AS total_output_tokens,
ROUND(
SUM(total_tokens * token_price_per_1k / 1000), 4
) AS estimated_cost_usd
FROM SNOWFLAKE.LOCAL.AI_OBSERVABILITY_EVENTS
WHERE event_timestamp >= DATEADD('day', -30, CURRENT_TIMESTAMP)
GROUP BY 1, 2, 3
ORDER BY estimated_cost_usd DESC;

No resource monitors exist for AI workloads in Snowflake the way they exist for warehouses. There is no automatic alert when a Cortex function scan exceeds a cost threshold. The governance must be built deliberately - and most teams have not built it.

The counterpoint is that this problem is solvable. A cannabis data analytics company called Headset ran 2,500 embedded analytics users through Snowflake and worked through the economics carefully. They found that 99% of their queries were scanning under 100 GB - small enough to run efficiently on DuckDB, a lightweight in-process analytical engine. By routing those queries away from Snowflake and only sending large joins back to the warehouse, they reduced their Snowflake bill by 92%. The economics work because the query distribution has a long tail: nearly all queries are small, the 99.9th percentile queries scan 300 GB or more, and the billing model charges warehouse compute for both.


Why FinOps Can't See This

FinOps tooling was built for a specific problem: optimizing infrastructure cost through reserved instances, right-sizing, commitment discounts, and idle resource elimination. It is good at that problem. The AI indirect cost problem has three structural properties that make it invisible to those tools.

The attribution gap. When an AI agent issues 12 Snowflake queries, the charges land in the Snowflake bill with no tag connecting them to the agent that issued them. The Anthropic tokens land in a separate invoice. The N+1 pattern the agent generated appears as an AWS cost variance. No monthly dashboard shows "AI-caused infrastructure variance: $X." The observability must be built deliberately - through Cortex observability tables, Unity Catalog tagging, or custom instrumentation. It does not arrive enabled by default. This is not negligence. It is a tooling gap that appeared faster than the platforms could respond to it.

Stochastic variance breaks forecasting. Stanford Digital Economy Lab's research showed the same agentic task costing 30x more or less depending on model behavior on a given run. A variance band that wide does not fit inside standard FinOps controls. The standard response to cost variance is a cost alert. The problem is that alerts fire after spend has occurred. Token budgets - enforced at the API call level, blocking the next call before it exceeds a threshold - prevent spend rather than report it.

Source: Stanford Digital Economy Lab research on agentic AI cost variability. Identical task, identical model, 10 separate runs. Minimum: $0.11. Maximum: $3.20. Range: 29x. Standard FinOps variance buffers of 5-10% do not accommodate this distribution.

The cuts trap. When headcount is reduced, the instinct is that infrastructure costs fall proportionally. With AI agents, that assumption is false. Agents run independently of the engineers who deployed them. A 20% headcount reduction with no corresponding reduction in agent activity produces a smaller team governing the same infrastructure cost footprint. In some cases it worsens: fewer engineers available to catch runaway agent behavior means more unchecked query proliferation. Cutting engineers without also auditing and reducing agent activity is not a cost reduction.

Shadow AI amplifies all three. Individual engineers expensing Cursor subscriptions, running personal Claude accounts against production databases, spinning up ad-hoc LangChain pipelines that bypass the data platform team - a significant share of indirect costs comes from tools that never went through procurement. No usage attribution. No budget line. No kill switch when someone leaves.

Microsoft's experience is the clearest enterprise-scale example. The company provided Claude Code licenses to thousands of engineers across the Experiences and Devices group - Windows, Office 365, Outlook, Teams, Surface - starting in December 2025. By June 2026, the licenses were canceled. The stated reason was not that the tool was ineffective. It was that token-based billing at agentic workflow scale was unsustainable.


The Incident Catalogue

The incidents below are the early empirical record of what happens when agentic AI meets pay-as-you-go infrastructure without a governance model. The pattern across all of them: the cost was invisible until the invoice arrived.

Incident Cost Root cause Source
Uber: Claude Code deployed to 5,000 engineers, December 2025 $3.4B annual budget consumed in 4 months No token consumption governor; adoption incentivized by internal leaderboards Fortune, May 2026 ✓
Microsoft: Claude Code licenses canceled, Experiences & Devices group $500-$2,000/engineer/month across thousands of licenses Token-based billing unsustainable at agentic workflow scale The Next Web, 2026 ✓
Snowflake Cortex AI: single query against 1.18B records $5,000 for one query Unfiltered dataset passed to AI function; no pre-filter, no resource monitor Seemore Data, 2026 ✓
LangChain 4-agent pipeline: infinite loop, 11 days undetected † $47,000 No step limit; no cost anomaly detection; loop ran over a long weekend DEV Community, 2026 †
N+1 query pattern in AI-generated code † $8K/month - $35K/month AWS RDS 847 redundant queries per page load; no query pattern review in code review process Medium, 2026 †

† Developer-reported, not independently verified. Included as illustrative patterns consistent with documented anti-patterns, not as confirmed incidents.


The ROI Calculation Is Missing Half the Denominator

The case for AI coding tools is genuinely strong, and it should be stated clearly before complicating it.

Forrester's Total Economic Impact study on GitHub Copilot found 376% ROI with payback under 6 months. GitHub's own data shows 3.6 hours per week saved per developer - roughly $967 per month in recovered time at a £130K annual salary. For a 12-person startup that deploys AI to operate at the pace of a 30-person team, the productivity gain can dwarf the direct API cost. This is why companies are buying, and they are right to.

The problem is not the numerator. It is the denominator.

Standard AI ROI calculations set the cost as: LLM API license plus direct compute. Against that, they set the benefit: developer hours saved at hourly rate. The payback period looks like 6 to 9 months. What the calculation excludes:

  • The Snowflake, AWS, and GCP infrastructure delta driven by agent query volume - which lands on a different invoice with no AI attribution
  • Engineering time spent identifying and fixing AI-generated N+1 patterns - which itself consumed API budget to generate in the first place
  • Agent loop incidents (the $47K example above ran for 11 days before anyone noticed)
  • The stochastic variance buffer that any honest runway or budget model must carry for AI-driven infrastructure
Standard AI ROI models count the complete benefit (productivity gain) against an incomplete cost (direct API only). The full model requires the infrastructure delta - which is currently unmeasured at most organizations. Figures illustrative; per-engineer/month at scale.

The result: you are comparing a complete benefit against an incomplete cost. With the full denominator, the payback timeline is unknown - not because indirect costs necessarily exceed the benefits, but because they have never been measured.

There is also the forecasting problem adjacent to this. A $50 million infrastructure budget with a 5% variance buffer is standard practice. The same budget with AI agents introducing a 30x stochastic variance on a growing component of it is not manageable through normal variance controls. The unpredictability is the problem, independent of the absolute numbers.


Governance That Actually Works - By Stage

The threshold at which this becomes material: once AI-initiated queries exceed 15% of your total infrastructure query volume, you have an unmodelled cost driver. At a startup that has deployed an AI feature to production customers, that threshold is typically crossed within six months.

One architectural note before the stages. Multi-engine query routing - using DuckDB for sub-100GB queries and only sending large joins to Snowflake - is trivially cheap to implement with a 12-person engineering team. It requires something close to a full data stack refactor at 200 people with agents embedded in every production workflow. Implementing routing before you need it is one of the few places in AI tooling where acting early is definitively the right call.

Stage Measure Enforce Architect
Seed / Pre-A
Under 100 customers
Track AI-initiated query % from day one. No warehouse minimums until 60 days of AI query data exists. No minimum commitment contracts before understanding your query multiplier. Instrument COGS per customer. Does gross margin hold if AI usage per customer doubles?
Series A
100-500 customers
Cost per customer by usage tier. AI-attributed infrastructure separate from shared infra. Token budgets per agent - not alerts, enforcement before spend. Max 50 steps per workflow. Semantic caching: Redis data shows over 40% of queries are near-paraphrases, 86% LLM cost reduction achievable. Separate AI cost as a P&L line.
Series B+ / Enterprise Board-level AI cost attribution. Disclosure readiness for material infrastructure variance. Rolling anomaly detector: alert at 3x the 7-day average for any AI-attributed line item. Multi-engine routing (DuckDB for under 100GB, Snowflake for large joins). Full Cortex observability queried daily.

Universal - all stages. Token budgets beat cost alerts: alerts fire after spend; budgets prevent it. Filter data before passing to AI functions - never run an AI query on an unfiltered large dataset. Require AI ROI sign-off to include the full model: direct cost plus infrastructure delta.

Two implementation patterns worth having in every codebase that runs agents:

PYTHON — Token budget enforcement
# WRONG: alert fires after you've already spent the money
if monthly_spend > 100:
send_alert("Over budget") # the damage is done
# RIGHT: budget enforcement blocks the next call before overspend
class TokenBudgetGuard:
def __init__(self, daily_soft_usd: float = 50, daily_hard_usd: float = 100):
self.soft = daily_soft_usd
self.hard = daily_hard_usd
def check_before_call(self, today_spend: float, tokens_requested: int) -> None:
estimated = tokens_requested * COST_PER_TOKEN
projected = today_spend + estimated
if projected > self.hard:
raise BudgetExceeded(
f"Blocked: ${projected:.2f} would exceed hard cap ${self.hard}"
)
if projected > self.soft:
logger.warning(f"Soft cap approaching: ${projected:.2f} / ${self.hard}")
# Attach to every agent call site - BEFORE the API call, not after.
PYTHON — Agent step limit guard
# The $47K incident: 4 agents in an infinite loop, 11 days, no step limit.
# One pattern prevents it.
MAX_STEPS = 50 # sensible default - raise per workflow with documented justification
class StepLimitedAgent:
def __init__(self, max_steps: int = MAX_STEPS):
self.max_steps = max_steps
self.steps = 0
def step(self, action):
self.steps += 1
if self.steps > self.max_steps:
raise StepLimitExceeded(
f"Agent stopped at step {self.steps}. "
f"Raise limit explicitly if this workflow requires more."
)
return self._execute(action)
# Without this, every agent workflow carries an open-ended cost exposure.

The Greybeam multi-engine architecture referenced in the Headset case shows how query routing works in practice: a semantic layer sits above multiple execution engines, routes queries based on scan size and complexity, and presents a single SQL interface to the agent. The agent issues queries. The router decides whether they run on DuckDB or Snowflake.


If You're Building This Into a Product

The indirect cost problem is not a large-company problem. It is a unit economics problem, and it compounds faster at startups because of how SaaS pricing works.

Per-seat pricing assumes COGS scales with customers. AI features break that assumption. As customers use AI features more heavily, they generate more agent queries, and infrastructure COGS grows with usage intensity - not headcount. Gross margin compresses silently, because the cost increase is buried in a shared cloud bill with no attribution to the feature driving it.

Three questions to answer before signing any cloud infrastructure contract with minimum commitments:

  1. What percentage of database queries are AI-initiated today - and what will that number be at 2x current customer count?
  2. Is the pricing model COGS-safe if AI query volume per customer doubles?
  3. What is the switching cost if the primary data warehouse turns out to be the wrong architecture for agent-scale query volume?

The answers do not need to exist at seed stage. The measurements do - because without the data, the answers cannot be produced when a Series A investor asks for them in diligence.

SQL — AI query share tracking (run before warehouse commitments)
-- Run this before signing any data warehouse minimum commitment.
-- If you don't have this data yet, instrument it before you sign.
SELECT
DATE_TRUNC('week', query_start_time) AS week,
initiating_service,
COUNT(*) AS query_count,
ROUND(SUM(bytes_scanned) / 1e9, 2) AS gb_scanned,
ROUND(
100.0 * COUNT(*) FILTER (
WHERE initiating_service = 'ai_agent'
) / NULLIF(COUNT(*), 0), 1
) AS ai_query_pct
FROM query_history
WHERE query_start_time >= DATEADD('week', -8, CURRENT_DATE)
GROUP BY 1, 2
ORDER BY week DESC;
-- If ai_query_pct exceeds 15% and is growing faster than customer count:
-- you have an unmodelled cost driver in your unit economics.
-- Do not sign a warehouse minimum until this trend is stable.
PYTHON — N+1 pattern: what AI generates vs what it should be
# What AI generates - the N+1 pattern
# This pattern turned one team's AWS RDS bill from $8K to $35K/month.
def get_authors_with_posts(author_ids: list[int]) -> list[dict]:
authors = db.query("SELECT * FROM authors WHERE id IN (?)", author_ids)
for author in authors:
# One query per author, inside the loop.
# 100 authors = 101 total queries.
author['posts'] = db.query(
"SELECT * FROM posts WHERE author_id = ?", author['id']
)
return authors
# What it should be - always 2 queries, regardless of N
def get_authors_with_posts(author_ids: list[int]) -> list[dict]:
authors = db.query("SELECT * FROM authors WHERE id IN (?)", author_ids)
ids = [a['id'] for a in authors]
posts = db.query("SELECT * FROM posts WHERE author_id IN (?)", ids)
posts_by_author: dict = {}
for post in posts:
posts_by_author.setdefault(post['author_id'], []).append(post)
for author in authors:
author['posts'] = posts_by_author.get(author['id'], [])
return authors

The Closing Argument

Uber disclosed AI spending on an earnings call. The moment AI-driven infrastructure variance becomes material at a public company, it stops being an engineering problem and becomes a disclosure question. Controllers and CFOs are now asking: what percentage of our infrastructure variance last quarter was attributable to AI agent activity? If they cannot answer, the next question is whether that unknown represents a material undisclosed risk.

No industry benchmark yet exists for normal AI-attributable infrastructure cost as a share of total infrastructure spend. A working heuristic: measure AI-initiated queries as a percentage of total query volume. If that share grows faster than headcount, there is an unmodelled cost driver in the business. At 20% AI query share, a governance model is required. At 40%, it belongs in the board pack.

This article is not a case against AI. The productivity gains are real. The infrastructure multiplier is also real. The companies that will win on AI ROI are not the ones that spend the least - they are the ones that can read the second bill. They find the 92% reductions. They price their products correctly. They do not get caught in contracts they cannot exit.

What percentage of your infrastructure variance last quarter was attributable to AI agent activity? If you cannot answer that, you do not have a cost problem yet. You have a measurement problem. And one leads to the other. 👇


Sources

  • Fortune / Andrew Macdonald, Uber COO (May 2026) - Uber AI spending and Claude Code adoption data: fortune.com
  • The Next Web (2026) - Microsoft Claude Code license cancellation, Experiences and Devices group: thenextweb.com
  • Seemore Data (2026) - Snowflake Cortex AI hidden costs and single-query $5,000 charge: seemoredata.io
  • Seemore Data (2026) - Snowflake AI observability for Cortex agents: seemoredata.io
  • Stanford Digital Economy Lab (2026) - Agentic AI token consumption and cost variability research: digitaleconomy.stanford.edu
  • Kyle Cheung, Greybeam / Hugo Lu, DataOps Leadership Substack (May 2026) - Multi-query engine architecture and Headset 92% cost reduction case study: dataopsleadership.substack.com
  • Forrester Research - GitHub Copilot Total Economic Impact (376% ROI, payback under 6 months): github.com/features/copilot
  • Redis Blog (2026) - Prompt caching vs semantic caching: 86% LLM cost reduction: redis.io

Working through the challenges in this post? I help engineering leaders and CTOs navigate complex technical decisions and scale high-performing teams. Schedule a consultation →