Most AI planning is built on a flawed assumption: that today's pricing is real. It isn't. What we're seeing right now is not a stable market for intelligence. It's a subsidised land grab. The companies that win won't be the ones optimising for today's pricing. They'll be the ones already building for tomorrow's cost structure.

1. The Assumption Everyone's Operating On

Right now, the market for AI is sending a signal.

You can build AI-powered features for $0.001 per thousand tokens. You can subscribe to Claude, ChatGPT, or Gemini for $20 a month. You can deploy inference at scale without breaking your unit economics.

That signal looks like permission.

Permission to design systems around unlimited API access. Permission to call language models for every decision, every summarization, every classification in your product. Permission to treat inference like a free resource you simply tap whenever needed.

Companies are building their entire roadmaps around that signal.

But here's the problem: that signal is lying.

2. What's Actually Happening: A Subsidized Land Grab

This is not a stable market for intelligence.

It's a subsidised land grab.

The numbers make this obvious:

GPU supply is constrained. The H100 shortage didn't end; it just became accepted as baseline. Every major lab needs exponentially more compute just to train the next generation. Nvidia controls the supply. Lead times exist. Margins are real.

Inference costs are high. Running a trillion-parameter model through billions of requests has real computational expense. The margin economics on inference-as-a-service are thin without scale. If you're not at Anthropic, OpenAI, or Google's scale, you're not getting there.

And major labs are burning cash to lock in developers. The "$20/month unlimited AI" model? That's not pricing based on cost. That's a subsidy. Anthropic, OpenAI, Google, Meta—they're all willing to lose money on individual API calls to win market share, lock in developer mindshare, and prevent competitors from establishing beachheads.

The goal is clear: make it so cheap and convenient that you design systems that only work with their API.

That's brilliant strategy.

But it's not economics. It's a land grab.

And land grabs end.

3. The Hidden Mistake: Designing for Today

Here's where companies are quietly going wrong.

They are designing systems, workflows, and ROI models around temporary pricing conditions.

I see this constantly:

A SaaS company launches an "AI-powered" feature. It calls an LLM for every user action: summarization, categorization, content generation, decision support. Cheap. Reliable. Fast to ship.

The feature ships. Adoption grows. Cost per request stays at $0.0001. Everything works.

But the company has now locked itself into a specific economic model: low-friction, high-frequency inference.

If inference costs 10x tomorrow, the unit economics shatter.

If inference costs 100x, the entire feature becomes a liability.

And the architecture can't adapt. You can't pull AI out of a system that was built assuming it was free. You've baked it into the user experience, the data model, the support structure, everything.

As I wrote in "AI Unlocks Economics", the companies winning right now are the ones architecting for AI as a first-class cost lever. But most aren't. Most are just adding it where it's cheap.

That works until it doesn't.

4. What Real Costs Actually Look Like

Once the market stabilises—and it will—AI pricing will converge toward actual inputs, not distribution subsidies.

That means:

  • Compute - GPU cycles (H100 prices, power, cooling, depreciation, replacement)
  • Memory bandwidth - RAM and VRAM constraints (the bottleneck nobody talks about)
  • Latency guarantees - SLA penalties, dedicated capacity, priority routing
  • Uptime and reliability overhead - redundancy, failover, monitoring, on-call costs
  • Model complexity per task - smaller models for simple tasks, larger for complex ones

Not subscriptions. Not "flat access." Not fantasy bundles where you're subsidizing billion-token monthly budgets.

Real infrastructure pricing. Real operational costs. Real trade-offs.

When that happens, the current "efficiency" of shipping AI everywhere becomes what it actually is: massively wasteful.

5. The Real Optimization Problem

And this is where most companies get it backwards.

They ask: "How do we use more AI?"

The companies that survive cost normalization will ask: "How do we design systems that minimise unnecessary inference while maximising output quality?"

Those are completely different problems.

The first leads to: bloated inference pipelines, redundant API calls, wasteful reranking, over-engineered summarization, and slow unit economics.

The second leads to: intelligent caching, hybrid architectures (AI for the 10% of decisions that matter, heuristics for the 90%), batch processing instead of real-time, smaller models for fast classification, larger models only for complex reasoning.

As I explored in "The Hidden Cost of AI-Generated Code", this applies even to code generation: the cost isn't the API call. The cost is the technical debt of maintaining globally fragile systems built by optimizing locally for speed. That's a cost that appears later, silently, in maintenance burden.

The same principle applies everywhere AI touches your system.

Low-frequency, high-value inference = defensible.

High-frequency, low-value inference = fragile.

Companies designing for the first architecture now will have pricing power tomorrow.

Companies designed for the second will have a restructuring problem.

6. Who Actually Wins When Pricing Normalises

Here's the uncomfortable truth:

The companies that win won't be the ones optimising for today's pricing.

They'll be the ones already building for tomorrow's cost structure.

That means:

  • Treating inference as a constrained resource, not a commodity
  • Building hybrid systems that use AI where it compounds (complex reasoning, pattern recognition, content generation) and leave heuristics everywhere else
  • Investing in edge inference and smaller models to reduce API dependency
  • Designing for cacheability - fewer novel problems, more cached solutions
  • Engineering for interrogation - systems that can explain why they called an LLM, and prove it was worth it

As I wrote in "The 5 Files You Must Still Review", the standard that separates builders from generators is: can you defend every decision your system made? For AI, that standard is even more brutal. If you can't defend why you paid for that inference, you've wasted it.

Companies building these standards now—while inference is cheap—will have architectural advantage when inference is expensive.

The ones who don't? They'll face a choice:

  1. Restructure the entire system to reduce inference (slow, painful, risky)
  2. Keep the bloated system and accept lower margins (slow death)
  3. Kill the feature entirely (fast death)

None of those are good options.

The time to fix it is now.

7. What This Means for Your Team

If you're building AI-powered features today, ask yourself:

Will this architecture still make sense when inference costs 10x what it costs today?

If the answer is "no," you're not building for the future. You're building for a subsidy.

And subsidies always end.

Start now:

  1. Measure inference like a cost line item. Every API call, every token, every model selection should be tracked like infrastructure spend. Make it visible. Make it defendable.
  2. Build hybrid systems. AI for the decisions that matter. Rules and heuristics for everything else. Your margins will be better. Your latency will be faster. Your system will be simpler.
  3. Invest in smaller models. Haiku is often faster and cheaper than Opus, because constraints force clarity. The same principle applies to task-specific models. Spend engineering time now to reduce model size and complexity later.
  4. Design for cacheability. If you're solving the same problem twice, cache it. If you're calling an LLM for identical inputs, cache the output. If you're doing redundant inference, stop.
  5. Treat inference reduction like a feature. Make it a sprint goal. Make it a roadmap item. Make it a metric you track. Not because it's fashionable, but because it's going to be necessary.

The companies that do this while inference is cheap will have pricing power, architectural clarity, and customer defensibility when the market normalises.

The ones that don't will have a rewrite in their future.

Working through the challenges in this post? I help engineering leaders and CTOs navigate complex technical decisions and scale high-performing teams. Schedule a consultation →