Every AI startup hits the same wall eventually. The product is working, users are growing, and then the infrastructure bill arrives and nothing makes sense anymore. The question is not which model to use or which framework to build on. The question is
where your AI actually lives,
who owns it, and
what happens to your margins as you scale.
There are three positions available to you. You can build on hosted inference APIs, paying OpenAI or Anthropic or one of the cheaper alternatives per token. You can rent GPU compute by the hour from neocloud providers like Lambda, CoreWeave, or Crusoe. Or you can buy hardware and operate it yourself. Build, rent, buy. Three positions, very different economics at different scales.
Most founders treat this as a one-time decision.
It is not.
It is a continuous optimisation problem.
and, the right answer changes as your traffic grows, as GPU prices shift, and as your model strategy matures.
H100 cloud rental rates have fallen from eight dollars an hour in early 2023 to around one eighty to three fifty per hour in mid-2026. API prices have fallen too, but unevenly - there is now a 640-times gap between the cheapest viable LLM API and the most expensive frontier option. That spread is enormous, and most founders are not actively managing it.
The decision is driven by three variables:
your utilization rate,
your workload predictability, and
your engineering capacity.
High sustained utilization, predictable traffic, and a team that can operate infrastructure - when all three are true, owning your compute makes economic sense. When any one is missing, you want flexibility.
Here is the number that changes how you think about this.
Eighty percent of AI GPU spend is now inference, not training.
That means your infrastructure choice is being made primarily for production workloads, not for training runs. And for regulated sectors like aerospace, transport, healthcare, financial services - where your data goes is not a preference. It is a legal requirement.
This series runs eight episodes. We cover inference API economics, GPU rental markets, the on-premises case, open source versus proprietary models, AI FinOps, sovereign AI and compliance, edge inference, and the Nvidia compute wars story.
Listen to the full episode here, in Substack app, or Apple, Spotify / youtube.
I’ll add a companion cost calculator and spreadsheet at intelligentfounder.ai soon.












