← Back to all articles
Challenges

NVIDIA's Record Earnings: The AI Infrastructure Boom Is Real

By Marc Molas·August 17, 2023·9 min read

The numbers speak for themselves. When NVIDIA reported its Q1 FY2024 earnings on May 24, 2023, total revenue hit $7.19 billion, with data center revenue surging to $4.28 billion -- up 14% quarter-over-quarter and 18% year-over-year. The stock jumped 25% in a single after-hours session, adding roughly $200 billion in market cap overnight. That's not a blip. That's a tectonic shift.

And this was just the appetizer. NVIDIA's guidance for Q2 FY2024 projected revenue of approximately $11 billion, crushing analyst expectations of $7.2 billion. By the time you're reading this, the Q2 results are imminent, and every indicator suggests they'll be even more staggering. As Reuters and CNBC have extensively covered, the AI chip boom has turned NVIDIA into one of the most valuable companies on the planet -- briefly crossing the $1 trillion market cap threshold in May.

This isn't just a story for Wall Street. If you're leading an engineering team, especially one building anything that touches machine learning, this AI infrastructure boom directly affects your technical decisions, your costs, and your hiring.

What's driving the surge

The demand is coming from everywhere, all at once.

Hyperscalers are in an arms race. Microsoft, Google, Amazon, and Meta are all aggressively expanding their AI compute capacity. Microsoft's partnership with OpenAI alone is driving massive GPU procurement. Google is training Gemini. Meta is training Llama. Each of these efforts requires tens of thousands of A100 and H100 GPUs. The hyperscalers are buying everything NVIDIA can produce, and they're placing orders years in advance.

Enterprise AI adoption is accelerating. Every Fortune 500 company is now running AI initiatives -- not as research projects but as core business strategy. They need inference capacity for production workloads: recommendation engines, fraud detection, natural language processing, computer vision. This is steady, recurring demand, not a one-time purchase.

The LLM training race continues. Training a frontier large language model like GPT-4 is estimated to require thousands of GPUs running for months. Every new entrant in the LLM space -- Anthropic, Cohere, Mistral, and others -- needs massive compute to train competitive models. And the models keep getting larger.

China is stockpiling. Despite export restrictions on the most advanced chips, Chinese companies have been buying every NVIDIA GPU they can legally acquire, and the demand for the compliant-but-still-powerful alternatives (A800, H800) is enormous.

What this means for GPU costs and availability

For engineering teams, the practical impact is straightforward: GPUs are expensive and hard to get.

Cloud GPU instances have not gotten cheaper. Despite the normal trend of cloud compute costs decreasing over time, GPU instances have held steady or increased in price. An A100 instance on AWS (p4d.24xlarge) still runs $32.77/hour on-demand. H100 instances (p5.48xlarge) are even more expensive. Spot availability is unpredictable -- you might get a good deal, or you might wait hours for capacity.

On-premise GPU procurement has long lead times. If you wanted to buy H100 GPUs directly, the wait time as of mid-2023 was reportedly 36-52 weeks. Dell, Supermicro, and other server vendors are backordered. This isn't a supply chain issue that resolves in a quarter -- NVIDIA's next-gen Blackwell architecture won't ship until 2024.

Alternative GPU providers are emerging. Companies like CoreWeave, Lambda Labs, and Together AI are building GPU clouds specifically for ML workloads, often at prices 30-50% below the hyperscalers. These are worth evaluating, especially for training jobs that don't need AWS's full ecosystem.

The build vs. API decision just got more important

For startups building AI-powered products, the infrastructure boom makes the build vs. buy decision sharper than ever. Here's how I think about it:

Use API calls (OpenAI, Anthropic, etc.) when:

  • You're in the experimentation phase. You don't know yet if the AI feature will work or if customers want it. Spending $50-500/month on API calls to validate the concept is infinitely smarter than provisioning GPU infrastructure.
  • Your inference volume is low to moderate. If you're making fewer than 100,000 API calls per month, the unit economics of API calls usually beat the cost of running your own infrastructure.
  • You need frontier model capabilities. If your use case requires GPT-4-class reasoning or Claude's analytical capabilities, you literally can't replicate that with your own models yet. The API is your only option.
  • Your team doesn't have ML infrastructure expertise. Running GPU inference in production -- handling scaling, failover, model versioning, monitoring -- is a real operational burden. If your team is four engineers building a SaaS product, this isn't where you should be spending your time.

Invest in your own GPU infrastructure when:

  • Inference costs are a significant line item. If you're spending $10,000+/month on API calls and the volume is predictable, running your own models (especially open-source alternatives like Llama 2) can reduce costs by 60-80%.
  • Latency is critical. API calls add network latency and are subject to the provider's queue times. If you need sub-100ms inference for a real-time application, self-hosted models on dedicated GPUs give you control.
  • Data privacy requirements prohibit external APIs. If your data can't leave your infrastructure for regulatory or contractual reasons, you need to run models locally.
  • You need fine-tuned models. If the generic API doesn't perform well enough for your domain and you need fine-tuned models on your own data, you'll need GPU infrastructure for both training and inference.

The hybrid approach (what I recommend for most startups):

  • Use APIs for prototyping and initial launch. Get the product to market fast.
  • Measure your actual inference costs and volumes. Don't optimize prematurely.
  • When API costs hit $5,000-10,000/month and growing, evaluate self-hosting. Run the numbers: GPU cloud costs (not on-premise, not yet) vs. API costs at projected volumes. Include the engineering time to set up and maintain the infrastructure.
  • Start with managed GPU clouds, not hyperscalers. CoreWeave, Lambda, or Replicate give you GPU access without the complexity of provisioning EC2 instances and managing CUDA drivers.

Cloud provider pricing implications

The GPU shortage is rippling through cloud pricing in ways that affect all engineering teams, not just ML teams:

General compute hasn't gotten cheaper either. Normally, cloud providers reduce prices annually as hardware costs decrease. The AI boom is consuming so much of the hyperscalers' CapEx that the usual price reduction cycle has slowed. AWS, GCP, and Azure have all been investing heavily in GPU capacity, and that investment comes at the expense of price reductions on other instance types.

Reserved instance economics are shifting. The usual advice of "buy reserved instances for predictable workloads" still holds, but the discount spreads have narrowed for GPU instances. Providers know GPU capacity is scarce and aren't incentivized to offer deep discounts.

Multi-cloud leverage matters more. When one cloud's GPU capacity is exhausted, having the ability to burst to another is valuable. Teams that have abstracted their infrastructure enough to be cloud-portable have an advantage.

Implications for startups building AI products

If you're a startup founder or CTO thinking about AI product development in this environment, here's my practical advice:

  1. Don't build GPU infrastructure until you've proven the product. The biggest waste I've seen is startups investing six figures in GPU infrastructure before validating that customers will pay for the AI-powered feature. Use APIs. They're more expensive per inference but infinitely cheaper than building infrastructure for a product that doesn't find market fit.

  2. Budget for inference costs explicitly. AI inference isn't free, and it doesn't scale like traditional compute. If your product makes 10 LLM calls per user session, model the unit economics now. What does it cost to serve one customer? Does that scale?

  3. Hire engineers who understand the trade-offs, not just the models. The most valuable ML engineers right now aren't the ones who can fine-tune a model -- it's the ones who can evaluate whether you should fine-tune a model or use an API, estimate the infrastructure costs of each approach, and architect a system that lets you switch later.

  4. Watch the open-source model ecosystem closely. Llama 2, Mistral, and the broader open-source LLM movement are rapidly closing the gap with proprietary APIs. Models that required $100K in compute to train a year ago can now be fine-tuned for $1,000. This trend directly reduces your dependency on expensive API calls.

  5. Plan for cost optimization in 12-18 months, not now. NVIDIA's supply will eventually catch up. New GPU architectures will launch. Competition from AMD and custom silicon (Google TPUs, Amazon Inferentia) will increase options. The infrastructure landscape in 2024-2025 will look very different from today. Don't over-invest in today's constraints.

At Conectia, we're seeing increased demand from startups that need engineers who can navigate these infrastructure decisions -- not just write ML models but architect the systems around them. Our senior LATAM engineers include backend and infrastructure specialists who've built AI-powered products and understand the build-vs-buy trade-offs firsthand.

The AI infrastructure boom is real, it's reshaping the economics of building software products, and it's not going away. The question for engineering leaders isn't whether to engage with it -- it's how to do so without burning through your runway on GPU bills.


Building an AI-powered product and need engineers who understand infrastructure trade-offs, not just models? Talk to a CTO -- our senior LATAM engineers help you ship AI features without over-investing in infrastructure.

Ready to build your engineering team?

Talk to a technical partner and get CTO-vetted developers deployed in 72 hours.