AI Infra

As companies shift from experimenting with AI to integrating it across their core products and operations, we believe AI infrastructure has become one of the most important and strategic layers of the enterprise technology stack. The AI model lifecycle generally follows four key stages:

  1. Data preparation

  2. Model development (including pre-training or fine-tuning)

  3. Inference (running the model in real time)

  4. Monitoring and evaluation

In today’s market, the foundational model layer, home to major players like OpenAI, Anthropic, Google, and Meta, has become increasingly crowded and capital intensive, with clear category leaders emerging. While these companies will remain critical to the AI ecosystem, we believe the more compelling investment opportunity now lies in the infrastructure one layer up: the platforms enabling teams to build on, adapt, and run AI models without having to train them from scratch.

We’ve structured our focus around three core parts of the AI infrastructure stack that are rapidly growing, less saturated, and increasingly mission-critical to enterprise AI adoption:

  1. Pre-training Platforms: Tools like Together.ai and Lightning help organizations pre-train models using their own datasets or open weights, offering more flexibility, better cost control, and faster experimentation. These platforms also abstract away the complexities of managing large-scale compute environments, making it easier for engineering teams to work directly with models.

  2. Fine-tuning Platforms: As enterprises seek to tailor models to their proprietary data and use cases, fine-tuning has become a key step. Companies like Lamini, OctoML, and Baseten let businesses fine-tune models securely and at scale, with minimal engineering effort.

  3. Hosting and Inference Platforms: Once a model is ready, it needs to be served in production environments, often at high volume, low latency, and across cloud environments. Inference is now the costliest and most performance-sensitive part of AI deployment, and tools like Modal, Fireworks.ai, Replicate, and Anyscale are emerging as critical providers. These platforms offer developer-first APIs, automated scaling, and model observability while giving teams greater control over performance and cost.

Market Thesis

What I like

  • Massive Enterprise Adoption: AI is now used by 72% of companies, with half deploying it across multiple departments. Large enterprises are increasing AI budgets faster than overall IT spend, driven by tangible results in areas like supply chain and customer service. As a result, hosted AI infrastructure is becoming a foundational layer across the enterprise stack.

  • AI Infrastructure Drives Tangible ROI and Defensibility: Leading AI infrastructure platforms are delivering real business impact by helping companies analyze vast amounts of data, automate repetitive tasks, and improve decision-making. The result: 20–30% reductions in operating costs and up to 50% gains in process efficiency across areas like supply chain, forecasting, and customer service. At the same time, these platforms are building strong competitive moats by embedding into core enterprise workflows, capturing proprietary data and feedback loops, and integrating seamlessly with existing systems.

  • Sticky Platforms through Lock-In: As enterprises begin integrating AI infrastructure into core workflows, switching costs are rising, estimated at 60–70% once platforms are fully adopted. This growing dependency leads to long-term retention, pricing power for vendors, and deeper platform entrenchment over time as more teams and systems rely on the tools.

  • AI Infrastructure Is Becoming Essential: Most enterprises lack the resources or in-house expertise to build and manage the infrastructure required to support large-scale AI systems. As adoption accelerates, companies are increasingly relying on hosted AI platforms to deliver the performance, compliance, and reliability needed for real-world deployment. AI infrastructure is quickly becoming a foundational layer of the modern enterprise stack.

  • Regulatory Push Toward AI Development: The U.S. AI Action Plan marks a major policy shift to accelerate AI leadership by driving innovation, expanding AI infrastructure, and strengthening international standing. With over 90 policy actions, the plan streamlines permits for data centers, reduces regulatory barriers to AI development, and encourages federal investment in high-performance, neutral AI systems. These moves signal a strong national mandate for public and private sectors to prioritize AI, making scalable infrastructure investment an urgent industry priority.

What keeps me up at night

  • Hard to Pick Winners: The AI infrastructure market is evolving rapidly, with new entrants emerging constantly. Long-term winners tend to have strong IP, technical depth, and clear customer traction. Without these advantages, early leads can fade quickly as competition intensifies, and cloud incumbents expand.

    • Mitigant: Investing in companies with strong technical expertise, proprietary infrastructure, and clear customer traction can reduce the risk of being displaced by competitors. Supporting startups that build essential infrastructure such as foundational model platforms or vector native storage also helps protect against commoditization and supports long-term success.

  • Market Concentration and Lock-In Risk: AI infrastructure is dominated by AWS, Google, Microsoft, and Nvidia, who benefit from deep R&D, exclusive chip access, and tightly integrated ecosystems. Their control creates high switching costs and exposes startups to pricing shifts, access limits, and direct competition.

    • Mitigant: Emerging startups are challenging cloud lock-in by offering flexible, cloud-agnostic infrastructure. Platforms like CoreWeave, Together.ai, Modal, Fireworks.ai, and Replicate provide open, developer-first alternatives that give enterprises more control, better pricing, and cross-cloud compatibility. As demand grows for interoperability and independence, these platforms are gaining traction and reducing reliance on hyperscalers.

  • Capital Intensive to Build and Scale: Building and operating AI infrastructure is expensive. Companies must manage large volumes of compute, energy, and storage, especially for training and inference. These demands place a significant burden on unit economics, particularly for earlier-stage platforms that must compete on both performance and cost. Many AI startups are spending up to 50% of their revenue on compute costs compared to just 18% for typical SaaS companies.

    • Mitigant: The U.S. AI Action Plan is helping ease the high cost of building AI infrastructure by boosting government support for data centers, chip factories, and network buildouts. It speeds up the process by streamlining permits, opening access to government-owned land, and offering federal loans, grants, tax breaks, and guaranteed purchase agreements for approved projects.

  • Talent Shortage Blocking Deployment: AI has become the most in-demand skill in tech and there is a global shortage of experienced AI researchers, infrastructure engineers, and DevOps talent, making it difficult for smaller firms to compete or even deliver on contract wins.

    • Mitigant: Prioritize companies that abstract away infrastructure complexity and reduce the need for deep in-house AI or DevOps talent. Startups like Modal and Replicate offer developer-friendly, serverless platforms and simple APIs that allow teams to deploy models without managing underlying infrastructure. These products scale adoption by enabling lean teams to ship production-ready AI with minimal engineering overhead.

Market Overview

Global AI Spending = Worldwide IT Spending (Software + Data Center Systems) × % of AI Spend

  • 2025 Global IT Spending for Data Center Systems: $474.8B

  • 2025 Global IT Spending for Software: $1,232B

  • Total = $1,707B

  • $1.707T × 5.6% (AI % Spend) = $95B

Incumbents

The incumbents (AWS, Microsoft, Google, Oracle, and NVIDIA) are rapidly consolidating power across the AI infrastructure stack by combining control of chips, cloud platforms, and foundational models. Their aggressive capital deployment into custom hardware, global data centers, and strategic partnerships ensures they capture the bulk of enterprise and government demand. This dominance is reflected in their scale, with billions spent annually on AI infrastructure buildouts, expanding ecosystems, and compliance capabilities. For startups, the incumbents’ vertical integration creates both a barrier to entry and an opportunity to differentiate through flexibility, cost efficiency, and cloud-agnostic alternatives.

Market Trends

Headwinds

  • Compute Scarcity and Rising Costs: Microsoft, Google, and Meta have bought over 60% of Nvidia’s GPUs for 2024 and 2025, leading to wait times as long as 50 weeks and making it hard for smaller companies to get access. Startups are now spending up to 50% of their budgets on GPUs and cloud services. To lower costs, big tech companies are building their own chips, securing power deals, and opening new factories. Shortages of key components like high-bandwidth memory are making infrastructure even more expensive, and overall compute spending is expected to nearly double by 2025.

  • Vendor Lock-In and API Fragmentation: Enterprises face high switching costs because they rely on proprietary APIs. Changing providers means rewriting code, retraining models, and paying steep fees. Many companies use over 100 different APIs with no standard format, which limits customization, slows innovation, and is drawing increased regulatory scrutiny.

  • Race to the Bottom: As hyperscalers escalate spending, a race to the bottom has begun. AI infrastructure providers face rising capital expenses but falling prices, pushing margins down to 50–60% or lower. Unlike SaaS, inference has ongoing compute costs, and some providers operate at breakeven or negative margins to grow.

  • Lack of Standardized AI Benchmarks: Without clear frameworks to measure output quality, bias, robustness, or safety, enterprises and regulators struggle to evaluate model risk, especially in sensitive sectors like healthcare or finance. This uncertainty creates legal, compliance, and reputational concerns, slowing adoption, increasing costs, and limiting trust across the ecosystem.

Tailwinds

  • Enterprise AI Adoption Acceleration: LLM adoption rose from 55% to 78% year over year, with enterprise spending surging to $13.8 billion. As companies move from testing to operational use, infrastructure becomes critical. Today, 75% of enterprises fine-tune models with internal data, a figure expected to reach 90% by 2030. This is increasing the need for ongoing compute power, machine learning engineers, and GPU-heavy workloads.

  • Inference Workloads Dominate AI Spending: Inference now makes up most of AI infrastructure spending, reaching $97 billion in 2024 and expected to surpass $250 billion by 2030. As token usage continues to grow, demand is rising for always-on, low-latency systems that can deliver fast and reliable results.

  • Sovereign AI Infrastructure Surge: Nations and companies are investing billions in sovereign AI infrastructure as 140+ countries now require data to be stored locally. The US administration has issued the 2025 federal AI Action Plan to speed up domestic AI development by making it easier to build data centers, supporting US-made AI technology, expanding chip and energy production, and promoting open, secure, and compliant AI. These efforts are driving strong demand for local data centers, national cloud platforms, and security-focused AI solutions, creating major growth opportunities for providers that can deliver regulated and sovereign infrastructure.

Model Value Chain

The Model Development and Hosting sector includes the full process of creating, training, and deploying AI models for real-world use. It is organized into three parts: Model Providers, Training, and Hosting and Inference. It begins with pretraining, where models learn from large datasets to build general capabilities. These models are then fine-tuned with more specific data to perform well on particular tasks. Once trained, they are deployed through inference platforms that deliver fast, scalable access through APIs.

I. Fully Trained Models

AI model providers fall into two categories: open source and closed source, each with trade-offs across cost, flexibility, and ease of deployment:

  • Open-Source Model Providers:
    Publicly shared models allow full control and self-hosting, with lower long-term costs but higher setup effort.
    → Examples: Llama 2, Falcon, RedPajama, INCITE

  • Closed-Source Model Providers:
    Offers proprietary models through APIs with easy setup, higher long-term costs, and limited flexibility.
    → Examples: Anthropic, Cohere, AI21 Labs, OpenAI, DeepMind

II. Training

Support the model training lifecycle from foundation model pretraining to domain-specific fine-tuning across use cases and infrastructure types:

  • Pre-train
    Provide infrastructure or tools for training large-scale foundation models from scratch.
    → Examples: Together.ai, MosaicML, Lightning AI

  • Fine-tune
    Enable model adaptation through techniques like instruction tuning, LoRA, and parameter-efficient fine-tuning.
    → Examples: Together.ai, MosaicML, Hugging Face, OctoML, Stochastic, Lamini

III. Hosting & Inference

Provide the infrastructure to deploy and run AI models, making them accessible and scalable through APIs and serverless tools:

  • Model Hosting Platforms
    Let developers run, scale, and serve AI models (often open-source) via APIs or simple deployment flows.
    → Examples: Replicate, Fireworks.ai, Hugging Face, Together.ai, Banana

  • Inference Optimization & Infra Orchestration
    Make model use faster, cheaper, and easier to run by improving performance and simplifying setup.
    → Examples: OctoML, Modular, MosaicML

  • Serverless / ML Infrastructure Platforms
    Run ML workloads with minimal setup and automatic scaling, no server management needed.
    → Examples: Modal, Stochastic

Previous
Previous

Baseten vs AWS: Stop Paying Rent on Idle GPUs

Next
Next

“I’M JUST A GIRL”