Opinion

Why the Future of AI Belongs to Smaller Models: Affordable, Efficient, and Smarter

dean

11 May 2025 • 5 min read

The AI landscape is evolving.

For years, the biggest headlines have focused on massive leaps: GPT-3, GPT-4, GPT-5, with ever more parameters, compute power, and billion-dollar training runs. OpenAI, Microsoft, SoftBank, and Google DeepMind are pouring staggering amounts into the next generation of “mega-models,” banking on scale as the key to progress.

But beneath that headline race, something subtler — and possibly more transformative — is emerging: the rise of smaller, more efficient AI models.

A recent Medium piece by S. Thomason argues that the next phase of AI growth won’t necessarily be defined by who has the largest supercomputer but by who can make AI smarter, leaner, and more accessible.

This post explores why smaller models matter, the risks big players like OpenAI, Microsoft, and SoftBank are taking by chasing size, and why a shift toward nimble, local, affordable AI could reshape not only the tech world but also education, creativity, and individual empowerment.

The Mega-Model Arms Race

Let’s start with what’s happening at the top.

We are living through an unprecedented AI infrastructure buildout:

Microsoft’s $10 billion+ investment into OpenAI (Reuters).
Rumors of the $100 billion Stargate project — a supercomputer aimed at enabling GPT-6 and beyond (The Information).
Google’s ambitious Gemini models.
SoftBank’s massive bets on AI hardware through its control of Arm.

The logic here is simple:
✅ Bigger models = better performance.
✅ More compute = more capability.
✅ Scale = moat.

But this approach comes with enormous risks:

Skyrocketing costs: Each new generation of large language models (LLMs) costs exponentially more to train and run.
Energy consumption: Training a single mega-model can consume as much energy as several thousand homes for a year (MIT Technology Review).
Diminishing returns: Bigger doesn’t always mean smarter. Recent research suggests that, past a certain point, larger models yield only marginal improvements for enormous cost.
Centralization of power: Only a handful of companies can even afford to play in this space, raising concerns about monopolies, data control, and governance.

And perhaps most critically:

Real-world use cases often don’t need massive general-purpose models.

The Case for Small and Efficient AI

This is where the story flips.

While mega-companies build gigantic models, researchers and developers are showing that small, purpose-built models can:
✅ Solve real-world problems faster
✅ Run locally, without cloud dependencies
✅ Use dramatically fewer resources
✅ Be fine-tuned for specific industries, languages, or tasks
✅ Empower smaller players, from startups to hobbyists

Examples include:

Phi-3: A Microsoft research model that punches far above its size class.
Gemma: Google’s lightweight open-weight model designed for edge deployment.
Mistral: An impressively capable small-scale LLM gaining popularity in the open-source community.
LLaMA 3: Meta’s contribution to efficient AI at scale.

These models are often “good enough” for tasks like:

Document summarization
Code assistance
Chatbots
Local language processing
Education tools

They don’t need cloud supercomputers. They can run on laptops, smartphones, or affordable servers — and they put AI capabilities directly into the hands of users.

Stargate, SoftBank, and the Risks of Overreach

The push toward mega-models isn’t just a technical bet — it’s a financial gamble.

Take OpenAI’s rumored Stargate project. If true, it would represent one of the largest infrastructure bets in the history of computing — over $100 billion to build the next generation of training clusters (The Information).

Similarly, SoftBank’s investments in Arm and other hardware ventures aim to capture the hardware layer of the AI boom. But if the market shifts toward more efficient, less resource-intensive models, these bets could turn sour.

There’s a historical pattern here:

Mainframes were once seen as the future — until personal computers reshaped the market.
Centralized telecoms once dominated — until mobile phones decentralized communication.
Cloud computing reshaped software — but now edge computing is rising.

The AI world may be heading toward its own decentralization moment, and mega-scale bets could turn into cautionary tales.

Microsoft’s Quiet Rebalancing

Even among the giants, there are signs of recalibration.

While Microsoft is heavily invested in OpenAI, recent reports suggest it is also looking to reduce costs and diversify its AI portfolio. Massive GPU spending has raised internal concerns, and Microsoft has begun embedding smaller models directly into products like Windows, Office, and Teams (The Verge).

This shift reflects an important strategic insight:

Not every user needs GPT-6.
Not every task needs a cloud supercomputer.
Sometimes, embedding a small, efficient model locally provides a better, faster, cheaper experience.

If Microsoft begins pulling back from all-out mega-scale investments, it could signal the start of a broader rebalancing across the industry.

Why This Matters for Education

For schools, universities, and learners, the rise of smaller models is a game-changer.

Here’s why:
✅ Affordability: Schools don’t need mega-cloud budgets to deploy useful AI tools.
✅ Privacy: Local models reduce reliance on third-party servers, protecting student data.
✅ Customisation: Teachers and institutions can fine-tune models to suit local needs — language, curriculum, culture — without waiting for big providers.
✅ Skill-building: Students can experiment directly, gaining hands-on understanding of how AI works, rather than passively consuming outputs.

Imagine classrooms where:

Students deploy their own chatbots.
Teachers build lesson planners powered by fine-tuned local models.
Schools run AI systems on their own hardware, tailoring them to their community’s needs.

This isn’t science fiction. It’s already starting to happen — and small models are leading the way.

A Warning About Overhype

It’s worth noting that the hype cycle is dangerous on both ends.

While big AI projects like Stargate risk overreach, there’s also a temptation to oversell small models. Not every problem can (or should) be solved locally. There will always be use cases — like cutting-edge research or large-scale language understanding — where big models matter.

But balance is key. The future isn’t about one winner; it’s about an ecosystem where small and large models coexist, each serving the needs they’re best suited for.

The Bigger Picture: Decentralization and Agency

Zooming out, this shift toward smaller models is part of a broader pattern:

Decentralisation of computing.
Increased user agency.
Lower barriers to entry.

Whether it’s in creativity, education, research, or entrepreneurship, the rise of efficient AI models signals a world where more people can participate, experiment, and build.

That’s not just good for technology — it’s good for society.

Final Thought

As someone who has hands-on experience running local models like Gemma and Mistral through Ollama on my home server (Blackbox), I’ve already seen the empowering potential of small models.

Without relying on expensive cloud subscriptions or enormous datasets, I’ve been able to explore, build, and experiment on my own terms. It’s made me think differently about what “AI access” really means — and why decentralization matters.

While I’m realistic about the limits (no, I’m not running a ChatGPT-4 competitor in my cupboard), I’m convinced that local-first, efficient AI opens the door for hobbyists, educators, researchers, and small businesses to innovate without needing Silicon Valley’s blessing.

The mega-model race will grab the headlines.
But the real revolution might be happening elsewhere:
In the classrooms, the home labs, the startup garages — wherever small, smart, efficient AI models empower people to do things they couldn’t do before.

In the end, it’s not about how big your model is.
It’s about what you can do with it.