technology · business · May 30, 2026

Tokenmaxxing: How Amazon's AI Leaderboard Became an Expensive Joke

No reader ratings yet.

📰 Reading Passage

In late 2025, Amazon shut down an internal leaderboard called Kirorank that had been quietly tracking how much each of its employees used the company's in-house AI coding platform, Kiro. The dashboard was supposed to celebrate enthusiastic adopters of artificial intelligence. Instead, it became an expensive cautionary tale.

Amazon, a roughly $2.9 trillion company, has pushed its engineers hard toward AI. Internal targets require more than 80% of developers to use AI tools each week, and the company is expected to spend around $200bn on capital projects this year, the vast majority of it on AI infrastructure and data centres. In that environment, Kirorank looked like a natural extension of the strategy: rank engineers by their AI activity, post the scoreboard, and watch adoption climb.

It did climb — just not in a useful way. According to people familiar with the matter, workers began pointing autonomous AI 'agents' (bots that can take multi-step actions on a user's behalf) at unnecessary tasks, running repeated, low-value calls simply to inflate their consumption of 'tokens,' the units of text that AI models process and that cloud providers bill for. Inside Amazon, the practice picked up a name: tokenmaxxing. The result was a leaderboard full of activity that looked productive but wasn't, and a noticeably larger compute bill for Amazon itself. Dave Treadwell, an Amazon senior vice-president, told staff the leaderboard had been built with 'good intentions' but was being deprecated, and asked employees, plainly, not to use AI just for the sake of using AI.

Here's the catch that makes this more than an in-house embarrassment. The economics of corporate AI have shifted. Major model providers — Anthropic, whose systems Amazon uses heavily, among them — have moved away from flat monthly subscriptions toward consumption-based pricing, in which every token processed has a price. Under the old flat-fee world, wasted tokens were essentially free. Under the new one, every gamed leaderboard score is a small but real transfer of money from Amazon to its AI suppliers. Multiply that across a workforce of hundreds of thousands of engineers and the numbers stop being small.

Amazon is not alone. The Financial Times has reported that Meta employees have engaged in similar gaming on their own internal tables, suggesting tokenmaxxing is less an Amazon-specific glitch than a Big Tech pattern. The underlying dynamic is familiar to anyone who has studied incentives: when a measure becomes a target, people optimise for the target, not for the thing the measure was meant to track. A score meant to capture 'how seriously do you take AI?' instead captured 'how willing are you to waste compute?' — and the two answers turned out to be different.

The industry's response is already taking shape. Amazon has begun emphasising a metric it calls 'normalised deployments,' which tries to measure whether AI-assisted code actually shipped and produced value, rather than how much was generated. Expect more of that kind of pivot from rivals as investors start asking harder questions about whether the hundreds of billions flowing into AI data centres are producing real productivity gains — or just very expensive activity charts.

Source: https://www.ft.com/content/b1a62a7f-6df5-4c90-94ce-64ce9c9961b6

📎 Download Original ⬇ Download Analysis PDF

📖 Explanation

Amazon built a scoreboard to celebrate employees who used its AI tools the most. Workers promptly figured out that the fastest way to win was to waste the company's money.

📖 What's Going On?

Amazon has quietly killed an internal leaderboard called Kirorank that ranked employees by how much they used the company's in-house AI coding tool. The tracker lived inside Amazon's Kiro developer platform, and its scores were supposed to celebrate engineers who were eagerly adopting AI in their workflows.

Instead, workers started gaming the system. Some pointed autonomous AI 'agents' at pointless busywork just to rack up activity, a practice Amazon insiders nicknamed 'tokenmaxxing.' Senior vice-president Dave Treadwell told staff the leaderboard had been built with 'good intentions' but was driving up the company's compute bill, so it was being deprecated. Meta engineers, the Financial Times reported, have been playing the same game on their own internal dashboards.

🎯 How To Think About It

This is a textbook case of Goodhart's Law: the moment you turn a measurement into a target, people stop optimising for the underlying thing you actually wanted and start optimising for the number itself. The leaderboard didn't measure good engineering — it measured tokens consumed.

Imagine a teacher who grades essays by word count. Students don't write better essays; they write longer, padded ones — and the teacher still has to read them all.
Or think of Soviet nail factories famously judged on tonnage: they made giant, useless nails. Judged on quantity, they made millions of tiny ones nobody could hammer. The metric, not the customer, ran the factory.

💡 Key Things To Know

Amazon is a roughly $2.9 trillion company and plans to spend about $200bn in capital expenditure this year, the vast majority of it on AI and data centres.
A 'token' is a chunk of text an AI model processes — every token costs real money in GPU time, so inflated usage hits the bottom line directly.
Amazon has pushed targets requiring more than 80% of its developers to use AI tools each week, which created the pressure to perform usage in the first place.
AI providers like Anthropic (whose models Amazon uses heavily) have shifted from flat monthly fees to consumption-based pricing, meaning every wasted token is now billed.
The popular misread: this isn't lazy employees sabotaging Amazon. It's rational employees responding exactly the way the incentive system told them to.

🌟 Why It Matters

If you're heading into a STEM or business career, you'll spend a lot of time being measured — GPA, KPIs, OKRs, GitHub commits, LinkedIn posts. The Kirorank fiasco is a live demo of why smart people game dumb metrics, and why your future managers will obsess over 'productivity' numbers that may say almost nothing about whether real work is getting done. It also tells you something important about the AI hype cycle: companies are under so much pressure to look AI-native that they sometimes confuse using AI with creating value.

🔮 The Bigger Picture

Watch what happens to AI economics next. Now that providers charge by the token and hyperscalers are sinking hundreds of billions into data centres, every 'tokenmaxxing' spree shows up as wasted capex. Expect a swing in the opposite direction — leaderboards replaced by 'normalised deployments' and quality metrics, plus tighter scrutiny of whether AI adoption is actually producing useful code or just expensive theatre. The second-order effect: investors will start demanding evidence that AI spending translates to revenue, not just usage charts.

📚 Key Terms Glossary

Token

The basic unit of text an AI language model reads or writes — roughly a short word or piece of a word. Cloud AI services bill customers per token processed.

Tokenmaxxing

Insider slang for artificially inflating your AI-token usage to look productive on an internal dashboard, even when the underlying work is pointless.

AI agent

An autonomous AI program that can take multi-step actions on a user's behalf — sending emails, running code, calling other tools — rather than just answering one prompt.

Deprecated

Tech-industry term for officially retiring a tool or feature. It still might exist briefly, but it's no longer supported or recommended.

Capital expenditure (capex)

Money a company spends on long-lived physical assets like buildings, servers, or data centres — as opposed to day-to-day operating costs.

Consumption-based pricing

A billing model where you pay per unit used (per token, per API call) rather than a flat monthly fee. It rewards efficiency and punishes waste.

Goodhart's Law

The principle that 'when a measure becomes a target, it ceases to be a good measure' — because people start gaming the metric instead of pursuing what it was meant to track.

Normalised deployment

Amazon's preferred metric: a measure of AI tool use weighted by whether the code actually shipped and produced value, rather than just counting raw activity.

← Previous (older)

BYD's $16 Billion IOU Habit Just Caught Up With It

Next (newer) →

Why Iran's Top Strategists Think Peace Now Means a Bigger War Later