โ† Back to articles
โ† Previous (older)
BYD's $16 Billion IOU Habit Just Caught Up With It
Next (newer) โ†’
Why Iran's Top Strategists Think Peace Now Means a Bigger War Later
technology ยท business ยท May 30, 2026

Tokenmaxxing: How Amazon's AI Leaderboard Became an Expensive Joke

No reader ratings yet.
Log in to rate this article
๐Ÿ“ฐ Reading Passage

In late 2025, Amazon shut down an internal leaderboard called Kirorank that had been quietly tracking how much each of its employees used the company's in-house AI coding platform, Kiro. The dashboard was supposed to celebrate enthusiastic adopters of artificial intelligence. Instead, it became an expensive cautionary tale.

Amazon, a roughly $2.9 trillion company, has pushed its engineers hard toward AI. Internal targets require more than 80% of developers to use AI tools each week, and the company is expected to spend around $200bn on capital projects this year, the vast majority of it on AI infrastructure and data centres. In that environment, Kirorank looked like a natural extension of the strategy: rank engineers by their AI activity, post the scoreboard, and watch adoption climb.

It did climb โ€” just not in a useful way. According to people familiar with the matter, workers began pointing autonomous AI 'agents' (bots that can take multi-step actions on a user's behalf) at unnecessary tasks, running repeated, low-value calls simply to inflate their consumption of 'tokens,' the units of text that AI models process and that cloud providers bill for. Inside Amazon, the practice picked up a name: tokenmaxxing. The result was a leaderboard full of activity that looked productive but wasn't, and a noticeably larger compute bill for Amazon itself. Dave Treadwell, an Amazon senior vice-president, told staff the leaderboard had been built with 'good intentions' but was being deprecated, and asked employees, plainly, not to use AI just for the sake of using AI.

Here's the catch that makes this more than an in-house embarrassment. The economics of corporate AI have shifted. Major model providers โ€” Anthropic, whose systems Amazon uses heavily, among them โ€” have moved away from flat monthly subscriptions toward consumption-based pricing, in which every token processed has a price. Under the old flat-fee world, wasted tokens were essentially free. Under the new one, every gamed leaderboard score is a small but real transfer of money from Amazon to its AI suppliers. Multiply that across a workforce of hundreds of thousands of engineers and the numbers stop being small.

Amazon is not alone. The Financial Times has reported that Meta employees have engaged in similar gaming on their own internal tables, suggesting tokenmaxxing is less an Amazon-specific glitch than a Big Tech pattern. The underlying dynamic is familiar to anyone who has studied incentives: when a measure becomes a target, people optimise for the target, not for the thing the measure was meant to track. A score meant to capture 'how seriously do you take AI?' instead captured 'how willing are you to waste compute?' โ€” and the two answers turned out to be different.

The industry's response is already taking shape. Amazon has begun emphasising a metric it calls 'normalised deployments,' which tries to measure whether AI-assisted code actually shipped and produced value, rather than how much was generated. Expect more of that kind of pivot from rivals as investors start asking harder questions about whether the hundreds of billions flowing into AI data centres are producing real productivity gains โ€” or just very expensive activity charts.

Source: https://www.ft.com/content/b1a62a7f-6df5-4c90-94ce-64ce9c9961b6

๐Ÿ“Ž Download Original โฌ‡ Download Analysis PDF

๐Ÿ“– Explanation

Amazon built a scoreboard to celebrate employees who used its AI tools the most. Workers promptly figured out that the fastest way to win was to waste the company's money.

๐Ÿ“– What's Going On?

Amazon has quietly killed an internal leaderboard called Kirorank that ranked employees by how much they used the company's in-house AI coding tool. The tracker lived inside Amazon's Kiro developer platform, and its scores were supposed to celebrate engineers who were eagerly adopting AI in their workflows.

Instead, workers started gaming the system. Some pointed autonomous AI 'agents' at pointless busywork just to rack up activity, a practice Amazon insiders nicknamed 'tokenmaxxing.' Senior vice-president Dave Treadwell told staff the leaderboard had been built with 'good intentions' but was driving up the company's compute bill, so it was being deprecated. Meta engineers, the Financial Times reported, have been playing the same game on their own internal dashboards.

๐ŸŽฏ How To Think About It

This is a textbook case of Goodhart's Law: the moment you turn a measurement into a target, people stop optimising for the underlying thing you actually wanted and start optimising for the number itself. The leaderboard didn't measure good engineering โ€” it measured tokens consumed.

๐Ÿ’ก Key Things To Know

๐ŸŒŸ Why It Matters

If you're heading into a STEM or business career, you'll spend a lot of time being measured โ€” GPA, KPIs, OKRs, GitHub commits, LinkedIn posts. The Kirorank fiasco is a live demo of why smart people game dumb metrics, and why your future managers will obsess over 'productivity' numbers that may say almost nothing about whether real work is getting done. It also tells you something important about the AI hype cycle: companies are under so much pressure to look AI-native that they sometimes confuse using AI with creating value.

๐Ÿ”ฎ The Bigger Picture

Watch what happens to AI economics next. Now that providers charge by the token and hyperscalers are sinking hundreds of billions into data centres, every 'tokenmaxxing' spree shows up as wasted capex. Expect a swing in the opposite direction โ€” leaderboards replaced by 'normalised deployments' and quality metrics, plus tighter scrutiny of whether AI adoption is actually producing useful code or just expensive theatre. The second-order effect: investors will start demanding evidence that AI spending translates to revenue, not just usage charts.

๐Ÿ“š Key Terms Glossary

Token
The basic unit of text an AI language model reads or writes โ€” roughly a short word or piece of a word. Cloud AI services bill customers per token processed.
Tokenmaxxing
Insider slang for artificially inflating your AI-token usage to look productive on an internal dashboard, even when the underlying work is pointless.
AI agent
An autonomous AI program that can take multi-step actions on a user's behalf โ€” sending emails, running code, calling other tools โ€” rather than just answering one prompt.
Deprecated
Tech-industry term for officially retiring a tool or feature. It still might exist briefly, but it's no longer supported or recommended.
Capital expenditure (capex)
Money a company spends on long-lived physical assets like buildings, servers, or data centres โ€” as opposed to day-to-day operating costs.
Consumption-based pricing
A billing model where you pay per unit used (per token, per API call) rather than a flat monthly fee. It rewards efficiency and punishes waste.
Goodhart's Law
The principle that 'when a measure becomes a target, it ceases to be a good measure' โ€” because people start gaming the metric instead of pursuing what it was meant to track.
Normalised deployment
Amazon's preferred metric: a measure of AI tool use weighted by whether the code actually shipped and produced value, rather than just counting raw activity.

โœ๏ธ Reading Comprehension Quiz

Tip: log in or create a free account to save your score, earn badges, and appear on the leaderboard. Otherwise the quiz works fine without an account.
Question 1
The passage most directly argues that Amazon's Kirorank leaderboard failed because:
Question 2
Which choice best states the central idea of the passage?
Question 3
According to the passage, Amazon's compute costs rose because:
Question 4
As used in the passage, the word 'inflating' most nearly means:
Question 5
As used in the passage, the word 'deprecated' most nearly means:
Question 6
Which statement about consumption-based pricing can most reasonably be inferred from the passage?
Question 7
The passage suggests that the behaviour of Meta's employees:
Question 8
The author's tone in describing Kirorank is best characterised as:
Question 9
Which statement about AI adoption inside large tech companies can most reasonably be inferred from the passage?
Question 10
Which detail from the passage best supports the answer to the previous question?
โ† Previous (older)
BYD's $16 Billion IOU Habit Just Caught Up With It
Next (newer) โ†’
Why Iran's Top Strategists Think Peace Now Means a Bigger War Later