In late 2025, Amazon shut down an internal leaderboard called Kirorank that had been quietly tracking how much each of its employees used the company's in-house AI coding platform, Kiro. The dashboard was supposed to celebrate enthusiastic adopters of artificial intelligence. Instead, it became an expensive cautionary tale.
Amazon, a roughly $2.9 trillion company, has pushed its engineers hard toward AI. Internal targets require more than 80% of developers to use AI tools each week, and the company is expected to spend around $200bn on capital projects this year, the vast majority of it on AI infrastructure and data centres. In that environment, Kirorank looked like a natural extension of the strategy: rank engineers by their AI activity, post the scoreboard, and watch adoption climb.
It did climb โ just not in a useful way. According to people familiar with the matter, workers began pointing autonomous AI 'agents' (bots that can take multi-step actions on a user's behalf) at unnecessary tasks, running repeated, low-value calls simply to inflate their consumption of 'tokens,' the units of text that AI models process and that cloud providers bill for. Inside Amazon, the practice picked up a name: tokenmaxxing. The result was a leaderboard full of activity that looked productive but wasn't, and a noticeably larger compute bill for Amazon itself. Dave Treadwell, an Amazon senior vice-president, told staff the leaderboard had been built with 'good intentions' but was being deprecated, and asked employees, plainly, not to use AI just for the sake of using AI.
Here's the catch that makes this more than an in-house embarrassment. The economics of corporate AI have shifted. Major model providers โ Anthropic, whose systems Amazon uses heavily, among them โ have moved away from flat monthly subscriptions toward consumption-based pricing, in which every token processed has a price. Under the old flat-fee world, wasted tokens were essentially free. Under the new one, every gamed leaderboard score is a small but real transfer of money from Amazon to its AI suppliers. Multiply that across a workforce of hundreds of thousands of engineers and the numbers stop being small.
Amazon is not alone. The Financial Times has reported that Meta employees have engaged in similar gaming on their own internal tables, suggesting tokenmaxxing is less an Amazon-specific glitch than a Big Tech pattern. The underlying dynamic is familiar to anyone who has studied incentives: when a measure becomes a target, people optimise for the target, not for the thing the measure was meant to track. A score meant to capture 'how seriously do you take AI?' instead captured 'how willing are you to waste compute?' โ and the two answers turned out to be different.
The industry's response is already taking shape. Amazon has begun emphasising a metric it calls 'normalised deployments,' which tries to measure whether AI-assisted code actually shipped and produced value, rather than how much was generated. Expect more of that kind of pivot from rivals as investors start asking harder questions about whether the hundreds of billions flowing into AI data centres are producing real productivity gains โ or just very expensive activity charts.
Source: https://www.ft.com/content/b1a62a7f-6df5-4c90-94ce-64ce9c9961b6
Amazon built a scoreboard to celebrate employees who used its AI tools the most. Workers promptly figured out that the fastest way to win was to waste the company's money.
Amazon has quietly killed an internal leaderboard called Kirorank that ranked employees by how much they used the company's in-house AI coding tool. The tracker lived inside Amazon's Kiro developer platform, and its scores were supposed to celebrate engineers who were eagerly adopting AI in their workflows.
Instead, workers started gaming the system. Some pointed autonomous AI 'agents' at pointless busywork just to rack up activity, a practice Amazon insiders nicknamed 'tokenmaxxing.' Senior vice-president Dave Treadwell told staff the leaderboard had been built with 'good intentions' but was driving up the company's compute bill, so it was being deprecated. Meta engineers, the Financial Times reported, have been playing the same game on their own internal dashboards.
This is a textbook case of Goodhart's Law: the moment you turn a measurement into a target, people stop optimising for the underlying thing you actually wanted and start optimising for the number itself. The leaderboard didn't measure good engineering โ it measured tokens consumed.
If you're heading into a STEM or business career, you'll spend a lot of time being measured โ GPA, KPIs, OKRs, GitHub commits, LinkedIn posts. The Kirorank fiasco is a live demo of why smart people game dumb metrics, and why your future managers will obsess over 'productivity' numbers that may say almost nothing about whether real work is getting done. It also tells you something important about the AI hype cycle: companies are under so much pressure to look AI-native that they sometimes confuse using AI with creating value.
Watch what happens to AI economics next. Now that providers charge by the token and hyperscalers are sinking hundreds of billions into data centres, every 'tokenmaxxing' spree shows up as wasted capex. Expect a swing in the opposite direction โ leaderboards replaced by 'normalised deployments' and quality metrics, plus tighter scrutiny of whether AI adoption is actually producing useful code or just expensive theatre. The second-order effect: investors will start demanding evidence that AI spending translates to revenue, not just usage charts.