When the Metrics Lie: Klarna’s AI Case Study

01. The Victory Lap

The Numbers Everyone Saw

January 2024. Klarna deployed an OpenAI-powered chatbot across 23 markets, 35+ languages. The results looked perfect.

In the first month alone, the chatbot handled 2.3 million conversations. This was equivalent to the work of 700 full-time outsourced customer service agents. Response time dropped from 11 minutes to under 2 minutes. Customer satisfaction scores matched human agents. Repeat inquiries fell 25%. Revenue per employee jumped 73% year over year.

The financial projections were staggering. Klarna announced a projected $40 million profit improvement for 2024. The board smiled. The market applauded. Siemiatkowski stood at the podium and talked about the future.

Conversations Handled (Month 1)

2.3M

Response Time Improvement

11m → 2m

Drop in Repeat Inquiries

25%

Revenue per Employee Increase

+73%

What happened next revealed the difference between measuring what matters and measuring what's easy.

02. The Invisible Crack

The Numbers That Hid the Story

The chatbot was technically correct. It resolved issues faster. It cost less per interaction. But it was emotionally inert.

The system could not read emotional cues. It could not de-escalate a frustrated customer. It could not acknowledge when someone was angry beyond the words they typed. Satisfaction scores measured resolution, not experience quality. The chatbot won at the visible layer. It obliterated the contextual and invisible layers.

No smooth escalation path existed from AI to human. When the chatbot hit its limits, customers were either dropped or looped back into the queue. The gap between "problem solved" and "problem solved well" was widening, but nobody was tracking it.

Cost had been "a too predominant evaluation factor" resulting in "lower quality." The company went "too far in the wrong direction."
Sebastian Siemiatkowski, CEO

This is the core of the Judgment Architecture problem. You optimize for what you measure. Klarna measured resolution speed and cost per ticket. It achieved excellence in both. What it didn't measure was trust erosion, brand promise abandonment, and customer frustration accumulating invisibly.

The volume trap had sprung. The visible layer was performing beautifully on every metric the dashboard could show. The invisible layer was failing at 100%.

The Numbers That Didn't Tell the Story

700 Outsourced Agents "Replaced"

First-month equivalency. Klarna reduced its outsourced agent need from approximately 3,000 to approximately 2,000. These were not Klarna employees being eliminated. What was eliminated was their brand's ability to care about the nuance in a customer's voice.

$40M Projected Profit Improvement

The metric that made headlines. The actual cost of the reversal was far higher. This was efficiency without judgment.

$152M Net Loss (H1 2025)

Four-fold worse than the $31 million loss in H1 2024.

67% Below Peak Valuation

IPO at approximately $15 billion, down from $45.6 billion peak in June 2021.

03. The Reversal

When the Board Realized

May 2025. Siemiatkowski announced the reversal. Klarna was rehiring customer service staff. The company went from AI-first to hybrid. From cost optimization to judgment restoration.

The rehiring model was telling. Not traditional full-time employment. An Uber-style gig setup. Workers from Klarna's own customer base, flexible schedules, starting at 400 SEK (approximately $41 per hour) in Sweden. The company had learned its lesson about pure commoditization. It needed judgment. But it still wanted to keep the lever of flexibility.

The financial toll was immediate and brutal.

H1 2025 Net Loss

$152M

Compared to $31M loss in H1 2024

Valuation Decline

67%

From $45.6B peak (June 2021) to approximately $15B at IPO (September 2025)

The IPO happened in September 2025 at $40 per share, valuing the company at approximately $15 billion. Shares surged on the first trading day, opening at $52 and closing at $45.82. But the loss from peak was staggering. Klarna was worth roughly a third of what it had been four years prior.

Context matters. Klarna had posted its first full-year profit since 2019 in 2024 ($21 million on $2.8 billion in revenue). The 2025 losses came from a different calculation entirely. The losses came from understanding that you cannot optimize away the human judgment layer without destroying the thing you built.

04. Framework Diagnosis

The Three Layers, Two Gates

The Judgment Architecture explains what went wrong at Klarna through a simple lens: three layers of judgment, two gates that protect them.

The Visible Layer

Routing, FAQs, transactional processing. Klarna's AI excelled here. A customer asks for their order status, the system looks it up, returns the answer. Pure information retrieval. The volume performance was real.

The Contextual Layer

Reading emotional cues, de-escalating frustration, recognizing when a customer is angry even if they are trying not to show it. Offering empathy with the resolution. This is where Klarna eliminated judgment. It was invisible in the metrics, so it was invisible in the strategy.

The Invisible Layer

Brand trust. The feeling of care. The knowledge that someone on the other side values you as a customer, not just as a resolution number. This is what got destroyed. Not overnight. Gradually. Through a thousand small moments of efficient indifference.

The Two Gates

Klarna bypassed both. The Values Gate (the brand promise of simple, accessible credit implied human care and judgment). The Escalation Gate (no graceful AI-to-human handoff when confidence dropped or emotion was detected).

When you remove judgment from judgment-required layers, you don't get efficiency. You get frustration at scale.

05. The Real Lesson

How Klarna Actually Won

This is the Volume Trap in action. And this is how you escape it.

Klarna now runs hybrid. AI for triage and simple requests. Human agents for nuanced cases. Automatic escalation when confidence drops or emotion is detected. The chatbot still handles approximately two-thirds of inquiries. The company kept the AI efficiency and restored the judgment it had removed.

This is not a failure of AI. It is a failure of strategy. Klarna tried to eliminate human judgment entirely instead of strategically placing it. The company confused technical capability with strategic wisdom.

The lesson is not that AI cannot do customer service. It is that customer service judgment requires layers, and layers require gates, and gates require humans to make calls that computers cannot make. Not because computers are stupid. Because those calls require judgment. And judgment is not efficiency. It is different.

Klarna's recovery was not expensive because the AI failed. It was expensive because the company discovered that removing the invisible layer costs you later. Much later. When customers are gone. When brand is damaged. When you have to rebuild what you thought you could eliminate.

The Volume Trap is real. But so is the escape route. You find it when you admit that the numbers that look perfect might be hiding the numbers that matter.

Sources

Klarna AI assistant handles two-thirds of customer service chats in its first month · Klarna Press Release
Klarna's AI assistant does the work of 700 full-time agents · OpenAI
Klarna Is Hiring Customer Service Agents After AI Couldn't Cut It · Entrepreneur
As Klarna flips from AI-first to hiring people again · Fortune
3 reasons why Klarna's valuation has fallen by nearly 70% · Fortune
Klarna changes its AI tune and again recruits humans · CX Dive

When the Metrics Lie: How Klarna's AI Victory Became Its Most Expensive Lesson