Detecting "Drunk" Agents: Monitoring Reasoning Drift

Traditional software failure is binary. It works, or it crashes (Stack Trace, 500 Error, Segfault). It fails loudly, quickly, and usually cheaply.

AI agents don't crash. They get "drunk."

A "drunk" agent is one that has lost the plot. It starts hallucinating, repeating itself, or obsessing over irrelevant details in the prompt. Crucially, it keeps running. It keeps calling the LLM API. It keeps returning "200 OK" statuses to your infrastructure. The user sees a spinning wheel or a bizarre answer, but your logs show "System Healthy." Meanwhile, the agent is burning through your budget at maximum speed.

A drunk agent is far more dangerous than a dead server. You need new observability tools to detect them.

Metric 1: Token Velocity Monitoring

The first sign of intoxication is hyperactivity. Drunk agents tend to loop. Whether it's a "Tool Error Loop" (trying to search Google 50 times in a row) or a "Critique Loop" (refining a paragraph forever), the result is a massive spike in token consumption.

You must implement Token Velocity monitoring per session.

Normal Behavior: Bursts of activity (Agent thinks/acts) followed by long pauses (User reads/types).
Drunk Behavior: Sustained, high-velocity token consumption without user interaction. The curve looks like a vertical wall.

The Circuit Breaker Rule:

If tokens_per_minute > 5,000 for session_X AND user_input_count == 0:

BLOCK further requests immediately and alert Ops.

This simple rule catches 90% of infinite loops before they cost more than a few dollars.

Metric 2: Semantic Drift (The "Lost" Agent)

Sometimes, an agent isn't looping; it's just wandering. It started with the goal "Book a flight to Paris," but after 10 turns of tool calling, it is currently "Browsing Wikipedia for facts about the Eiffel Tower's construction history." It has drifted.

To detect this, you need Semantic Analysis.

Embed the Goal: Take the user's initial prompt ("Book flight to Paris") and create a vector embedding ($V_{goal}$).
Embed the Thoughts: At every step, embed the agent's internal "Chain of Thought" or "Reasoning" trace ($V_{thought}$).
Calculate Similarity: Measure the Cosine Similarity between $V_{goal}$ and $V_{thought}$.

If the similarity drops below a threshold (e.g., 0.4), the agent has likely forgotten the original mission. It is "drunk."

Remediation: You can program the system to intervene. "System Intervention: It seems you are drifting from the user's goal of 'Book flight'. Please refocus or ask the user for clarification."

Metric 3: Repetition Penalty (The "Broken Record")

Drunk agents often repeat the same tool call with the exact same parameters, hoping for a different result (the definition of insanity).

Implement a Sliding Window Hash check. Keep a hash of the last 3 tool calls. If Hash(Call_N) == Hash(Call_N-1) == Hash(Call_N-2), your agent is stuck. Kill it.

Summary: The "Kill Switch" Architecture

You wouldn’t let a drunk employee manage your bank account. Don’t let a drifting agent manage your API keys. Your infrastructure must include a "Kill Switch" layer that sits between the Agent and the LLM Provider. This layer is not an AI; it is a rigid, deterministic guardrail that says "No" when the behavior patterns match intoxication.

Monitor, detect, and kill. This is the only way to sleep soundly while your agents work the night shift.

See, Understand, Optimize -
All in One Place

Atler Pilot decodes your cloud spend story by bringing monitoring, automation, and intelligent insights together for faster and better cloud operations.