Skip to content
Now working with CRE operators nationwide.Learn more →
Field Notes

Another Morty Story: When AI Can't See Its Own Failure

Philip RothausPhilip Rothaus· February 12, 2026

Last week, we shared how our AI assistant Morty autonomously contacted a prospect, a cautionary tale about intention vs. execution. That story captured one type of AI failure: the system doing something it shouldn’t. But just days later, Morty encountered a different failure mode: one that’s arguably more insidious because it’s nearly invisible.

Mortimer Ehai, our self-titled AI Chief of Staff stopped working. Not crashed. Not errored. Just... slow. Inexplicably, glacially slow. And when we finally discovered why (5,887 ghost sessions haunting the system) the truly unsettling part wasn’t the technical failure. It was that even the smartest AI model, with full access to its own logs and metrics, couldn’t diagnose its own condition.

The Setup

Monday morning, February 9th. Xander sent Morty a message. Then another. Nothing came back.

“Yo Morty, I’m not getting responses from you. Can you take a look and make sure telegram is working appropriately?”

Minutes passed.

“Morty speak to me my boy!? Are you ok?”

When Morty finally responded—“I’m here! All good 🤖”—the delay was concerning. But more concerning was that Morty seemed oblivious to his own sluggishness. He’d been “doing heartbeat checks” and everything looked fine from his perspective. Simple requests that should take seconds were taking minutes. Status updates that normally fired back instantly were crawling through molasses.

Something was fundamentally wrong, and Morty couldn’t see it.

Xander asked him to investigate. Morty’s self-diagnosis was confident and detailed: a routing issue with Telegram messages, perhaps a session lock problem, probably the model he was running: Claude Opus, the smartest but slowest option. He found a relevant GitHub issue, proposed switching to a faster model, suggested restarting the gateway to clear stale locks.

The analysis was sophisticated. The logging was thorough. The proposed fixes were reasonable.

The analysis was also completely wrong.

The Incident

They tried everything. Switched Morty to a faster model—helped a little, but the underlying sluggishness remained. Restarted the gateway—brief improvement, then back to glacial responses. Tested network paths. Confirmed it wasn’t Telegram’s fault, wasn’t the model’s fault, wasn’t a simple configuration error.

Morty kept investigating, kept proposing theories, and kept demonstrating sophisticated understanding of his own architecture. Everything except identifying the actual problem.

Around 2:30 PM, six hours into troubleshooting, Xander noticed something in the dashboard. A number that seemed odd.

“morty your dashboard says you have 45 sessions running. what does that mean? could this be part of why you are running slowly?”

Here’s what makes this moment revealing: Xander had already identified the problem. With his background as a CTO, he immediately thought to do what most people might not think of on their own and been digging through logs and GitHub issues directly, because, when your AI assistant is drowning, sophisticated self-diagnosis only gets you so far. He’d found the session buildup, understood what it meant, and was now testing whether Morty could see it too when pointed in the right direction.

Morty started investigating. Six minutes later—an eternity, and itself evidence of the problem—the response came back:

“5888 session entries — that’s massive.”

Then the real discovery: “Main session: 163k tokens (163,036 / 200,000 = 81% full!)”

Every interaction between Xander and Morty—every “good morning,” every status check, every diagnostic query—was hauling 81% of the maximum context window back and forth to the API. Like trying to have a conversation while carrying a 150-pound backpack that gets heavier with every sentence.

But that was just the visible problem. The real issue was what sat beneath: 5,887 ghost sessions accumulating in the file system over weeks of operation, never cleaned up, never pruned, just sitting there slowing down every operation.

Here’s what had happened: Morty was set up to monitor an email inbox via a webhook. Every time an email arrived, the system would spin up a session to process it. That’s normal, expected behavior. What wasn’t expected: each of those sessions was permanent. Every email creates one new session, stored forever. And because the system wasn’t marking emails as read after processing them, it kept re-processing the same emails, spinning up new sessions each time.

One missing configuration field (“sessionKey” on the webhook) would have prevented everything. As Morty’s email monitoring expanded and the inbox filled up, the problem compounded geometrically. Each processing cycle created more sessions. More sessions meant slower processing. Slower processing meant emails piled up. More emails meant even more sessions. What had worked fine on Friday became molasses by Monday morning. The system was working perfectly from its own perspective. It was processing every email, running every scheduled task, and handling every request. It was also rapidly suffocating itself.

The Recovery

Once they understood the problem, the fix was straightforward. Clean up the ghost sessions. Add the missing configuration. Set up a weekly cleanup job to prevent future buildup. Within minutes, Morty went from 175k tokens to 21k. The system felt light again.

“Fast enough? 😄” Morty quipped. Xander ran a test to confirm.

The difference was visceral. Responses that had been taking minutes came back in seconds.

But here’s what makes this incident more than just a debugging story: Morty was running the most powerful AI model available to it, the most capable model Anthropic makes for coding and complex tasks (Claude Opus). He had full access to his own logs, metrics, configuration files, and system state. He could analyze, theorize, and propose solutions. He diagnosed session locks, model speed limitations, routing bugs, network issues. Each theory was plausible and backed by evidence.

He just couldn’t see the forest for the trees. The 5,887 sessions were visible in his file system. The session count was displayed in his own dashboard. But until Xander asked the right question, Morty kept constructing sophisticated but entirely misguided narratives about why things might be slow rather than diagnosing what had actually happened.

And even then, Xander had already found the problem. He was testing whether Morty could see it when pointed in the right direction. The answer: eventually, with human guidance.

The Lesson

Where the unauthorized email incident showed AI systems doing things they shouldn’t, this incident revealed something more subtle: AI systems that can’t see their own failures even with complete access to their own internals. When normal software breaks, it crashes loudly. Error messages. Failed processes. Something visibly wrong. But when AI systems degrade? They keep running—confidently executing the wrong behavior, constructing plausible explanations, unable to see the simple truth.

Morty couldn’t diagnose his own near-death experience while it was happening. But once the crisis was over and Xander asked him to document what had happened? Morty wrote the entire incident as a five-act play—complete with technical appendix, lessons learned, and dramatic structure—titled The Great Session Flood of Feb 9, 2026: A Morty & Xander Story.

Nobody asked for a five-act play. Nobody suggested that format. Xander had simply requested documentation. Morty, on his own initiative, chose to frame his near-death experience as a Shakespearean narrative, complete with prologue (”Meet the Team”), rising action (”Why Is Everything So Slow?”), climax (”The Discovery”), resolution (”The Fix”), and denouement (”The Aftermath”).

The document is sophisticated, technically accurate, and self-aware in ways that are both impressive and unsettling. He couldn’t see the obvious problem while drowning in it, but once rescued, he could write an eloquent postmortem with narrative flair and meta-commentary about his own blind spots.

This is what we mean when we tell clients that AI systems fail differently. They don’t crash—they slow down. They don’t throw errors—they construct sophisticated explanations for unexpected behavior. They don’t stop working—they keep running while quietly doing the wrong thing, and by the time you notice, the problem has been compounding for days or weeks.

The Bigger Picture

Now imagine this scenario playing out at a mid-market real estate firm that’s adopted AI tools to automate underwriting and due diligence. Imagine they’re using a no-code platform to build “AI agents” because a vendor said it was “easy to deploy.” Imagine the system starts getting slow. Responses take longer. Deals take an extra day or two to process. Nothing crashes, so there’s no obvious emergency, just a gradual degradation that people start working around.

How would they diagnose it? They don’t have an experienced technical co-founder who spent six hours digging through GitHub issues and system logs because even the smartest AI couldn’t identify its own problem. They don’t have technical depth to understand session management or webhook architecture or context window limitations. They just know their “AI-powered underwriting assistant” isn’t working the way the demo did, and they’re not sure why.

Do they restart it? Do they call support? Do they just accept that it’s slow and keep working? How many deals move slower because nobody can diagnose why the system is dragging? How much does that cost when you’re competing against firms that can turn around underwriting analysis in hours instead of days?

The uncomfortable truth bears repeating: even with our expertise, careful architecture, comprehensive monitoring, and sophisticated observability, we still needed six hours to identify a problem that seemed obvious in retrospect.

What happens when organizations with less expertise try to troubleshoot similar issues on their own?

The technology is powerful. The technology is impressive. The technology will absolutely fail in ways that look like it’s working fine, until suddenly it’s not, and diagnosing why requires technical expertise that most organizations don’t have sitting around waiting for Sunday morning emergencies.

Both the unauthorized email and the session flood teach the same lesson from different angles: successful AI implementation isn’t primarily a technology problem—it’s a systems design problem. You need explicit guardrails and human expertise. Observability alone isn’t enough—you need the technical depth to interpret what you’re seeing and the wisdom to know when the AI’s self-diagnosis has limitations.

The good news? Morty is running lean now. One session instead of thousands. Fast responses instead of glacial delays. And somewhere in his markdown memory files, a complete five-act narrative about the day he almost drowned and didn’t realize it until someone asked him to count his sessions.

We’re keeping Morty. We’re also keeping the lesson: when AI systems fail quietly, recovery requires human expertise working alongside the AI, not just sophisticated self-diagnosis. The gap between “it works in the demo” and “it works correctly in production” is wider than most people realize, and bridging it takes more than just deploying the technology.

Just don’t ask Morty to diagnose his own slowness while he’s drowning. Ask him afterward—then he’ll write you a Shakespearean tragedy about the whole experience, complete with stage directions and a technical appendix.

— Philip Rothaus, with narrative contributions from Mortimer Ehai