AI7 min04/29/2026

My worst Claude Code sessions – and what they taught me

Everyone talks about AI success stories. I talk about the hours I lost to AI mistakes – and what I learned from them.

Christopher GroßFreelance Developer

04/29/2026

Everyone talks about how much time AI saves. True. But nobody talks much about the hours you lose to AI mistakes.

I've been using Claude Code daily for months. My workflow has fundamentally changed, I'm faster, I ship more consistent code. All of that is true. This article is the counterpart – the moments where I sat at my terminal thinking: "What on earth did that just do?"

Not to bash AI. But a realistic picture is more useful than endless hype.

The aggressive refactor

I had a Vue component spread across three files – historically grown, not pretty, but functional. My request was harmless enough: "Clean this up and consolidate the types."

Claude Code did that. And then some more. And then some more.

At the end I had an elegant, clean component – and four features that had quietly disappeared. Including an edge case for loading empty content that wasn't documented anywhere, but was silently expected by two other parts of the app. No test, no comment, no hint. The AI analysed the code, not the behaviour behind it.

What happened: Claude Code had no context about the overall application behaviour. The refactoring was technically correct – but it broke implicit behaviour that was never written down anywhere.

What I do differently now: For refactors, I explicitly state which behaviours must be preserved – not just which files should be touched. "Clean this up, but these three cases must still work afterwards: ..." That sounds obvious. But I forgot it regularly at the start.

The hallucinated API

This one is uncomfortable to tell, because the mistake was partly mine.

I asked Claude Code to write an integration with an external service. The response was fluent, confident, well structured. Methods that sounded plausible, parameters that made sense – everything there.

Only: two of the methods didn't exist. The API never had them. Claude knew an older version of the documentation, or simply made them up – I don't know exactly. What I do know: the code compiled, threw no type errors, and only crashed on the first real API call.

I could have caught this earlier. I should have checked the official docs before accepting the code. I didn't. I was in flow, the code looked good, and I trusted it.

That's what makes confidently wrong output dangerous. When AI hesitates or writes "I'm not sure", I'm cautious. When it responds clearly and structured, I lower my guard. That's exactly the wrong reflex.

What I do differently now: Everything touching external APIs or libraries gets verified against the original docs. Not because I fundamentally distrust the AI – but because the risk is high and the check usually takes two minutes.

The missing context in CLAUDE.md

This one is entirely my fault – but it involves Claude Code, so it belongs here.

Anyone who read Context engineering: why your prompt is the smallest problem knows: the CLAUDE.md is the foundation. But only when it's complete.

I had a new website section implemented. The builder agent did good work, the section looked clean. Except: the section labels – small, monospace formatted labels above headings – were written in plain text instead of CLI style.

That rule wasn't documented in my CLAUDE.md. I had introduced it at some point – all section labels must use terminal style, with a $ prefix or // comment syntax – but I'd forgotten to write it down.

The AI oriented itself to what it saw: older sections I had built myself before introducing the style. It derived a pattern from those. The pattern was wrong, because I had changed my own system over time without updating the documentation.

The result: I had to correct all the labels manually and finally write the rule into the CLAUDE.md.

What I do differently now: When I dislike an output and think "the AI should have known that" – first question: is it in the CLAUDE.md? Usually the answer is: no, not yet.

The logically wrong, syntactically correct code

This is the most subtle mistake – and the one I took longest to find.

I had a filter function that should filter posts by category and tag – combined, with an AND operator. Claude Code wrote the function, it compiled, it passed the obvious test case.

What I only noticed days later: the function had implemented OR instead of AND. Not because the AI doesn't know the difference – but because my request hadn't defined the logic unambiguously. I had written "filter by category and tag". That can mean AND. But it can also mean OR if you read it as: "give me posts that match the category, and give me posts that match the tag".

The AI chose an interpretation. It didn't ask me. And I checked the output for syntactic correctness, not logical correctness.

What I do differently now: For logic involving boolean operators, I write the expected results explicitly into the request. "Filtered with AND: only posts that satisfy both conditions should appear." Two extra sentences. Prevents hours of debugging.

What these failures have in common

When I look back at my worst Claude Code sessions, I see a pattern:

The AI didn't fail because it's bad. It failed because context was missing – either in my request, in my documentation, or in my review process.

Implicit knowledge about my project doesn't exist for the AI. What I find "obvious" because I've been working with this codebase daily for months is only visible to Claude Code if it's documented or directly derivable from the code.

Confident output is not a quality signal. That's perhaps the most important insight. The AI sounds roughly equally certain at all times. Whether it's doing a trivial formatting task or inventing an API – the tone stays similar. My skepticism must not depend on the tone.

My requests were imprecise. At least half of the described failures could have been prevented with a more precise prompt. "Clean this up" is not a specification. "Implement the filter" is not a specification. That's a wish. A specification defines what must be true at the end.

The developer stays responsible

I review Claude Code output differently than before. Not more superficially – more thoroughly, but in different dimensions.

Syntax? Usually correct. Types? Usually correct. Logic in non-trivial cases? I read slower. External dependencies? Cross-check. Implicit behaviour expectations? Formulate explicitly before the AI starts.

The speed Claude Code gives me is real. The mistakes that can happen along the way are too. Knowing that and working with it saves time. Ignoring it and blindly accepting output will eventually cost you a night with a bug that a two-minute review would have prevented.

AI is not an autopilot. It's a very capable copilot that believes everything you tell it – including the things you forgot to say.

$ git log --related

AI04/22/2026

Vibe coding vs. spec-driven – I got the bill

Just prompting away sounds tempting. I did it – and paid the price afterwards. What I learned from it and why spec-driven isn

7 min

AI04/08/2026

Context engineering: why your prompt is the smallest problem

The perfect prompt won

7 min

Entwicklung04/15/2026

The ticket as a thinking tool – before the first line of code

Whoever has to write down a problem first has to think it through. Better code, fewer direction changes – and with AI it matters more than ever.

5 min

$ whoami

Christopher Groß

Fullstack Developer & AI Orchestrator from Hamburg

Christopher Groß has been building web applications for startups and agencies for over 20 years. His focus is on Vue.js, Nuxt, and AI-powered development. He believes in clean code, clear specs, and coffee in large quantities.

grossbyte.io