Building a CLAUDE.md for a Legacy Rails App: A Field Report

What we learned modernising a 15-year-old courier system for the age of AI agents - with the exact file structure, conventions, and test results we used.

This is a practical follow-up to Coding For Agents: Why Contextual Readiness is the New Technical Debt, where we made the strategic case for in-repo metadata. Here we show the fieldwork.

The Repo That Taught Us This Lesson

Most articles about CLAUDE.md show you a clean example file and tell you to adapt it to your project. That works fine if your project is a greenfield Next.js app with six folders. It does not work so well when the repo you’re modernising has been in production since Rails 1.0, has survived many major framework upgrades, and contains a /lib/utils directory that everyone on your team is afraid to touch.

One of our long-standing clients operates a courier platform handling bookings, shipping updates, invoicing, and integrations with Shopify and Amazon. The codebase has seen a steady stream of upgrade projects over the years, and when the time came to bring it up to Rails 8, we used the opportunity to make it agent-ready.

What follows is the exact process we used, the structure we landed on, and the before-and-after results that convinced the rest of the team this was worth the investment.

CourierHub (not its real name) has been operational since 2004, with a codebase of 50,000+ lines over 60 models and 10 services.

Step 1: Audit Before You Write

The temptation is to open a blank file and start writing everything you know about the project. Resist. The most useful CLAUDE.md is built around the gaps in your project that an agent is actually likely to hit, not around all the product knowledge a senior developer has accrued.

We spent the first hour of our upgrade project doing nothing but auditing. Specifically, we walked the repo asking five questions:

Where would a new developer get lost? These are the directories, modules, or concepts that don’t self-explain from code alone.
What’s the difference between what exists and what we want? Which folders should be deprecated going forward? Which patterns should we migrate away from?
What naming conventions do we follow - and which do we break? Conventions evolve over time. Agents will follow patterns they can see, but they can’t see rules that aren’t written down.
Which side effects aren’t visible from the code? Background jobs, callbacks, external API calls, cross-module notifications - simple for a developer to follow across files, difficult for an agent to infer.
What are the forbidden zones? Places new code must never go, even if the agent thinks it’s a logical fit. For example, we didn’t want any new code going into /lib/utils.

In our courier system, the audit exposed three dominant pain points: the shipment lifecycle (which crossed multiple bounded contexts in ways no single file made obvious), the partner integration layer (where legacy direct-API-call patterns sat next to newer integration services), and a set of legacy /lib modules for generating printable invoices and shipping labels that we were actively migrating out.

Those three areas became the backbone of our CLAUDE.md. Everything else was secondary.

Step 2: Structure the File Around Mental Models, Not Folders

The single most common mistake I see in CLAUDE.md files is leading with the directory tree. Agents can read folder structures on their own - that’s not where they need help. What they can’t infer is why the code is organised the way it is, which concepts map to which folders, and what unspoken rules there may be.

Here’s the structure we landed on for our primary metadata file, stripped to the essentials. Each section has a specific job:

# CourierHub - Agent Guide

## Mental Model
CourierHub manages the lifecycle of a shipment from booking
through delivery confirmation. The domain splits into three
bounded contexts: Booking, Transit, and Arrival.

## Where Things Live
- Booking logic: /app/services/booking/
- Transit tracking: /app/services/transit/
- Partner integrations: /app/integrations/ (Shopify, Amazon)
- DEPRECATED: /lib/utils/* - migrating to /app/*/

## Conventions
- Service objects end in _service.rb and expose a `call` method
- Formatters end in <type>_formatter and expose a `print` method
- Background jobs live in /app/jobs/ and are named <Action>Job
- All external API calls go through /app/integrations/, never direct

## Don't
- Don't add new code to /lib/utils/
- Don't call partner APIs directly from controllers
- Don't use ActiveRecord callbacks for cross-context side effects

A few notes on why each section exists:

The Mental Model comes first because it gives the agent the conceptual frame to interpret everything else. An agent that knows how CourierHub is contextually organised will place new code in the right context even when the file it’s editing doesn’t make that organisation explicit.
The Where Things Live section maps concepts to paths. The DEPRECATED line is doing a lot of work here - it tells the agent not just where each context lives, but what’s being phased out.
Conventions captures the patterns that live in the team’s head but not in the linter. Without these, agents will invent their own conventions and scatter them across the codebase.
Don’t is the highest-leverage section. These are the mistakes we watched our agents make repeatedly before we wrote them down. Each rule exists because we learnt with our agents.

In other projects we’ve used the Don’t block to avoid dependency conflicts, security gaps (“Don’t use html_safe”) and n+1 queries (quite literally, “Don’t perform N+1 queries” ensures the agent considers .includes or .preload).

Step 3: Test It With a Real Task, Measure the Difference

The only way to know if your CLAUDE.md is working is to use it on a real task and compare the output to what your agent delivered previously. We ran three representative tasks through our agent both with and without the metadata file present. The differences were considerable.

Task 1: Add a new partner integration for a third logistics provider. Without CLAUDE.md, the agent created the integration file in /lib/utils and called the partner API directly from the controller. With CLAUDE.md, it correctly placed the integration in /app/integrations/ and routed the call through the existing integration service.

Task 2: Modify the shipment status update flow to trigger a new customer notification. In the first run, the agent added an ActiveRecord callback to the Shipment model - exactly the pattern our Don’t rules now forbid. With CLAUDE.md, it created a dedicated notifier in the Arrival context and wired it through the existing background job.

Task 3: Refactor the PDF invoice generator. Without instructional metadata, the agent edited the deprecated /lib/utils file in place. With metadata, it recognised the deprecation note and scaffolded a new formatter in /app/formatters/ and followed our naming convention perfectly.

The PDF refactor was a really stark example - before adding our agent guidelines the generated code seemed bound to the existing deprecated context, sticking with an outdated library and struggling to generate clean PDF output. In the second run the agent not only took note of our deprecation instructions, it also made a dependency jump to a newer library, which resulted in a significantly more future proof solution.

The pattern was consistent: without the metadata, the agent made predictable mistakes that a senior developer would have to catch in review. With it, much of the first draft was close to ship-ready, our review became a sanity check rather than a rewrite.

Step 4: Treat It As Living Infrastructure

A CLAUDE.md is not a set-and-forget file. Architecture evolves, conventions shift, and new failure modes emerge every time you use a more capable model. The teams that get the most out of their metadata treat it the same way they treat any other piece of infrastructure: versioned, reviewed, and maintained.

Three practices that have worked for us:

Review CLAUDE.md changes during PRs alongside the code changes that prompted them. If a new pattern is introduced, the metadata should be updated in the same pull request.
Add a “Don’t” rule every time an agent makes the same mistake twice. The first time may be a one-off. The second is a pattern, and patterns belong in the metadata.
Audit with every minor version bump. This tends to mean that roughly every 3–6 months, while working on the minor upgrade, we read the file end to end and ask whether it still reflects the architecture. Deprecated sections should be removed, not just annotated.

The Investment That Keeps Paying

A solid CLAUDE.md file takes an afternoon to implement. In return, every agent session on that repo becomes faster, more accurate, and cheaper in correction time. For a legacy Rails application, where the architecture can carry years of implicit knowledge, the ROI on that afternoon of effort is enormous.

Contextual readiness isn’t a luxury you add after the refactor.

It’s the file you write before you let an agent touch the code.

For the broader strategic case behind this work, see the companion piece: Coding For Agents: Why Contextual Readiness is the New Technical Debt.

See the original post on Matt's website: tiltedsky.net(opens in new tab)