Building Trafford
A Desktop Agent OS for Knowledge Work
The Setup
Background & hypothesis
8 years in product — Amazon, startups, co-founded an AI startup on LangGraph
Claude Code as my operating system for 6 months — glimpses of tool → teammate
7 weeks building, dogfooding, and testing
The hypothesis
The models are already capable enough. But building production agent systems carries a massive infrastructure tax — structured memory, tool orchestration, multi-model routing, agent coordination — that only well-resourced engineering teams can actually afford to pay. Non-technical knowledge workers and small teams are locked out entirely. The missing layer isn't better AI — it's the management infrastructure that makes agents actually useful for the people who need them most.
Five Questions
Before I started building, I wanted to really clarify my goals. I was not aiming to build a set of features — I wanted to build and use a multi-agent system for myself, to learn more about a set of 5 questions.
1. How do you create agents that are actually good?
The result is a binary state — well-resourced engineering teams can build world-class agents, and non-technical users can build a bot that summarizes meeting notes into Notion.
2. How do you make agents remember like a team of infinite human specialists?
Most implementations recall conversations but lose what they learned from them. The hard part isn't forgetting — it's remembering the wrong things.
3. How do you mimic the magic of cross-functional collaboration with agents?
Different models seem to have genuinely different reasoning strengths — which makes me think there's something worth exploring in how they might complement each other.
4. How can AI make context switching seamless instead of painful?
Context switching isn't just friction — it's a tax that compounds through the day.
5. How do you build the right workspace for different types of knowledge work?
For most knowledge work that is not code, the workspace should not only be a terminal or a chat window.
How I Went About It
I'm not a developer by training — I'm a PM who had been building production systems with Claude Code and Codex for the better part of a year.
Before writing any code or requirements documentation, I first spent a day setting up my product development system. It was a combination of five well-tested Claude Code skills, instructions on how a feature should flow from an idea to deployed code — including ticket types and structure on Linear, testing infrastructure, documentation templates, coding agent roles (Claude Code, Codex), and a requirements gathering protocol so that Claude Code could ask me a holistic set of questions around what needed to be built and come up with an implementation plan accordingly.
From there I worked in two-day sprints against a living strawman spec, with research spikes wherever uncertainty was high, revisiting the strawman at the end of each sprint based on what I'd learned, and moving towards target milestones with lots of intermediate testing.
What Trafford Is
Work with your agents across surfaces. Always in sync.
Specialist agents that get better the more you work together.
Each agent has
Iris creates agents from your conversation + platform capabilities. Agents share context automatically. You review. You approve. Not a black box.
What powers the team.
Backend Infrastructure
Three Deep Dives
Iris — Agent Architect
Trafford's agent architect. Describe what you need in conversation — system prompt, skills, tool access, personality, anti-patterns all come from that conversation, not a form.
Memory Pipeline
5-stage pipeline adapted from Mastra (94.87% LongMemEval). Background processing on Qwen 3.5. Decisions supersede, observations propagate across agents.
Multi-Agent Reasoning
Three modes of collaboration — 1:1, group threads, and council sessions. Testing where multi-model diversity genuinely adds value vs. where a single strong model is enough.
Architecture Explorer
Those three systems sit inside a larger architecture. Here's the full picture — click into any node to see the implementation detail.
How I Built It + What I Learned
Seven weeks, two interleaved pathways, and Claude Code with five custom skills.
The Build Process
The Toolkit
5 custom skills
Each skill has protocols for how it takes an idea to implementation-ready work. A lot of critical thinking happens before anyone writes code.
What I Learned
Vibe coding is a recipe for disaster.
The most instructive failure pattern: skipping the thinking pathway. Whenever I skipped architecture and planning, whenever acceptance criteria were loose and I hadn't thought through implications — that's where things broke. AI amplifies whatever you feed it. Feed it rigor, you get rigorous output. Feed it vibes, you get expensive garbage.
Building is cheap. Knowing what to build is expensive.
With AI agents, the cost of building a feature drops dramatically. What matters is vision discipline — knowing WHAT to build, having tested your assumptions, maintaining a spec that reflects real needs. Building the wrong thing fast is worse than building the right thing slowly. That inverts the traditional PM instinct to cut scope.
Infrastructure compounds.
Skills, wiki, ticket breakdown process, testing protocols — it all felt like overhead at week 1. Paid for itself by week 4. Essential by week 6. I spent about a day setting up skills and skill protocols. That day paid for itself many times over. Three days would have compounded even more.
Honest assessment
Trafford is not ready for external users yet. That's not the current priority. I'd rather be honest about where it is. If I rebuilt from scratch with everything I know now, the architecture would be tighter. But the point wasn't perfect code — it's validated decisions and a deep understanding of what matters.
A Glimpse Into How Trafford Works + Where It's Going
The ask
"Prepare outreach briefs for 15 people I want to reconnect with."
11 of 15 briefs included existing touchpoints. Four were flagged as former teammates I hadn't realized were in my network.
The ask
"Write a customer proposal for [prospect] based on our discovery call notes."
Final draft used the prospect's own terminology and referenced their recent company moves. One round of my edits, then ready to send.
The ask
"I found this open-source n8n workflow for competitive analysis. Can we integrate it?"
New competitive analysis skill running in under an hour. Existing skills tested and confirmed working.
What's occupying my mind
Voice AI — Newer interaction experiences beyond text — speech-to-text workflows, but also low-latency voice agents with strong contextual grounding. Eventually, something that plugs into Trafford natively.
Local models — Open-weight models running locally for speed, privacy, and cost.
Managed agents as MCP servers — Agents that expose their capabilities as tools for other agents — like SaaS tools, but for AI. Capabilities can be dynamic and might even improve as more people use the same agent.
SDK independence — Anthropic Agent SDK is the foundation now. Long-term: the architecture should be portable across SDK changes.
Cost as a design constraint — Cost-per-experiment determines how many hypotheses you can test. Model routing by complexity — not everything needs frontier models.
Autonomous lab — Trafford as a self-improving system — autonomously benchmarking new models, testing new skills, evaluating new agent configurations. A laboratory that runs experiments while I sleep.
Building Trafford has been a journey of discovery in more ways than one. I've discovered a lot about agent building, about AI-native product development, and quite frankly a lot about what constitutes work itself — something a lot of us might have taken for granted all these decades.