Back

The system I couldn't build until AI let me

Reading time

8 minutes

Project Details

The intuition

The idea wasn't original. I want to say that upfront, because the value of this case isn't in inventing something — it's in following an instinct long enough to do something about it.

Around 2023, working on a SaaS platform for OTA vehicle updates, I built a design system from scratch to 315 components (more on that here). It worked. It scaled. But every time I handed off a component to a developer, I felt the same friction: the Figma file said what the component was, not what it did. The state machine — hover, focus, loading, error, success — lived in a separate document, in a Loom recording, in a paragraph of release notes. Three sources, never in sync.

I started sketching what I wanted instead. A design system where every component carried its own behavior, its own motion, its own state transitions. Not a static reference — a living one. The component as a small, self-contained instrument that played itself when you opened the page.

In May 2025, Google announced Material 3 Expressive. Springy animations, motion physics as a first-class citizen, components that behaved instead of just appearing. I read the announcement and felt something I rarely feel reading product news: validation. I wasn't crazy. The intuition was right. Google had infinitely more resources than I did, and they were betting on the same idea.

The difference was that they could just build it. I couldn't — not yet.

The tools that almost worked

Between 2024 and 2025, I tested four directions. Each one got closer. None of them got there.

Lottie was the first stop. Brilliant for exporting animations from After Effects, mature ecosystem, lightweight runtime. But Lottie wasn't built for design systems — it was built for animations. A component isn't a loop; it's a state machine. Lottie can play "loading," but it can't model "loading transitions to error if the request fails." It animates; it doesn't behave. (Lottie has since added state machines, but the workflow still treats motion as the unit, not the component.)

Rive got the concept right. State machines as the primary abstraction, exactly the model a design system needs. But Rive's center of gravity is gaming and brand animation, not product UI. Setting up a button with realistic loading, focus, and error states felt like building a character rig for a 2D platformer. Powerful, but the tool wasn't asking the questions a product designer asks.

Storybook + React + Framer Motion is the textbook answer. It's what serious design systems use. Linear, Vercel, Stripe — they all live somewhere in this stack. But this stack assumes you can write React. Components, props, hooks, motion variants, build configs. I'm a designer. I can read code, sketch with it, modify it. I can't write a Storybook from scratch and maintain it in production. Every time I opened the docs I was learning a new language to describe components I already knew how to design. The barrier wasn't the system — it was the writing.

Claude Code + Cursor, then Claude Code + VS Code, then Claude Code + Figma MCP. This is where things got interesting. Vibe coding through prompts could generate scaffolding faster than I could write it. The Figma MCP roundtrip — designs flowing into prompts, code flowing back — was genuinely magical for short bursts. But I was still living inside an IDE. The interface assumed I was a developer who happened to use AI. It met me as a guest in someone else's house.

Every solution betrayed the same premise: it asked the designer to become a developer first, then design. I needed something that met me where I already was.

Finding the right tool

April 17, 2026. Anthropic released Claude Design.

I'm not going to pretend this is a neutral observation. I've been studying with Anthropic Academy for months — AI Fluency, Teaching AI Fluency, Claude 101 — so I'm primed to notice what they ship. But the release surprised me anyway. Claude Design wasn't an IDE with AI bolted on. It was a design tool built natively around conversation.

Three things made it click for me, and they're worth naming because they tell us something larger about what AI for designers should look like.

The interface didn't intimidate. No terminal. No build configs. No "open the project folder and run pnpm install." It looked like a place where designers work — visual, immediate, forgiving.

Editing was hybrid. I could write a prompt to generate a section, then go in and edit text or values manually without re-prompting. I could use sliders for spacing or color. I could do both in the same session. The tool didn't force me into one mode.

Comments were granular. I could leave a note on a single component or section, the way I'd leave a comment in Figma. Not "rewrite the whole page," but "this badge should sit closer to the title." The unit of feedback matched the unit of design.

The first two days, I kept catching myself thinking this is what Figma would feel like if it had been built around AI from day one. I don't know if that's right. But I do know that for the first time in three years, the tool wasn't asking me to be someone else.

Finding the right tool

April 17, 2026. Anthropic released Claude Design.

Three things made it click for me, and they're worth naming because they tell us something larger about what AI for designers should look like.

The interface didn't intimidate. No terminal. No build configs. No "open the project folder and run pnpm install." It looked like a place where designers work — visual, immediate, forgiving.

The idea wasn't original. I want to say that upfront, because the value of this case isn't in inventing something — it's in following an instinct long enough to do something about it.

The difference was that they could just build it. I couldn't — not yet.

Building Amaca

The decisions I kept for myself — the ones I never delegated — were the ones a tool can't make:

The palette is magenta and obsidian. Magenta because it's the color most underused in serious B2B design systems, and I wanted Amaca to feel like neither Linear nor Stripe. Obsidian because dark surfaces let the magenta accent carry the entire signal load — one color shouting in a quiet room. I rejected three blue palettes Claude Design proposed before settling here. The tool was right that blue was safer; I needed it not to be safe.

The typeface is Satoshi, used at every scale. Single typeface systems are a discipline — you give up some flexibility for unmistakable identity. Light through Black, with monospace as the only companion, reserved strictly for metadata. The decision was three lines in a brief. The execution was three days of token tuning.

Motion runs through the whole system. Every component, every section, every transition follows the same rhythm. Ten years of motion design taught me that an interface with five different easings reads as five different products. Amaca's motion is consistent, curated, and intentional — a single instrument played at different durations.

The decisions I delegated to Claude Design were the ones a designer's time is wasted on:

Token generation across spacing steps and obsidian shades. CSS variable consistency across the whole system. First drafts of component documentation. Refactor passes when I changed naming conventions halfway through. Tabular comparisons between similar variants. Every kind of work that has a right answer and a tedious path to it.

The workflow that emerged was a loop, and it came directly from how Anthropic teaches you to work with their models: give context, converse, refine. Not one-shot prompts.

I describe what I want for one section, with as much constraint as I can pre-load
Claude Design generates a first version — usually 70% there
I edit manually where the prompt over-shot, comment inline where the structure was off
I ask Claude Design to apply the corrections across the rest of the system
I verify consistency, validate against the rest of the system, then move to the next section

Granular, conversational, sectional. Each pass tightens a smaller surface than the last.

The system itself is deliberately minimal under the hood. No framework, no build step, no dependencies to audit. Static HTML, vanilla CSS, design tokens as custom properties, Satoshi from Fontshare, Motion One for orchestrated animations, Lottie for the brand mark. Hosted on Vercel, DNS via Cloudflare. The whole thing lives publicly on GitHub — every release tagged, every change documented. Two releases a week, on average.

What AI did well, what it didn't

I want to be specific here, because most case studies about AI fall into one of two traps: hype or dismissal. Neither is useful.

Claude Design was excellent at:

Token generation under tight constraints — once I defined the magenta scale and the obsidian range, it could produce consistent semantic mappings (success, warning, danger, info) that fit the system's logic. Refactoring repeated patterns — when I renamed a token convention from accent-500 to magenta-500, the change propagated cleanly. Documentation tables — usage matrices, do/don't pairs, breakpoint specs were faster to generate than to write from scratch. Variant generation — five button variants with three sizes and four states gave the AI exactly the kind of combinatorial work it handles well. First drafts of HTML and CSS structure that I could then edit in place.

It was poor at:

Editorial hierarchy. A page with many sections and dense micro-headings needs a sense of which thing is more important than which other thing. The AI gave me a flat hierarchy by default; I had to impose verticality manually. Brand decisions. The first three palettes Claude proposed were defensible and forgettable. Taste lives outside the model's distribution — or rather, the model's distribution is the average, and brand is the deviation. Accessibility edge cases. Contrast ratios at the boundary of WCAG AA, focus states that work for keyboard navigation but not for screen readers — I caught issues the AI didn't surface (and I caught them, in part, because I'd done the studying to know what to look for). Precise undo. Statistical models don't regenerate identically. Iterating on a section twice produces three slightly different versions, none of which is the "previous" one. Experimenting costs tokens and precision.

Refining individual components is harder than building them. Generating a button from scratch is fast. Refining a button that's almost right but not quite is where the workflow strains. Sometimes Claude Design loops on the same change as if it doesn't fully understand the request — three passes, three near-identical outputs, none of them the one I asked for. Sometimes it fixes one thing but breaks another, and on a system with many interconnected components, the regression isn't always immediate to spot. The bigger the system, the more the cost of catching it grows.

The model sees code, not pixels. This one took me a while to notice. Claude Design reads the structure of what it generates — the markup, the tokens, the CSS — but it doesn't see the rendered output the way I do. So sometimes there's a small mismatch between what I'm looking at on screen and what the model thinks it produced. I describe a visible problem, the model reads its own code and tells me everything is correct. Both of us are right, in our own frames. Closing that gap means becoming very specific in the way you describe what you see — almost like writing a bug report to yourself.

The takeaway, which I'd write on a wall if I had one: the tool didn't replace what I knew — it amplified it. Ten years of design and seven AI ce

The trade-offs

In the spirit of "name the dead ends" — three things I'm still navigating.

Cost. Claude Design runs on subscription usage limits. Even on the Max plan, an intense session can burn through allowances faster than expected. This isn't yet a workflow for an 8-hour design day without thinking about the meter.

No reliable undo. Statistical models regenerate, they don't recall. If I generate a card section, then ask for a variant, then decide I preferred the first, I can't reach back to it byte-for-byte. The closest I have is "regenerate something similar to what I had before," which is asymptotic at best. For experimental work, this is a real cost.

The seduction of the first output. Claude Design returns something that looks competent on the first try. There's a real psychological resistance to asking it to redo the work — it feels like rejecting a gift. The discipline of saying "this is fine, but it isn't right" is a learned one. I'm still learning it.

Built to be read by machines

Five things, in the order I learned them.

The prompt is the design brief. Every skill that lets me write a clear brief for a developer translates directly to writing a clear prompt for an AI. The designers who'll struggle most with AI are the ones who can't articulate their own intent.

Iterating in conversation beats writing the perfect prompt. I spent the first day trying to write longer, more detailed prompts. The breakthrough was realizing Claude Design rewards quick prompt → real output → comment inline → corrected output, not long prompt → finished work. Treat it like a conversation with a junior designer, not a search engine. Section by section. Refine, validate, move on.

AI is a constraint amplifier, not a constraint generator. Give it a tight system to work within, and it executes faster than you can. Ask it to invent the system, and it averages.

Knowledge is the multiplier. Without the design experience to know what good looked like, and without the Anthropic Academy training to know how to talk to the model, this project would have taken months instead of days — or it wouldn't have existed at all. AI doesn't lower the floor; it raises the ceiling for people who already know how to climb.

The designer's job didn't shrink. It moved.

What I learned about working with AI

Halfway through building Amaca, I noticed something happening in the design community I hadn't planned for: a new file format was quietly emerging. DESIGN.md — a structured, machine-readable description of a design system, written specifically so AI coding agents could read it and produce faithful output without further context.

Google Stitch shipped one. The community started collecting them. Anthropic's own Claude Design supports exporting them. By April 2026, the pattern wasn't novel anymore — it was becoming infrastructure.

So I added one to Amaca.

The reasoning was straightforward. A design system in 2026 has two readers: humans and machines. Both need to use it, and they need different things. Humans need narrative — context, reasoning, the warm prose that explains why a decision was made. Machines need rules — declarative, numerical, paired with do's and don'ts they can pattern-match against.

Trying to serve both with the same text fails both. So Amaca speaks in two registers.

The live site at amaca.design is editorial: written for a designer reading on a Tuesday afternoon, paragraphed and contextual. The DESIGN.md on GitHub is the same system, expressed differently: token by token, rule by rule, "use this for X, never for Y." One file, written for machines, but readable by humans who want the dense version.

I tested it the only way that matters. I gave the file to Claude Code with a prompt to build a screen the system didn't yet contain — a pricing page — and watched what came back. The first output was 70% there. The misses became new entries in the file's Don'ts section. By the third pass, Claude Code was producing surfaces that genuinely felt like Amaca, with no further context than the DESIGN.md.

That's the test. Not "does the file exist," but "does an AI agent reading it produce work the system would accept?"

The file is published openly on GitHub, alongside the source of the live system itself. It's at v0.1 — the Don'ts list is intentionally short, designed to grow with every off-brand output an AI agent generates from it. That growth is part of the design.

A design system in 2026 has two readers. It has to speak both languages.

What's next

Amaca is at v1.2.6. The plan ahead, in priority order:

More animated, interactive components. The original intuition — components that demonstrate their own state machines, motion, and behavior. The reason I started building this in the first place. With Claude Design's hybrid editing, this is finally tractable for someone who isn't a React developer.

A pattern library for AI products. Chat interfaces, citation pills, model selectors, agentic flow indicators, fallback states. The components a designer needs to design for AI, not just with it.

That last point is the bridge to what's coming next. The next case study on this portfolio is about designing an AI tutor for designers — building Amaca taught me what I needed to know to design that one well.

Building with AI taught me what to look for when designing for AI.

Amaca is a personal, evolving project. This case study reflects the state of the system at v1.1.4 (April 2026). The system itself lives at amaca.design with the full source on GitHub.