Kaizen vs Red Team
I’d been running kaizen loops on the plan for days. Multiple personas, fresh sessions, the whole workflow I wrote about before. Each pass made the plan sharper. Better incentive curves, cleaner accelerators, tighter language. And then I ran a red team against it and an agent playing a behavioral comp analyst took it apart in about two prompts. There was a glaring fatal flaw I’d been polishing right past.
I rebuilt our sales comp plan from scratch. Twice (heck, maybe more). Both times after I thought I was nearly done.
I’d been running kaizen loops on the plan for days. Multiple personas, fresh sessions, the whole workflow I wrote about before. Each pass made the plan sharper. Better incentive curves, cleaner accelerators, tighter language. And then I ran a red team against it and an agent playing a behavioral comp analyst took it apart in about two prompts. There was a glaring fatal flaw I’d been polishing right past.
So I rebuilt the plan and ran more kaizen on V2. It was tighter, and I felt good about it. Then I ran another red team, this time with an agent playing a recruiter, and got taken apart again from a completely different angle. I’d sharpened the wrong plan a second time. Ugh.
That’s when it clicked that I’d been running the wrong loop. Kaizen makes things better. Red team decides if they should exist. Different jobs, different prompts, and most of the time I’d been collapsing them into one.
Kaizen is the friendly loop. You’ve decided to do the thing. Now you want it sharper. So you build a room of collaborators (Oliver reading the brief, Liz reading the timeline, the customer reading the copy) and you ask them where it wobbles. Will this make sense to you? What would you push back on? What’s missing? The agents are on your side. They want the thing to work. You leave a kaizen loop with a sharper version of what you started with.
Red team is not friendly. Before you commit, you want someone trying to kill the idea, not improve it. What’s the strongest case against this strategy? What am I assuming that, if wrong, takes the whole thing down? Who would I least want to read this draft, and what would they say? The agents are not on your side. You want them looking for blood. You leave a red team loop with either conviction you’ve earned or a decision reversed.
The prompts are not interchangeable, and that’s where most people get into trouble. A kaizen prompt names a collaborator: act as Liz reading this brief. A red team prompt names an opponent: act as a behavioral comp analyst who has seen ten of these plans fail because the accelerators incentivize the wrong behavior. Attack this plan. Run a red team prompt with a collaborator persona and you get encouragement dressed up as feedback. Run a kaizen prompt with an adversary persona and you get sabotage. Both feel like signal. Neither is.
The constraints differ too. For kaizen, I ask for the smallest change, the lowest cost, the fastest test. That forces the loop to stay incremental. For red team, I ask the agent to find the single load-bearing claim and attack only that. Otherwise you get a tidy list of ten objections that nobody’s going to act on.
One move that’s pure red team: steelman first. Before the agents attack, I have them rebuild the strongest version of my position, sharper than what I wrote. Then they attack that. If the steelman falls, my actual draft was never going to survive. Kaizen doesn’t need this. You’re not trying to kill the thing.
So the sequence is red team first, kaizen second. If the work survives red teaming, run kaizen to sharpen it. If it doesn’t, scrap it and start over. Which is what I should have done with V1 of that comp plan instead of polishing it over and over and over again.
AI: 40% | Human: 60% — Jesse wanted to turn his experience rewriting a comp plan several times into a blog post; AI kept losing his voice
The JD is the Antifilter
The instinct is to add filters downstream. Better screening tools. AI knockout questions. Resume scorers. Keyword gates. That's a losing arms race. You can't out-filter a model that writes 500 cover letters an hour and learns whatever screener you put in front of it. Every dollar spent on the back-end filter is a dollar spent on the wrong side of the problem. The fix is upstream. The JD is the antifilter.
We posted a full stack engineering role and got 1,500 applications in under 24 hours.
Five years ago that number was a hiring success story. Today it's a diagnosis. AI broke the application funnel. Candidates fire hundreds of tailored applications a week, generated and customized by the same models you might be using to screen them. Your ATS isn't a filter anymore. It's a firehose.
The instinct is to add filters downstream. Better screening tools. AI knockout questions. Resume scorers. Keyword gates. That's a losing arms race. You can't out-filter a model that writes 500 cover letters an hour and learns whatever screener you put in front of it. Every dollar spent on the back-end filter is a dollar spent on the wrong side of the problem.
The fix is upstream. The JD is the antifilter.
Volume is no longer the metric.
The metric is density of right-fit per applicant, and the best JDs are designed to lose applications on purpose. If your inbox fills up with wrong-fit resumes, the JD is doing its job badly even when applications are up. Every line is signal that attracts or repels, and the work is to scare off the wrong people upstream.
A junior reads the JD literally, checking qualifications. The senior IC you actually want reads it as a document and infers culture, taste, and rigor from how it's written. They're deciding whether you sound like a place worth their next four years before they ever get to the requirements. A sloppy JD says "sloppy team" before the candidate ever clicks apply.
The JD writes the team before it writes the candidate.
Writing tight forces a hiring manager to commit to must-have versus nice-to-have. If you can't write it, you don't know what you're hiring for. A candidate should finish reading and know three things: what you do, what kind of company you are, and whether they'd thrive there. If they can't tell, the JD isn't done.
The antifilter does two jobs at once. It repels the wrong applications, and it makes the hiring manager commit to what good looks like before a single resume comes in. Most JDs fail at both because the writer never had to choose.
Five checks.
The hiring manager runs five checks and if any one fails, the JD doesn't ship. Then a peer in a different function reads it cold and signs off. The peer read is what catches the things a checklist can't see.
Stranger test. Hand paragraph one to someone outside the company and ask what you do. If they can't tell you in a sentence, the lede is broken and nothing downstream matters.
Specificity. Pull three requirement bullets at random. Each one has to name a tool, a technique, or something measurable. "Comfortable with AI tools" fails. "Has shipped production code with Claude Code and can debug an agent that's confidently wrong" passes. A generic bullet on a random pull means the writer didn't think hard enough about the role.
Must-have versus nice-to-have. Every requirement carries one of those two labels. If the hiring manager won't commit, they don't know what they're hiring for and the JD isn't ready.
Three repels named. The hiring manager writes down, somewhere other than the JD itself, three specific candidate profiles it's built to lose. Not "junior generalists." Closer to "career consultants who haven't owned a P&L." Three or the funnel isn't pointed.
A value doing filter work. At least one of your values has to show up in the body of the JD in a way that costs you applications. Listing values is decoration. Showing what a value disqualifies is filtering.
After all five, the peer read. What they're answering is whether a senior IC could finish reading and picture spending four years here. If not, figure out why before you post.
Then require a cover letter.
Once the JD is right, make the apply itself a filter. Require a cover letter. Not as a formality but as a test. Will they include one? Will it be more than two paragraphs of boilerplate? Will it demonstrate they read the JD, understand the company, have a point of view about the role, are genuinely interested?
AI didn't kill the cover letter. It made it more useful. Generic AI cover letters are easy to spot. Same structure, same hedges, same "I am excited about the opportunity" opener. The effort to produce a real one is higher than ever, which means the signal is higher than ever. A candidate who writes a real letter has already done the thing the AI flood made expensive. They've thought about you.
I've written separately about what makes a great cover letter. The point here is simpler. The JD is the first filter. The cover letter requirement is the second. Stack them and most of the noise never reaches your inbox.
The math has flipped.
If your application volume goes up year over year as AI gets better, your hiring is getting worse. The number that matters now is the ratio of right-fit to total. Most teams are still optimizing the numerator. The leverage is in the denominator.
The antifilter is the JD. Build it before you post.
AI: 30% | Human: 70% — Jesse provided the data point and the framing from an ongoing conversation about JD writing. Claude structured the post and drafted iterations. Jesse edited every pass.
Map the Physics, Not the Sociology
AI agents don't organize by department. They organize by process. An agent doesn't care whether a task belongs to "marketing" or "ops" or "finance." It cares about the trigger, the data, the output, and the next state. Agents don't tolerate ambiguity any better than humans do. They just fail visibly when they encounter it. That's the thing that finally makes the cost of unclear process visible.
Driving back from the lake last week I listened to Lex Fridman interview Jensen Huang. Two things stuck with me.
First, Jensen pointed out that every company's org chart looks the same. Hamburger companies and software companies and car companies all have the same boxes with the same layers and the same reporting lines.
Second, he described how NVIDIA runs. He calls it extreme co-design: no one-on-ones, no siloed ownership. Every expert, whether memory, CPU, GPU, optics, or cooling, is in the room on every problem. The problem decides who's needed, not the org chart.
That reframed something I've been chewing on.
Agents are the forcing function.
AI agents don't organize by department, they organize by process. An agent doesn't care whether a task belongs to "marketing" or "ops" or "finance." It cares about the trigger, the data, the output, and the next state. And when it hits ambiguity, it just fails visibly. That's what finally makes the cost of unclear process real.
Map the flow first. Build the system around the flow. Then put humans where judgment is irreplaceable and let agents handle everything mechanical. The org falls out of the work, rather than the work getting crammed into the org.
The alternative is the cul-de-sac most companies are stuck in. They ask where AI can help with what they already do, bolt agents onto workflows designed around 2010 staffing assumptions, and get a slightly faster version of the wrong thing. The org chart is a fossil. Pouring agents into it preserves the fossil.
Every company is a flow. We've been treating them like relay races.
Walk through how value actually gets created in your business. You'll find separate threads with separate owners and separate rulebooks, mostly running in parallel, meeting at the finish line because that's where the calendar said to meet.
Ask a twenty-year veteran to describe the process and they'll describe the people: the roles, the handoffs, the way it's always been done. That's not the work. That's the sociology that grew up around it.
The work itself has physics. Information gets created somewhere and consumed somewhere else. State changes at specific gates. Some dependencies are real. Most are inherited from habit. The sequence is often arbitrary because that's the order the humans got to it.
Look at the physics and you'll find your future org.
Doing the map.
Mapping isn't a metaphor. It's a literal exercise. You're looking for the stages, the gates, the decision points. Where information gets created and where it's consumed. Where the work branches and where it converges. Which dependencies are real and which are inherited from habit.
The output is a flow diagram that fits on a wall, not a process doc. Run the exercise honestly and the answer rarely matches the org chart.
Who maps it.
If you're building from scratch, you have the rare gift of designing the org around the work from day one. Don't waste it by hiring the org chart your industry expects.
If you're inside an existing company, you're looking for someone who can hold the whole flow in their head and see the shape of it. They've usually done the work for years, but the years are the proxy. The thing you actually want is the wiring. Some twenty-year veterans have it, some five-year operators do too. These are the people who become more valuable in the new world, not less, and the move isn't to extract their knowledge and move on. You build the company around the way they think.
Either way, the instruction is the same. Forget the people, the roles, and how it's always been done. Map the work, not the desks.
A concrete example.
At Lineage we run real estate transactions. The conventional view in our industry is three parallel processes, each with its own owner: buying, lending, insurance. Streams cross at predictable handoffs and converge at the closing table. That's the sociology, and it's how every competitor in the category operates.
The physics view is different. There's one deal moving through stages, and most of what looks like the structure of the work is actually the structure of how the industry staffed itself decades ago. The system you need to build when you see that looks nothing like the org chart you'd inherit.
The honest part.
In larger organizations, this transition will cost some roles. The mechanical layer compresses, and the work that compresses is done by real people. Pretending otherwise is dishonest, and the leaders who skip past it lose credibility with everyone who has to live the change. The new work that emerges is real and valuable, but it doesn't always fall to the same people. Operators owe both halves of that truth to their teams.
This isn't a tooling exercise. It's an architecture exercise. The companies that win are the ones that designed themselves around the work, not the ones that bolted AI onto a structure they inherited.
Map the physics. That's where the future is.
AI: 25% | Human: 75% — Jesse drove the framing from a Jensen Huang interview and an internal note to a teammate. Claude abstracted the email into thematic content, structured the first draft, and drafted iterations based on feedback. Jesse edited every pass.
How We Work with AI at Lineage
We didn't hire you to absorb the system. We hired you because your experience and instincts are going to change it. Most companies are adding AI to their workflows. We built the company around it. The default at Lineage is that Claude runs the workflow. We don't start with a job and ask where AI can help. We start with what Claude can do alone, then add humans where judgment actually matters.
We didn't hire you to absorb the system. We hired you because your experience and instincts are going to change it.
Most companies are adding AI to their workflows. We built the company around it. The default at Lineage is that Claude runs the workflow. We don't start with a job and ask where AI can help. We start with what Claude can do alone, then add humans where judgment actually matters.
We're a financial services company. Nothing ships without a human in the loop. The only question is how much Claude does and what the human is on the hook for.
Three modes.
AI-forward. Claude does the work. A human verifies specific claims before it ships.
Bridge. Claude does the heavy lifting. A human applies judgment.
Human-led. A person does the work. Claude prepares the inputs.
The speed comes from Claude working faster, not from cutting the human out. A human-led task at Lineage still moves faster than the same task at a company where the human is also doing the prep.
Taste is the hire.
The people who thrive here aren't the best prompters. They're the ones who think clearly, write precisely, and know when to trust the machine and when to override it. That's taste. We don't automate it. We hire for it.
If your instinct when Claude gives you a confident answer is to ship it, this isn't the place for you. If your instinct is to ask what it got wrong, keep reading.
Last month Claude drafted a borrower communication that was technically accurate and completely wrong in tone. Confident, polished, and going to land badly with the customer it was written for. The underwriter caught it, rewrote the opening, and we updated the skill so the next draft starts from a better place. That's the loop. The skill got sharper because someone with taste pushed back on it.
Onboarding works the same way.
Before you start, Claude drafts your first-week plan and loads the skills for your role. On day one, you're not reading a manual. You're doing the work with a chief of staff that knows how Lineage operates. Your manager owns the relationship and the read on whether you're set up to succeed.
Then the part that matters. Within your first weeks, the skills evolve because of what you push back on, what you do differently, what you think we've got wrong. The system gets smarter because you're in it, not in spite of you.
We challenge every process. If something exists because "that's how we've always done it," it's on the table. Writing stays on the human side more than most things, because writing is how you think. You can't outsource the part where ideas get distilled. The draft is the thinking. We use Claude to pressure test and sharpen, not to replace the act.
This isn't a tools initiative. It's how the company operates. We build on skills. Modular, composable units of instruction and context that load based on what the agent is doing. Our skills encode how Lineage works, and they evolve every time someone pushes back on them. That's the compounding asset we're building alongside the business.
That's our OS.
AI: 25% | Human: 75% — Jesse directed the framing, wrote the content, and edited every draft. Claude structured the essay, ran prompted research agents, and drafted iterations.
How I use Claude to think
I don't use AI to write for me. I use it to think with me. That's not a semantic difference. If you hand Claude a prompt and publish what comes back, you've outsourced your judgment. You'll produce a lot of mediocre work very fast. If you use it to pressure test your thinking from angles you'd miss alone, you've gained something you can't get any other way.
I don't use AI to write for me. I use it to think with me. That's not a semantic difference. If you hand Claude a prompt and publish what comes back, you've outsourced your judgment. You'll produce a lot of mediocre work very fast. If you use it to pressure test your thinking from angles you'd miss alone, you've gained something you can't get any other way.
I start with a position. I never open a session with "what should I think about X." I come in with a thesis, a half-formed plan, an argument I'm not sure holds up. Claude is a thinking partner, not a thinking replacement. If you don't have a point of view going in, you'll accept whatever comes back.
I build the thing, then I break it. First session, I work with Claude on the research. Then I start a new session and begin kaizen. I launch a bunch of agents to stress test it, each with a different point of view. How many depends on the problem. If I'm testing website copy for conversion, it's twenty personas across different buyer segments. If I'm evaluating vendors, perhaps it's three. If it's something internal, it's the actual team members who'll be affected. Claude is good at sizing the right set of perspectives when you give it the context.
I synthesize and loop. I take those reactions, decide what's signal and what's noise, and synthesize a revised version. New session. New agents, different lenses. Synthesize again. Sometimes that's one loop. Sometimes it's three. Sometimes I walk away for a few days, find something new on the web, and come back with the original doc and the new input and say "run a kaizen loop against this with these personas." The loop reopens whenever the thinking needs it.
One specific instruction I give the synthesizing agent every time: don't argue with the subagents, and don't average them. When personas disagree, the job isn't to broker a compromise. Compromise produces a position no agent actually held, weakened to offend none of them. Either reason to first principles and pick the strongest answer, or come back to me and ask. A middle ground is almost always a tell that the synthesizer gave up.
Additionally, one of my most frequent follow up prompts is "convince me why I'm totally wrong." If Claude can't make a strong case against my position, I have more confidence in it. If it can, I just learned something.
Then I write for the team. Once the plan is solid, new session. I write the execution version. Then I run agents as the actual people who'll be reading the work. Does this make sense to Oliver? Is Liz going to push back on the timeline? Will the customer convert? That round isn't about the idea anymore. It's about how the people receiving it will receive it and whether they can act on it.
Fresh sessions are the discipline. Long conversations rot. Claude starts agreeing with you, echoing your framing, losing its edge. Every phase gets a clean session. Planning, stress testing, drafting, etc. It feels slower but it's faster and the output is incomparably better.
You have to think. This is the part that doesn't fit in a process doc. You can run kaizen loops all day, but if you're not a critical reader of what comes back, you're just generating noise. The tool multiplies whatever you bring to it. Bring sharp, critical thinking and you get sharper thinking back. Bring nothing and you get polished nothing, and you will be exposed.
AI: 20% | Human: 80% — Jesse described his full workflow, directed the framing, wrote the content, and edited every draft. Claude structured the essay, ran research agents to validate the approach against published best practices, and drafted iterations.