The Karpathy Loop, Applied to Marketing

Google will optimize your ads for you.

Inside their platform. Using their algorithm. Measured by their definitions. You don't get to see how it works. You don't get to change what "better" means. You just get a dashboard that says trust us.

Meta will do the same. Advantage+ picks your creative. Performance Max picks your audience. The platforms call it automation.

It's automation the way a casino automates your betting strategy. The house built the algorithm. The house defines winning. And the house always optimizes for the house.

You're not the customer. You're the supply.

In March 2026, Andrej Karpathy released a repo called autoresearch.

630 lines of code. One GPU. One file the AI agent can edit. One file the human writes. One metric that decides what lives and what dies.

The agent forms a hypothesis. Edits the training code. Runs a five-minute experiment. Checks the score. If the score improved, it keeps the change. If not, it rolls back. Then it loops.

126 experiments overnight. No human in the loop.

28,000 developers starred the repo in a week. Everyone saw a machine learning tool.

I saw the opposite of a black box.

Which is embarrassing, because I'd been inside the black box for years. Not questioning it. Just reading the dashboards.

Here's what makes autoresearch different from platform automation.

The human defines the constraints. The human defines success. The human sees every experiment, every result, every decision. The agent does the work. The human owns the logic.

Nothing is hidden. Nothing is optimized for someone else's revenue. The entire system fits in your head.

That's not how your ad platform works.

That's not how your attribution tool works. That's not how any vendor in your stack works.

Every tool you rent is someone else's opinion about your business, packaged as a fact.

The dashboard says you're winning. The P&L says something different. You can't explain the gap because the logic lives in someone else's box.

I built a tool called LoopKit. It transplants Karpathy's architecture into marketing.

The mapping is 1:1.

In autoresearch, there's a frozen file called prepare.py. The agent can't touch it. It defines the data, the evaluation rules, the environment.

In LoopKit, that's the brand brief. Product. Audience. Voice guidelines. The constraints the agent works within. Yours. Not Google's.

In autoresearch, there's an editable file called train.py. The agent rewrites it every iteration.

In LoopKit, that's the ad copy. Headlines. Descriptions. The agent rewrites them every cycle. Angles, hooks, structure. Everything is fair game.

In autoresearch, there's a single metric called val_bpb. It's what the ratchet turns on.

In LoopKit, that's a composite score. Relevance, clarity, urgency, brand fit. Defined by you. Scored transparently. Not a black box number that means whatever the vendor needs it to mean.

Same architecture. Different domain. And one critical difference: you own the entire loop.

"AI writes ad copy" is a commodity. Every tool does that now.

Nobody runs 50 experiments on ad copy. Nobody lets an agent hypothesize, test, fail, learn, and iterate against a fixed standard across dozens of cycles.

The difference between asking AI for a draft and running an autonomous research program is the difference between asking one question and conducting a study.

One gives you an answer. The other gives you understanding.

And understanding is the thing the platforms will never sell you. Because understanding in your hands is leverage out of theirs.

Karpathy made six design decisions that make the loop work. Each one is a principle. Each one is the opposite of how platforms treat you.

The agent can change everything in one domain and nothing outside it. And it can't redefine success. You set the metric. You freeze it.

Platforms give you no visibility into what they change, and they redefine success every time they update their algorithm.

Every run gets the same budget. Fair comparison. Improvements lock in. Regressions roll back. Progress is structural, not statistical.

Platforms allocate your budget across experiments you can't see, then show you aggregate performance and hide the ones that failed underneath it.

Every hypothesis, every result, every decision is logged. You can trace any outcome back to the change that caused it.

You write the strategy. The agent writes the tactics. Platforms make you the observer. They're the investigator, the lab, and the judge.

The same architecture applies to email subject lines, landing page headlines, bid strategies, audience configurations. Anything with a bounded search space and a scorable outcome.

The pattern is the product. The domain is a variable.

But the real product isn't the pattern.

It's the shift from renting intelligence to building it.

Two organizations look at the same market.

One logs into platforms. Reads dashboards built by vendors. Makes decisions based on data filtered through someone else's priorities. When the algorithm changes, it scrambles.

You know the morning routine. Open the laptop. Check the dashboard. Scan the green numbers. Close the laptop.

Nothing learned.

The other runs its own loops. Defines its own metrics. Owns every experiment. When the algorithm changes, it already has the data to measure the impact independently.

The first is a customer.

The second is a competitor.

Everyone is asking what AI can do.

The better question is what loops you should be running overnight. On your terms. With your data. Measured by your definitions.

Not inside their black box.

Inside yours.