← Back to Projects
Co-founded · 2025 Open-sourcing soon

Pulp AI

A simulation platform that lets founders and product teams test copy, pricing, and features against ~1,000 AI-generated customer personas, shipping micro-decisions in minutes instead of weeks.

Multi-agent LLMsSynthetic personasProvider-agnosticFastAPIpgvectorDocker
Try the demo ↓ ← All projects
10 Early-stage pilots Founding teams who ran live tests
~1,000 Agents per simulation Synthesized to match the target population
70% Back-test accuracy Calling the winning variant vs. 50% chance
Minutes vs. weeks of research Turnaround on a micro-decision

The Problem

Founders make a hundred decisions a week, and research can't keep up

Building from the ground up, I kept running into the same trap. Decisions were either anchored too heavily to my own way of thinking, or taken as shots in the dark off a handful of responses from a narrow slice of the market.

Talking to other founders surfaced a pattern. You have to move fast and make a lot of calls, and there is rarely the money or the time to do deep, meaningful market research for every important micro-decision, whether that is the copy, the price point, or the framing of a feature.

The people who felt this most acutely were founding teams and PMs who needed a clear, intuitive read on how their target demographic would react to a product change, frequently and quickly.

Their alternatives were user interviews and surveys, which are slow to run and rarely wide enough to capture the full diversity of the audience. Pulp AI set out to compress that loop from weeks to minutes without giving up the diversity of perspective that makes research worth doing.

The Wedge

Why a population of personas beats asking one model

Prompting a single LLM with "what would users think?" flattens every perspective into one averaged voice that is directionally vague and easy to over-trust.

Single prompt

1 averaged voice

One blended answer that hides disagreement and gives no quantitative read on who reacts how.

Pulp AI

~1,000 distinct agents

Each persona reasons independently, then reactions are clustered into segments, yielding richer insight with quantitative directionality.

Worked example Should the Pro tier launch at $29 or $49 per month?

Ask one model

"It depends on your audience. $39 could be a sensible middle ground that balances accessibility against perceived value."

Plausible and instantly forgettable. No read on who wants what, and no number you can act on.

Ask a population

64% lean toward $29, but the quarter of buyers who are enterprise read $49 as a quality signal and disengage at the lower price. Price-sensitive SMBs anchor hard on $29.

Ship $29 as the entry point and keep $49 as a clearly positioned upper tier. A decision, backed by the segments that drove it.

How it works

From a population seed to a segmented verdict

One run takes a product description and a test, synthesizes a matching population, and returns how each market segment will respond.

01
Setup

Define the simulation

The founder describes their product and target demographic, then picks a test type (A/B copy, pricing sensitivity, feature framing) and supplies the variants.

02
~1,000 agents

Synthesize the population

The persona generator expands the population seed into ~1,000 heterogeneous agents matching the target distribution. Existing matching profiles are reused via pgvector, and gaps are synthesized and cached.

03
Multi-step

Agents deliberate

Each persona runs a bounded reason → act → observe loop, calling tools mid-flight and finalizing a validated decision that contains a choice, free-text reasoning, and a confidence score.

04
Scored

Aggregate deterministically

A weight- and confidence-adjusted tally produces the winner, margin, and overall confidence, with no LLM judge, so the headline number is reproducible.

05
Insight

Cluster & narrate

Reactions are clustered into market segments and a single narrative pass writes the insight and recommendation, so the founder sees how each segment will respond.

Architecture

A modular AI layer over a three-tier core

A React frontend app talks to a FastAPI backend that owns the entire run lifecycle, from persona creation through test coordination, agent execution, scoring, and the final summary. Persistent state is split across three stores. PostgreSQL holds relational data, pgvector holds persona embeddings that power dedup and demographic-consistent reuse, and Redis handles sessions and caching. Every model call flows through one provider-agnostic LLM adapter, with concurrency bounded to respect provider rate limits.

Request path

ClientReact SPA

Test builder, live run view, and the segment-results dashboard.

API · orchestration coreFastAPI

Owns the full run lifecycle end to end.

1Persona creation2Test coordination3Agent execution4Deterministic scoring5Narrative summary

Modular AI layer

Persona engine

Expands a population seed into N heterogeneous agents with bios, traits, and behavioral weights.

Loop harness

Drives each agent's bounded reason → act → observe loop and validates the submit() decision.

Analyzer

Weight- and confidence-adjusted tally, then a single narrative pass. No LLM judge in the loop.

Provider-agnostic LLM adapter
claude-sonnet-4-6gpt-4o-mini

Persistence

PostgreSQLRelational state

Products, test definitions, runs, and per-agent results.

pgvectorPersona embeddings

Similarity search to dedupe agents and recycle profiles that match new demographics.

RedisSessions & cache

Session state and hot-path caching across a run.

Per runN personas × 1–5 deliberation calls + 1 generation + 1 narrative

Engineering

Key technical decisions

Extracting one JSON answer from a prompt is trivial. Orchestrating a thousand asynchronous, multi-turn agents into a validated, reproducible verdict, without one bad agent corrupting the whole, was the real work.

Persona generation

Synthetic populations from a single seed

Rather than static research data or hand-authored profiles, a generation pipeline expands a population seed, the simulation's ICP, into N heterogeneous agents, each with a bio, traits, and behavioral weights drawn from free-text hints or predefined cohorts.

Consistency

Identity anchoring & deterministic replay

A frozen system prompt re-injects each agent's core demographic profile into every model call to prevent persona drift. In the offline harness, outputs are seeded by a hash of the system prompt, so any given agent is logically repeatable while the population stays diverse.

Agent design

Bounded reason → act → observe loops

Every persona is a true multi-step agent, not a single call. It can invoke tools, fold observations back into context, and iterate up to a step budget before finalizing through a synthetic submit tool whose schema is the persona's decision model.

Reliability

Provider-agnostic, schema-validated output

The submit tool's JSON schema is derived from a Pydantic decision model and inlined so enums survive across Anthropic and OpenAI. Validation failures trigger budgeted retries rather than letting malformed data persist, with no vendor lock-in above the adapter layer.

Fault tolerance

Three-tier isolation across the population

Tool errors become observations, step-limit breaches fall back to low-confidence results, and catastrophic agent crashes are encapsulated in a result container. One malfunctioning agent never corrupts the aggregate of a thousand.

Aggregation

Deterministic scoring, then one narrative pass

No LLM judge. A weight- and confidence-adjusted tally produces the winner, margin, and confidence, so the headline number is reproducible. A single narrative pass then writes the insight and recommendation on top.

The per-agent loop

Deliberate

Reason over the variant

Act

Call a tool (calc, research)

Observe

Feed result back as context

Finalize

submit() → validated decision

Bounded to ~5 stepsTool errors become observations, not crashesSchema-validated submit() with budgeted retriesBounded concurrency respects provider rate limits

Demo

Run a simulation

Pick a test, press run, and watch a synthetic population react, then see the aggregated verdict and segment breakdown. This is a stylized mockup with pre-computed results. The product ran this live against a freshly synthesized population each time.

app.pulp.ai/simulate

1 · Pick a test

Question under test

Should the Pro tier launch at $29/mo or $49/mo?

A · $29 / monthB · $49 / month
Awaiting run

Press Run simulation to spawn a synthetic population and watch agents react.

2 · Aggregated verdict

Your aggregated verdict and segment breakdown will appear here.

Illustrative mockup with pre-computed outputs. The production system ran live against a freshly synthesized population per test.

Validation

Back-tested against real outcomes

Across 10 early-stage pilots, the most consistent signal was speed. Micro-decisions that used to take weeks of interviews resolved in minutes. To sanity-check the predictions, we replayed historical A/B tests through the simulator.

Winning-variant accuracy · back-test over 10 historical A/B tests

Pulp AI prediction
70%
Random baseline
50%

20 percentage points above chance at calling the winner, plus directional lift, on a small historical sample.

On a small set of 10 historical A/B tests, the simulator picked the winning variant 70% of the time, meaningfully above the 50% coin-flip baseline, and recovered the directional lift. A promising early signal rather than a validated benchmark, given the sample size.

Learnings & what's next

Fun to build, hard to make cheap

Engineering a system like this is genuinely a thrill, orchestrating a thousand reasoning agents into one coherent answer is a satisfying problem. The hard part isn't getting it to work. It's getting it to work inside an optimal cost window without sacrificing the efficiency and fidelity that make the output worth trusting.

That tension between cost and quality is the core lesson I'm carrying forward. The plan now is to open-source the core concept as a library, so anyone can spin up persona simulations for their own decisions and tune the cost/fidelity trade-off for their use case.

The core engine is going open-source

A library version of Pulp AI's persona-simulation core is on the way. Want a heads-up when it lands, or to swap notes on agent orchestration? Reach out.

Email me ← All projects