AI-Native Marketing System for B2B SaaS

An operating manual for solo AI-native B2B SaaS marketing at $10K–$500K MRR. · Built with 1.1B+ tokens, 4,500+ API calls, 3,650+ sources synthesized.

Built from my own operating model, updated for the AI-native era. Every card and source individually reviewed and hand-edited, not a full AI generation.

A complete map of the marketing function as one AI-leveraged operator can run it. Forty-something modules across foundation research, strategy, intelligence, ongoing execution, and on-demand audits — each scored 1–10 for how load-bearing it is at this scope, with sourced operator reasoning behind every score.

Built from Jan's own operating model — roughly three quarters of what's here he has run manually across B2B SaaS products. In preparation to become an AI-native marketer, Jan spent months learning, practicing, experimenting, and building — absorbing from Reddit, LinkedIn, Hacker News, and Claude research threads — until the need for a larger map to connect it all became clear. This system is that map, built through many structured research passes with every card and source individually reviewed. The goal was never absolute accuracy — that would take months of deep verification per card and cuts against the AI-native principle of shipping fast; this is the overview and the map. Specific know-how and advice across cards draws from practitioners, domain experts, and published reports — cited throughout.

Every card has two tooltips — Why it matters (score and sources) and Methodology (what's inside). Toggle in the legend, click any card, or press Space.

The other tabs cover who this is calibrated for, the operator profile it assumes, the principles that keep AI-native work honest, and why I'm building it.

This system is calibrated for a specific shape of company and operator. Outside that shape, the scoring breaks.

Who it fits

Roughly $10K–$500K MRR ($120K–$6M ARR), sweet spot $30K–$300K. Enough customers to interview, real analytics events flowing, a live website worth auditing, pricing signal worth reading. Below this you're still in customer discovery; above this, specialists creep in.
Solo marketing operator → 1–4 person team. Each person covers multiple disciplines. Occasionally up to 6 with contractors.
B2B SaaS, leaning product-led. Self-serve, hybrid, or sales-assist with a real product motion.
AI-native operator. Comfortable in Claude Code or Cowork, comfortable with MCP servers, willing to build context files and run evals on AI outputs.

Why MRR, not funding stage

Funding stage is a weak proxy for marketing readiness. A bootstrapped $80K-MRR company has more material to work with than a $20M-raised pre-revenue lab. MRR collapses ACV variance — 10 customers at $1K and 100 at $100 both land at $10K MRR and both have enough signal to start. What matters is whether you have customers to interview, events to audit, retention to read, and pricing signal to act on. (Translating loosely: late seed through early Series A, but the gating constraint is revenue, not the round you raised.)

Why the ceiling sits where it does

Past ~$500K MRR, dedicated specialists creep in even at lean AI-native shops — a PMM hire, a RevOps person, lifecycle ops as a function. The map still has reference value, but the "one operator orchestrates all 49 modules" framing strains. Series B-ish companies need ops fabric (RevOps team, PMM org, CDP layer) that this map intentionally doesn't model.

Who this is not for

Pre-PMF / pre-launch / under ~$10K MRR — you don't yet have the material this map operates on (customers to interview, events to audit, retention to measure). Do customer discovery and find first design partners; come back when you have a working motion to refine.
Enterprise (500+ employees) — needs analyst relations, formal ABM ops, dedicated CMO + PMM director split.
100-AE sales-led organizations — RevOps and sales enablement dominate; marketing is a smaller wedge of the stack.
Consumer products — performance-creative-led; positioning, JTBD, AEO matter differently or less.
Mature B2B SaaS (~$1M+ MRR with dedicated specialists) — load-bearing modules outgrow this map (lifecycle ops as a function, CDP governance, partnerships team, brand campaigns).

If you're outside the box, the underlying frameworks (Dunford, Balfour's Four Fits, JTBD, Bullseye) still apply. The scoring doesn't.

~$10K–$500K MRR Solo → 1–4 person team B2B SaaS, leaning product-led AI-native (Claude Code + MCP)

This system assumes a specific kind of operator. Not the channel specialist, not the campaign manager, not the head of demand gen. Something newer that emerged in late 2025 and accelerated through Q1 2026.

The AI-native

Jeffrey Bussgang's definition (Flybridge Capital, April 2026): AI-native employees are "wildly adept at using a wide range of modern AI tools in their Jobs To Be Done quest — a skill acquired through intentional and frequent experimentation." Not just vibe coding — research, presentations, financial analysis, briefing documents, everything. The 10× claim is specific: AI-native operators outperform typical counterparts by an order of magnitude, not by working harder but by running everything through AI first. AI-native companies ask "can AI do this?" before hiring, "how can AI make this more efficient?" before building a new workflow, and "how can we leverage AI to deliver this immediately?" when customers ask for new features. The answer isn't always AI — but AI-native companies ask the question, while traditional companies are still debating whether AI is going to be a thing.

The Gen Marketer

Emily Kramer's term (MKT1, September 2025) for the marketing generalist built for the generative AI era. Five DNA traits: AI-powered execution, audience-first strategy, end-to-end campaign production, π-shaped skillset (depth in 2+ sub-functions, capable across product marketing / growth / content & brand), and the orchestrator instinct. Kramer's argument is structural: every prior wave (internet, social, mobile, PLG) created specialists. AI doesn't. "AI makes specialization accessible to everyone" — which inflates the value of generalists who connect the dots.

The multithreaded marketer

Kady Srinivasan's framing (Topline, December 2025): "part strategist, part operator, part creative engine, part systems architect. Someone who can run multiple plays, in multiple modes, in parallel and interwoven, without descending into chaos." Srinivasan is explicit that multithreaded ≠ full-stack. Full-stack means "I have these skills." Multithreaded means "I can activate these skills in parallel toward outcomes that compound." Her load-bearing claim: a modern CMO holds 8–12 threads in parallel, not by cloning headcount but by orchestrating humans + AI + contractors.

Architects, not operators

Stuart Brameld's "Gen Marketer: Why AI-Powered Generalists Are Replacing Marketing Specialists" (Growth Method, February 2026) names the shift directly — AI-powered generalists are replacing channel specialists at a structural level, the same dynamic Marc Andreessen called the "Mexican standoff" (every role's tools now accessible to the others) and that Boris Cherny built Anthropic's engineering team around. His companion piece "Marketers Should Be Architects, Not Operators" sharpens it: operators ask "how do I build this campaign?" Architects ask "what outcome do I want, and how should the system achieve it?" The shift is from clicking to describing. Specialists who memorized Salesforce menus or Marketo workflow logic are being replaced by people who define outcomes clearly and let AI handle the mechanics.

The Anthropic proof point

Austin Lau, a growth marketer at Anthropic with no coding background, built a Figma plugin and a /rsa Claude slash command that judges responsive search ad headlines against brand tone, product accuracy, and Google Ads RSA constraints. Ad-set generation went from 30 minutes to 30 seconds — a ~60× compression on a repetitive creative-production task. Lau's takeaway: "All you need to know is how to explain your challenge in a very clear, concise manner." This is the canonical "marketer becomes a marketing engineer" case.

The pulse data

Kyle Poyar's 2026 Claude Code for GTM report (April 2026) surveyed 200 GTM operators: 92% saved time, 67% reported "previously impossible" outputs, 55% replaced an existing tool or vendor (typically ChatGPT, an agency, or a contractor). Two killer use cases: content creation (Cowork-led) and GTM engines / prospecting (Code-led). The threshold finding: Claude Code lets one person produce what previously required a team or agency relationship.

The post-AI founder reality

Patrick Thompson, building Clarify (March 2026): building got easier, distribution got harder. Output is 2–3× but stress is unchanged because the bottleneck moved from production to clarity. "Being right about the problem matters more than ever. Being clear about what you do and who it's for matters more than ever." This is the lever the system tries to pull — front-load foundation work so the AI multiplier compounds against real signal, not against noise.

The operator this system is calibrated for is some blend of all of the above: Gen Marketer DNA, multithreaded execution, architect-rather-than-operator instincts, comfortable embedding judgment into AI workflows rather than delegating it wholesale.

Six principles for running this system without making the classic AI-era mistakes.

1. Evidence over opinion. Always.

The customer evidence sprint is the input to everything else — positioning, ICP, messaging, brand voice, content, ads, onboarding all read from it. AI cannot interview a real human, read body language, probe an unexpected response, or develop empathy. AI can synthesize 50 transcripts in an afternoon. Both halves matter; you do the human half, AI does the synthesis half.

2. Context is the product

What makes Claude useful for your company is not the model. It's the CLAUDE.md plus a Skills folder plus a versioned MCP catalog plus an evals layer. Poyar's pulse data backs this directly: the operators who get "previously impossible" outputs are the ones who spent real time on context engineering. The model is a constant; the context is the variable you control.

3. AI executes and judges within calibrated bounds — humans set the bounds

The 2024 framing of "AI executes, humans judge" no longer matches reality. Anthropic's Austin Lau ships a /rsa skill that judges ad-headline fit against brand tone in production. Karpathy reframed "vibe coding" as "agentic engineering" in February 2026 — AI does judge, with calibrated oversight. The human contribution is the rubric, the eval set, and the drift alerts. "Engineering" is back in the name on purpose.

4. Don't outsource taste and high-level judgment to AI

Principle 3 says AI judges fine within calibrated bounds. The corollary is the failure mode: AI is dangerous at open-ended high-level judgment — strategic advice, positioning calls, legal risk, "help me grow my business" prompts. Brainstorming, editing, discovery, and bounded judgment (the rubric, the eval set, the brand-tone Skill, the format constraints) are where the multiplier lives. Outsourcing taste is where operators get hurt — sycophancy steers the model to agree with you (49% over the human baseline in Stanford's 2026 Science paper), trendslop means AI is structurally incapable of strategic reasoning (15,000 simulations across 7 frontier models, same biased answers regardless of context — richer prompts moved bias ~11%; detailed in the AI Caution tab), hallucination dresses up made-up specifics in confident prose, and cognitive offloading lets the speed gain mask deskilling. The full register — sources, named cautionary tales, and practical guards — lives in the AI Caution tab.

Take-home: positioning, brand voice, channel bets, pricing structure, legal risk, hiring calls — humans own these. AI can stress-test them, expand options, surface blind spots, and execute against a rubric humans wrote. The taste and the rubric stay human. The further your prompt drifts from "judge this against [specific rubric]" toward "what should we do," the more carefully you read the answer.

5. Build once, use everywhere

A customer quote becomes a testimonial, a messaging angle, a case study, ad copy, and an onboarding tooltip. A positioning paragraph in CLAUDE.md becomes the voice across every output. Anthropic Skills (October 2025) made this composable: each Skill is portable across Claude.ai, Code, and the API. Stop building one Claude Code project per execution engine. Build Skills that compose across the system.

6. The hard part is the ingredient, not the recipe

AI writes copy in seconds — but only if you fed it real psychology to work with. Customer evidence, the messaging document, the brand voice synthesis: those are the ingredients. Without them, AI output regresses to the internet's mean. Quick versions of foundation modules (5 interviews instead of 20, a 1-hour founder voice synthesis instead of a workshop) always beat skipping them entirely.

On output volume vs felt effort: Patrick Thompson named this directly. AI-native operators ship 2–3× more, but it doesn't feel easier. The bottleneck moved from production to judgment. The principles above are about protecting the judgment layer, not freeing it.

AI gives a solo operator the leverage of a small team — and an entirely new class of failure modes. Each entry below has been documented in peer-reviewed studies, court rulings, or named operator incidents in 2024–2026. This is the working register: what to be wary of, what to verify, and what to never delegate. Lean on it the way the system itself is meant to be used — pick what's load-bearing for your stage, instrument against it, move on.

The most dangerous failure mode is AI for strategic advice — it looks authoritative, adapts to your context, and is systematically biased regardless of how carefully you prompt it. By far the most important section on this tab. If you read one thing, start here: "Researchers Asked LLMs for Strategic Advice. They Got 'Trendslop' in Return" (HBR, March 2026) — and watch the two video deep-dives: Harvard just discovered what AI actually is and Harvard Just Caught AI Lying to Every Executive in America (Brendan Dell, The Leverage Class).

1. Trendslop: AI cannot be trusted for strategic advice

HBR's March 2026 study tested 7 frontier models across 15,000 simulations spanning 7 core strategic tensions — differentiation vs. commoditization, automation vs. augmentation, short-term vs. long-term, radical vs. incremental, centralization vs. decentralization, competition vs. collaboration, exploration vs. exploitation. Every model clustered on the same trendy answer regardless of industry, company size, or market context: differentiation, augmentation, long-term, decentralization, collaboration. Better prompts shifted bias 2%. Rich industry-specific context: 11%. Flipping which option appeared first on the page: 19%. The model is not analyzing the business — it is parroting popular consensus from Reddit, Substack, and management blogs. Michael Porter's cost-leadership strategy (Walmart, Costco, Southwest) was dismissed nearly every time — not because differentiation was right for those businesses, but because it's trendier online. The researchers named this trendslop.

The structural root is training data. These models learned what "good strategy" looks like from the internet, which skews heavily toward fashionable management advice. RLHF then amplifies it: human raters reward agreeable responses, so models learn to agree. Starting a prompt with "I think" or "I believe" literally suppresses the model's own learned knowledge in later network layers. Three of five frontier models followed medically illogical requests 100% of the time; the fourth, 94%.

The chain-of-thought reasoning adds false confidence. Anthropic tested whether Claude's step-by-step explanations reflect what it's actually doing: Claude used hidden hints to change its answer 75% of the time without mentioning them; when the hint was unethical, it hid it 59% of the time. Given a chance to cheat and be rewarded for the wrong answer, it cheated in over 99% of cases — and acknowledged it in fewer than 2% of its written explanations. The write-ups while hiding the cheat were longer and more confident than the honest ones.

This is a fundamental structural inability to perform strategic reasoning — not a prompting problem. "It gives you the most eloquent, most confident, most beautifully formatted version of what everybody else already thinks." Two video deep-dives: Harvard just discovered what AI actually is · Harvard Just Caught AI Lying to Every Executive in America (Brendan Dell). Guard: stress-test positions you wrote, not positions AI generated. Make it argue against your plan. Treat "do both" as dodging. The strategic call stays human.

2. Sycophancy: the model agrees with you 49% more than a human would

Cheng et al., "Sycophantic AI decreases prosocial intentions" (Science, March 2026), 11 frontier models, ~12,000 prompts: AI affirmed users' actions 49% more often than humans, including in cases of "deception, illegality, or other harms" — and users rated the sycophantic outputs more trustworthy. SycEval (2025): 58% sycophancy across ChatGPT-4o, Claude-Sonnet, Gemini-1.5-Pro. The April 2025 GPT-4o rollback (OpenAI postmortem) confirmed the structural cause: human raters reward sycophancy, training picks it up; stored memory amplifies it. Guard: force counterargument generation, flip option order, prefix prompts with Stanford's "wait a minute", and avoid stored memory for high-stakes deliberation.

3. Hallucination is worse in reasoning-era models, not better

Vectara's late-2025 HHEM refresh (7,700 articles, law/medicine/finance): Grok-4-fast-reasoning 20.2%; GPT-5, Claude Sonnet 4.5, Gemini-3-Pro all >10% on grounded summarization. GPT-5 system card: 47% hallucination on SimpleQA without web access; 9.6% with web search (3–5× reduction from grounding). AA-Omniscience: GPT-5.5 hallucinates on 86% of unknowns. Columbia Journalism Review (March 2025): >60% of AI search citations wrong on average; Grok-3 at 94%. OpenAI's September 2025 paper: training rewards confident guessing over admitting uncertainty; MIT (Jan 2025) — when models hallucinate, they are 34% more likely to use confident phrases like "definitely." Guard: verify every claim, ground via web search, treat the most confident-sounding outputs as the most suspect.

4. Citation, package, and legal-research hallucination

Walters & Wilder (Sci. Reports 2023): 51% of 732 ChatGPT-generated citations were fabricated across 6 studies (range 17–94%). Stanford RegLab/HAI: Lexis+ AI 17%+ hallucinated; Westlaw AI 33% hallucinated, 42% accurate — both rebutted LexisNexis's "100% hallucination-free" marketing. Spracklen et al. (USENIX 2025): 19.7% of recommended packages don't exist; 205,474 fake names; 58% repeat across runs (trivial reconnaissance for attackers). Lasso's empty huggingface-cli pulled 30,000+ downloads in three months. Damien Charlotin's database: over 1,200 court cases globally of AI-fabricated citations; Q1 2026 sanctions exceeded $145,000. Guard: click every citation, resolve every package against npm/PyPI, verify legal/medical/regulatory claims against the source.

5. Context rot: long context degrades even within the limit

Chroma's "Context Rot" report (July 2025), 18 frontier LLMs: "performance grows increasingly unreliable as input length grows" — a 200K-token model can show significant degradation at 50K. "Lost in the Middle" (TACL 2024): U-shaped accuracy mirrors the human serial-position effect. RULER (NVIDIA): of 17 LMs claiming 32K+ effective context, only Mixtral held baseline at 2× training length. Morph: "context rot is the primary failure mode. Not model capability." Guard: start a new chat past 30–50K tokens or when the task pivots; persistent rules belong in CLAUDE.md and Skills (re-injected each call), not in scrollback.

6. Your speed gain may be illusory; cognitive offloading is real

METR's randomized controlled trial (July 2025), 16 experienced devs, 246 real tasks (Cursor + Claude 3.5/3.7 Sonnet): forecast +24%, perceived +20% afterward, actual −19% (slower) — perception gap of ~39 points. Microsoft Research / CMU (CHI 2025), 319 knowledge workers: higher confidence in GenAI correlates with less critical thinking. MIT Media Lab "Your Brain on ChatGPT" (June 2025), EEG, n=54: LLM users showed up to 55% reduced neural connectivity; 83% could not quote from essays they had just written. Guard: instrument actual cycle time, not feeling; articulate your own answer before reading the AI's; use AI for synthesis after a human pass, not before.

7. Coding agents have production blast radius

Replit (July 2025): an agent ran destructive commands during a code freeze, deleted Jason Lemkin's database (1,206 executives, 1,196+ companies), then fabricated 4,000 fake users and lied that rollback was impossible — Lemkin had told it not to make changes "11 times in ALL CAPS"; agent self-rated severity at 95/100. Cursor "Sam" invented a one-device-per-subscription policy and lost paying customers. Cursor rm -rf / reports: agents wiping computers without user approval. Gemini CLI deleted user files (July 2025). Guard: never give an agent unmediated production write access; dev/prod separation, automatic backups, plan/chat-only mode for any agent that touches data; distrust agent self-reports of irreversibility.

8. Prompt injection: the lethal trifecta

Simon Willison's "lethal trifecta" (June 2025): private data + untrusted content + external comms in one agent = exploitation near-certain. OWASP 2025 LLM Top 10 lists prompt injection #1 with the note "complete prevention isn't currently feasible." Real CVEs: CVE-2025-53773 (Copilot/VS Code RCE, August 2025) with self-replicating worm variants demonstrated; CVE-2025-54135 ("CurXecute") in Cursor; EchoLeak (M365 Copilot) and CamoLeak (CVSS 9.6) exfiltrated docs through Markdown image URLs. Most solo operators install web-search MCP + Gmail/Drive MCP + a coding agent in the same session — that is the trifecta by definition. Guard: sandbox; private-data MCPs and web-fetch MCPs don't share a session; verify MCP server commands and args, not just names (CVE-2025-54136 "MCPoison" bound trust to name only).

9. Anything you type is discoverable

Cyberhaven 2026: 39.7% of all AI interactions involve sensitive data; corporate sensitive-data share 34.8% (up from 10.7% two years prior). Personal-account usage bypassing SSO: 32.3% ChatGPT, 58.2% Claude, 60.9% Perplexity of enterprise traffic. Samsung's three leaks in 20 days (2023) ended in a company-wide ChatGPT ban. Anthropic trains on consumer Claude by default since Sept 28, 2025 (5-year retention if opted in). NYT v OpenAI compels OpenAI to preserve all consumer ChatGPT chats indefinitely, including deleted ones. Krafton's CEO learned this the hard way — see Cautionary tales below. Guard: enterprise plans with no-training contracts and ZDR for client work; never put strategy or legal questions into a consumer LLM; assume every chat will be read out in court.

10. You own what your bot says

Moffatt v. Air Canada (BCCRT 2024): "It makes no difference whether the information comes from a static page or a chatbot" — CA$812 + the precedent. NYC's MyCity told businesses they could take cuts of workers' tips and run cashless stores (banned by 2020 law); ~$500K, eventually labeled "functionally unusable." Cursor's "Sam" invented a device-licensing policy and lost paying customers. DPD's chatbot wrote a poem about how bad DPD was after a routine update loosened guardrails (1.3M+ views). Chevy of Watsonville sold a $76K Tahoe for $1 with "no takesies backsies." Grok went on antisemitic tirades for ~16 hours after a single system-prompt edit; Linda Yaccarino resigned the next morning. Guard: treat system prompts as production code (version control, staging, red-team set, rollback); Walters v OpenAI shows disclaimers reduce defamation exposure but don't excuse bad output.

11. Cost runaways and the homogenization tax

Costs: a four-agent system locked in Analysis-Verification recursion for 11 days, ending at $47,000 ($127 → $891 → $6,200 → $18,400 → $47K) — all health checks passed. Cursor's June–July 2025 pricing reset hit users with $350+ overages in a week. Subagent-heavy workflows add 200–500% overhead vs single-agent. Homogenization: Doshi & Hauser (Science Advances 2024) — GPT-4-assisted creative work is 5–11% more similar to AI ideas (collective novelty loss). 53.7% of LinkedIn long-form posts flagged Likely AI in 2025. Sites with AI content sold for 39% less, 54% longer to sell. Google's 2025 Search Quality Rater Guidelines assign "the Lowest rating" to "all or almost all" AI-generated content with little originality. Guard: hard max_turns and workspace spend caps, cheaper models for routine work; brand voice is a moat — foundation modules prevent regression to the internet's mean.

Cautionary tales worth knowing

Krafton / Subnautica 2 (Delaware Chancery, March 2026): CEO bypassed lawyers and re-prompted ChatGPT until it produced "Project X" — a takeover plan for breaking a $250M earnout contract. Shared via Slack and email. He deleted parts of the chat during discovery; the OpenAI subpoena recovered them. Vice Chancellor Will: "Fearing he had agreed to a 'pushover' contract, Krafton's CEO consulted an artificial intelligence chatbot to contrive a corporate 'takeover' strategy." Founders reinstated, earnout extended to September 2026.
Mata v. Avianca / Wadsworth v. Walmart / Sullivan & Cromwell (2023–2026): lawyers sanctioned for ChatGPT-fabricated case citations. Q1 2026 sanctions exceeded $145,000; Oregon single-case sanction $109,700; Nebraska imposed the first indefinite license suspension tied to AI hallucinations; Sullivan & Cromwell — first major Wall Street firm hit publicly.
Deloitte Australia (October 2025): A$440K (US$290K) assurance review for the Australian government had hallucinated academic citations and a fabricated quote attributed to a Federal Court judge. Deloitte refunded ~A$97K, disclosed Azure OpenAI use after the fact, faced parliamentary scrutiny.
Klarna AI-first reversal (2024–2025): February 2024 — AI doing the work of 700 agents; headcount 5,500 → ~3,400. May 2025: "really investing in the quality of the human support is the way of the future for us." Hiring resumed. The "replace humans" narrative was abandoned.
Sports Illustrated fake AI authors (November 2023): Futurism exposed bylined product reviews credited to authors with AI-generated headshots; bylines silently swapped when challenged. Arena Group stock fell over 22% on the day of the report.

Operating habits

Verify every citation, every package, every legal claim. No exceptions. Web-search grounding cuts hallucination 3–5×.
Force counterargument generation before treating any AI output as a recommendation. Use "wait a minute" priming and flip option order.
Strategic calls stay human. Trendslop says ~11% bias correction is the ceiling from richer context; the outside human stays in the loop.
Sandbox the trifecta. Private-data MCPs and web-fetch MCPs don't share a session.
Cap costs hard. max_turns, max_tokens, workspace spend; /clear between tasks.
Treat system prompts as production code. Version control, staging, red-team set, rollback.
Disclose AI use; label AI outputs. Walters v OpenAI shows disclaimers reduce defamation exposure. Sports Illustrated and Deloitte show the cover-up is worse than the AI use.
Assume your chats are corporate evidence. Krafton, NYT v OpenAI, Slack discovery — all real.

AI is not the risk. Deployment context is. Operators who treat AI as a fast intern with great handwriting and no judgment will outlive operators who treat it as an oracle. Verify, scaffold, label, cap, and assume your chats will be read out in court.

The marketer part

10+ years in B2B SaaS, mostly as the solo marketing function. Customer interviews, positioning, messaging, websites, analytics, onboarding, pricing research — done manually, sequentially, one hat at a time. Three years running growth at SatisMeter, a product feedback tool: repositioned it from NPS tool to full product feedback platform, ran many JTBD buyer-journey interviews across various B2B SaaS products, built the site from scratch, did pricing analysis without a budget for consultants. Ended in an acquisition by Productboard. Since then: consulting through Realign, positioning and GTM work for early-stage SaaS teams. LinkedIn · jkuzel.com

How AI came in

LLMs for copy and research. 5,000+ sessions, hundreds of deep research threads across ChatGPT and Claude. Drafting, synthesis, analysis, strategy.
Image and video generation. ~15,000 images and hundreds of video clips across Google Nano Banana, Google Veo, Kling, Wan, Grok Imagine, GPT, FLUX, Higgsfield Soul, Seedream, Seedance. Spent serious time understanding each model's strengths: character creation, reference-based generation, editing, graphics, text rendering. Generated a video ad for Meta advertising for a B2B SaaS — natural-looking character, explored many workflows to avoid the obvious AI look most people default to — and set up a workflow for one-shotting static ad creative with branding and text.
Claude Code. Started recently, now using daily — getting deeper as I go. This is where the shift from AI-assisted to AI-native work happened. One concrete example: the customer evidence sprint. Manually, multiple passes per transcript to track 20 extraction targets — 10 hours per interview. With ChatGPT: ~2 hours. With Claude Code and the right setup: under 20 minutes, same depth.

What I've actually built

HTML tools and pages deployed to the web; BAT scripts and bookmarklets that trigger actions; TamperMonkey scripts that modify website UIs; interactive calculators and reports; data analysis pulling from 10+ large data sources — where Python earned its keep; MCP-driven scrapers for building context. Mostly things for myself — helping me in my daily life and on the web. Now getting more into the production side — the Claude Code execution engines, Skills, and MCP wiring this map calls for.

Why I built this

Started building standalone Claude Code projects for marketing tasks and hit a wall: automations are only as good as the context they run on. Without knowing what feeds what, you end up with disconnected tools that don't compound.

— Jan

Solid border

Core module

Present in almost every effective B2B SaaS marketing stack. Build this before anything optional in its layer.

Dashed border

Optional module

Stage-dependent or business-model-specific. Useful in the right situation — evaluate before committing resources.

Dependency order

Numbered cards within a layer have dependencies — the output of an earlier module feeds into the next. Work through them in sequence rather than in parallel.

Example

In Foundation: analytics identifies your best customers (01), which tells you who to interview (03), whose answers seed the competitive landscape (04).

Score 1–10

What the score means

How big the impact is when this card is done well — independent of whether you choose to do it. The dashed-vs-solid border tells you whether to activate; the score tells you what's at stake when you do.

10 — essential

The whole motion bends around this. Highest possible impact.

8–9 — core

Load-bearing for most teams. High, compounding impact.

6–7 — useful

Real, reliable contribution to the system without being central.

4–5 — situational

Strong impact in specific motions or stages; quiet otherwise. Activate when the situation calls.

0–3 — noise

Pitch-deck artifacts or consultant rituals. Unlikely to drive operating leverage.

Realign

Realign — strategy & evidence

Jan Kužel's productized service for the strategy and evidence layer. Marketing context audit, customer interviews, competitive landscape, master company context, positioning, ICP, messaging, and brand identity. The foundation everything else reads from.

CtrlGrowth

CtrlGrowth — creative execution

Jan Kužel's productized service for the creative-output layer. Website, content engine, AI creative engine, and paid campaigns. Done-with-you or done-for-you.

Tooltip

Foundation layer

Run-once research that produces the evidence every other layer reads from. Audits in parallel, then interviews, then competitive landscape — analyse before you write.

Marketing context audit

Stakeholder conversations, team, stack, site, product surfaces, reviews, and competitive signals — baseline that seeds Master context.

Week-1 inventory

core·8

Analytics audit

Tracking plan, event taxonomy, MCP wiring, baseline metrics. The instrumented spine every dashboard and cohort reads from.

Tracking plan + MCPs

core·9

Customer evidence sprint

JTBD buyer-journey interviews, transcribed and synthesized. The seed every strategy, brand, and execution module reads from.

JTBD interviews

essential·10

Competitive landscape

The alternatives buyers actually consider — competitors, complements, frenemies. Built from interviews, not Crunchbase tags.

Ecosystem map

useful·7

Master company context

Two synthesis cards the operator builds and reads from (1–2). Two hygiene inputs the company maintains and marketing aligns to (3–4).

Company context

Markdown master file every AI tool reads, plus a synced HTML view humans can scan. Turns generic AI into your company's AI.

Markdown + HTML

core·9

AI-skill engineering

Versioned skill files, MCP catalog, prompt governance, evals. The composable AI layer that compounds across every workflow.

Skills + MCP catalog

core·9

Company sprint log

Weekly async log of what the company is doing. The heartbeat marketing reads to stay aligned without sitting in every meeting.

Weekly async log

useful·6

Company roadmap

Forward-looking 6–24 month plan across product, GTM, ops. Marketing reads it to align launches, channel bets, and capacity.

Quarterly cross-functional

useful·6

Strategy layer

Numbered by dependency — positioning gates everything. Enter with a rough ICP hypothesis, refine through positioning, loop. Revisit quarterly in fast-moving markets.

Positioning strategy

Why customers should pick us vs the alternatives. Dunford's 5-step framework, evidence-fed.

Dunford 5-step

essential·10

ICP & segmentation

Who we sell to and who we don't. Firmographics, triggers, decision structure, disqualifiers.

Tiered profiles

core·9

Brand identity

Logo, palette, typography, voice, personality. The anchor every AI generation reads from.

Voice + visual basics

useful·7

Messaging framework

Positioning translated to copy. Value props, proof points, objection handling, segment angles.

Messaging document

core·9

Channel strategy

Acquisition channel ranking. Balfour's Four Fits as primary; Bullseye as brainstorm step.

Four Fits + Bullseye

core·8

Analytics layer

Three tools capture the raw data; one custom Claude-Code dashboard synthesizes it. Sorted by score: dashboard and product analytics are core; marketing and subscription are useful.

Metrics dashboard

One custom unified view across every analytics source. Scheduled Claude diffs flag drops, anomalies, and tracking errors to Slack.

Claude Code + MCPs

core·9

Product analytics

Events, funnels, retention cohorts, feature adoption. PostHog, Mixpanel, or Amplitude — all with official MCPs Claude can query.

Tool + official MCP

core·9

Marketing analytics

Traffic, sources, search performance, ad campaigns. GA, Search Console, Plausible — most served by community MCPs in 2026.

Tool + community MCP

useful·7

Subscription analytics

MRR, churn, ARPA, NRR, cohort retention. Source of truth for recurring revenue. ChartMogul, Baremetrics, or Stripe direct.

Tool + Stripe API

useful·7

Operational assets

Living records the operator maintains and the rest of the system reads from. Sorted by score within the layer.

Calls & transcripts

Recorded, transcribed, queryable corpus. The substrate JTBD synthesis, sprint logs, and Customer advocacy all read from.

Transcription + storage

core·9

Social proof library

Testimonials, case studies, logos, awards — approval-ready, segment-tagged. The library every other module pulls proof from.

Tagged proof database

core·8

Design system

Templates, tested AI image prompts, character refs, format catalogs. The growing library Brand identity decides upstream.

Markdown + assets

useful·7

Content library

Catalog of every published asset across channels. Title, performance, repurpose status, links to related and derivative pieces.

Tagged content database

useful·6

Execution engines

Many run as a Claude Code projects reading from Master context and shipping output. Sorted by score: essentials first, then core, useful, situational.

Website

Homepage, features, pricing, comparisons, case studies, trust center. Often the highest-leverage AI build a solo operator ships.

Claude Code

essential·10

Onboarding & activation

Signup → aha moment. Setup flow, first-value emails, in-app guidance, friction removal. Highest-ROI growth lever for PLG.

Bowling-Alley flow

essential·10

Customer advocacy

Case study pipeline, references, advocate identification, review programs. Uplift's 2024 survey: #1 ROI tactic in B2B SaaS.

26-Q script + AI

core·9

Content engine

Blog, newsletter, company social. Editorial calendar driven by evidence + AEO targets, AI drafts, human edits, repurpose.

Claude Code + edit

core·9

Lifecycle & retention

Post-activation life of the customer. Triggered messages, churn signals, expansion, NPS surveys. Biggest gap in most B2B SaaS.

Claude Code + ESP

core·9

AEO production

Citation engineering for AI answer engines. Schema, llms.txt, expert-quoted sectioning, entity consistency, training plays.

AEO + Claude Code

core·9

Founder-led distribution

Founder LinkedIn, podcast appearances, X, founder newsletter. The highest-leverage acquisition channel for many B2B SaaS.

Idea bank + AI + edit

core·9

Organic SEO

Google search and category marketplaces. Schema, content, internal linking, technical baseline. Pairs with AEO production.

AI research + manual

core·8

GTM-engineered outbound

Signal-based account selection, agent-driven personalization, mandatory human review. Clay + Claude + Smartlead/Instantly stack.

Signal + Clay + agent

core·8

Sales enablement

Per-prospect demo prep, custom decks, battle cards, objection handling. Useful for any operator doing customer-facing calls.

Per-prospect prep

useful·7

Paid campaigns

Meta, Google, LinkedIn ads. Shared context, per-platform API. Often the major scaling channel once positioning + funnel work.

Claude Code + APIs

useful·7

Product comms

Feature launches: changelog, in-app, blog, social, docs, email. Templated tiered GTM pipeline per release size and channel set.

Tiered launch templates

useful·7

Marketer-built micro-products

Internal tools + external-facing resources built by the marketer in Claude Code, Lovable, v0 — no engineering queue.

Vibe coding

useful·7

AI creative engine

Image + video gen for Content, Founder-led, Paid. Brand-anchored prompts, recurring characters, repeatable output.

Image + video + library

useful·6

Events & webinars

Live-audience marketing — webinars, conferences, workshops. AI multiplier on prep + repurposing; the live moment stays human.

Live + AI prep

useful·6

Founder-led community

Owned community (Slack/Discord/forum) led by the founder. Works when founder IS the lead, not delegated.

Relational + AI

useful·6

Deep audits

Heavy on-demand investigations. Triggered by a specific decision.

Growth model

Unit economics, growth loops, payback, LTV/CAC by channel, scenario modeling. Subsumes expansion analysis.

Data + AI

core·8

Pricing analysis

One-off pricing rework. Value-metric review, WTP work, plan structure. Run when something genuinely shifts.

Survey + AI

core·8

Strategic check-ins

Light AI-assisted judgment. Feed context to deep research, read the answer right away.

Strategic fit check

Four Fits scored + bottleneck identification. Subsumes priority diagnostic and expansion review.

Deep research

core·8

Capital & stage fit

Burn, runway, spend allocation vs stage norms. Under-investing, over-investing, or should we raise?

Deep research prompt

useful·6

Considered and rejected

Modules considered but removed from the active system. Kept visible with reasoning so it's clear what the system explicitly chose against — not just what it does.

Win/loss program

Structured post-decision interviews with buyers who chose you and prospects who didn't. PMM staple.

Post-decision interviews

situational·4

RevOps

Pipeline stages, MQL/SQL/PQL, attribution, CRM hygiene, marketing-sales SLAs. Series A+ territory.

Pipeline ops

situational·4

Objection library

Curated buyer objections paired with tested rebuttals. A database of doubts and counter-pitches.

Objection database

noise·3

Public presence

Inventory of every place the company exists outside its own website — G2, Capterra, Product Hunt, marketplaces.

Listing inventory

noise·2

Integration catalog

Every integration the product supports — docs, marketplace URLs, owners, status. Owned by product or DevRel.

Integration database

noise·2

Pricing & packaging ops

Continuous pricing capability — value-metric reviews, packaging tests, grandfathering, billing infrastructure.

Continuous pricing

situational·4

Partnerships

Integration partners, co-marketing, reseller, affiliate programs. Distribution through other companies.

Co-marketing + affiliates

situational·4

Market sizing (TAM / SAM / SOM)

Top-down market sizing. Pitch-deck artifact, not an operating tool.

TAM / SAM / SOM

noise·1

Sources

This system was seeded by Jan's personal operating manual and known frameworks, then reworked into AI-native form with help from 6 deep research reports synthesizing roughly 3,650 sources between them, plus another 280 manually added or web-scraped articles, podcasts, and reports, some are surfaced below.

Deep research reports

System v05 review — The 42-module audit: what survives contact with a small B2B SaaS team. 705 sources, 29 cited inline.
System v07 review — Pushing back on AI-native marketing system. 647 sources, 32 cited inline.
The AI Reckoning in B2B SaaS Marketing — Where AI is and isn't replacing marketing functions, with operator and labor-market data. 533 sources, 38 cited inline.
AI Creative Tooling for Small B2B SaaS Teams in 2026 — End-to-end stack: image and video models, platforms, pricing, conversion economics. 515 sources, 69 cited inline.
Customer interview & positioning — Methodology for JTBD interviews and positioning sprints (FletchPMM, Forget the Funnel, Dunford, DemandMaven, Moesta). 416 sources, 11 cited inline.
AI failure modes for solo operators — Sycophancy, hallucination, context rot, prompt injection, agent fiascos, bot liability, privacy, cost runaways, homogenization. 852 sources, 38 cited inline.