Build a GTM Agent From Scratch in 7 Steps

To build a GTM agent from scratch you need three parts: a runtime that hosts the model and its tool calls, one skill file written in plain markdown that states the job and its guardrails, and a small set of MCP connections to your data and your systems of record. The step almost everyone botches is grading the output on real leads before trusting it.

That is the whole build, and the rest of this piece walks each part in order. The runtime sits at the center of an agentic GTM operating system, which is the wider frame this slots into, and both sit under AI native GTM engineering as the discipline. Here we go one level deeper, from a blank file to something that scores a lead and routes it.

What a GTM agent actually is

The marketing version of a GTM agent is a black box that books meetings while you sleep. The operator version is more exact. A GTM agent is a runtime plus a skill plus a handful of MCP connections. The runtime hosts the model and runs the tool calls. The skill is a markdown file that tells the model what job it does and what rules apply. The MCPs are the data and action endpoints the agent reaches for during the job.

MCP is the load-bearing standard underneath all of this. Anthropic introduced the Model Context Protocol on November 25, 2024 as an open standard for connecting models to the systems where data lives, and in December 2025 it was donated to the Agentic AI Foundation, a fund under the Linux Foundation co-founded with Block and OpenAI. That governance move matters for a builder. The connectors you wire today are not a single vendor's proprietary format, so the agent you write is portable across runtimes that speak the protocol.

The non-obvious rule here is the bar you set for "agent." It is not autonomous reasoning. It is reliable orchestration of a workflow you could already run by hand. If you cannot read the prompt, you do not own the agent, you rent a demo. Three properties separate a real one from a vendor pitch: it reads from live data sources rather than a static export, it writes back to the systems your team already runs on, and every prompt and connection lives in code you can open and rewrite. If you have read the AI SDR tools field map, this is the layer underneath every category on it.

Which runtime should I use to build a GTM agent?

There are three credible runtimes in 2026, and they solve different problems. The default for a first agent is Claude Code, and the decision rule is to climb the ladder only when the rung you are on stops fitting.

Runtime	Best for	Where it runs	Cost shape
Claude Code	First agent, operator who thinks in workflows	Your laptop	Flat plan, Pro $20/mo or Max $100 to $200/mo
Anthropic Agent SDK	Same logic as a hosted service	Your server	Per-token API plus hosting
Custom loop	Full control over scheduling and state	Anywhere you build it	Per-token API plus your own infrastructure

Claude Code is the fastest path. The runtime sits on your laptop, talks to MCPs over the standard protocol, and runs skills written in plain markdown, with tool calls, file access, and the model in one process. The pricing is a flat subscription rather than per-token billing, $20 a month for Pro and $100 to $200 for Max as of June 2026, which removes the per-run cost anxiety while you are still iterating. Most of the workflows in the playbook on using Claude Code for sales ship as a single skill in this exact runtime, and the demand side gets the same treatment in Claude Code for marketing teams.

The Anthropic Agent SDK is the right pick when you outgrow the laptop and need the agent to run as a service. It is the same harness that powers Claude Code, exposed as a library, and it ships with the built-in tool loop, optional human checkpoints, subagents, persistent sessions, and a first-class MCP client. You wrap the same logic in code, host it on a server, and call it from a cron, a webhook, or another agent. The cost moves to per-token API billing, so it is worth knowing the rate before you schedule anything heavy: Opus runs $5 per million input tokens and $25 per million output, Sonnet $3 and $15. The benefit is reach. The cost is more infrastructure and a slower iteration loop.

A custom runtime means you write your own loop around the API: read input, build the prompt, call tools, manage retries, log output. The reason to do it is total control over scheduling, observability, and state. The reason to avoid it is everything else. Most operators never need to go here, and jumping to it early is the most expensive mistake on this list.

How do I write a GTM agent as a markdown skill?

A skill is the contract between the operator and the agent. It is one markdown file that states the job, the inputs, the steps, the guardrails, and the outputs. No graph, no canvas, no clickable UI.

A working GTM skill carries six sections.

Purpose

One sentence on what the agent does. If you cannot write it in one sentence, the agent is doing two jobs and should be two skills.

Inputs

What the agent expects, named exactly. A company domain, a person email, a list of leads in a known shape.

Steps

The workflow in plain English. Look up the company, enrich the people, score the fit, write the brief. Each step maps to one decision a human would make.

Tool calls

The MCPs and the specific calls the agent is allowed to make. This is also where you ban calls, which is how you cap cost and blast radius.

Guardrails

Cost ceilings, rate limits, and the actions the agent must never take without a human.

Outputs

What the agent returns and where it writes.

The reason markdown beats a workflow canvas is that markdown compounds. Every run teaches you something: an edge case missed, a phrasing that backfires, a tool call that times out. You open the file, fix the line, save, and the next run hits the sharper version. A node graph forces a redeploy for the same change, and a vendor UI usually will not let you read the underlying instructions at all. Plain text in a repo also means anyone on the team can diff it, which is the difference between an asset the team owns and a black box one person babysits.

Which data MCPs should the agent use?

A GTM agent without data is a chatbot. The first wiring step is the read layer, and the rule is to wire the data MCPs your workflow actually needs and stop there.

Crustdata is the workhorse for people and company intelligence: search, enrichment, and signals through one API. The MCP exposes those as named tools, so the skill reads like a decision tree. If the lead has no LinkedIn URL, search people by email. If headcount is missing, enrich the company by domain. Auth, rate limits, and retries sit inside the MCP, not in your prompt.

FullEnrich is the waterfall layer for email and phone. When the person resolves but the email is blank, the agent calls FullEnrich to fill the gap and gets back contact data with confidence scores. The skill decides what to do below a confidence threshold, whether that is skip, queue for review, or retry. Hard-coding that threshold in the skill is what keeps a bad email from reaching your sending domain.

PredictLeads supplies the signal layer: hiring posts, job openings, technographic shifts. The agent watches the feed for triggers that match the ICP defined in the skill, then routes a fresh lead into qualification. That is signal-first outbound at the agent level, and the protocol mechanics behind wiring any of these get a full treatment in the explainer on MCPs for sales teams.

Which action MCPs let the agent write back?

Data MCPs read. Action MCPs write, and an agent earns its keep the moment it writes to the systems your team already runs on. Two data MCPs and two action MCPs is plenty for a first agent.

Unipile is the LinkedIn and email action layer. The agent sends an invite, drops a message, or schedules a follow-up through calls that respect per-account daily caps. If the agent sends cold email at any volume, this is also where the deliverability rules bite. Since February 2024, Google and Yahoo require bulk senders above 5,000 messages a day to authenticate with SPF and DKIM, publish a DMARC record, offer one-click unsubscribe honored within two days, and keep the spam complaint rate under 0.3%. Write those constraints into the guardrails so the agent never volume-sends past a clean domain's limits.

HubSpot is the CRM write target. When the agent qualifies a lead it writes the score, the rationale, and the next action straight onto the contact record. No CSV export, no Zap, no second system of truth.

Notion is where the agent writes its own state: run logs, enrichment caches, signal history. The MCP gives the agent a structured store it can read on the next run, which is how middle-mile work compounds instead of restarting cold every time.

Slack is the human-in-the-loop layer. When the agent finishes a batch or hits an edge case it cannot resolve, it posts to the right channel with context and a proposed next action. The operator replies, and the agent reads that reply on the following run. The non-obvious point is that the Slack message is not a notification, it is an input, which is what keeps a human in control of the actions that carry risk.

What guardrails does a GTM agent need before shipping?

Skipping guardrails is the fastest way to lose the team's trust in a new agent. Three are mandatory, and the eval is the one that gets cut and the one that matters most.

Cost guardrails

Set a hard ceiling on tokens and tool calls per run. If the skill loops or the model gets stuck, the agent halts before the bill ships. Cost runaway almost always happens on the first production run, not the hundredth, so the cap has to be in place before the first real batch.

Rate guardrails

Every data MCP has rate limits and every action MCP has per-account caps. Unipile applies different daily caps by LinkedIn account type. Crustdata has per-minute and per-day ceilings. Write the limits into the skill so the agent never runs hot and never trips an account suspension you cannot undo.

Eval guardrails

Before the agent writes to any production system, run it on a real sample and grade the output. Pull 30 leads, score them by hand, score them with the agent, and compare. If agreement is below 80 percent, the prompt is wrong, not the leads. This is the discipline that makes the operator playbook for B2B lead generation repeatable, and it is the single step that separates an agent you trust from one you shelve after week one.

How do I build a working lead qualifier?

The simplest agent that earns its keep is a qualifier. Input is a list of inbound leads, each with an email and a company domain. Output is a score from 1 to 5, a one-paragraph rationale, and a routed action.

The skill reads as a sequence. For each lead, look up the company through Crustdata and pull industry, headcount, funding stage, and recent signals. Look up the person and pull title, seniority, tenure, and recent role moves. If the email is empty, fill it through FullEnrich. Match the company shape against the ICP in the skill and the person against the buyer persona in the skill. Score 1 to 5 and write the rationale. If the score is 4 or 5, post a Slack message to the AE channel with the brief and propose a call. If it is 1 or 2, mark the contact disqualified in HubSpot. Anything in the middle goes to a nurture list in Notion.

The judgment that makes this work is keeping the routing thresholds in the skill rather than in the model's head. The model scores, but the skill decides what a 3 does, because that boundary is a business rule you will tune weekly and the model should never improvise it. The skill itself runs to roughly 150 lines of markdown, the MCP wiring is a few lines of config, and the value is that every override the operator makes gets written back into the rationale logic, so run fifty is sharper than run one.

How do I move from a local skill to scheduled production?

A skill that runs on demand from your laptop is already useful. A skill that runs on a schedule, ingests fresh signals, and writes back to your CRM is a system. The path there is four gradual steps, not a rewrite.

First, run the skill manually for two weeks until you trust the output. Watch every run, override anything that looks off, and edit the skill after every override. Second, point the skill at a live input, a PredictLeads feed or a HubSpot view, and let it process new leads as they appear, still on a manual trigger. Third, schedule it. The Agent SDK lets you wrap the same skill behind an endpoint that runs on a cron, and the skill itself does not change, only the runtime around it. Fourth, add observability by logging every run, tool call, and override into Notion, which is how you answer the finance questions that arrive in month two. What does it cost, how many leads does it qualify, and where does it disagree with the human.

The pattern across all four steps is the same. The skill stays in markdown. The MCPs stay decoupled. The runtime grows around the skill, never on top of it.

Common mistakes when shipping the first agent

First agents fail in predictable ways, and naming them is cheaper than learning them live.

Skipping the eval is the most common. The first five outputs look right, the team ships, and by output fifty the failure mode is obvious and embarrassing. Grade a real sample first, every time.

Wiring too many MCPs is next. A qualifier does not need eight tools. Two data MCPs and two action MCPs cover it, and you add more only when the workflow demands it.

Hiding the prompt turns the agent into a black box that one person owns. Keep it in markdown, in a repo, where anyone can read it and propose a change.

Running with no guardrails means the first run goes hot, the bill shocks the team, and the agent gets shelved. Cost and rate limits are not optional.

Optimizing the runtime before the skill works is the most expensive of all. Operators jump from Claude Code to a custom framework before the skill has proven itself. The skill is the asset and the runtime is the host, so iterate the skill until it converges, then move the runtime.

What to do this week

Pick one workflow you would run by hand. Open a fresh markdown file and write the Purpose, Inputs, Steps, Tool calls, Guardrails, and Outputs sections. Choose Claude Code as the runtime for a first agent. Wire two data MCPs and two action MCPs. Add a cost cap, a rate cap, and an eval set of 30 real cases. Run the skill manually until the output is right, then schedule it.

The shortcut is to start from a skill that already runs and rewrite it for your playbook. The Unipile campaign skill is the closest reference for a LinkedIn outreach agent. Clone it, change the inputs, change the rationale prompt, change the routing logic, and ship in a week. That is what it takes to build a GTM agent in 2026. Not a graph of nodes, not a vendor canvas. One markdown skill, four MCP connections, and a runtime that runs on the laptop you already have.

Frequently Asked Questions

What is a GTM agent?

A GTM agent is a runtime that hosts an AI model, a skill file that defines the job in plain language, and a set of MCP connections to your data and your systems of record. It automates a go-to-market workflow you could already run by hand, such as qualifying inbound leads or sending outreach, and writes the results back to tools like your CRM. The bar is reliable orchestration, not autonomous reasoning.

Do I need to code to build a GTM agent?

Not for the first one. A skill is a single markdown file written in plain English, and a runtime like Claude Code runs it on your laptop with the MCP wiring handled in a few lines of config. You move to writing code only when you outgrow the laptop and need the agent to run as a hosted service on a schedule, which is what the Anthropic Agent SDK is for.

What does it cost to run a GTM agent?

It depends on the runtime. Claude Code is a flat subscription, $20 a month for Pro and $100 to $200 for Max as of June 2026, which removes per-run cost while you iterate. Running through the Agent SDK shifts to per-token API billing, roughly $5 per million input tokens and $25 per million output on Opus, so a hard cost cap per run is the first guardrail you set.

Why use MCP instead of building custom integrations?

The Model Context Protocol is an open standard Anthropic introduced in November 2024 and donated to the Linux Foundation's Agentic AI Foundation in December 2025, with adoption from OpenAI and Google. Wiring a connector once through MCP means it works across any runtime that speaks the protocol, so your agent is portable rather than locked to one vendor's proprietary integration format.

How long does it take to ship a first GTM agent?

About a week if you start from an existing skill and rewrite it for your playbook rather than building from a blank file. Budget two weeks of manual runs after that to grade output and tune the skill before you schedule it, because the trust comes from the eval, not the code.

How to Build Your Own GTM Agent From Scratch

What a GTM agent actually is

Which runtime should I use to build a GTM agent?

How do I write a GTM agent as a markdown skill?

Purpose

Inputs

Steps

Tool calls

Guardrails

Outputs

Which data MCPs should the agent use?

Which action MCPs let the agent write back?

What guardrails does a GTM agent need before shipping?

Cost guardrails

Rate guardrails

Eval guardrails

How do I build a working lead qualifier?

How do I move from a local skill to scheduled production?

Common mistakes when shipping the first agent

What to do this week

Frequently Asked Questions

Run this playbook from Claude Code.

How to Build Your Own GTM Agent From Scratch

What a GTM agent actually is

Which runtime should I use to build a GTM agent?

How do I write a GTM agent as a markdown skill?

Purpose

Inputs

Steps

Tool calls

Guardrails

Outputs

Which data MCPs should the agent use?

Which action MCPs let the agent write back?

What guardrails does a GTM agent need before shipping?

Cost guardrails

Rate guardrails

Eval guardrails

How do I build a working lead qualifier?

How do I move from a local skill to scheduled production?

Common mistakes when shipping the first agent

What to do this week

Frequently Asked Questions

Run this playbook from Claude Code.

More from the Yalc blog

Sales Intelligence Software in 2026, The Operator Guide

Sales Process Automation, The Operator's System Guide for 2026

SMTP vs IMAP, How Business Email Actually Runs in 2026