
Firecrawl MCP and the Yalc Framework
The web data layer for every Yalc workflow that pulls from outside our database stack. JS rendering, anti-bot, and structured extraction all handled by Claude tool calls.
Add Firecrawl to Claude Code in one command
claude mcp add firecrawl --env FIRECRAWL_API_KEY=fc-xxx -- npx -y firecrawl-mcp
Get a free API key at firecrawl.dev (500 pages free tier). Replace `fc-xxx` with the key, run the command, restart Claude Code. The hosted version is the default. Self host the open source version if your data sensitivity requires it.
Firecrawl, plainly
The Firecrawl MCP is the official `firecrawl-mcp` package from Mendable AI. It exposes 8 web data verbs as native Claude tools: scrape, batch_scrape, map, search, crawl, extract, interact, and agent. Output is markdown by default, optimized for LLM context windows, with optional structured JSON via schemas.
For Yalc workflows, Firecrawl is the canonical web intake layer. When a prompt says "look up this competitor's pricing", "extract structured data from these 30 vendor pages", "monitor changes on a product changelog", or "search the web for fintech news", Firecrawl handles the wire. Yalc decides what to fetch and what to do with the result.
Position in the GTM operating system
The Firecrawl MCP sits at the **intake** node for any web sourced data. It complements Crustdata: Crustdata for structured B2B databases, Firecrawl for everything else (vendor sites, blogs, changelogs, product pages, news).
JS rendering, smart wait, anti bot, and caching are all handled inside Firecrawl. Yalc workflows treat each Firecrawl call as a black box that returns clean markdown or structured JSON.
Deploying the Firecrawl MCP inside Yalc workflows
Workflow position
The web intake node. Yalc invokes Firecrawl when the answer lives on a public page rather than in a database. Output flows downstream into Notion, Claude analysis, or comparison reports.
Prompt patterns
Copy paste prompts for Claude Code that invoke the Firecrawl MCP.
Chaining recommendations
Anti patterns to avoid
Compatibility
Works in Claude Code (primary), Claude Desktop, Cursor, Codex, and any MCP-compatible client. Open source on GitHub (mendableai/firecrawl-mcp-server). Self-host option available if data sensitivity prohibits third-party scraping.
Pros, cons, who it's for
Pros
- 500 page free tier. Real workflows ship without paying.
- Open source. 100k+ GitHub stars. Active maintenance.
- 8 verbs cover the full web data surface (scrape, crawl, search, extract, interact, agent, etc.).
- JS rendering, smart wait, anti-bot all handled. No tuning needed.
- Markdown output by default. LLM context-window optimized.
Cons
- Sites with aggressive anti-bot (Cloudflare strict, Datadome) still occasionally fail.
- Crawl verb is expensive. Easy to burn budget if not careful.
- JSON schema extraction works most of the time. Complex schemas need iteration.
- Self hosting means maintaining browsers, queues, scaling. Real work.
Who it's for
- GTM engineers building agentic research workflows
- Operators who need pricing, changelog, news scraping in regular workflows
- Data teams piloting LLM driven web data ingestion at small to mid scale
The Firecrawl ecosystem inside Yalc
MCPs to consider instead
Frequently asked
How does the MCP compare to Firecrawl's REST API?
Functionally equivalent results. The MCP is more convenient inside Claude Code because the verbs become native tool calls Claude composes during conversation. The REST API is better for headless cron jobs and batch pipelines.
Can I scrape JavaScript heavy sites?
Yes. Firecrawl renders JS by default with smart wait. SPAs and React apps work without manual configuration.
Does the MCP work for LinkedIn or Reddit?
For LinkedIn, no. LinkedIn aggressively blocks general scrapers. Use the Unipile MCP instead. For Reddit, technically yes, but Apify's Reddit actors are battle tested for production volume.
How does the free tier behave inside the MCP?
Same 500 page allotment as the REST API. The MCP returns the same rate limit errors when you exceed. Plan accordingly with cache hints and selective verbs.
How do I extract structured data from a page?
Pass a JSON schema to the scrape or extract verb. Firecrawl runs the page through an LLM with the schema, returns structured JSON. Works most of the time. Complex schemas may need a few iterations.
Is the open source version the same as hosted?
Same core engine, MIT licensed. Self hosting means you manage the infra (browsers, queues, scaling). Hosted is the convenient option for most teams.
Install the Firecrawl MCP
Drop it into Claude Code and orchestrate from your next Yalc prompt.
claude mcp add firecrawl --env FIRECRAWL_API_KEY=fc-xxx -- npx -y firecrawl-mcp