feedstock

Why Feedstock

How feedstock compares to crawl4ai and firecrawl for web crawling and scraping.

A technical comparison of three popular web crawling libraries for building AI-powered data pipelines.

At a Glance

FeedstockCrawl4AIFirecrawl
LanguageTypeScript (Bun)Python (asyncio)TypeScript (Node.js)
LicenseApache-2.0Apache-2.0AGPL-3.0
ModelLibraryLibrary + CLIHosted API + self-host
Dependencies3 (playwright, cheerio, turndown)Heavy (playwright, litellm, etc.)Redis, PostgreSQL, Playwright
Installbun add feedstockpip install crawl4aiSign up for API key or Docker Compose

Runtime and Architecture

Feedstock runs on Bun with zero infrastructure requirements. Import it, call crawl(), get results. The fetch-first engine tries a lightweight HTTP request before launching a browser, auto-escalating only when it detects an SPA shell or anti-bot block. Static pages never spin up Chromium.

Crawl4AI is Python-native with async browser pooling via Playwright. Each browser instance consumes significant memory, which limits single-machine concurrency.

Firecrawl is a queue-based worker architecture requiring Redis, PostgreSQL, and a Playwright microservice. Even self-hosted, you're running 4+ services.

Feedstock needs zero infrastructure. One dependency install. Static pages don't launch a browser. You're writing TypeScript end-to-end with no service orchestration.

Browser Support

FeedstockCrawl4AIFirecrawl
Playwright (Chromium/Firefox/WebKit)YesYesYes (microservice)
Generic CDP (Browserbase, Browserless, etc.)YesNoNo (hosted only)
LightpandaYes (local + cloud)NoNo
Fetch-first (no browser)Yes (auto)NoNo

Feedstock's { kind: "cdp", wsUrl: "..." } backend lets you connect to any cloud browser provider with one config line. Crawl4AI and Firecrawl lock you into their browser management.

Content Extraction

CapabilityFeedstockCrawl4AIFirecrawl
Markdown outputYes (Turndown)YesYes
CSS selector extractionYesYesNo
XPath extractionYesNoNo
Regex extractionYesYesNo
Table extractionYesNoNo
Accessibility tree extractionYesNoNo
LLM-based extractionNoYesYes
Schema-based structured outputYes (CSS/XPath schemas)Yes (via LLM)Yes (via LLM)

Feedstock has six deterministic extraction strategies that run locally with zero API costs. Crawl4AI and Firecrawl offer LLM-based extraction, which is powerful for unstructured content but adds latency, cost, and non-determinism.

If you need "extract the product name and price from any arbitrary page," LLM extraction is hard to beat. If you know the page structure, feedstock's CSS/XPath/table strategies are faster, cheaper, and deterministic. Feedstock doesn't include LLM extraction — pipe markdown output to your LLM of choice.

Performance

FeatureFeedstockCrawl4AIFirecrawl
Fetch-first engineYesNoNo
Resource blocking profilesfast, minimal, media-only, customtext_mode, light_modeNo granular control
Navigation strategiescommit, domcontentloaded, load, networkidleLimitedNo control
In-page extractionYes (skips HTML serialization)NoNo
Content hashing (skip unchanged)YesNoNo

Feedstock's engine system means a 10-page crawl of static documentation sites may never launch a browser. Crawl4AI and Firecrawl launch Playwright for every page.

Deep Crawling

FeatureFeedstockCrawl4AIFirecrawl
BFS / DFS / BestFirstAll threeAll threeBFS only
Streaming resultsYes (AsyncGenerator)YesPolling-based
URL scoringComposable (keyword, path depth, freshness, domain authority)Keyword-basedNo
Filter chainsComposable (domain, pattern, content-type, max depth)Yes (FilterChain)URL include/exclude patterns
Rate limitingPer-domain with backoffYesServer-side
Robots.txt complianceBuilt-inBuilt-inBuilt-in
Sitemap discoveryYes (URLSeeder)YesYes (Map endpoint)

Anti-Bot and Stealth

FeatureFeedstockCrawl4AIFirecrawl
Stealth mode (single flag)YesYesHosted handles it
User-agent rotation9-agent poolYesManaged
navigator.webdriver overrideYesYesManaged
Human simulation (mouse/scroll)YesYes (scroll for lazy content)No
Proxy rotation with health trackingYesYes (3-tier escalation)Managed (hosted)
Block detection + auto-retryYesYesManaged
Consent popup removalYesNoNo
Storage state persistenceYes (cookies/localStorage)YesVia API

Output Formats

FormatFeedstockCrawl4AIFirecrawl
Raw HTMLYesYesYes
Cleaned HTMLYesYesYes
MarkdownYes (with citations)Yes (raw + fit)Yes
ScreenshotsYes (base64)YesYes
PDFsYesYesNo
Accessibility snapshotsYes (@e refs, 3-10x smaller than HTML)NoNo
Network request logsYesNoNo
Console messagesYesNoNo
Change tracking (diffs)YesNoNo
Interactive element mapYesNoNo

Feedstock's accessibility snapshots produce compact semantic representations with @e1, @e2 refs that AI agents can reference directly — useful for building autonomous browsing agents.

Cost

FeedstockCrawl4AIFirecrawl
Library costFreeFreeFree (self-hosted)
Hosting costYour computeYour computeCredits (hosted) or Redis+PG+Playwright (self-hosted)
LLM costsNone (no LLM features)Per-extraction (OpenAI/etc.)Per-extraction (OpenAI)
InfrastructureNonePlaywright binaryRedis + PostgreSQL + Playwright (self-hosted)

When to Use What

Choose Feedstock if:

  • Your stack is TypeScript/Bun
  • You want zero infrastructure — just a library
  • You need deterministic, fast extraction without LLM costs
  • You're building AI agents that need accessibility snapshots
  • You need multiple browser backends (cloud CDP, Lightpanda)
  • Performance matters — fetch-first engine, resource blocking profiles

Choose Crawl4AI if:

  • Your stack is Python
  • You need LLM-based extraction out of the box
  • You want a mature, well-documented library with a large community

Choose Firecrawl if:

  • You want a managed service with zero ops
  • You need to scrape at massive scale with queuing infrastructure
  • You're building integrations (Zapier, n8n) and want a REST API

Honest Gaps

Feedstock is early-stage software. Here's what it doesn't do yet:

  • No LLM extraction — pipe markdown to your own LLM
  • Bun-only — no Node.js compatibility
  • Smaller community — API may evolve
  • No hosted service — you run it yourself
  • No distributed crawling — single-process, no queue system

On this page