feedstock

Configuration

BrowserConfig and CrawlerRunConfig reference.

Feedstock uses two configuration objects: BrowserConfig for browser-level settings, and CrawlerRunConfig for per-crawl behavior.

BrowserConfig

Controls the browser instance. Set once when creating the WebCrawler.

import { createBrowserConfig } from "feedstock";

const config = createBrowserConfig({
  browserType: "chromium",   // "chromium" | "firefox" | "webkit"
  headless: true,
  viewport: { width: 1920, height: 1080 },
  userAgent: "my-bot/1.0",
  proxy: { server: "http://proxy:8080" },
  backend: { kind: "playwright" },
});
OptionTypeDefaultDescription
browserType"chromium" | "firefox" | "webkit""chromium"Browser engine
headlessbooleantrueRun without UI
viewport{ width, height }1920x1080Page viewport size
userAgentstring | nullnullCustom user agent
proxyProxyConfig | nullnullProxy server
ignoreHttpsErrorsbooleantrueIgnore SSL errors
javaEnabledbooleantrueEnable JavaScript
extraArgsstring[][]Extra browser launch args
textModebooleanfalseText-only mode
backendBrowserBackend{ kind: "playwright" }Browser backend
verbosebooleanfalseEnable verbose logging

CrawlerRunConfig

Controls per-crawl behavior. Pass to crawl() or crawlMany().

import { createCrawlerRunConfig, CacheMode } from "feedstock";

const config = createCrawlerRunConfig({
  cacheMode: CacheMode.Bypass,
  waitFor: { kind: "selector", value: "#loaded" },
  screenshot: true,
  excludeTags: ["nav", "footer", "aside"],
  cssSelector: "article",
});

Content Options

OptionTypeDefaultDescription
wordCountThresholdnumber10Min words for content
excludeTagsstring[][]HTML tags to strip
includeTagsstring[][]Only keep these tags
removeOverlayElementsbooleanfalseRemove modals/popups
cssSelectorstring | nullnullExtract only matching elements
generateMarkdownbooleantrueGenerate markdown output

Browser Behavior

OptionTypeDefaultDescription
jsCodestring | string[] | nullnullJavaScript to execute
waitForWaitForType | nullnullWait condition
waitAfterLoadnumber0Additional wait (ms)
pageTimeoutnumber60000Navigation timeout (ms)

Wait Conditions

// Wait for a CSS selector
{ kind: "selector", value: "#content", timeout: 5000 }

// Wait for network idle
{ kind: "networkIdle" }

// Wait a fixed delay
{ kind: "delay", ms: 2000 }

// Wait for a JS function to return truthy
{ kind: "function", fn: "() => document.readyState === 'complete'" }

Capture Options

OptionTypeDefaultDescription
screenshotbooleanfalseCapture full-page screenshot
pdfbooleanfalseCapture page as PDF
captureNetworkRequestsbooleanfalseLog network requests
captureConsoleMessagesbooleanfalseLog console output

Layered Configuration

Feedstock supports a three-layer configuration system. Each layer overrides the one below it:

  1. Programmatic overrides — values passed to createBrowserConfig() / createCrawlerRunConfig()
  2. Environment variablesFEEDSTOCK_* vars
  3. Project config filefeedstock.json in your project root
  4. Built-in defaults

Project Config File

Create a feedstock.json in your project root (or any ancestor directory). Feedstock walks up from cwd to find it.

{
  "browser": {
    "browserType": "chromium",
    "headless": true,
    "stealth": true,
    "verbose": false
  },
  "crawl": {
    "pageTimeout": 30000,
    "screenshot": false,
    "generateMarkdown": true
  }
}

The file accepts two top-level keys: browser (partial BrowserConfig) and crawl (partial CrawlerRunConfig).

Environment Variables

Set FEEDSTOCK_* environment variables to override project file settings. Useful for CI/CD and Docker deployments.

VariableTypeMaps to
FEEDSTOCK_BROWSER_TYPEstringbrowser.browserType
FEEDSTOCK_HEADLESS"true" | "false"browser.headless
FEEDSTOCK_USER_AGENTstringbrowser.userAgent
FEEDSTOCK_STEALTH"true" | "false"browser.stealth
FEEDSTOCK_VERBOSE"true" | "false"browser.verbose
FEEDSTOCK_TEXT_MODE"true" | "false"browser.textMode
FEEDSTOCK_CDP_URLstringbrowser.backend (sets { kind: "cdp", wsUrl })
FEEDSTOCK_PROXYstringbrowser.proxy.server
FEEDSTOCK_PROXY_USERNAMEstringbrowser.proxy.username
FEEDSTOCK_PROXY_PASSWORDstringbrowser.proxy.password
FEEDSTOCK_PAGE_TIMEOUTnumbercrawl.pageTimeout
FEEDSTOCK_SCREENSHOT"true" | "false"crawl.screenshot
FEEDSTOCK_BLOCK_RESOURCES"true" | "false" | profile namecrawl.blockResources
FEEDSTOCK_GENERATE_MARKDOWN"true" | "false"crawl.generateMarkdown

Set FEEDSTOCK_CDP_URL in your environment and your code doesn't need to change between local development and CI — the layered config picks it up automatically.

Using loadConfig()

The loadConfig() function merges the project file and environment variable layers. Spread the result into your config creators to apply all layers:

import {
  loadConfig,
  createBrowserConfig,
  createCrawlerRunConfig,
  WebCrawler,
} from "feedstock";

const layered = loadConfig();

const browserConfig = createBrowserConfig({
  ...layered.browser,
  // Programmatic overrides (highest precedence)
  headless: false,
});

const crawlConfig = createCrawlerRunConfig({
  ...layered.crawl,
  screenshot: true,
});

const crawler = new WebCrawler(browserConfig);
const result = await crawler.crawl("https://example.com", crawlConfig);

loadConfig() accepts an optional { startDir } to control where the project file search begins (defaults to process.cwd()).

Edit on GitHub

Last updated on

On this page