Configuration
BrowserConfig and CrawlerRunConfig reference.
Feedstock uses two configuration objects: BrowserConfig for browser-level settings, and CrawlerRunConfig for per-crawl behavior.
BrowserConfig
Controls the browser instance. Set once when creating the WebCrawler.
import { createBrowserConfig } from "feedstock";
const config = createBrowserConfig({
browserType: "chromium", // "chromium" | "firefox" | "webkit"
headless: true,
viewport: { width: 1920, height: 1080 },
userAgent: "my-bot/1.0",
proxy: { server: "http://proxy:8080" },
backend: { kind: "playwright" },
});| Option | Type | Default | Description |
|---|---|---|---|
browserType | "chromium" | "firefox" | "webkit" | "chromium" | Browser engine |
headless | boolean | true | Run without UI |
viewport | { width, height } | 1920x1080 | Page viewport size |
userAgent | string | null | null | Custom user agent |
proxy | ProxyConfig | null | null | Proxy server |
ignoreHttpsErrors | boolean | true | Ignore SSL errors |
javaEnabled | boolean | true | Enable JavaScript |
extraArgs | string[] | [] | Extra browser launch args |
textMode | boolean | false | Text-only mode |
backend | BrowserBackend | { kind: "playwright" } | Browser backend |
verbose | boolean | false | Enable verbose logging |
CrawlerRunConfig
Controls per-crawl behavior. Pass to crawl() or crawlMany().
import { createCrawlerRunConfig, CacheMode } from "feedstock";
const config = createCrawlerRunConfig({
cacheMode: CacheMode.Bypass,
waitFor: { kind: "selector", value: "#loaded" },
screenshot: true,
excludeTags: ["nav", "footer", "aside"],
cssSelector: "article",
});Content Options
| Option | Type | Default | Description |
|---|---|---|---|
wordCountThreshold | number | 10 | Min words for content |
excludeTags | string[] | [] | HTML tags to strip |
includeTags | string[] | [] | Only keep these tags |
removeOverlayElements | boolean | false | Remove modals/popups |
cssSelector | string | null | null | Extract only matching elements |
generateMarkdown | boolean | true | Generate markdown output |
Browser Behavior
| Option | Type | Default | Description |
|---|---|---|---|
jsCode | string | string[] | null | null | JavaScript to execute |
waitFor | WaitForType | null | null | Wait condition |
waitAfterLoad | number | 0 | Additional wait (ms) |
pageTimeout | number | 60000 | Navigation timeout (ms) |
Wait Conditions
// Wait for a CSS selector
{ kind: "selector", value: "#content", timeout: 5000 }
// Wait for network idle
{ kind: "networkIdle" }
// Wait a fixed delay
{ kind: "delay", ms: 2000 }
// Wait for a JS function to return truthy
{ kind: "function", fn: "() => document.readyState === 'complete'" }Capture Options
| Option | Type | Default | Description |
|---|---|---|---|
screenshot | boolean | false | Capture full-page screenshot |
pdf | boolean | false | Capture page as PDF |
captureNetworkRequests | boolean | false | Log network requests |
captureConsoleMessages | boolean | false | Log console output |
Layered Configuration
Feedstock supports a three-layer configuration system. Each layer overrides the one below it:
- Programmatic overrides — values passed to
createBrowserConfig()/createCrawlerRunConfig() - Environment variables —
FEEDSTOCK_*vars - Project config file —
feedstock.jsonin your project root - Built-in defaults
Project Config File
Create a feedstock.json in your project root (or any ancestor directory). Feedstock walks up from cwd to find it.
{
"browser": {
"browserType": "chromium",
"headless": true,
"stealth": true,
"verbose": false
},
"crawl": {
"pageTimeout": 30000,
"screenshot": false,
"generateMarkdown": true
}
}The file accepts two top-level keys: browser (partial BrowserConfig) and crawl (partial CrawlerRunConfig).
Environment Variables
Set FEEDSTOCK_* environment variables to override project file settings. Useful for CI/CD and Docker deployments.
| Variable | Type | Maps to |
|---|---|---|
FEEDSTOCK_BROWSER_TYPE | string | browser.browserType |
FEEDSTOCK_HEADLESS | "true" | "false" | browser.headless |
FEEDSTOCK_USER_AGENT | string | browser.userAgent |
FEEDSTOCK_STEALTH | "true" | "false" | browser.stealth |
FEEDSTOCK_VERBOSE | "true" | "false" | browser.verbose |
FEEDSTOCK_TEXT_MODE | "true" | "false" | browser.textMode |
FEEDSTOCK_CDP_URL | string | browser.backend (sets { kind: "cdp", wsUrl }) |
FEEDSTOCK_PROXY | string | browser.proxy.server |
FEEDSTOCK_PROXY_USERNAME | string | browser.proxy.username |
FEEDSTOCK_PROXY_PASSWORD | string | browser.proxy.password |
FEEDSTOCK_PAGE_TIMEOUT | number | crawl.pageTimeout |
FEEDSTOCK_SCREENSHOT | "true" | "false" | crawl.screenshot |
FEEDSTOCK_BLOCK_RESOURCES | "true" | "false" | profile name | crawl.blockResources |
FEEDSTOCK_GENERATE_MARKDOWN | "true" | "false" | crawl.generateMarkdown |
Set FEEDSTOCK_CDP_URL in your environment and your code doesn't need to change between local development and CI — the layered config picks it up automatically.
Using loadConfig()
The loadConfig() function merges the project file and environment variable layers. Spread the result into your config creators to apply all layers:
import {
loadConfig,
createBrowserConfig,
createCrawlerRunConfig,
WebCrawler,
} from "feedstock";
const layered = loadConfig();
const browserConfig = createBrowserConfig({
...layered.browser,
// Programmatic overrides (highest precedence)
headless: false,
});
const crawlConfig = createCrawlerRunConfig({
...layered.crawl,
screenshot: true,
});
const crawler = new WebCrawler(browserConfig);
const result = await crawler.crawl("https://example.com", crawlConfig);loadConfig() accepts an optional { startDir } to control where the project file search begins (defaults to process.cwd()).
Last updated on