feedstock

Hooks

Inject custom behavior at key points in the crawl lifecycle.

Hooks let you run custom code at specific points during page crawling. They receive the Playwright Page object, giving you full control.

Available Hooks

HookWhenUse Case
onPageCreatedAfter new page is createdSet up listeners, inject scripts
beforeGotoBefore page.goto()Modify headers, log navigation
afterGotoAfter navigation completesCheck page state, dismiss dialogs
onExecutionStartedBefore custom JS runsSet up monitoring
beforeReturnHtmlBefore capturing HTMLFinal DOM modifications

Usage

const crawler = new WebCrawler();
await crawler.start();

crawler.setHook("afterGoto", async (page) => {
  // Dismiss cookie banners
  const banner = page.locator('[class*="cookie-banner"]');
  if (await banner.isVisible()) {
    await banner.locator("button").first().click();
  }
});

crawler.setHook("beforeReturnHtml", async (page) => {
  // Expand all collapsed sections
  const toggles = page.locator('[aria-expanded="false"]');
  for (const toggle of await toggles.all()) {
    await toggle.click();
  }
});

const result = await crawler.crawl("https://example.com");

Hook Signature

type HookFn = (page: Page, ...args: unknown[]) => Promise<void>;

All hooks receive the Playwright Page as their first argument. The beforeGoto hook also receives the target URL.

Custom Strategy Hooks

If you implement a custom CrawlerStrategy, hooks are managed via the base class:

class MyStrategy extends CrawlerStrategy {
  async crawl(url: string, config: CrawlerRunConfig) {
    const page = /* ... */;
    await this.executeHook("onPageCreated", page);
    await this.executeHook("beforeGoto", page, url);
    // ... navigate ...
    await this.executeHook("afterGoto", page);
    // ...
  }
}

On this page