feedstock

Accessibility Snapshots

Compact semantic page representations for AI consumption.

Snapshots extract a compact, semantic representation of a page — headings, links, buttons, inputs, images — with deterministic @e refs. Orders of magnitude smaller than raw HTML.

Usage

const result = await crawler.crawl("https://example.com", {
  snapshot: true,
});

console.log(result.snapshot);
// @e1 [heading] "Example Domain" [level=1]
// [paragraph] "This domain is for use in illustrative examples..."
// @e2 [link] "More information..." [-> https://www.iana.org/domains/example]

Why Snapshots?

FormatSizeAI-friendlyStructure
Raw HTML~50KBNoFull DOM noise
Cleaned HTML~10KBSomewhatLess noise
Markdown~5KBYesFlat text
Snapshot~1KBYesSemantic tree

Snapshots are what AI browser agents (Anthropic computer use, OpenAI tools, agent-browser) converge on for page understanding.

Static vs CDP Snapshots

Static (default)

Works with any engine (including FetchEngine). Parses HTML with Cheerio to extract semantic elements:

import { buildStaticSnapshot } from "feedstock";

const snap = buildStaticSnapshot(html);
console.log(snap.text);     // formatted text tree
console.log(snap.tree);     // structured SnapshotNode[]
console.log(snap.refs);     // Map<string, { role, name }>
console.log(snap.nodeCount); // total refs assigned

CDP (browser-only)

Uses Chrome's Accessibility.getFullAXTree for a more precise tree. Requires Playwright engine:

import { takeSnapshot } from "feedstock";

// page is a Playwright Page object
const snap = await takeSnapshot(page, {
  interactiveOnly: false,
  maxDepth: 10,
});

Node Categories

CategoryRolesGets ref?
Interactivebutton, link, textbox, checkbox, radio, combobox, tab, switch, sliderAlways
Contentheading, paragraph, img, article, region, navigationIf named
Structuralgeneric, group, listFiltered out

Reference System

Each interactive or named content node gets a deterministic ref (@e1, @e2, ...):

@e1 [heading] "Welcome" [level=1]
@e2 [link] "About" [-> /about]
@e3 [button] "Sign Up"
@e4 [textbox] "Email" 
@e5 [checkbox] [checked=false]

The refs map lets you look up any element:

snap.refs.get("e3"); // { role: "button", name: "Sign Up" }

With processHtml

Snapshots work on raw HTML without a browser:

const result = await crawler.processHtml(html, { snapshot: true });
console.log(result.snapshot);
Edit on GitHub

Last updated on

On this page