Markdown Generation
Converting crawled HTML to clean Markdown with citations.
Feedstock converts cleaned HTML to Markdown using Turndown, a battle-tested HTML-to-Markdown converter.
Default Output
Every crawl result includes a MarkdownGenerationResult:
interface MarkdownGenerationResult {
rawMarkdown: string; // Clean markdown
markdownWithCitations: string; // Links replaced with [1], [2], etc.
referencesMarkdown: string; // [1] https://...\n[2] https://...
fitMarkdown: string | null; // Reserved for content filtering
}Usage
const result = await crawler.crawl("https://example.com");
// Raw markdown
console.log(result.markdown?.rawMarkdown);
// Citation-style (links as numbered references)
console.log(result.markdown?.markdownWithCitations);
// Just the references list
console.log(result.markdown?.referencesMarkdown);Disabling Markdown
const result = await crawler.crawl("https://example.com", {
generateMarkdown: false,
});
// result.markdown will be nullCustom Markdown Generator
Extend MarkdownGenerationStrategy to customize output:
import { MarkdownGenerationStrategy } from "feedstock";
class CustomMarkdownGenerator extends MarkdownGenerationStrategy {
generate(url: string, html: string) {
// Your custom logic
return {
rawMarkdown: "...",
markdownWithCitations: "...",
referencesMarkdown: "",
fitMarkdown: null,
};
}
}
const crawler = new WebCrawler({
markdownGenerator: new CustomMarkdownGenerator(),
});Turndown Configuration
The default generator uses these Turndown options:
headingStyle: "atx"—# H1,## H2, etc.codeBlockStyle: "fenced"— triple backticksbulletListMarker: "-"— dashes for unordered lists