URL Scorers

Scorers assign a relevance score (0-1) to discovered URLs. Used by the BestFirstDeepCrawlStrategy to determine crawl priority.

CompositeScorer

Combine multiple scorers with weighted averaging:

import {
  CompositeScorer,
  KeywordRelevanceScorer,
  PathDepthScorer,
  FreshnessScorer,
  DomainAuthorityScorer,
} from "feedstock";

const scorer = new CompositeScorer()
  .add(new KeywordRelevanceScorer(["docs", "api", "guide"], 2.0))
  .add(new PathDepthScorer(10, 1.0))
  .add(new FreshnessScorer(0.5))
  .add(new DomainAuthorityScorer(["example.com"], 1.5));

const score = scorer.score("https://example.com/docs/api", 1);

Built-in Scorers

KeywordRelevanceScorer

Scores based on keyword matches in the URL and anchor text.

new KeywordRelevanceScorer(["product", "pricing"], 2.0)

Score = (matching keywords) / (total keywords). Also checks context.anchorText.

PathDepthScorer

Shallower URLs score higher. /about scores higher than /a/b/c/d/e.

new PathDepthScorer(10, 1.0) // maxPathDepth, weight

Score = max(0, 1 - segments/maxPathDepth).

FreshnessScorer

URLs with date patterns (e.g., /2024/01/post) score based on recency.

new FreshnessScorer(0.5)

Current year: ~1.0
5+ years old: 0
No date signal: 0.3

DomainAuthorityScorer

Preferred domains score highest.

new DomainAuthorityScorer(["example.com"], 1.5)

Exact match: 1.0
Subdomain of preferred: 0.8
Unknown domain: 0.3

Learning Scorers

BanditScorer

UCB1-based online learning scorer. Groups URLs by structural pattern and learns which groups yield valuable content. See Bandit Scorer for full documentation.

import { BanditScorer } from "feedstock";

const bandit = new BanditScorer({ explorationWeight: 1.41 });
scorer.add(bandit);

// After crawling, update with reward
bandit.update(url, reward, { anchorText, parentUrl });

NeuralQualityScorer

Predicts page quality from URL features, anchor text, and parent quality — improving with each observation. See Neural Quality Scorer for full documentation.

import { NeuralQualityScorer } from "feedstock";

const neural = new NeuralQualityScorer({ propagationFactor: 0.3 });
scorer.add(neural);

// After crawling, teach the model
neural.observe(url, quality, { anchorText, parentUrl });

Custom Scorers

Extend URLScorer:

import { URLScorer, type ScorerContext } from "feedstock";

class ContentLengthScorer extends URLScorer {
  constructor(weight = 1.0) {
    super("content-length", weight);
  }

  score(url: string, depth: number, context?: ScorerContext): number {
    // Prefer shorter URLs (likely more important pages)
    return Math.max(0, 1 - url.length / 200);
  }
}

On this page