Cosine Relevance Scorer

How It Works

Three signals. One score.

Google doesn't just count links — it evaluates if the linking page is contextually relevant to your content. We model this with three semantic signals.

Context ↔ Target Page

The paragraph surrounding your backlink is compared against the target page body. This is the strongest signal — worth 50% of the score. Models Google's context2 — the hash of terms near the anchor.

Referring Page ↔ Target Page

The full referring page topic is compared to the target page. Captures topical alignment even when the specific paragraph is weak. Worth 30% — models Google's siteFocusScore.

Anchor ↔ Target Page

The anchor text is matched against the target page content. Worth 20%. Generic anchors ("click here", URLs) are detected and their weight redistributed to context signals — preventing score inflation from irrelevant anchors.

0.50 × cos(context, target) + 0.30 × cos(refPage, target) + 0.20 × cos(anchor, target)

Strong ≥ 0.45

Moderate 0.25 – 0.44

Weak < 0.25

Model: all-MiniLM-L6-v2 (384-dim) via ONNX Runtime. Deterministic — same inputs always produce the same score.

Calculator

Score your backlinks

Paste two URLs — we extract context, anchor, and body text automatically. No keywords needed. Or use text/weighted mode for manual control.

Referring Page URL

The page that links to you — we'll find the link and extract anchor + surrounding paragraph

Detected Link

Target Page URL

Your page — we'll extract the body content

Extracted Body

Automatic 3-signal analysis

We extract everything from the URLs: link context paragraph (50%), referring page topical relevance (30%), and anchor text alignment (20%). No manual keywords needed — the target page content IS the keyword signal.

-

Text A

e.g. the paragraph surrounding your backlink

Text B

e.g. your target page body or keyword cluster

-

Referring Page Context

The surrounding paragraph - the most important signal (50%)

Target Page Body

Your page's main content - what the link points to

Anchor Text

The clickable link text (20%)

Target Keywords

Comma-separated keyword cluster — context↔keywords is 30%

-

Upload a CSV with URLs or text. Columns: url_a, url_b (fetches & extracts) or text_a, text_b (direct text).

📄

Drop your CSV here or click to browse

Accepts .csv - max 200 rows for URL mode, 500 for text mode

Download text template · Download URL template

Processing 0 / 0

#	Source	Target	Score	Tier

Methodology

Why cosine similarity matters for backlinks

The Reasonable Surfer Model

Google's Reasonable Surfer patent (US 7,716,225 — filed 2004, updated 2010) assigns different weights to links based on their probability of being clicked. A link in a contextually relevant paragraph carries more weight than one in a footer or sidebar. Our formula models this: context signals = 80%, anchor text = 20%.

Confirmed by the 2024 API Leak

The 2024 Google Content Warehouse API leak (documented by iPullRank, SparkToro) exposed real production ranking fields:

context2 — hash of terms near the anchor (paragraph-level context, NOT full page body)
fullLeftContext / fullRightContext — extended text window around the link
anchorMismatchDemotion — penalty when anchor topic doesn't match destination
sourceType — quality tier of the linking page (HIGH/MEDIUM/LOW)
siteFocusScore — how topically focused the target site is
siteRadius — how far individual pages deviate from the site's topic centroid

Critically: Google evaluates the paragraph around the link, not the full referring page body. A 2,000-word page about marketing with one paragraph about SEO tools — only that paragraph matters for a link to an SEO tool site.

Why 50/30/20 — Three Distinct Signals?

The leak shows context2 is a primary signal, siteFocusScore evaluates overall topical alignment, and anchor text is secondary — subject to anchorMismatchDemotion. Our formula models all three: context paragraph (50%) captures the immediate link environment, referring page body (30%) captures topical relevance of the entire page, and anchor text (20%) captures link label alignment. Generic anchors ("click here", bare URLs) are detected and their weight redistributed to context signals — preventing score inflation from irrelevant anchor text.

Why all-MiniLM-L6-v2?

384-dimensional sentence embeddings, 82.03 Spearman on STS Benchmark. We tested Nomic (768-dim) — it compressed all scores to 0.42-0.84, making tier differentiation impossible. MiniLM's wider spread maps naturally to meaningful quality tiers. ONNX runtime ensures deterministic FP32 output: same inputs, same score, every time.

Threshold Calibration

Strong (≥0.45) requires genuine contextual alignment across multiple signals — not achievable by anchor match alone. Moderate (0.25-0.44) indicates topical connection with room to improve. Weak (<0.25) means the linking context has minimal semantic overlap. Calibrated against 1,186 real backlinks across 6 domains.

WLDM Cosine Scoring Pipeline
─────────────────────────────

Step 1: Extract
  URL → fetch HTML → strip tags
  → clean body text (no nav/footer)

Step 2: Embed
  All text → all-MiniLM-L6-v2 (ONNX)
  → 384-dimensional unit vectors

Step 3: Compare
  Cosine similarity = dot product
  of L2-normalized vectors

  Score range: 0.0 → 1.0

Step 4: Weight
  0.50 × cos(context ↔ target)   ← context2
  0.30 × cos(refPage ↔ target)   ← siteFocus
  0.20 × cos(anchor  ↔ target)   ← anchorMatch
  Generic anchors → weight redistributed

Step 5: Classify
  ● Strong   ≥ 0.45
  ● Moderate 0.25 – 0.44
  ● Weak     < 0.25

Deterministic Guarantee
  Same inputs → same ONNX graph
  → same FP32 result every time
  No randomness. No sampling.

Backlink Relevance
Cosine Scorer

Context ↔ Target Page

Referring Page ↔ Target Page

Anchor ↔ Target Page

Detected Link

Extracted Body

-

-

-

The Reasonable Surfer Model

Confirmed by the 2024 API Leak

Why 50/30/20 — Three Distinct Signals?

Why all-MiniLM-L6-v2?

Threshold Calibration

Want the full picture?

Backlink RelevanceCosine Scorer

Context ↔ Target Page

Referring Page ↔ Target Page

Anchor ↔ Target Page

Detected Link

Extracted Body

-

-

-

The Reasonable Surfer Model

Confirmed by the 2024 API Leak

Why 50/30/20 — Three Distinct Signals?

Why all-MiniLM-L6-v2?

Threshold Calibration

Want the full picture?

Backlink Relevance
Cosine Scorer