Similarity Explorer

What this shows

Similarity algorithms

Poem mode

Verse mode

Formula clusters

Geography tab — Network View

Geography tab — Place Focus

Place Focus metrics

Interactions

Poem Similarity Explorer

Compare poems across two similarity algorithms side-by-side. Search for any poem above, or use ?poem=ID in the URL.

Six algorithms: TF-IDF Lemma, Jaccard Wordform, Translation-Pivot, Alignment, Verse Match, and Combined RRF.

Poem Comparison

Formulaic Patterns (curated entry points)

What are formula clusters?

In runosong (regilaul / Kalevala-meter poetry), singers composed by combining stock verse formulas. A formula cluster groups verse lines that are near-identical variants of each other, found across many poems, places, and singers.

How they were discovered

The system analyzed 4.36 million verse lines using four similarity algorithms (Jaccard wordform overlap, TF-IDF lemma cosine, cross-lingual translation pivot, and character bigram similarity), then combined evidence from all four algorithms to identify clusters where verses are similar across multiple dimensions. The 200 largest clusters are shown here.

Reading the display

Members = total verse occurrences in the cluster. Places = distinct collection locations. The language bar shows the Estonian (blue) / Finnish (orange) proportion. Bilingual clusters appear in both language traditions.

Geographic data

When you select a formula, the system looks up its sample variant texts in the verse search index to find where they were collected. This covers a subset of the cluster's full membership (5 representative variants out of potentially thousands).

Integration with Verse Explorer

Click "Explore in Verse Tab" on any formula to load its representative verse into the Verse Explorer above, where you can see all four similarity algorithms and navigate to related verses.

Loading formulaic patterns...
Select a formula from the list to see details
Loading...

Verse Formulaicity Explorer

This tool reveals the formulaic skeleton of Finnic oral poetry. Each verse in a poem is scored by how closely it matches verses in other poems across the corpus.

Key Finding

99.8% of scorable verses have at least one close match elsewhere — confirming Parry-Lord oral-formulaic theory. The rare 0.2% unique verses (highlighted in yellow) are where singers composed something genuinely new.

How to Read

Color strip: Each cell = one verse. Green = high match score (formulaic), yellow = unique (no matches), gray = too short to score. The strip shows the poem's formulaic “skeleton” at a glance.

Score: 0–100 scale based on the best similarity score a verse achieves in any of four algorithms (Jaccard, TF-IDF, Translation, CharBigram). Clicking a verse shows the five closest parallels, tagged by the algorithm that found each one (Jac / TF / Tx / Bi).

Click a verse to see its top 5 similar verses from other poems.

Short verses (fewer than two distinct words) are shown with a gray “SHORT” badge. They are displayed for context but excluded from the formulaic / unique counts.

Academic Context

Albert Lord's “The Singer of Tales” (1960) proposed that oral poets compose by combining inherited formulas. This visualization makes the formulaic fabric visible.

Top 20 Most Formulaic Poems

Loading formulaicity data...
Select a poem to see its formulaic skeleton