Explore how Estonian and Finnish folk poems relate to each other. This page compares 292,000 poems and 4.29 million verse lines from three archives, using seven different methods to find similarities — from shared words to shared meaning across languages.
Tip: Press ? on your keyboard to toggle this panel. It auto-selects the section matching your current tab.
Five views
Poem Similarity — pick a poem, see its closest matches
Verse Similarity — search for a verse line and find similar ones
Verse Pairs — browse the 50,000 strongest verse-to-verse connections
Geography — see which places share poetic traditions on a map
Formulaicity — discover which verses are traditional formulas and which are unique
How similarity is measured
Seven methods look for connections in different ways:
Jaccard — same exact words shared between two texts
TF-IDF — same root words (e.g., “laulu” and “laulude” both count as “laul”)
Thematic (Translation) — same meaning across languages (compared through English translations)
Alignment — similar verses appearing in the same order
Semantic (GloVe) — similar overall themes, even with different words
Verse Match — compares individual verse lines (shared words, rare words, translations, letter patterns), then aggregates to a poem score
Combined (RRF) — merges evidence from all methods; strong only when multiple agree
Scores are percentages: higher = more similar. Colors match algorithms throughout the page.
Start by searching for a poem (by archive ID like “H II 1, 389” or by a word in the text). You’ll see the most similar poems ranked by whichever method you choose.
Side-by-side comparison
The pair buttons (e.g., “TF-IDF vs Jaccard”) show two rankings next to each other — useful for seeing which matches are found by different methods. Poems appearing in both columns are especially reliable matches.
Combined (RRF)
Merges evidence from multiple methods. A poem ranks high only if several algorithms agree. The algorithm badges (T, J, Tr, A) on each card show which methods contributed.
Poem comparison overlay
Click “Compare” on any match to see both poems verse by verse. Highlighted words show what’s shared:
Yellow — same exact word
Blue — same root word (different form)
Purple — same meaning across languages
Curved lines between the poems connect matching verses. Thicker lines = stronger matches.
Network graph & similarity map
A web of connections shows how your poem relates to its matches and their matches. The similarity map plots where matches were collected geographically.
Tips
Use the score slider to hide weak matches
Check Cross-lingual only (on Translation, Alignment) to see Estonian-Finnish connections
Click any poem ID to chain-navigate — breadcrumbs track your path
Deep link: ?poem=H II 1, 389
Verse Similarity
Search for a verse line and see which other verses across the corpus sound similar. Results come from four different methods plus a combined view.
Type any verse text in the search bar
Click the arrow on a match to explore from that verse instead — the verse trail tracks where you’ve been
Language badges (ET/FI) show which tradition each match comes from
Four algorithm columns (TF-IDF, Jaccard, Translation, CharBigram) plus an RRF combined view
Formulaic Patterns
Below the results: the Formulaic Patterns browser shows the 200 largest groups of recurring verse lines in the tradition. Click a cluster to load its representative verse and explore.
Verse Pairs
Browse the strongest verse-to-verse connections found across the whole corpus — the 50,000 most similar pairs, ranked by combined evidence from shared words, rare words, translations, and letter patterns.
Sort by any algorithm or use Combined (default)
Check Cross-lingual only to focus on Estonian-Finnish pairs
Click a pair to see both poems, per-algorithm scores, and a map link
Switch to By Place Pair to see which collection places share the most verse material
Select pairs (checkboxes) to highlight them on the mini-map
Algorithm badges: J (blue) = Jaccard, T (green) = TF-IDF, Tr (teal) = Translation, C (pink) = CharBigram.
By Place Pair view shows identical formulas + near-matches = combined total. Sort by combined count, identical formulas, near-matches, best RRF score, or cross-lingual near-matches.
Geography
Five map-based views show where poetic traditions connect geographically.
Verse Network
Lines on the map connect collection locations; thicker lines mean stronger connections. Green = same language, gold = cross-lingual.
Language filter — All / Cross-lingual only / Same-language only
Min connections — hide weaker connections
Show top (10–500) — limit to the N strongest
Cross-Lingual Formulas
Estonian-Finnish connections: which places and song types bridge the two traditions.
Language filter (All / ET / FI), Min clusters, Min connections, Top N sliders
Show all checkbox — show all connections instead of filtered top N
Song type pairs: “confirmed” = known to scholars, “new” = algorithmic discoveries
Poem Connections
Poem-level connections on the map. Lines connect places where similar poems were collected.
Algorithm selector (RRF / Thematic / Semantic), Min pairs slider
Show all checkbox — show all connections
Combined
Merges verse-level and poem-level connections onto one map.
Algorithm selector, Min verse pairs, Min poem pairs sliders
Show all checkbox (on by default) — toggle full network vs filtered
Place Focus
Pick a single place and see all its connections radiating outward.
Tradition Strength — how much poetic material this place shares with each partner
Surprising Connections — connections stronger than expected given place sizes
Click Focus on this place in any popup to chain-navigate
Deep link: ?mode=geography&submode=placefocus&place=Kuusalu
Formulaicity
See which parts of a poem are traditional formulas shared across the tradition, and which are found nowhere else in the corpus. Each verse gets a color: dark green = widely shared, yellow = found nowhere else, gray = too short to score.
Browse Most Formulaic or Most Unique poems, or search by poem ID
Click any verse in the color strip to see its 5 closest matches from other poems
Each verse is scored by the maximum similarity across Jaccard, TF-IDF, Thematic (Translation), and CharBigram. Verses with fewer than 2 distinct words are excluded (SHORT badge). 99.8% of scorable verses match something in the corpus, supporting the Parry-Lord oral-formulaic composition theory. Algorithm badges on each match: Jac/TF/Tx/Bi.
Glossary
Term
Explanation
Alignment
Compares poems by matching verses in order, like aligning two song sheets
CharBigram
Compares how words look letter-by-letter; catches dialect spelling differences
Combined / RRF
Merges rankings from several methods; strong only when multiple agree
Cosine similarity
Measures how similar two lists of numbers are (0 = unrelated, 1 = identical)
Cross-lingual
Across languages — here, Estonian and Finnish
ERAB
Estonian runosong archive (~165,000 poems)
Formula
A verse line that appears in many poems across the tradition
Jaccard
Counts shared exact words between two texts
JR
Finnish/Ingrian archive of unpublished runosong (~38,000 poems)
Lemma
Dictionary form of a word (“laulu” and “laulude” → “laul”)
Near-match
Algorithmically similar but not identical verses
Runosong
Traditional Baltic-Finnic oral poetry (Estonian: regilaul, Finnish: runolaulu)
SKVR
Main Finnish runosong archive (~89,000 poems)
Normalized surprise
How unexpected a connection is, given the sizes of both places — divides shared count by the geometric mean of both collections
TF-IDF
Matches root words, weighting rare shared words more heavily
Tradition Strength
Total quality of verse connections between two places
Thematic (Translation)
Compares texts through English translations; works across languages
Verse Match
Compares verse lines (shared words, rare words, translations, letter patterns), then aggregates to a poem score
New here? Press ? or click ? Help for a guide to this page.
Pick a poem and see which other poems are most like it. Choose two methods to compare side by side, or use Combined for an overall ranking. Click any match to jump to that poem and keep exploring.
Use the score slider to hide weak matches
Check Cross-lingual only to see Estonian-Finnish connections
Click Compare to see both poems verse by verse, with shared words highlighted
Search for a verse line and find similar ones across the corpus. Results are compared by shared words, rare words, translations, and letter patterns, plus a combined view.
Click the arrow on a match to explore from that verse — the verse trail tracks where you’ve been
Language badges (ET/FI) show which tradition each match comes from
In runosong, singers composed by combining stock verse formulas. A formula cluster groups verse lines that are near-identical variants of each other, found across many poems and places.
How they were discovered
The system analyzed 4.29 million verse lines using four similarity algorithms (Jaccard, TF-IDF, Translation, CharBigram), then combined evidence to identify clusters. The 200 largest clusters are shown here.
Reading the display
Members = total verse occurrences. Places = distinct collection locations. The language bar shows Estonian (blue) / Finnish (orange) proportion. Bilingual clusters appear in both traditions.
Integration
Click “Explore in Verse Tab” to load a formula into the Verse Explorer above.
The 50,000 most similar verse pairs, ranked by combined evidence from shared words, rare words, translations, and letter patterns. Use By Place Pair to see which collection places share the most verse material.
See which parts of a poem are traditional formulas shared across the tradition, and which are found nowhere else in the corpus.
Color strip
Each cell = one verse. Dark green = widely shared formula, yellow = found nowhere else, gray = too short to score.
How to use
Click a verse to see its top 5 matches from other poems. Score 0–100: 80+ = well-known formula, under 20 = largely unique. Algorithm badges (Jac/TF/Tx/Bi) show which method found each match.