Explore semantic parallelism in Estonian and Finnish runosongs — consecutive verses expressing the same idea with different words.
Parallelism
Consecutive verse parallelism — immediately successive verses expressing the same idea with different words. The defining feature of Baltic-Finnic oral poetry (Steinitz 1934, Sarv 2017).
Red words and blue words show aligned substitution pairs between consecutive verses. Per-pair scores (shown as percentages with → arrows) indicate the strength of each consecutive pair connection within a group.
How are parallelism groups calculated?
Each consecutive verse pair is scored using a weighted combination of signals:
Word overlap (25%) — proportion of shared content words between the two verses
Best substitution pair (35%) — strongest word pair match from the full-corpus word pair table (W2V cosine + PMI co-occurrence), BERT and FAISS embedding similarity, or cluster membership
POS sequence similarity (10%) — how well the part-of-speech tag sequences of the two verses match (same syntactic structure = stronger parallelism evidence)
Morphological similarity (10%) — average character-level edit distance between differing words (catches inflectional variants)
Length ratio (5%) — similarity in word count, character count, and per-position word lengths
Substitution pair count (15%) — how many word substitution pairs are found between the verses
Additional bonuses and adjustments:
End-rhyme bonus — +8% if the last words of consecutive verses share 4+ final characters, +5% for 3 characters. Rewards the common runosong pattern of rhyming verse endings.
Position weighting — substitution pairs in similar relative positions within their verses score higher. A pair at the start of both verses scores more than one at the start vs. end.
Translation overlap — when diff words share English translation components (from the RunoLex gloss index), a small bonus is added, catching semantic parallels invisible to surface-form matching.
Lemma collapsing — dialectal variants that map to the same lemma (via gloss index or dialect synonyms) are treated as shared words rather than differences. This handles regional spelling variation across the corpus.
Consecutive above-threshold pairs are grouped together. Large groups (4+ verses) may be split at weak bridges — pairs whose score drops significantly below the group mean — producing tighter, more coherent groups. Groups are classified as structural (≥50% word overlap) or semantic (more substitution-driven).
Formulaic Recurrence
Verse pairs within the same poem that share structural formulas, potentially far apart. Detected via Jaccard word overlap from within-poem verse similarity. This is a different phenomenon from parallelism — it captures formulaic reuse across a poem, not the consecutive verse variation that defines runosong poetics.
Word Clusters
Groups of semantically related words that substitute for each other in the same verse template positions across the corpus.
How to use this page
Parallelism: Browse poems ranked by parallelism density. Select a poem to see consecutive verse groups with color-coded substitution pairs.
Formulaic Recurrence: Browse poems ranked by within-poem formula reuse. Shared words are underlined.
Word Clusters: Browse 2,206 clusters of semantically related words. Each cluster shows word members and substitution pairs ranked by frequency.
Cross-links: Click poem IDs to open them in the Poem Reader. Word forms link to the main Dictionary.
Color coding
Green, Blue, Purple, Orange, Pink — up to 5 parallel verse groups per poem
Underlined words — words shared across parallel verses
Red / Blue — substitution word pairs (Parallelism tab)
Data
200,627 poems with detected parallelism (651,170 groups, avg 3.2/poem). 102,065 poems with formulaic recurrence. 2,206 word clusters.
Loading parallelism index...
Select a poem from the list to see parallelism structure
Loading formulaic recurrence index...
Select a poem from the list to see formulaic recurrence structure