RunoVerse
Recent clusters:

Verse Similarity Analysis

Cross-algorithm comparison of verse-level similarity across 4.2 million Estonian and Finnish runosong verses, with formulaic cluster analysis and geographic spread.

What this page shows

A statistical analysis of verse-level similarity across the entire Finnic runosong corpus (4.29 million verse occurrences, 289,702 poems). The analysis compares how four different algorithms find similar verses and identifies formulaic patterns.

Algorithm Comparison Dashboard

Cross-Algorithm Discordance

For verses that appear in multiple algorithms, how much do their neighbor lists overlap? High discordance means the algorithms find fundamentally different similar verses. Low overlap suggests the algorithms capture complementary aspects of similarity.

Formulaic Verse Clusters

Geographic Spread Charts

Scatter plot shows how cluster size relates to geographic distribution. Histogram shows the frequency of clusters by number of distinct places. Widely distributed formulas represent the most universal elements of Finnic oral poetry.

Algorithm Comparison Dashboard
Side-by-side statistics for each similarity algorithm. Match types: s = same language, x = cross-lingual, w = within-poem.
Cross-Lingual Match Percentage
Average Similarity Score
Verse Coverage (Verses with Matches)
Total Similarity Pairs
Cross-Algorithm Discordance
Pairwise comparison of how much the algorithms agree on their top neighbors for the same verses.
Formulaic Verse Clusters (Top 200)
The 200 largest groups of nearly identical verses found across the corpus. These represent formulaic expressions shared across poems and regions.
Geographic Spread of Formulaic Clusters
How widely the formulaic verse clusters are distributed across distinct collection places.
Cluster Size vs. Distinct Places
Distribution of Geographic Spread (# Places)