RunoVerse

Frequency Distribution

A complete frequency dashboard for the Finnic runosong lexicon: 439,746 lemmas across 15.3 million tokens, with 161,223 hapax legomena (36.7%). Includes the Zipf curve, cumulative token coverage, part-of-speech profiles, and a side-by-side Estonian–Finnish comparison.

Loading

Frequency bands

Lemmas grouped by total occurrence count. Hapax legomena (count = 1) form the largest group; ultra-high-frequency lemmas account for half the tokens.

Zipf curve

Rank-frequency on a log-log scale. A straight line is the signature of Zipf's law. The fitted exponent is shown below.

Token coverage

What fraction of distinct lemmas accounts for what fraction of all tokens? In a Zipfian distribution, a tiny set of common words covers most of the corpus.

Part-of-speech profile

How many lemmas and tokens belong to each grammatical class? Hapax rate per POS reveals which categories are dominated by one-time words.

Hapax spotlight

A few of the 161,223 lemmas that appear exactly once in the corpus, with their English glosses. Many are dialectal isolates, scribal variants, or borrowings.

Estonian vs Finnish

Side-by-side statistics for the Estonian (ERAB) and Finnish (SKVR/JR) traditions. Distinctive lemmas are those especially overrepresented in one language relative to the other.