A complete frequency dashboard for the Finnic runosong lexicon: 439,746 lemmas across 15.3 million tokens, with 161,223 hapax legomena (36.7%). Includes the Zipf curve, cumulative token coverage, part-of-speech profiles, and a side-by-side Estonian–Finnish comparison.
…
Loading
Frequency bands
Lemmas grouped by total occurrence count. Hapax legomena (count = 1) form the largest group; ultra-high-frequency lemmas account for half the tokens.
Zipf curve
Rank-frequency on a log-log scale. A straight line is the signature of Zipf's law. The fitted exponent is shown below.
Token coverage
What fraction of distinct lemmas accounts for what fraction of all tokens? In a Zipfian distribution, a tiny set of common words covers most of the corpus.
Part-of-speech profile
How many lemmas and tokens belong to each grammatical class? Hapax rate per POS reveals which categories are dominated by one-time words.
Hapax spotlight
A few of the 161,223 lemmas that appear exactly once in the corpus, with their English glosses. Many are dialectal isolates, scribal variants, or borrowings.
Estonian vs Finnish
Side-by-side statistics for the Estonian (ERAB) and Finnish (SKVR/JR) traditions. Distinctive lemmas are those especially overrepresented in one language relative to the other.