RunoVerse

Corpus & Geography Guide

What is this page?

This guide introduces five RunoVerse tools for exploring corpus metadata: who collected the poems, where they were gathered, when, and how many. Each section below describes one tool, what data it provides, and links directly to it.

How to use this guide

Read from top to bottom for a full overview, or jump to the summary table at the end to find the right tool for your question. Each section ends with a direct link to the explorer page it describes.

Related guides

Dictionary Guide covers annotation sources and word lookup. Languages Guide explains the Estonian/Finnish bilingual corpus. Poetics Guide covers alliteration, parallelism, and meter. Similarity Guide explains the 7 similarity algorithms.

Tip

The Dashboard and Regional Heritage pages combine corpus statistics with other data layers for cross-cutting exploration.

The RunoVerse brings together three major folklore collections spanning four centuries of documentation across Estonia and Finland. These tools help you explore the who, where, when, and how of the corpus itself — the collection history, geographic distribution, and statistical properties of this remarkable heritage.

Corpus Statistics

The Statistics page provides an interactive dashboard of corpus-wide metrics. It gives you an overview of how the three source corpora — SKVR, JR, and ERAB — compare in size, vocabulary diversity, and linguistic features. Use it to understand the overall shape of the data before diving into specific explorations.

Open Corpus Statistics →

Frequency Distribution

The Distribution page lets you explore how the 439,746 lemmas are distributed by frequency. Like most natural language corpora, runosong vocabulary follows Zipf's law: a small number of very common words account for most of the text, while the vast majority of words are rare. This tool visualizes that pattern and lets you explore its implications.

Open Frequency Distribution →

Regional Vocabulary

The Places page maps the geographic dimension of the corpus across 803 collection places and 292,092 poems. An interactive Leaflet map shows where poems were gathered, and you can search for any word to see where it appears geographically. This is particularly useful for studying dialectal variation and regional poetic traditions.

Open Regional Vocabulary →

Corpus Timeline

The Timeline page shows when the 285,946 dated poems were collected across four centuries of Finnic folklore documentation, from the 1560s through the 1970s. It reveals the waves of collection activity that built up these corpora, and how the focus shifted between regions and traditions over time.

Open Corpus Timeline →

Collector Explorer

The Collector Explorer lets you browse the 7,482 individuals who gathered the 292,092 poems in the corpus. Behind every poem is a collector — from well-known folklorists like Jakob Hurt and Elias Lönnrot to anonymous local contributors who recorded a handful of songs from their communities. This tool lets you explore the human effort behind the data.

Open Collector Explorer →

Overview

Tool Key Data Best For
Statistics 3 corpora, 439K lemmas, 15.3M tokens Understanding corpus composition and size
Distribution 439,746 lemmas across frequency bands Exploring vocabulary frequency and coverage
Places 803 locations, 292,092 poems, 10 map modes Geographic patterns and dialectal variation
Timeline 285,946 poems, 1560s–1970s Historical collection patterns
Collectors 7,482 collectors, 292,092 poems People behind the corpus

← Back to About