Emotion Vocabulary

About the Emotion Vocabulary Browser

This page presents the emotion-related vocabulary of Finnic runosongs (regilaulud), discovered through semantic parallelism detection. In runosong tradition, consecutive verses repeat the same syntactic structure with semantically related words substituting for each other — the substitution test exploits this formulaic structure to discover words that share meaning.

How the vocabulary was built

Discover→ Classify→ Expand→ Review→ Merge

Discover: For each emotion seed word, verse templates are created by removing the word from its verse context, then other words filling the same position are found across the corpus.
Classify: Top candidates are classified by Claude AI into categories: core emotion (E), morphological derivation (M), corpus artifact (A), adjacent (J), or noise (N).
Expand: Confirmed emotion words become seeds for the next round. 26 seed words were explored to depth 5, with translation-based expansion adding further candidates.
Review: All lemmas reviewed by Claude Opus with actual verse evidence, wordform translations, and corpus frequencies. Lemmas and individual wordforms can be removed or restored.
Merge: Three layers of corrections applied: per-form moves, lemma-level merges, and recovered forms from removed lemmas. Corrections propagated to the RunoVerse lexicon and poem index.

Classification categories

E — Emotion M — Morphological A — Artifact J — Adjacent

E = core emotion word (e.g., rõõm ‘joy’). M = derivation of an emotion word (e.g., rõõmustama ‘to rejoice’). A = dialect variant or lemmatization error merged to its correct family (e.g., reem → rõõm). J = contextually related but not an emotion itself (e.g., laul ‘song’, süda ‘heart’).

Using this browser

Sidebar: Browse emotion families grouped by semantic domain. Click a family to view its members.
Search: Search by lemma, word form, or English translation. Matches filter the sidebar in real time.
Word forms: Blue forms were found via the substitution test; gray forms are corpus-attested. Frequency badges show corpus count.
Verse examples: Hover over a word form to see an example verse. Click to open the word’s detailed view with poem contexts.
Sort: Word forms can be sorted alphabetically (A-Z), by corpus frequency (Freq), or by template IDF weight (IDF — how specific the substitution templates are).
Export: Use the CSV or JSON export buttons on the landing page.

Translation cognates

Each family page includes translation cognates — Estonian lemmas from the runosong corpus that share English translations with the family’s members. The RunoVerse lexicon (242K glosses, 1.5M mappings from DeepSeek AI translations) is inverted to build per-lemma gloss profiles. For each family, lemmas whose translation profiles overlap with the family’s are scored using an IDF-weighted overlap coefficient: common glosses (‘to’, ‘little’) are downweighted via idf = log(1 + N/(1 + df)), while specific glosses (‘contempt’, ‘mock’) count more. Candidates require ≥2 shared glosses and a minimum IDF sum. This discovers semantically related words that the substitution test may have missed.

Source data

Corpus: Estonian and Finnish runosong collections (7.3M + 7.4M tokens, 451K + 701K unique forms). BERT embeddings fine-tuned on runosong texts (190K words × 768 dimensions). Translations from the RunoVerse lexicon (DeepSeek AI translations, 1.36M entries).