This page presents the emotion-related vocabulary of Finnic runosongs (regilaulud), discovered through semantic parallelism detection. In runosong tradition, consecutive verses repeat the same syntactic structure with semantically related words substituting for each other — the substitution test exploits this formulaic structure to discover words that share meaning.
E = core emotion word (e.g., rõõm ‘joy’). M = derivation of an emotion word (e.g., rõõmustama ‘to rejoice’). A = dialect variant or lemmatization error merged to its correct family (e.g., reem → rõõm). J = contextually related but not an emotion itself (e.g., laul ‘song’, süda ‘heart’).
Each family page includes translation cognates — Estonian lemmas from the runosong corpus that share English translations with the family’s members. The RunoVerse lexicon (242K glosses, 1.5M mappings from DeepSeek AI translations) is inverted to build per-lemma gloss profiles. For each family, lemmas whose translation profiles overlap with the family’s are scored using an IDF-weighted overlap coefficient: common glosses (‘to’, ‘little’) are downweighted via idf = log(1 + N/(1 + df)), while specific glosses (‘contempt’, ‘mock’) count more. Candidates require ≥2 shared glosses and a minimum IDF sum. This discovers semantically related words that the substitution test may have missed.
Corpus: Estonian and Finnish runosong collections (7.3M + 7.4M tokens, 451K + 701K unique forms). BERT embeddings fine-tuned on runosong texts (190K words × 768 dimensions). Translations from the RunoVerse lexicon (DeepSeek AI translations, 1.36M entries).
Emotional Geography Explorer — maps the geographic distribution of emotion vocabulary across collection parishes, showing how different emotions cluster in different regions of the runosong tradition.
Note: This vocabulary is computationally derived and AI-reviewed. It is intended as a research tool, not a definitive classification. Dialectal and archaic Estonian (13th–19th century) presents inherent lemmatization challenges.