This tool lets you explore 704K+ annotated wordforms in the Finnic runosong corpus through three semantic tiers, organized by reliability:
Translation-based semantic groupings using English seed keywords. High precision for this corpus because the method relies on dictionary translations rather than NLP models. Domains include: Family & Kinship, Animals, Body & Health, Plants & Crops, Water & Sea, and 20 more.
thesaurus_index.json (25 domains, 40K+ keywords)Multi-method emotion vocabulary audit using GoEmotions, SetFit, NRC EmoLex, and EKKD dictionary data. ~90% accurate. Organized into 38 families across 26 domains (e.g., rõõm, viha).
Entity categories from NLP models (GLiNER NER, WordNet, thesaurus keyword matching, morphological detection, lemma propagation). These models were trained on modern English/general text and may have lower accuracy on archaic dialectal runosong vocabulary. Results are preliminary — use with caution.