Surface forms that map to more than one lemma — where a runosong word is genuinely ambiguous.
What is wordform ambiguity?
A wordform is the exact surface spelling seen in a poem. A lemma is its dictionary headword. In Finnic languages many wordforms are genuinely ambiguous: they can be inflected forms of two or more different lemmas. For example a form could be either a case of a noun or a verb form, or two different nouns' inflections. This page collects wordforms that have 2 or more candidate lemmas and shows which lemmas each could be.
How to read a card
Bold green word — the ambiguous surface form as it appears in the corpus.
The orange pill shows how many distinct lemmas this form could map to.
Each row below is one candidate lemma with its POS tag. The bar and number are that lemma's overall corpus frequency, not how often this specific surface form resolves to it — more frequent lemmas are the statistically likelier parse, but the ambiguity is not resolved here.
Sort options
Most Ambiguous — ranks by number of candidate lemmas × frequency of the top candidate, surfacing the forms most worth reviewing.
Most Frequent Top Candidate — puts the commonest words first (the cases that matter most for corpus-wide statistics).
Most Lemma Candidates — forms with the largest number of possible parses.
Why it matters
Runosong language is archaic, dialectal, and often uses syncopated or truncated forms. Disambiguation is a real bottleneck for any frequency count or semantic analysis — this page is a way to audit where the lemmatiser hedges, and to decide whether the hedge is justified.
How to read these cards: Each wordform (bold, green) has multiple candidate lemmas.
The number next to each lemma is its overall corpus frequency — how common that lemma is across the
whole corpus — not how often this specific wordform maps to it. Lemmas are sorted by overall frequency;
more frequent candidates are statistically likelier parses, but the disambiguation itself is ambiguous.