Lemma Quality Explorer
Explore 325,332 lemmatization conflicts between corpus annotations and DeepSeek AI analysis. Discover magnet lemmas, suffix patterns, and annotation uncertainty.
What are conflicts?
A conflict occurs when the corpus annotation and the DeepSeek AI analysis assign different lemmas to the same wordform. This does not necessarily mean either is wrong — many conflicts are spelling variants or cross-language differences. True conflicts represent genuine disagreements about word identity.
Conflict Classes
True conflict Different lemma interpretations (216K). Spelling variant Same word, different orthography (99K). Cross-language Estonian vs Finnish lemma (10K).
Magnet Lemmas
Common lemmas (like “ei”, “saama”, “olla”) that attract many wordforms from the corpus pipeline, even when the AI suggests different lemmas. These reveal systematic biases in the annotation pipeline.
Current Pick
For each conflict, RunoVerse currently uses a selection algorithm that considers corpus unanimity, count ratios, and cross-language signals. The “current pick” shows what the system chose.
Conflict Class Distribution
Corpus vs AI Dominance
Most Contested Wordforms
Wordforms with the highest combined annotation counts where corpus and AI disagree:
| Wordform | Corpus Lemma | Count | AI Lemma | Count | Class | Current Pick |
|---|
| Lemma | Attracted Forms | Unanimity | True Conflicts | Spelling | Cross-Lang |
|---|
| Wordform | Corpus Lemma | Count | AI Lemma | Count | Class | Current Pick |
|---|
Top Suffix Patterns in Conflicts
Systematic morphological patterns in conflicting lemma assignments:
| # | Wordform Suffix | Lemma Suffix | Conflicts |
|---|
Magnet Lemma Size Distribution
How many lemmas attract a given number of conflicting wordforms: