Lemma Quality Explorer

Explore 325,332 lemmatization conflicts between corpus annotations and DeepSeek AI analysis. Discover magnet lemmas, suffix patterns, and annotation uncertainty.

What are conflicts?

A conflict occurs when the corpus annotation and the DeepSeek AI analysis assign different lemmas to the same wordform. This does not necessarily mean either is wrong — many conflicts are spelling variants or cross-language differences. True conflicts represent genuine disagreements about word identity.

Conflict Classes

True conflict Different lemma interpretations (216K). Spelling variant Same word, different orthography (99K). Cross-language Estonian vs Finnish lemma (10K).

Magnet Lemmas

Common lemmas (like “ei”, “saama”, “olla”) that attract many wordforms from the corpus pipeline, even when the AI suggests different lemmas. These reveal systematic biases in the annotation pipeline.

Current Pick

For each conflict, RunoVerse currently uses a selection algorithm that considers corpus unanimity, count ratios, and cross-language signals. The “current pick” shows what the system chose.

-Total Conflicts

-True Conflicts

-Spelling Variants

-Cross-Language

Overview

Magnet Lemmas

Patterns

Browse Conflicts

Conflict Class Distribution

Corpus vs AI Dominance

Most Contested Wordforms

Wordforms with the highest combined annotation counts where corpus and AI disagree:

Wordform	Corpus Lemma	Count	AI Lemma	Count	Class	Current Pick

Lemma	Attracted Forms	Unanimity	True Conflicts	Spelling	Cross-Lang

Wordform	Corpus Lemma	Count	AI Lemma	Count	Class	Current Pick

Top Suffix Patterns in Conflicts

Systematic morphological patterns in conflicting lemma assignments:

#	Wordform Suffix	Lemma Suffix	Conflicts

Magnet Lemma Size Distribution

How many lemmas attract a given number of conflicting wordforms: