Poem Length vs Shape Distribution
Methodology disclaimer
This page measures emotional arcs by counting positive vs negative emotion-family wordforms per verse against a 60,707-entry Estonian/Finnish lexicon. It does NOT understand negation ("ei ole kurb" scores negative), quoted speech, or formulaic epithets ("vaene olin, vaene olen"). Only ~17% of verses contain any emotion token; arcs for poems with <15% scorable verses are excluded. Treat results as a heuristic summary of affective vocabulary density, not as a validated emotion measurement.
What is this page?
This page tracks how emotions rise and fall across the verses of a runosong. For each poem, every verse is checked against a 60,707-word Estonian/Finnish emotion lexicon. Words are classified into emotion families (e.g. kurb = sadness, rõõm = joy), and each family is either positive or negative. The resulting verse-by-verse score creates an emotional arc — a line showing how the poem's mood shifts from beginning to end.
Around 92,000 poems have enough emotion words to produce a meaningful arc. These arcs are grouped into 8 shapes (clusters) by similarity, so you can see which emotional patterns are common across the tradition.
Tab-by-tab guide
1. Poem tab
View the emotional arc of a single poem. Type a poem ID in the search box (e.g. SKVR X1 1 or H II 3) or press Random to pick one at random.
Once a poem loads you will see:
- Arc chart (left) — a line graph of the poem's per-verse emotion score. Each point is one verse. Points above the middle line lean positive (joy, love, beauty); points below lean negative (sadness, anger, fear).
- Dashed "avg" line — the corpus-wide average arc, computed as the weighted mean of all 8 cluster centroids. It shows where a "typical" poem sits emotionally. If your poem's line sits well above or below the average, that poem is emotionally more intense or unusual than the norm.
- Verse list (right) — every verse of the poem, with a coloured badge showing its emotion direction (+ positive, − negative, · neutral) and the dominant emotion family for that verse.
- Emotion word highlighting — toggle "Show emotion words" to see which specific words in each verse matched the lexicon and which family they belong to.
- Key numbers strip — the row of summary boxes at the top: verse count, scorable verse %, overall valence, dominant emotion family, and which shape cluster this poem belongs to.
- Arc statistics — numerical descriptors of the arc: start/end valence, trajectory slope, variance (how much the arc jumps around), flip count (how many times the arc crosses from positive to negative or vice versa), and coverage (what percentage of verses had any emotion word).
- Emotion family breakdown — a stacked bar showing how often each emotion family appears in this poem.
- Cluster navigation — "Prev" and "Next" buttons let you browse other poems in the same shape cluster.
Compare mode: press Compare (or the C key) to load a second poem side-by-side. Both arcs appear on the same chart so you can see how two poems differ emotionally.
Export: press Export (or E) to download a JSON file with the poem's raw arc data, verse scores, and metadata.
2. Shapes tab
An overview of all 8 emotion-arc clusters. Each card shows one shape — a typical emotional trajectory shared by thousands of poems.
On each card you will see:
- Mini arc chart — the cluster's centroid (the average arc of all poems in that cluster). The dashed grey line labelled
avgis the corpus-wide average for comparison. - Shape name — labels like "Arm-dominant (steady)" or "Kurb-dominant (mild-fall)" describe which emotion family dominates and how the arc moves. "Arm" means love/affection, "Kurb" means sadness.
- Descriptor chips — four small badges summarising each shape:
- ↘ falling / ↗ rising / → flat — the overall direction of the arc. "Falling" means the poem tends to end more negative than it began; "rising" means it ends more positive; "flat" means it stays roughly level.
- σ (sigma) — standard deviation: how much the arc wobbles. Low σ means a smooth, steady arc; high σ means a turbulent one.
- flips — how many times the centroid crosses the zero line (switches between positive and negative mood).
- Δ (delta) — the difference between the end and the start of the arc. Positive Δ means the poem ends happier than it began; negative means it ends sadder.
- Statistics — number of poems, ET/FI split, median verse count, top emotion families, year range.
- Language bar — coloured strip showing the proportion of Estonian vs Finnish poems.
- Most representative poems — the poems whose arc is closest to the cluster centroid (highest cosine similarity). Click any poem ID to view it in the Poem tab.
- Checkbox — tick two cards to open a side-by-side comparison of their centroids and statistics.
3. Similar tab
Find poems whose emotional arc looks like a specific poem's arc.
How to "anchor" a poem:
- First load any poem on the Poem tab (search or press Random).
- Switch to the Similar tab. The poem you were viewing is automatically used as the anchor.
- Alternatively, type a different poem ID into the "Anchor poem" box.
- Press Find shape-similar.
The page computes the cosine similarity between the anchor poem's 20-bucket valence vector and every other poem's vector, then lists the closest matches. Higher scores mean more similar emotional trajectories.
Filters: use the language pills (All / ET only / FI only) to limit results by corpus. Use the cluster pills to show only poems from a specific shape cluster.
CSV: export the similarity results as a CSV file for further analysis.
4. Traditions tab
Compares emotional arcs across collection traditions (SKVR, Eesti regilaulude andmebaas, etc.). Each tradition has its own mean arc and cluster distribution.
- Overlay chart — all tradition mean arcs plotted together. Click legend items to show/hide individual traditions.
- Comparison table — sortable statistics for each tradition: poem count, mean valence, standard deviation, median verses, top cluster, and the ET/FI breakdown.
- Tradition cards — individual panels for each tradition, showing the mean arc, cluster distribution histogram, and dominant shape.
5. Findings tab
Corpus-wide statistics and research findings derived from the emotion arc data.
- Summary tiles — key numbers: total poems analysed, excluded poems, average scorable verse %, most/least emotional clusters.
- Decade chart — stacked bars showing how the 8 arc shapes are distributed across recording decades. Hover over a bar to see the breakdown.
- Emotion domain inventory — the full list of positive and negative emotion families used by the lexicon.
- ET vs FI comparison — side-by-side metrics for Estonian and Finnish subcorpora.
- Family frequency chart — horizontal bars showing how often each emotion family appears across the corpus.
- Most extreme poems — the poems with the highest positive and negative overall valence.
- Methodology — technical details of how arcs are computed, poems are clustered, and exclusions are applied.
- Build provenance — date and parameters of the data build.
Key concepts
Valence score
Each verse gets a score between −1 and +1. The formula is:
A verse with only positive words scores +1. A verse with only negative words scores −1. A verse with no emotion words scores 0 and is treated as neutral.
20-bucket vector
Poems vary wildly in length (from 5 to hundreds of verses). To compare them, each poem's verse-by-verse valence series is resampled into exactly 20 evenly spaced buckets. This creates a fixed-length "shape fingerprint" that can be compared across poems of any length.
Emotion families
The lexicon groups words into families by semantic meaning. Examples:
- Positive: arm (love), hea (good), ilu (beauty), rõõm (joy), õnn (happiness), naer (laughter), imestus (wonder), uhke (pride), hell (tender)
- Negative: kurb (sad), viha (anger), hirm (fear), valu (pain), häda (distress), lein (grief), nutma (crying), kuri (evil), vaev (suffering), häbi (shame)
Known limitations
- No negation handling — "ei ole kurb" (is not sad) still counts "kurb" as negative.
- No context awareness — quoted speech, irony, and formulaic epithets ("vaene olin, vaene olen") are scored the same as sincere statements.
- Sparse coverage — only about 17% of verses contain any emotion word at all. The arc is a heuristic summary of affective vocabulary density, not a validated emotion measurement.
- Exclusions — poems with fewer than 5 verses, or where fewer than 15% of verses have any emotion word, are excluded entirely.
Keyboard shortcuts
1–6— switch between tabsR— load a random poemS— copy a shareable linkC— toggle compare modeE— export poem data as JSON←/→— previous / next poem in the same cluster?— toggle shortcut overlay