What is an n-gram phrase?
An n-gram is a sequence of n consecutive words. Here:
- Bigrams — 2-word phrases (e.g. kulla ema "dear mother", veljeni verise "my bloody brother").
- Trigrams — 3-word phrases (e.g. kulla ema kulla). Trigrams are sparser but more distinctive.
Unlike collocates (which count any co-occurrence in a window), n-grams count adjacent, ordered sequences — so they capture actual stock phrases that singers reused verbatim.
How to use
- Search a word — see the top bigrams/trigrams that contain that word.
- View pills — switch between the top-100 bigrams and trigrams corpus-wide, or word-specific views.
- Alliterative filter — restrict to phrases whose content words share an initial sound. Alliteration is the defining sound device of Kalevala-meter poetry, so alliterative phrases are often formula cores.
Tags on a phrase row
- Allit — alliteration detected inside the phrase.
- Refrain — the phrase shows up in refrain-like distributional patterns (the same phrase repeating across adjacent verses).
Related pages
For formula clusters with variants, see Formulas; for single-word co-occurrence see Collocates; for verse-level repetition see Formulaicity.