Verse Path Finder
Find the shortest chain of similar verses connecting any two verses in the Finnic runosong corpus (4.29M verses, 289,702 poems).
What this does
Finds the shortest chain of similar verses connecting two verses through the similarity network. Uses bidirectional breadth-first search — expanding from both endpoints simultaneously until the frontiers meet.
How similarity works
- Jaccard — shared exact wordforms between verse lines
- TF-IDF — shared lemmas (dictionary forms), weighted by rarity
- Translation — shared English translations, enabling cross-lingual Estonian-Finnish matches
- CharBigram — character bigram overlap, capturing surface-level textual similarity
In "All (RRF)" mode, neighbors from all four algorithms are combined. In single-algorithm mode, only that algorithm's neighbors are followed.
Reading the path
- Each card is a verse; green endpoints are your selected verses
- The percentage on each arrow is the similarity score between adjacent verses
- Cross-lingual edges (ET/FI) are marked with a special badge
- Click any verse card to open it in the Verse Network explorer
Enriched search
When enabled, the path finder computes extra Jaccard (exact wordform) matches on-the-fly from all cached chunks, discovering connections that were below the pre-computed top-5 cutoff. This greatly improves connectivity for verses sharing vocabulary.
- Only discovers Jaccard (exact wordform) matches — not TF-IDF or Translation
- Only searches within already-loaded chunks (up to 30 cached)
- Enriched edges are marked with a Computed badge in hop details
- Disable for faster searches when pre-computed neighbors suffice
Tips
- Start with max hops = 3 or 4; increase if no path is found
- Enable "Enriched" mode to discover more connections through shared wordforms
- Using "Translation only" mode can find cross-lingual paths more easily
- Very common formulaic verses act as "hubs" — paths often pass through them
- If two verses have no path within 7 hops with enrichment, they may be in disconnected parts of the similarity graph