Matlab SPRIT (a prototype)
Matlab SPRIT is a program (about 1000 lines of matlab code) that
It's a prototype (written in matlab) of a program being written in Java.
It's hoped that the full version will be available (free) in summer 2007.
Fixed Forms currently identified are
- Identifies various fixed poetic forms. The
forms depend on various types of features and combinations of features
- Identifies some less fixed forms that depend on repetition
(of verses, lines or features)
- Provides statistics on some textual features
- Displays features graphically
More General Forms identified are
- Sonnet - 6 variants, including Shakespearian. Reports
how close the rhythm is to the metre. (uses line and syllable count;
rhyme check; rhythm check)
- Haiku - (uses line and syllable count)
- Limerick - (uses line count, and rhyme check)
- Villanelle - (uses line count, line-repetition, and rhyme check)
- Word Square - where the lines have the same number of words (which is the number of lines)
- Syllable Square- where the lines have the same number of syllables (which is the number of lines)
- Terza Rima - (uses lines-per-stanza check, rhyme check)
- Sestina - (uses line count and word repetition)
- Rondeau - (uses line count, syllable count and word repetition)
Statistics gathered include
The following sample output was produced when the program
was given 14 poems
- Boxed Poem - a poem of at least 3 stanzas
where all lines are roughly the same length and all stanzas (except
perhaps the final one) have the same number of lines. Line-length tolerance is
adjustable: by default it's 20%.
where all but the final stanza are the
same boxed shape (same lines/stanza and roughly the same width) and the
final stanza (which may be a single line) is about the same width as the
other stanzas. Line-length tolerance is adjustable: by default it's 20%.
- Syllabic - where the syllables-per-line pattern is the same in all stanzas.
- Word Stanza Pattern - where the words-per-line pattern is the same in all stanzas.
- Regular Rhyme - where the rhyme pattern is the same in all stanzas.
*** Sonnet ***
It's a Shakespearian sonnet
Per-stanza endstops: 0.50
*** Haiku ***
It's a haiku
*** Limerick ***
It's a limerick
*** Villanelle ***
It's a villanelle
Per-stanza endstops: 0.75 0.50 0.50 0.67 0.50 0.88
*** Word Square ***
It's a word square
*** Syllable Square ***
It's a syllable square
*** Dante ***
It's terza rima
Per-stanza endstops: 0.50 0.00 0.50 0.50 0.33
*** Box ***
Except for the final stanza, it's a boxed poem
*** Tyger ***
It's a regular rhyme
Per-stanza endstops: 0.38 0.75 0.88 0.75 0.75 0.38
*** Wheelbarrow ***
It's a word-stanza poem
*** Rondeau ***
It's a rondeau
*** Hymn ***
It's a syllabic poem
Per-stanza endstops: 0.69 0.62 0.69 0.50 0.69 0.50 0.69
*** Mona Lisa ***
*** Sestina ***
It's a sestina
Per-stanza endstops: 0.50 0.50 0.25 0.42 0.50 0.25 0.67
Graphical output is still under development - it's not clear how useful it is.
Color-coding aims to point
out the repeated blocks of lines, the rhyme pattern, and the beats. The
length of the rows shows the number of syllables. The picture below
illustrates "All Things Bright and Beautiful" and a sonnet
(click to see larger versions)
Several poems can be analysed together. Their characteristics can be
averaged, or trends can be studied.
Below are graphs for groups of poems showing a) "stanza-length" "Summary" - the
overall distribution of stanza-lengths, and b) "stanza-length" "Trend" - how the average stanza length
changed from poem to poem
Statistical analysis of texts pre-dates computers. Initially work was
mostly related to word-frequency analysis. More recently, stylistic
analysis has developed, and has helped with forgery detection. The
work here combines ideas from several fields
As many features as possible are codified into lists of numbers. For example,
William Carlos Williams' The Red Wheelbarrow can be codified as follows
- Computational Linguistics
- Code Analysis and Software Metrics - the more organised the text,
the more ideas can be used from this discipline
- More sophisticated pattern-matching strategies are being used on texts - see for example a neural-net approach
etc. An advantage of working this way is that the same fuzzy matching
algorithms can be used on different features - the code to see if all stanzas
have the same rhyme pattern is very similar to the code that checks if
all stanzas have the same syllable-per-line pattern.
- Syllables per line: 4 2 0 3 2 0 3 2 0 4 2
- Lines per stanza: 2 2 2 2
- Beat pattern for line 2: 0 1 (i.e. no beat on syllable 1, but a beat on
The final program supports plug-ins so that extra feature-detection can
be added by users.
XML is a format for text files that lets structured information be stored.
Routines exist in many languages to extract the information from such files.
Rather than write plug-ins, users can write programs that analyse information
in the XML files. This offers more flexibility than plug-ins, but these
programs won't be able to access the core's utility functions.
Some of the processing done by the core is straightforward (line-counting, etc)
but it also performs text-to-phoneme conversion (phonemes are units of sound) so that syllables can be
counted and rhyme analysis can be done. This introduces several difficulties
- Core - The core program converts a piece of text into a Poem object - an internal
representation of the poem and its features. It also has some utility features
- it can display the poem and its phoneme representation, for example. It
can save the static information of the Poem object into an XML file
- Plug-ins - The core supports plug-ins (code fragments that can add
functionality). Plug-ins have access to the Poem object and the core's utilities.
- Batch - The core offers a way to process many texts, letting the user
choose what features to report once and how to output the results
Beat analysis is even more subjective than TTP conversion.
- Dialects and accents
- Conversion errors
- Parsing ambiguity - Words like "several" can be vocalised in more than one
way - with either 2 or 3 syllables. In poetry, the prevailing rhythm
often determines the output - in a poem with a regular beat, 'The man ate
several biscuits' will have a unique reading. Ideally, the program should delay
syllable analysis until it has determined the prevailing beat, but it can't
determine the beat until it's analysed the syllables. The plan is that a
first-pass will check to see if the poem is close to a form. If it is, a
second-pass will adjust the syllables to see if a closer match can be obtained.
This feature will be especially important when
the text-to-phoneme translator is used, because it's bound to get some things 'wrong'.
Besides, poets like bending the rules - repeated lines in villanelles often don't
match exactly, and in Blake's "Tyger, Tyger"
the first and last verses differ by only one word.
Our Fuzzy Matching routine lets us choose the tolerance. Here are examples
of its output when matching the phrase "once upon a time"
|once upon a time|| 20|| 1 |
|once upon a tim|| 20|| 0.9375 |
|once upon a || 20|| 0 |
|once upon a || 40|| 0.7500 |
|upon a time|| 20|| 0 |
|upon a time|| 40|| 0.6875 |
|once upon the time|| 40|| 0.8333 |
An output of 1 means an exact match. An output of 0 means that there was no
match within the requested tolerance.
The routine works just as well when asked to compare 2 lists of numbers.
The program uses this routine in several places to offer the user configurable
tolerances when making comparisons.
- Identify more forms
- Make the analysis of groups of poems more configurable
- Introduce the notion of rhyme strength
- Add this prototype to the production version - the full version has a
text-to-phoneme translator allowing automated syllable-counting and end-rhyme detection. In this prototype these latter features are detected manually.
- Detect irregular patterns of sonic features
- Detect Metaphors - See "A Computer Method for Recognising Metaphors in Sentences", Dan Foss, "Research in Humanities Computing", V.3, 1994, p. 127-144 (it uses Prolog)
- Can the XML output files (rather then the raw text) be loaded in? Can it
be edited? Can the edited version's beat-pattern etc override what the program calculates?
- Analysing Sound Patterns (a previous project)
- "Virtual Verse Analysis: Analysing
Patterns in Poetry" in "Literary and Linguistic Computing", V21, Suppl
Issue 2006, by Marc R. Plamondon - "This article discusses the problem of
computer identification of some basic patterns in poetry: rhythm and
rhyme" - see http://llc.oxfordjournals.org/cgi/content/short/21/suppl_1/127
The program (called "AnalysePoems", written in Visual Basic + .NET) was used
to classify poems in a database. It
doesn't do TTP conversion - it looks up syllable decomposition, stress and
rhyme in a database. When given a word not in the database it asks the user
to provide syllable decomposition and accent info. Even though it doesn't
focus on iambic pentameter it rather assumes a regular metrical form.
- "Virtual Muse; An experiment in Computer Poetry", Hartman, 1966
- "The Connectionist model of poetic meter", Hayward, 1996 (in "Poetics" V20
- Computer-Assisted Phonetic Analysis of English Poetry: A Preliminary Case Study of Browning and Tennyson by Marc R. Plamondon from "TEXT Technology"
23rd May 2007