Long weekends are good. They need not be too long to be good. A workday off on a Friday makes a weekend a long weekend. Though it’s just a single-day addition to the regular two-day weekend, the quantum of relaxation and creativity it brings is more than 33%. You could go on a long drive to the beach and have fun. Rather than long drives and beaches, a programmer’s idea of fun is to design and implement a mini-project outside of their work domain.
During one long weekend this year, occasioned due to the harvest festival Makar Sankranthi falling on Friday, the 14th of January, I ventured to program the task of finding the number of syllables in a word.
The most famous poetic pattern is the Iambic Pentameter. Shakespeare wrote most of his verses in it. To know what it is, we need to understand some terms, very clearly explained by Debora Schwartz  as:
- Meter: a recognizable rhythm in a line of verse consisting of a pattern of regularly recurring stressed and unstressed syllables.
- Foot/feet: a metric “foot” refers to the combination of a strong stress and the associated weak stress (or stresses) that make up the recurrent metric unit of a line of verse.
- Iamb: a particular type of metric “foot” consisting of two syllables, an unstressed syllable followed by a stressed syllable (“da DUM”); the opposite of a “troche.” An unstressed syllable is conventionally represented by a curved line resembling a smile (a U is as close as I can get here). A stressed syllable is conventionally represented by a/. Thus, an iamb is conventionally represented as U/ .
- Iambic pentameter: A ten-syllable line consisting of five iambs is said to be in iambic pentameter (“penta” = five). Not just the iambic pentameter, but for all rhythmic poetry patterns, the basic unit to know is the syllable, which essentially is how many times you make vowel sounds when pronouncing a word.
What follows is a journal of what I did during my long weekend.
I spent Friday poking around the web and taking notes (this process is more fancifully called research).
There are some general rules for counting the number of syllables in a word.
- Count the vowels in a word.
- Subtract any silent vowels.
- Subtract one vowel when two vowel sounds form one speech sound (diphthong).
Extensive algorithm and Python code :
- If letters < 3 : return 1.
- If it doesn’t end with “ted” or “tes” or “ses” or “ied”, discard “es” and “ed” at the end.
- Discard trailing “e,” except where the ending is “le”, also handle “le_exceptions”
- Check if consecutive vowels exists, triplets or pairs, count them as one.
- Count remaining vowels in word.
- Add one if starts with “mc”
- Add one if it ends with “y” but is not surrounded by vowels.
- Add one if “y” is surrounded by non-vowels and is not last in the word.
- If it starts with “tri-” or “bi-” and is followed by a vowel, add one.
- If it ends with “-ian,” it should be counted as two syllables, except for “-tian” and “-cian”
- If it starts with “co-” and is followed by a vowel, check if it exists in the double syllable dictionary. If not, check if in the single dictionary and act accordingly.
- If it starts with “pre-” and is followed by a vowel, check if it exists in the double syllable dictionary. If not, check if in the single dictionary and act accordingly.
- Check for “-n’t” and cross match with a dictionary to add a syllable.
- Handling the exceptional words.
My favorite programming language and framework is Ruby and Rails. With the discovery of the syllable-counting algorithm and code, my task became simple. I simply had to port the code from Python.
On the second day, I had LitTech fun. LitTech is my term for what data science calls NLP (natural language processing). I converted the extensive Python algorithm that I had Googled for counting syllables to Ruby. I quickly built up a Rails application around that algorithm and it on my cloud server where I run my side projects. To test it out, I entered the words “Hyderabad Readers And Writers.” It gave the syllable count as seen in the following screenshot.
I had yet to verify the accuracy of my migrated ruby algorithm. I planned to run some test cases on Sunday. I informed my literature club (HydRAW) members of my application.
Sunday, the third day of the Sankranthi weekend started dark, cloudy, and cool. But that did not deter me from LitTech fun. I played the soulful crooning of Manjari and Papon on YouTube continuously to get extra spirit nourishment. One of my book club members pointed out a discrepancy — “Hyderabad” is four syllables whereas my application was counting three.
So, I did a cross-check with RapidAPI. It gave four syllables for “Hyderabad.” As a quick fix, I decided to use RapidAPI in my program. For each word in the input, check the syllable counts with RapidAPI as well as the current algorithm. If RapidAPI gives syllables count for all words and there are no network exceptions for any word, then its result is sent to the user. If my algorithm’s syllable count matches that of RapidAPI for every word and there are no network exceptions for any word check with RapidAPI, then the result is displayed as using the current algorithm.
Once the above functionality was done, I proceeded to find why the current algorithm was giving Hyderabad three syllables. Indeed, there was a check for the letter “y.”
But I had goofed up twice in converting the code from Python to Ruby.
I had to convert the string to an array of characters. I should have used
chars on the word, but I used just
split. The code was converting the word string to a single array and not an array of characters. I changed to
The second boo-boo was I missed an if condition. So, added it.
The interesting part is that Python
enumerate over a list gives index and element whereas Ruby
each_with_index gives it in the reverse order: element and index. So, for the main loop, I had to write
| j, i | instead of
| i, j |. The following three screenshots show the original Python code, my error Ruby code, and the fixed Ruby code.
Now the algorithm counted Hyderabad correctly as having four syllables.
I also added code to write words in a file whenever there is a discrepancy between the syllable count given by the self-algorithm and that from RapidAPI. I can use those words to check the algorithm and improve it.
The application is on my side-project server: https://www.mahboob.tech/poetry/
The source code is available in my GitHub repositories: https://github.com/mh-github/poetry