Finding Palindromes in the Latin Library


A playful diversion for the morning: What is the longest palindrome in the Latin language? And secondarily, what are the most common? (Before we even check, it won’t be too much of a surprise that non takes the top spot. It is the only palindrome in the Top 10 Most Frequent Latin Words list.)

As with other experiments in this series, we will use the Latin Library as a corpus and let it be our lexical playground. In this post, I will post some comments about method and report results. The code itself, using the CLTK and the CLTK Latin Library corpus with Python3, is available in this notebook.

As far as method, this experiment is fairly straightforward. First, we import the Latin Library, preprocess it in the usual ways, tokenize the text, and remove tokens of less than 3 letters. Now that we have a list of tokens, we can look for palindromes. We can use Python’s text slice and negative step to create a test for palindromes. Something like this:

def is_palindrome(token):
    return token == token[::-1]

This function takes a token, makes a copy but reversing the order of the letters, and returns true if they match. At this point, we can filter our list of tokens using this test and report our results. So…

Drumroll, please—the most frequently occurring palindromes in the Latin language are:

non, 166078
esse, 49426
illi, 9922
ibi, 7155
ecce, 3662
tot, 3443
sumus, 2678
sis, 1526
usu, 1472
tenet, 1072

Second drumroll, please—the longest palindrome in the Latin language is Massinissam (11 letters!), the accusative form of Massinissa, the first king of Numidia. We find other proper names in the top spots for longest palindromes: Aballaba, a site long Hadrian’s Wall reported in the Notitia Dignitatum; Suillius, a 1st-cent. Roman politician; and the Senones, a Celtic tribe well known to us from Livy among others. The longest Latin palindrome that is not a proper name is the dative/ablative plural of the superlative for similis, namely simillimis (10 letters). Rounding out the top ten are: the accusative of sarabara, “wide trowsers,” namely sarabaras; the feminine genitive plural of muratus, “walled,” namely muratarum; the first-person plural imperfect subjunctive of sumere, that is sumeremus; the  dative/ablative of silvula, “a little wood”, namely silvulis (notice the u/v normalization though); and rotator, “one who turns a thing round in a circle, a whirler round,” as Lewis & Short define it.

Not much here other than a bit of Latin word trivia. But we see again that using a large corpus like The Latin Library with Python/CLTK, we can extract information about the language easily. This sort of casual experiment lays the foundation for similar work that could be used perhaps to look into questions of greater philological significance.

A closing note. Looking over the list of Latin palindromes, I think my favorite is probably mutatum, a word that means something has changed, but when reversed stays exactly the same.