Machine learning techniques are providing new tools that could help archaeologists understand the past — particularly when it comes to deciphering ancient texts. The latest example is an AI model created by Alphabet-subsidiary DeepMind that helps not only restore text that is missing from ancient Greek inscriptions but offers suggestions for when the text was written (within a 30-year period) and its possible geographic origins.
“Inscriptions are really important because they are direct sources of evidence ... written directly by ancient people themselves,” Thea Sommerschield, a historian and machine learning expert who helped created the model, told journalists in a press briefing.
Due to their age, these texts are often damaged, making restoration a rewarding challenge. And because they are often inscribed on inorganic material like stone or metal, it means methods like radiocarbon dating can’t be used to find out when they were written. “To solve these tasks, epigraphers look for textual and contextual parallels in similar inscriptions,” said Sommerschield, who was co-lead on the work alongside DeepMind staff research scientist Yannis Assael. “However, it’s really difficult for a human to harness all existing, relevant data and to discover underlying patterns.”
That’s where machine learning can help.
The new software, named Ithaca, is trained on a dataset of some 78,608 ancient Greek inscriptions, each of which is labeled with metadata describing where and when it was written (to the best of historians’ knowledge). Like all machine learning systems, Ithaca looks for patterns in this information, encoding this information in complex mathematical models, and uses these inferences to suggest text, date, and origins.
In a paper published in Nature that describes Ithaca, the scientists who created the model say it is 62 percent accurate when restoring letters in damaged texts. It can attribute an inscription’s geographic origins to one of 84 regions of the ancient world with 71 percent accuracy and can date a text to within, on average, 30 years of its known year of writing.
These are promising statistics, but it’s important to remember that Ithaca is not capable of operating independently of human expertise. Its suggestions are ultimately based on data collected by traditional archaeological methods, and its creators are positioning it as simply another tool in a wider set of forensic methods, rather than a fully-automated AI historian. “Ithaca was designed as a complementary tool to aid historians,” said Sommerschield.
Eleanor Dickey, a professor of classics from the University of Reading who specializes in ancient Greek and Latin sociolinguists, told The Verge that Ithaca was an “exciting development that may improve our knowledge of the ancient world.” But, she added that a 62 percent accuracy for restoring lost text was not reassuringly high — “when people rely on it they will need to keep in mind that it is wrong about one third of the time” — and that she was not sure how the software would fit into existing academic methodologies.
For example, DeepMind highlighted tests that showed the model helped improve the accuracy of historians restoring missing text in ancient inscriptions from 25 percent to 72 percent. But Dickey notes that those being tested were students, not professional epigraphers. She says that AI models may be broadly accessible, but that doesn’t mean they can or should replace the small cadre of specialized academics who decipher texts.
“It is not yet clear to what extent use of this tool by genuinely qualified editors would result in an improvement in the editions generally available — but it will be interesting to find out,” said Dickey. She added that she was looking for to trying the Ithaca model out for herself. The software, along with its open-source code, is available online for anyone to test.
Ithaca and its predecessor (named Pythia and released in 2019) have already been used to help recent archaeological debates — including helping date inscriptions discovered in the Acropolis of Athens. However, the true potential of the software has yet to be seen.
Sommerschield stresses that the real value of Ithaca may be in its flexibility. Although it was trained on ancient Greek inscriptions, it could be easily configured to work with other ancient scripts. “Ithaca’s architecture makes it really applicable to any ancient language, not just Latin, but Mayan, cuneiform; really any written medium — papyri, manuscripts,” she said. “There’s a lot of opportunities.”