September 11, 2024
How to Restore Ancient Texts Using Human and Artificial Intelligence
Barbara Graziosi
Professor of Classics, Princeton University
How to Restore Ancient Texts Using Human and Artificial Intelligence
Barbara Graziosi
Professor of Classics, Princeton University
Minutes of the First Meeting of the 83rd Year
George Bustin presided over the meeting held on 11 September 2024. The Invocation was led by Frances Slade. No minutes of the previous meeting were read. There were ten guests, four of whom attended as potential members. Kathryn Trenner brought Sara Oderwald as her guest. Robert Varrin brought Charles Bushnell. Ricardo Fernandez brought Anders Boss and Jay Kuris. All of these guests have been proposed for membership. In addition, Anne Seltzer brought Marna Seltzer as her guest. Stephen Schreiber brought Harold Shapiro. Ferris Olin brought Sonia Yacov. Joan Fleming brought John Fleming as her guest and Michael Mathews brought Elise Mathews. There were 120 people in attendance.
A moment of silence was observed in memory of six Old Guard members who died since the last meeting. These members were Landon Y. Jones, Jr.; Bernard Miller; Judith Pinch; Everard K. Pinneo; Charles Taggart; and Ralph N. Tottenham-Smith.
The speaker, Barbara Graziosi, is a Professor of Classics at Princeton University. She is a scholar of classical Greek with a particular interest in Homer. In conjunction with an undergraduate computer science student and a computer science faculty member at Princeton, she helped create Logion, an AI tool that helps with the restoration of pre-modern Greek texts.
Professor Graziosi used her first slide to list the topics she intended to cover in her talk and then went on to expand on each of these topics, connecting them in an interesting and engaging manner. First up was “Artificial Intelligence and some challenges.” The Professor recounted how up until a few years ago she had had no interest in AI and machine learning. But then she had a “math genius” (her description) in one of her classes who casually asked one day after class about research in her field. She said she gave him an answer that she thought impressed him, and next thing that happened was he got one of his computer science professors involved and they ended up with a wonderful interdisciplinary project with “crucial undergraduate” involvement on the extremes of math, computer science, and the humanities with her interdisciplinary project in the middle. While she is clearly excited about it, she admits it has its rewards and perils.
Professor Graziosi identified 3 foci of AI:
First, the definition of AI was linked to a moment in 1956 at the Dartmouth Conference at which four scientists from different fields asked the Rockefeller Foundation for funding to research the conjecture that learning/intelligence can be so accurately described that a machine can be made to simulate it. This had two major consequences: first that artificial intelligence was defined as a field of major inquiry, and second, that humanists pushed back against everything “intelligent” that machines achieved.
As an example, Professor Graziosi reminded the audience that in 1997 IBM developed a computer program they called Deep Blue, which beat Gary Kasparov in chess. In response, The Economist and other publications declared that chess isn’t a sign of intelligence; real intelligence is the competent use of language, recognizing another’s emotional state, etc. But, in 2024, we all know these are examples of things that in fact machines can do, so we have a constant re-definition of terms. Professor Graziosi wants to focus not on competition between humans and machines but on collaboration, and not just interdisciplinary collaboration but also collaboration between humans and machines.
The Economist said that machine intelligence was all about number crunching but there are computer scientists who say the brain is a number cruncher and the only difference between a machine and a brain is that the latter is biodegradable.
The Professor feels we are at a level of interpretability (her second area of focus) at which no one really knows what goes on at deep levels of learning. Deep learning means there are so many layers to computational models and calculations and readjustments to the structure of the models that, for example, no one really knows how Chat GPT comes up with the results that it yields. The same is true for the human brain: there are many things we don’t know about how the human brain works, so there is an analogy between AI and the brain.
We feel a moment of amazement when AI does something like beating Kasparov. How could a machine beat him when no human could? We feel amazed and then we calm down. Now there are computers playing chess against other computers, so in a sense computers have been “domesticated” and are used in the fields of chess, analysis, training, and service/servants, while we still maintain an interest in humans. And this is because right now while we are excited about AI, we are still anxious about it; it doesn’t feel “domesticated” to us. Yet it seems that as we become more accustomed to AI, we get less anxious, and we decide that it’s not intelligence after all. If an AI model is successful, it’s not AI anymore. We are left with a paradox.
One way AI is very far from domestication is language models, because they work by statistics. That is, they use a large amount of data, and they extrapolate a likely answer from that data. For example, if we have the sentence “____ went to work and ____ stayed home.” AI will fill in the blanks with “he” and “she” respectively, based on the statistics of the occurrences of data in its model. As an aside, Professor Graziosi noted that there is a lot of research showing biases in chatbots against non-standardized uses of English, e.g. African American; the bias continues even if the bot is given anti-racist training. The great harm here is that these chatbots are being used to screen job applications for interviews.
Professor Graziosi said that as a classicist and a classical philologist, she is accustomed to preserving unique forms of speech and languages. She noted that in the Library at Alexandria are thousands of these, including a list of expressions used only once in the works of Homer. She posed the question: Why was this heroic task undertaken? Her answers are to record that those expressions exist, and to instill the idea that they need protecting because the diversity of human expression should be valued. Today that tradition continues by scholars with the codification of classical philology, doing textual criticism, showing variations in different manuscripts, etc.
There is a principle in philology that if you have a trivial version of a manuscript and a more meaningful, more difficult one that is thought provoking, the difficult one is more likely to be the original one, because as one copies a manuscript, one becomes tired and one tends to simplify the text. If a manuscript contains gibberish, it doesn’t necessarily mean that is the original because gibberish can’t have produced a more meaningful document in a future copy. This is all a probability game, but for the protection of the rare and unique original, these principles that make one stop and consider do counteract the statistical bias and majority uses against minority usage in the current challenges for uses of AI.
Professor Graziosi then went into more specifics about the work that classical philologists do. That work can be divided into three categories: gap filling, textual emendation, and manuscript evaluation/attribution. She noted that machine learning can help with all of these tasks, with an emphasis on help, as machines are very far from taking over in the field of philology. Experts are still needed, which is a challenge because there are fewer and fewer of them. The good news is that students in the field are very excited about using AI. And going back to the idea of collaboration, computer science has an overflow of majors while the humanities doesn’t, so it is a wonderful thing that they can work together.
Professor Graziosi then introduced her project Logion and used some examples of fragmentary manuscripts to demonstrate its capabilities in the three areas in which philologist work. One of the first projects Logion worked on was a fragmentary manuscript by the author Psellos. Her group trained Logion to help fill in the missing pieces in the manuscript. The professor didn’t provide details on this except to say that with each section they worked on they would estimate how large the missing section was—the number of letters—and then they gave Logion a range limit of the number of suggestions they wanted it to generate for them to consider. They also had their own idea of what the missing letters could be.
Sometimes Logion’s suggestions were wrong, sometimes they were right, sometimes Logion provided a better answer than the one the group had come up with. The group tested Logion’s suggested answers by creating fake gaps in the text and then seeing if Logion supplied the correct text; they found that Logion supplied the correct text 95% of the time. Professor Graziosi and her group felt that this was a useful success rate.
This has been an example of the gap filling work of philologists, also known as “forgery.” Professor Graziosi noted that this work is not as crucial as emendation, which is correcting errors/simplifications/modernizations, etc. of manuscripts that have occurred over many hundreds of years, the second important work of philologists.
As an example of this, Professor Graziosi’s group devised an algorithm to discover the least probable text where it is found, which means that any text that is found only once in a manuscript would be flagged by Logion as errors. The group then figured that if you changed just one letter in that bit of text you would get a very likely bit of text, so that became the algorithm. This yielded amazing results. Professor Graziosi demonstrated this by showing a slide of a manuscript fragment in which Logion highlighted where one letter in a word in a phrase was wrong and suggested a different letter, which changed the word and thereby made the phrase meaningful. When that corrected word and phrase were compared to the original manuscript it was found that the copier had made a mistake, and Logion had supplied the correct letter.
In another example there is a manuscript account of someone being beaten “at home.” Logion highlighted this phrase and indicated it did not make sense. Checking the original manuscript showed that the copy indeed was correct; further research revealed that the original scribe got caught up in the emotion of the tale and made a mistake in recording the time and place of the event. In this case the justification for the emendation had to be peer reviewed before the change was permitted. In this example Logion was also helpful in providing evidence that the attributed author was in fact wrong, manuscript attribution being the third area in which philologists work.
Professor Graziosi next addressed some of the problems with data, noting that there are virtuous and vicious circles. Machines need data and currently data is biased towards contemporary U.S. values when the machine is trained only on things found on the internet in English. A model is only as good as the data on which it is trained.
Scholars of antiquity are charged with preserving all the ancient texts in all their various media (wood, parchment, bamboo, clay tablets, stone, etc.) and those ancient texts that have been digitized are preserved in open access so that machines as well as humans can learn from them. But some ancient texts, e.g. Homer, Aristotle, Sophocles, exist in digital form but not in open access. There are no copyright laws in play so access shouldn’t be a problem. But there is no champion of philology.
The ”virtue” circle is that colleagues can assess the suggestions that Logion makes and then ask the machines to learn from these assessments. All the feedback will provide what computer scientists call “alignment” for the machines, so they get better, good enough to justify the engagement in the results they produce, creating the virtue circle.
Another important feature of the collaboration between humans and AI is capacity building. Algorithms built for classical Greek can be used for other languages. Professor Graziosi used classical Syriac as an example in which currently refugees from Syria are used to check the data, but we know from the classic Greek algorithms, machines can be trained, and that will result in a real human and AI collaboration.
And finally, Professor Graziosi addressed humans and AI, stating that brains learn differently from the way AI learns. The main difference is that machines restructure their architecture in order to learn something new. Human brains do not throw out old knowledge in order to learn something new; this makes us more resilient and better able to learn new things in the future—something humanists have always known.
The first question during Q and A was about bias and Professor Graziosi responded that you can mitigate bias by coding using less data or by combining larger and smaller data models. Another question asked how to handle machine learning when a “dumb” answer is suggested. The heart of the answer is that suggested answers are based on the statistical data in the model, not what makes sense. Another question asked how to train machines to make sense of idioms? Professor Graziosi said that currently Logion is not doing translations, so this is not yet an issue. She mentioned that there is someone at Princeton University who is working on this, but the issue is “nontrivial.”
Another question noted that syntax and language change over time and that these changes would seem to date a manuscript; how does machine learning deal with this? Professor Graziosi said there is currently one program doing this, but she is skeptical for now. In ten years, though, she expects machine learning to be able to date manuscripts. The final questions asked if AI can be programmed to note the steps of its process. Professor Graziosi said that right now, no. She explained that it is very difficult to understand how much processing goes on in machine learning. She said she is very lucky to be at Princeton University and have ready access to computer processing time. Her colleague at the Bibliothèque Nationale in Paris is not so fortunate.
Respectfully submitted,
Sarah Ringer
A moment of silence was observed in memory of six Old Guard members who died since the last meeting. These members were Landon Y. Jones, Jr.; Bernard Miller; Judith Pinch; Everard K. Pinneo; Charles Taggart; and Ralph N. Tottenham-Smith.
The speaker, Barbara Graziosi, is a Professor of Classics at Princeton University. She is a scholar of classical Greek with a particular interest in Homer. In conjunction with an undergraduate computer science student and a computer science faculty member at Princeton, she helped create Logion, an AI tool that helps with the restoration of pre-modern Greek texts.
Professor Graziosi used her first slide to list the topics she intended to cover in her talk and then went on to expand on each of these topics, connecting them in an interesting and engaging manner. First up was “Artificial Intelligence and some challenges.” The Professor recounted how up until a few years ago she had had no interest in AI and machine learning. But then she had a “math genius” (her description) in one of her classes who casually asked one day after class about research in her field. She said she gave him an answer that she thought impressed him, and next thing that happened was he got one of his computer science professors involved and they ended up with a wonderful interdisciplinary project with “crucial undergraduate” involvement on the extremes of math, computer science, and the humanities with her interdisciplinary project in the middle. While she is clearly excited about it, she admits it has its rewards and perils.
Professor Graziosi identified 3 foci of AI:
First, the definition of AI was linked to a moment in 1956 at the Dartmouth Conference at which four scientists from different fields asked the Rockefeller Foundation for funding to research the conjecture that learning/intelligence can be so accurately described that a machine can be made to simulate it. This had two major consequences: first that artificial intelligence was defined as a field of major inquiry, and second, that humanists pushed back against everything “intelligent” that machines achieved.
As an example, Professor Graziosi reminded the audience that in 1997 IBM developed a computer program they called Deep Blue, which beat Gary Kasparov in chess. In response, The Economist and other publications declared that chess isn’t a sign of intelligence; real intelligence is the competent use of language, recognizing another’s emotional state, etc. But, in 2024, we all know these are examples of things that in fact machines can do, so we have a constant re-definition of terms. Professor Graziosi wants to focus not on competition between humans and machines but on collaboration, and not just interdisciplinary collaboration but also collaboration between humans and machines.
The Economist said that machine intelligence was all about number crunching but there are computer scientists who say the brain is a number cruncher and the only difference between a machine and a brain is that the latter is biodegradable.
The Professor feels we are at a level of interpretability (her second area of focus) at which no one really knows what goes on at deep levels of learning. Deep learning means there are so many layers to computational models and calculations and readjustments to the structure of the models that, for example, no one really knows how Chat GPT comes up with the results that it yields. The same is true for the human brain: there are many things we don’t know about how the human brain works, so there is an analogy between AI and the brain.
We feel a moment of amazement when AI does something like beating Kasparov. How could a machine beat him when no human could? We feel amazed and then we calm down. Now there are computers playing chess against other computers, so in a sense computers have been “domesticated” and are used in the fields of chess, analysis, training, and service/servants, while we still maintain an interest in humans. And this is because right now while we are excited about AI, we are still anxious about it; it doesn’t feel “domesticated” to us. Yet it seems that as we become more accustomed to AI, we get less anxious, and we decide that it’s not intelligence after all. If an AI model is successful, it’s not AI anymore. We are left with a paradox.
One way AI is very far from domestication is language models, because they work by statistics. That is, they use a large amount of data, and they extrapolate a likely answer from that data. For example, if we have the sentence “____ went to work and ____ stayed home.” AI will fill in the blanks with “he” and “she” respectively, based on the statistics of the occurrences of data in its model. As an aside, Professor Graziosi noted that there is a lot of research showing biases in chatbots against non-standardized uses of English, e.g. African American; the bias continues even if the bot is given anti-racist training. The great harm here is that these chatbots are being used to screen job applications for interviews.
Professor Graziosi said that as a classicist and a classical philologist, she is accustomed to preserving unique forms of speech and languages. She noted that in the Library at Alexandria are thousands of these, including a list of expressions used only once in the works of Homer. She posed the question: Why was this heroic task undertaken? Her answers are to record that those expressions exist, and to instill the idea that they need protecting because the diversity of human expression should be valued. Today that tradition continues by scholars with the codification of classical philology, doing textual criticism, showing variations in different manuscripts, etc.
There is a principle in philology that if you have a trivial version of a manuscript and a more meaningful, more difficult one that is thought provoking, the difficult one is more likely to be the original one, because as one copies a manuscript, one becomes tired and one tends to simplify the text. If a manuscript contains gibberish, it doesn’t necessarily mean that is the original because gibberish can’t have produced a more meaningful document in a future copy. This is all a probability game, but for the protection of the rare and unique original, these principles that make one stop and consider do counteract the statistical bias and majority uses against minority usage in the current challenges for uses of AI.
Professor Graziosi then went into more specifics about the work that classical philologists do. That work can be divided into three categories: gap filling, textual emendation, and manuscript evaluation/attribution. She noted that machine learning can help with all of these tasks, with an emphasis on help, as machines are very far from taking over in the field of philology. Experts are still needed, which is a challenge because there are fewer and fewer of them. The good news is that students in the field are very excited about using AI. And going back to the idea of collaboration, computer science has an overflow of majors while the humanities doesn’t, so it is a wonderful thing that they can work together.
Professor Graziosi then introduced her project Logion and used some examples of fragmentary manuscripts to demonstrate its capabilities in the three areas in which philologist work. One of the first projects Logion worked on was a fragmentary manuscript by the author Psellos. Her group trained Logion to help fill in the missing pieces in the manuscript. The professor didn’t provide details on this except to say that with each section they worked on they would estimate how large the missing section was—the number of letters—and then they gave Logion a range limit of the number of suggestions they wanted it to generate for them to consider. They also had their own idea of what the missing letters could be.
Sometimes Logion’s suggestions were wrong, sometimes they were right, sometimes Logion provided a better answer than the one the group had come up with. The group tested Logion’s suggested answers by creating fake gaps in the text and then seeing if Logion supplied the correct text; they found that Logion supplied the correct text 95% of the time. Professor Graziosi and her group felt that this was a useful success rate.
This has been an example of the gap filling work of philologists, also known as “forgery.” Professor Graziosi noted that this work is not as crucial as emendation, which is correcting errors/simplifications/modernizations, etc. of manuscripts that have occurred over many hundreds of years, the second important work of philologists.
As an example of this, Professor Graziosi’s group devised an algorithm to discover the least probable text where it is found, which means that any text that is found only once in a manuscript would be flagged by Logion as errors. The group then figured that if you changed just one letter in that bit of text you would get a very likely bit of text, so that became the algorithm. This yielded amazing results. Professor Graziosi demonstrated this by showing a slide of a manuscript fragment in which Logion highlighted where one letter in a word in a phrase was wrong and suggested a different letter, which changed the word and thereby made the phrase meaningful. When that corrected word and phrase were compared to the original manuscript it was found that the copier had made a mistake, and Logion had supplied the correct letter.
In another example there is a manuscript account of someone being beaten “at home.” Logion highlighted this phrase and indicated it did not make sense. Checking the original manuscript showed that the copy indeed was correct; further research revealed that the original scribe got caught up in the emotion of the tale and made a mistake in recording the time and place of the event. In this case the justification for the emendation had to be peer reviewed before the change was permitted. In this example Logion was also helpful in providing evidence that the attributed author was in fact wrong, manuscript attribution being the third area in which philologists work.
Professor Graziosi next addressed some of the problems with data, noting that there are virtuous and vicious circles. Machines need data and currently data is biased towards contemporary U.S. values when the machine is trained only on things found on the internet in English. A model is only as good as the data on which it is trained.
Scholars of antiquity are charged with preserving all the ancient texts in all their various media (wood, parchment, bamboo, clay tablets, stone, etc.) and those ancient texts that have been digitized are preserved in open access so that machines as well as humans can learn from them. But some ancient texts, e.g. Homer, Aristotle, Sophocles, exist in digital form but not in open access. There are no copyright laws in play so access shouldn’t be a problem. But there is no champion of philology.
The ”virtue” circle is that colleagues can assess the suggestions that Logion makes and then ask the machines to learn from these assessments. All the feedback will provide what computer scientists call “alignment” for the machines, so they get better, good enough to justify the engagement in the results they produce, creating the virtue circle.
Another important feature of the collaboration between humans and AI is capacity building. Algorithms built for classical Greek can be used for other languages. Professor Graziosi used classical Syriac as an example in which currently refugees from Syria are used to check the data, but we know from the classic Greek algorithms, machines can be trained, and that will result in a real human and AI collaboration.
And finally, Professor Graziosi addressed humans and AI, stating that brains learn differently from the way AI learns. The main difference is that machines restructure their architecture in order to learn something new. Human brains do not throw out old knowledge in order to learn something new; this makes us more resilient and better able to learn new things in the future—something humanists have always known.
The first question during Q and A was about bias and Professor Graziosi responded that you can mitigate bias by coding using less data or by combining larger and smaller data models. Another question asked how to handle machine learning when a “dumb” answer is suggested. The heart of the answer is that suggested answers are based on the statistical data in the model, not what makes sense. Another question asked how to train machines to make sense of idioms? Professor Graziosi said that currently Logion is not doing translations, so this is not yet an issue. She mentioned that there is someone at Princeton University who is working on this, but the issue is “nontrivial.”
Another question noted that syntax and language change over time and that these changes would seem to date a manuscript; how does machine learning deal with this? Professor Graziosi said there is currently one program doing this, but she is skeptical for now. In ten years, though, she expects machine learning to be able to date manuscripts. The final questions asked if AI can be programmed to note the steps of its process. Professor Graziosi said that right now, no. She explained that it is very difficult to understand how much processing goes on in machine learning. She said she is very lucky to be at Princeton University and have ready access to computer processing time. Her colleague at the Bibliothèque Nationale in Paris is not so fortunate.
Respectfully submitted,
Sarah Ringer