A professor at Carnegie Mellon University in Qatar (CMU-Q) has compiled decades of research on Turkish Natural Language Processing.
Kemal Oflazer completed his bachelor’s and master’s degrees at Middle East Technical University in Ankara. He then pursued his PhD in computer science at Carnegie Mellon University and then worked in the US for a full decade.
When he returned to Turkey to teach at Bilkent University, he found that his time away had given him a new perspective. “You rarely get the chance to see what your first language looks like from an external point of view. I wrote a Turkish document, and I realised, there is no Turkish spell-checker,” said Oflazer.
It was the early 1990s, and this observation raised questions that would guide Oflazer’s research interests for the next three decades.
Turkish is an agglutinative language, which means suffixes are attached to a root word. One complex Turkish word with several suffixes could express the same meaning as an entire sentence in English.
“In English, the computer can check spelling against a finite list of words,” he explains. “In Turkish, a given verb root can give rise to about 1.5mn different word forms. It is rather amazing.”
This also brings other interesting properties, such as free word order where the subject, object or the verb can be arranged in any possible order. In English, by comparison, the order is rather fixed.
In the early 1990s, there was no work being done in the area of Turkish NLP. Through funding provided by Nato Science for Stability Programme, European Union and Turkish Scientific and Technological Research Council, Oflazer and his graduate students did research and development on Turkish natural language processing.
In 2012, Oflazer was invited to deliver a talk at the Language Resources Evaluation Conference in Istanbul on the challenges of Turkish NLP. After the lecture, he was approached by Springer Verlag with a proposition to compile a book on the state of the art of Turkish NLP. Along with co-editor Murat Saraçlar of Bogazici University in Istanbul, Oflazer spent more than four years working with researchers to bring together 25 years of work. The book was published in 2018 in both hard-copy and online versions, and so far more than 2,000 copies of the various chapters have been downloaded.
While Turkish is spoken by more than 70 million people in Turkey, the Middle East, and in European countries, the wider family of Turkic languages are spoken as a native language by approximately 165 million people worldwide.
Kemal Oflazer continues his research in the area of NLP with projects supported by the Qatar National Research Fund.