What are large language models and what is special about them?
Volker Tresp: Large language models are AI models that use machine learning methods to analyze huge amounts of text. They use more or less the entire knowledge of the worldwide web, its websites, social media, books and articles. In this way, they can answer complex questions, compose texts and provide recommendations for action. Dialog or translation systems are examples of large language models, most recently of course ChatGPT. You could say that Wikipedia or the Google Assistant can do much of this as well. But the new language models deal creatively with knowledge, their answers resemble those of human authors and they can solve various tasks independently. They can be extended to handle arbitrarily large amounts of data and are much more flexible than previous language models.
The large language models have moved from research to practice within a few years, and of course there are still shortcomings that the best minds in the world are working on. But even if the systems still occasionally give incorrect answers or fail to understand questions correctly - the technical successes that have been achieved here are phenomenal. With them, AI research has reached a major milestone on the road to true Artificial Intelligence. We need to be clear about one thing: The technology we are talking about here is not a vision of the future, but a reality. Anyone can use voice assistants and chatbots via their web browser. The current voice models are true gamechangers. In the next few years, they will significantly change the way society, science and business deal with information and knowledge.
What applications do the language models enable - and what prerequisites must be created for them?
Volker Tresp: The language models can be used for various applications. They can improve information systems and search engines. For service engineers, for example, a language model could analyze thousands of error reports and problem messages from previous cases. For physicians, it can aid in diagnosis and treatment. Language models belong to the family of so-called generative Transformer models, which can generate not only text, but also images or videos. Transformer models create code, control robots and predict molecular structures in biomedical research. In sensitive areas, of course, it will always be necessary for humans to control the results of the language model and ultimately make a decision. The answers given by the language models are still not always correct or digress from the topic. How can this be improved? How can we further integrate information sources? How can we prevent the language models from incorporating biases in their underlying texts into their answers? These are essential questions on which much research is needed. So there is still a lot of work to be done. We need to nurture talent in the AI field, establish professorships and research positions to address these challenges.
In addition, if we want to use language models for applications in and from Europe, we need European language models that can handle the languages here, take into account the needs of our companies and ethical requirements of our society. Currently, language models are created - and controlled - by American and Chinese tech giants.
Who can benefit from large language models? Only large companies or also medium-sized businesses?
Volker Tresp: Even small and medium-sized companies can use language models in their applications because they can be adapted very well to individual company problems. Certainly, medium-sized companies also need technical support in this process. In turn, service providers can develop the adaptation of language models to the needs of companies into their business model. There are no limits to the creativity of companies when it comes to developing solutions. Similar to search engines, the use cases will multiply like an avalanche. However, to avoid financial hurdles for small and medium-sized companies, we need large basic language models under European auspices that enable free or low-cost access to the technology.
The interview is released for editorial use (if source is acknowledged © Plattform Lernende Systeme).