Currently, we see a pronounced momentum in the development of large neural networks such as OpenAI's GPT-3, Deepmind's Gato, or Google's LaMDA. How do you assess the development?
Stefan Wrobel: The large neural networks, often referred to as foundation models, are indeed an extremely significant development in Machine Learning and Artificial Intelligence. Many Machine Learning applications to date have been based on the use of pre-classified training examples, from which an appropriate predictive model is then learned in a supervised fashion. Such pre-classified data is typically very limited in availability. In contrast, Foundation Models are the result of a long development and much work on unsupervised training of neural networks. With the availability of such architectures, we have the possibility not only to rely on pre-classified data, but also to train models with the almost unlimited material we find in existing texts, images, or increasingly videos. At the same time, since it has been possible to implement the corresponding architectures in a scalable way on high-performance computer systems, it is now possible to train models that include over 500 billion parameters. This number of parameters enables the models not only to absorb an enormous amount of information, but also to store it in a suitably newly represented form and to link it with each other by the way they are trained. This now works across different modalities (text, image, video) and enables intelligent-seeming performance that we thought impossible not long ago - whether answering questions that implicitly require logical inference, composing text, or even generating photorealistic images. This is a very significant step for anyone using machine learning - as we move away from highly specialized models trained on a single task to the very Foundation Models that will contribute in a fundamental way to a variety of tasks, and without which people will probably soon not want to develop corresponding systems.
The development of large neural networks requires a lot of resources and know-how. Is a market concentration to be expected here, similar to Internet search and social networks?
Stefan Wrobel: A key element of the impressive current performance of such models lies in scaling - both in terms of the size of the datasets used for training and the number of parameters in the models. It is hard to imagine that the training of such models can be done by smaller organizations and companies. Thus, the business community will have to rely on sourcing such large models from appropriate vendors and integrating them into their own AI systems and applications. So we cannot rule out the possibility that a market situation will develop here as well, in which only a few large providers will have the resources, expertise and capabilities to train such models. At the same time, technical progress is tremendous, also in academia, so that further developments could make similar results possible even with much smaller models. Nevertheless, it is of course good and important that Germany, for example, is investing in this area. With the OpenGPT-X project, we have a strong consortium at the start, which includes not only large providers, but also powerful partners in the field of high-performance computing. Even beyond OpenGPT-X, efforts are being made to create alternatives to the currently known large models. The AI competence centers that have just been established in Germany will certainly play an important role here.
Some of these large networks are no longer limited to just one activity. Are we thus on the way to general Artificial Intelligence?
Stefan Wrobel: The current large-scale models certainly represent an important step toward leaving behind the previous systems trained for just one task - and moving toward systems that can be used in different contexts and are trained in general. We would probably call general artificial intelligence systems that, in suitable tests, prove to be equivalent to or even superior to human intelligence in ways to be defined. The classic Turing test requires that a human in a chat can no longer determine whether he is chatting with an AI or a human. Here we can already see that this test will probably no longer be sufficient, because the human tendency to detect human traits in a counterpart can easily lead astray here. It is therefore hardly surprising that there are already the first AI researchers who believe they have discovered consciousness in the answers of a quite obviously technical system. Irrespective of the question of how exactly one would like to define consciousness or general intelligence, this clearly shows that we would do well to deal intensively with the ethical principles for the use of such systems at an early stage.
The interview is released for editorial use (if source is acknowledged © Plattform Lernende Systeme).