Technique: Risks and opportunities of the new shared AI

On 15 March in Trent, the “Industry, Technology and Digital” ministerial meeting under the Italian G7 presidency ended with the approval of a declaration on the rules governing artificial intelligence. THE The G7 members expressed their desire to develop artificial intelligence systems in an ethical manner and in accordance with the principles that underpin democracy to promote the cohesion, resilience and well-being of society. The declaration includes a section on artificial intelligence in the public sector, calling for an “open” and enabling environment for the development and release of secure and reliable systems. In line with the spirit of the “Regulation” recently approved by the European Parliament, the declaration reiterates the approach based on risk analysis of artificial intelligence systems. But what exactly does it mean to create an open environment? This statement does not go into detail about this statement, but we can try to give an acceptable interpretation based on concepts widely shared by the IT community.

Let’s assume that an artificial intelligence system is created using software programs and then trained using a large amount of data with the involvement of many people who verify and correct its operation. For example, teaching a system to recognize cats requires collecting and using lots of images and then making sure it works correctly. The system must not be confused by the dog image during the cat recognition process. The software programs used may be publicly available, in which case we speak of open software, or the exclusive property of the person who wrote them, in which case we speak of closed or proprietary software.

Training data can also be public or proprietary. The G7’s hope to build an open and welcoming environment therefore presupposes the availability of the software programs necessary to develop artificial intelligence systems. However, the availability of software programs is not enough for technical accuracy, but the data used for the initial training, or the parameters calculated on the basis of this data, are essential. In the case of current systems, the number of these parameters is enormous, we are talking about 175 billion parameters (B125) for ChatGPT 3.5, while ChatGPT 4 could have 100,000 billion (T100), as stated by Andrew Feldman, CEO of Cerebras in an interview with OpenAI. There is a significant difference between software programs and parameters. The first of them are based on algorithms, often published in specialized scientific literature. In recent years, science has made tremendous progress in discovering increasingly sophisticated algorithms based on complex mathematical and statistical techniques. You don’t need to be an expert in these areas to write a software program starting with one of these rhythmic algorithms. Not only that, it is possible to find tutorials and libraries on the web that allow this activity with a minimum of computer knowledge. Calculating the parameters, however, is another story. Very powerful calculators and huge amounts of computing time are needed. For example, it took 15 days to train ChatGPT 3, using 10,000 compute units in parallel, which equates to approximately four million hours of computation. By the way, such a large number of calculations requires energy consumption, which has significant effects on the environment, not to mention the economic costs necessary to acquire such computational resources.

A scientific paper published in February 2024 by Princeton University’s Sayash Kapoor and 24 other international researchers examined the impact of “open core models” on society. These models use artificial intelligence systems that use machine learning techniques based on large amounts of data. This is a special case of the AI ​​Act’s “general purpose models” that are capable of competently performing a wide variety of different tasks, regardless of the ways in which the model is then brought to market. The article questions the assumption that the public availability of basic models helps the scientific progress of the sector and the growth of the economy based on their use without significant side effects. On the contrary, the authors highlight the existence of risks related to security and misinformation. Risks mainly due to the availability of parameters, the calculation of which, as we have already noted, requires huge computing resources. It is an information asset that allows us to enormously reduce the costs necessary to create an AI system, making it easier and indeed advantageous to those who intend to exploit it. So we are faced with a classic ethical dilemma: publish and make readily available technology that can support the growth of society, or keep it secret to prevent it from being used against society itself. A similar dilemma appeared with other potentially dangerous technologies, including the atomic one. The current trend is to prefer partial openness, to conceal only some essential elements, parameters in the case of basic models. This is the path taken by Mistral, although Microsoft’s recent entry into the equity capital of the French company that developed the artificial intelligence system of the same name suggests a change in strategy.

When broadening the horizon, it is necessary to take into account the issue of digital sovereignty and especially European sovereignty. The entry into force of numerous regulations on the topic of digital services, data access and privacy protection laid the foundations for what is referred to as the European Union’s technological and economic perimeter, which is a response to other countries’ aggressive era policies. international actors. The declaration and protection of the rights of European citizens has so far attached great importance to freedom, in the various forms in which it manifests itself: freedom of movement, freedom of movement of capital and, more generally, the freedom of people. The development of artificial intelligence systems raises a number of questions in a geopolitical context related to the risks inherent in their use. Science can and must provide the conceptual tools to enable policy to assess these risks and implement adequate mitigation strategies.


Leave a Comment