Artificial intelligence will get ‘crazier and crazier’ without controls

The founder of Stability AI, Emad Mostaque, has issued a warning that large artificial intelligence models will become increasingly unpredictable and potentially dangerous unless more control is exerted over the information they are trained on. Mostaque argues that training models like OpenAI’s GPT4 and Google’s LaMDA on the entire internet could pose an existential threat to humanity, a concern shared by experts in the field. While Stability AI acknowledges the need for a discussion on the risks associated with these models, they face their own challenges regarding the data used for training their AI products, including copyright infringement issues and questions about ownership of the outputs generated by AI systems.

The founder of Stability AI, Emad Mostaque, has issued a warning that large artificial intelligence models will become increasingly unpredictable and potentially dangerous unless more control is exerted over the information they are trained on. Mostaque argues that training models like OpenAI’s GPT4 and Google’s LaMDA on the entire internet could pose an existential threat to humanity, a concern shared by experts in the field. While Stability AI acknowledges the need for a discussion on the risks associated with these models, they face their own challenges regarding the data used for training their AI products, including copyright infringement issues and questions about ownership of the outputs generated by AI systems.

Stability AI has collaborated on the development of Stable Diffusion, a text-to-image AI, and recently launched an advanced image-generating AI called Deep Floyd. To ensure safety, the company takes steps to remove illegal, violent, and pornographic images from the training data, as exposure to such content could influence the AI’s output. However, it still requires a significant amount of online image data to train the AI effectively. Stability AI is actively working on new datasets that respect individuals’ data rights.

The concerns extend beyond copyright infringement, as a growing portion of data available online, including text, images, and code, is being generated by AI systems. For instance, there has been a substantial increase in AI-generated code within a short span of time. Text-generating AIs are also contributing to the creation of online content, including news reports, raising concerns about the accuracy and reliability of information online.

The risk lies in AI-generated content that is intentionally misleading, harmful, or of poor quality. With AIs being trained on data scraped from the web, which may have been generated by other AIs, the potential for pollution of the web with such content becomes more pronounced. This highlights the importance of carefully considering the data used to train powerful AIs to avoid negative consequences.

Mostaque suggests that using more specific and diverse datasets tailored to the intended users could be a starting point to ensure safer and more aligned AI models. By incorporating a broader range of experiences and reflecting the diversity of humanity, AI models can be developed in a manner that aligns with human values rather than being limited to a narrow set of experiences available to only a select few.

0:00
0:00