The AI Feedback Loop: Navigating the Risks of 'Model Collapse'

Home ➤ Blog ➤ Generative AI ➤ The AI Feedback Loop: Navigating the Risks of 'Model Collapse'

The AI Feedback Loop: Navigating the Risks of 'Model Collapse'

Alec Foster • 2023-08-01

Generative AI, AI Models

In the rapidly evolving world of AI, generative models like OpenAI's ChatGPT have become integral to the workflows of many global companies. However, as AI-generated content proliferates on the internet, a new challenge arises: What happens when AI models begin to train on AI-generated content instead of primarily human-generated content?

A recent research paper published by a group of researchers at universities in the UK and Canada delves into this very issue. The findings are concerning for the current state of generative AI technology and its future. The researchers discovered that using model-generated content in training causes irreversible defects in the resulting models, a phenomenon they refer to as 'model collapse.'

Understanding 'Model Collapse'

Model collapse occurs when the data AI models generate contaminates the training set for subsequent models. As AI models are exposed to more AI-generated data, they perform worse over time, producing more errors in the responses and content they generate, and producing far less non-erroneous variety in its responses.

The researchers found that even if 10% of the original human-authored data is used to train the model in subsequent generations, model collapse still happens, albeit not as quickly. This "pollution" with AI-generated data results in models gaining a distorted perception of reality, misinterpreting reality based on their reinforced beliefs.

The Implications of 'Model Collapse'

The implications of model collapse are far-reaching. As AI-generated content fills the internet, it becomes harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. This could lead to serious implications, such as discrimination based on gender, ethnicity, or other sensitive attributes, especially if generative AI learns over time to produce one race in its responses, while "forgetting" others exist.

How to avoid 'Model Collapse'

Fortunately, there are ways to avoid model collapse. The researchers highlight two specific strategies. The first is by retaining a prestige copy of the original exclusively or nominally human-produced dataset, and avoiding contaminating it with AI-generated data. The second way is to introduce new, clean, human-generated datasets back into their training.

However, this would require a mass labeling mechanism or effort by content producers or AI companies to differentiate between AI-generated and human-generated content, which currently does not exist on a reliable or large-scale basis.

The Future of AI and 'Model Collapse'

While the news is worrisome for current generative AI technology and the companies seeking to monetize it, there is a silver lining for human content creators. In a future filled with generative AI tools and their content, human-created content will be even more valuable than it is today — if only as a source of pristine training data for AI.

These findings underscore the risks of unchecked generative processes and emphasize the need for improved methodologies to maintain the integrity of generative models over time. As we continue to navigate the age of generative AI, it is clear that model collapse is an issue that needs to be addressed to ensure generative AI continues to improve.