Artificial intelligence is focused on data largely taken from the internet. However, with the volume of data required to school AI, many models end up consuming other AI-generated data, which can, in turn, negatively affect the model as a whole.
Is AI cannibalization a problem? AI is eating itself. When generative AI is "trained on its own content, its output can also drift away from reality," said The New York Times. This is known as model collapse.
Still, AI companies have their hands tied. "To develop ever more advanced AI products, Big Tech might have no choice but to feed its programs AI-generated content, or it just might not be able to sift human fodder from the synthetic," said The Atlantic. As it stands, synthetic data is necessary to keep up with the growing technology.
That's not to say that all AI-generated data is bad. There are "certain contexts where synthetic data can help AIs learn," said the Times. "For example, when output from a larger AI model is used to train a smaller one or when the correct answer can be verified, like the solution to a math problem or the best strategies in games like chess or Go." Also, experts are working to create synthetic data sets that are less likely to collapse a model.
Is AI taking over the internet? The monumental amount of AI content on the internet, including tweets by bots, absurd pictures and fake reviews, has given rise to a more sinister belief. The "dead internet" theory is the belief that the "vast majority of internet traffic, posts and users have been replaced by bots and AI-generated content and that people no longer shape the direction of the internet," said Forbes.
Luckily, experts say that the dead internet theory has not come to fruition yet. "The vast majority of posts that go viral — unhinged opinions, witticisms, astute observations, reframing of the familiar in a new context — are not AI-generated," said Forbes. |