Edit Model collapse

Title:

Summary:

A brief summary that will appear in the project listing.

Image URL:

URL to an image that represents this research project.

Editor Type:

HTML Rich

Toggle between rich text editor and HTML editing mode.

Content:

There is a research paper that says if we continue to generate text data from LLMs the LLM training will be at risk.&nbsp;
Consider the first generation of LLM is trained on human-generated data. This first generation of model will create a new set of text on the internet. When we try to train second generation model, the data source will contain some set of text generated by first generation of model. So in some sense the model will not learn anything new as compared to first generation of model.&nbsp;
When most of the training data for LLM is polluted, the human interaction data with LLM becomes valuable. This will highly be the non-tampered, human-generated data.&nbsp;
&nbsp;
<h4>References: </h4>
<a href="https://www.nature.com/articles/s41586-024-07566-y" target="_blank" rel="noopener">AI models collapse when trained on recursively generated data</a>

Format your content with the rich text editor or use HTML tags directly.

Feature this project on the homepage

Update Research Project