Research finds traces of child abuse imagery in AI Image datasets

Artificial intelligence is advancing every day; we know that. Its ability to generate images has long been a subject of discussion. Even in recent times, such images have been used to manipulate information and produce fabricated media. To address this issue somewhat, Google released the “About this image” tool, which provides the source and background of an image, including metadata if accessible. This tool was first announced during the Google I/O Developer Conference 2023. However, recent revelations have indicated that the training dataset for AI image generation contained links to child abuse imagery. This is not only problematic but also very concerning, given the strict worldwide rules for the circulation of such content. For instance, Federal Law in the United States makes it illegal; if found guilty, one could face up to life imprisonment and a fine of up to $250,000.

Stanford researchers found traces of child abuse imagery in generative AI

The researchers at Stanford University (Stanford Internet Observatory) investigated AI image generation datasets. They discovered that the LAION-5B dataset, utilized by Stability AI’s Stable Diffusion and Google’s Imagen image generators, has come under scrutiny for containing a minimum of 1,679 illegal images sourced from various social media posts and notable adult websites.

Starting in September 2023, a group of researchers examined the LAION dataset closely to determine if it contained any inappropriate images of children. They primarily used special codes called “image hashes” to check the images. They then employed tools like PhotoDNA to confirm their findings, and experts from the Canadian Centre for Child Protection also reviewed and agreed with their results.

Many people believe that the LAION dataset stores actual pictures, but that’s not accurate. Instead, it serves as a comprehensive index or list directing users to where they can find images online. It stores web links to these images along with the accompanying text descriptions.

LAION responds, and affirms its “zero-tolerance policy.”

LAION, the non-profit organization managing the dataset, informed Bloomberg that they maintain a “zero-tolerance policy” against harmful content. And they would temporarily take the datasets offline. In response to the same report, Stability AI emphasized its policies to prevent misuse of its platforms. They clarified that although their models were trained using portions of the LAION-5B dataset. However, they specifically refined and adjusted them with safety concerns in mind.

Although researchers highlighted traces of child abuse imagery in the datasets, they explained that this doesn’t necessarily affect the model’s results. However, they cautioned that there remains a potential risk that the model might have extracted undesirable information from the images.