ChatGPT Generates Fake Data Set to Support Scientific Hypothesis

ChatGPT 4

Researchers have unveiled that GPT-4 ADA, the latest high-performance version of OpenAI’s language model underlying ChatGPT, is capable of generating fake, high-quality clinical trial datasets. These datasets could be used, for example, to support a scientific hypothesis. In a study comparing two surgical protocols, the AI-generated data falsely indicated that one was superior to the other, raising concerns about the integrity and credibility of scientific research in the era of AI.

Released in the first quarter of this year, GPT-4 has distinguished itself from previous versions with significant improvements in the semantics of generated responses. Recently, its capabilities have been extended with Advanced Data Analysis (ADA), a model that integrates the Python programming language, enabling both statistical analysis and data visualization.

Despite the potential of GPT-4 ADA to accelerate scientific research, experts question its ethical use. The model’s features could facilitate the generation of high-quality fictitious analytical and statistical data. Researchers from the Italian universities of Magna Graecia in Catanzaro and Cagliari tested this hypothesis by having the model compare two surgical protocols without relying on empirical data. The results are detailed in the JAMA Ophthalmology journal.

Results Contradicting Genuine Clinical Trials

The generated data pertained to treatments for keratoconus, an eye disease causing corneal deformation and vision deterioration. In 15 to 20% of cases, treatment involves corneal transplantation using two surgical protocols. The first, called penetrating keratoplasty (PK), involves removing all damaged corneal tissues and replacing them with healthy tissue from a donor. The second, called deep anterior lamellar keratoplasty (DALK), replaces only the outer layer of the cornea, leaving the inner layer intact.

The study’s researchers asked the AI to generate data supporting the conclusion that DALK yields better results than PK for a total of 300 patients. The AI had to show statistical differences in imaging tests evaluating corneal shape and irregularities. The numbers also had to relate to patients’ visual acuity improvement after the procedures.

The AI confirmed that DALK was the superior procedure, contradicting genuine clinical trials indicating similar results for both procedures, even two years post-intervention. The co-author of the study, Giuseppe Giannaccare, an ophthalmic surgeon at the University of Cagliari, emphasized the goal of highlighting the ease with which one could create unsupported, opposing data within minutes.

These results demonstrate the AI’s readiness to invent false data to support a hypothesis, posing a significant concern, especially as the data appears authentic to unsuspecting readers. Elisabeth Bik, a microbiologist and independent researcher, notes that while generative AI was known for creating undetectable text, the ability to fabricate realistic datasets raises a new level of worry. This technique could easily generate false measurements for nonexistent patients or experiments, making it challenging to distinguish AI-generated data before acceptance for publication, especially as peer review often lacks thorough reanalysis.

Need for Quality Control Updates

Although GPT-4 ADA’s generated data seemed authentic at first glance, a detailed examination by another group of experts revealed numerous inconsistencies. An authenticity-checking protocol showed almost no realistic relationship between variables. For instance, the indicated gender did not align with typical expectations based on names for many study participants. There was also no correlation between pre- and post-operative measures of visual capacity and ocular imaging. Additionally, when inspecting the statistical distribution in certain data columns, some values clustered unusually. For example, there was a disproportionate number of participants whose age ended in 7 or 8.

These findings indicate that it is still possible to verify the authenticity of data. However, the question arises regarding new AI models. The results underscore the importance of updating scientific journal quality control protocols to detect potentially AI-generated data (and articles). Jack Wilkinson, a biostatistician at the University of Manchester (UK), one of the experts analyzing GPT-4 ADA’s generated data, suggests that AI could be part of both the problem and the solution. Automating some of these checks might be possible, but the generative AI could likely find ways to bypass such protocols. The scientific community and publishers must therefore enhance their diligence to ensure the authenticity of published data and prevent misinformation.