Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain
Yanshen Sun, Jianfeng He
Abstract
The pervasive influence of misinformation has far-reaching and detrimental effects on both individuals and society. The COVID-19 pandemic has witnessed an alarming surge in the dissemination of medical misinformation. However, existing datasets pertaining to misinformation predominantly focus on textual information, neglecting the inclusion of visual elements, and tend to center solely on COVID-19-related misinformation, overlooking misinformation surrounding other diseases. Furthermore, the potential of Large Language Models (LLMs), such as the ChatGPT developed in late 2022, in generating misinformation has been overlooked in previous works. To overcome these limitations, we present Med-MMHL, a novel multi-modal misinformation detection dataset in a general medical domain encompassing multiple diseases. Med-MMHL not only incorporates human-generated misinformation but also includes misinformation generated by LLMs like ChatGPT. Our dataset aims to facilitate comprehensive research and development of methodologies for detecting misinformation across diverse diseases and various scenarios, including human and LLM-generated misinformation detection at the sentence, document, and multi-modal levels. To access our dataset and code, visit our GitHub repository: \url{this https URL}.
People
Publication Details
- Date of publication:
- June 15, 2023
- Journal:
- Cornell University
- Publication note:
Yanshen Sun, Jianfeng He, Shuo Lei, Limeng Cui, Chang-Tien Lu: Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain. CoRR abs/2306.08871 (2023)