Performance of ChatGPT and BARD AI in the National Examination for the Revalidation of Medical Diplomas in Brazil

Authors

DOI:

https://doi.org/10.13037/ras.vol22.e20249478

Keywords:

Artificial Intelligence, Education, Medical, Technological Development

Abstract

BACKGROUND: ChatGPT and Bard AI are artificial intelligence tools designed to generate human-like language and perform a wide range of tasks. These tools have been studied for various applications, including in the field of medical education, assessing their performance in relevant exams for professional practice. OBJECTIVE: The aim of this study was to evaluate and compare the performance of ChatGPT-3.5 and Bard AI in responding to questions from the 2023 Brazilian national exam for the revalidation of medical diplomas. METHODS: Objective exam questions were input into the tools, and the obtained responses were compared to official answer keys. Questions were categorized by area, scenario, and complexity. RESULTS: Results showed that both tools achieved over 60% accuracy, with Bard AI outperforming ChatGPT-3.5. No statistically significant differences were found in tool performance when questions were categorized by area, scenario, or complexity. CONCLUSIONS: It is crucial for healthcare professionals to recognize the potential and limitations of these tools, and further research is needed to effectively integrate them into medical education.

Downloads

Author Biographies

Fernanda Gabriele Fernandes Morais, Universidade Federal de Juiz de Fora

Acadêmica de Medicina da Universidade Federal de Juiz de Fora (UFJF). Juiz de Fora, Minas Gerais, Brasil.

Sabrine Teixeira Ferraz Grunewald, Universidade Federal de Juiz de Fora

Professora Adjunta da Faculdade de Medicina da Universidade Federal de Juiz de Fora (UFJF), Departamento Materno-Infantil. Juiz de Fora, Minas Gerais, Brasil.

References

Schwenk, H. Continuous Space Language Models. Computer Speech & Language, vol. 21, n. 3, Jul. 2007, p. 492–518.

Singh, S. K.; Kumar, S.; Mehra, P. S. Chat GPT & Google Bard AI: a Review. In: International Conference on Iot, Communication and Automation Technology (ICICAT). 23 jun. 2023, doi: 10.1109/ICICAT57735.2023.10263706. Acessado em 01 fev. 2024.

Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, vol. 11, no. 6, 19 Mar. 2023, p. 887, doi: 10.3390/healthcare11060887. Acessado em 01 fev. 2024

Gilson, A., et al. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Medical Education, vol. 9, Fev. 2023, e45312, doi: 10.2196/45312. Acessado em 01 fev. 2024.

Takagi, S., et al. Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Medical Education, vol. 9, Jun. 2023, e48002, doi: 10.2196/48002. Acessado em 01 fev. 2024.

Brasil. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. Painel Revalida. Brasília: Inep, 2022. Disponível em: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/inep-data/painel-revalida. Acessado em 01 fev. 2024.

Gobira, M., et al. Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation. Revista da Associação Médica Brasileira, vol. 69, n. 10, 2023, p. 1-5, doi:10.1590/1806-9282.20230848. Acessado em 01 fev. 2024.

Ferraz, A.P.C.M.; Belhot, R.V. Taxonomia de Bloom: revisão teórica e apresentação das adequações do instrumento para definição de objetivos instrucionais. Gestão & Produção, vol. 17, n. 4, 2010, p. 421-31.

Aragão J.C.S., et al. Evaluation of Residency Admission Exams. Revista Brasileira de Educação Médica, vol. 42, n. 2, Abr. 2018, p. 26-33, doi:10.1590/1981-52712015v421n2RB20170016. Acessado em 01 fev. 2024.

Casiraghi, B., et al. Avaliação de questões de prova do Revalida no Brasil. 2019. XV Congreso Internacional Gallego-Portugués De Psicopedagogía. Disponível em: https://ruc.udc.es/dspace/handle/2183/23486. Acessado em 01 fev. 2024.

Wójcik, S., et al. Reshaping medical education: Performance of ChatGPT on a PES medical examination. Cardiology Journal, Out. 2023, doi: 10.5603/cj.97517. Acessado em 01 fev. 2024.

Dhanvijay, A.K.D., et al. Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology. Cureus. vol. 15, n. 8, Ago. 2023, p. e42972.

Kumari, A., et al. Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing. Cureus., vol. 15, n. 8, Ago. 2023, p. e43861.

Agarwal, M.; Sharma, P.; Goswami, A. Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology. Cureus. vol. 15, n. 6, Jun. 2023, p. e40977.

Brasil. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. Exame Nacional de Revalidação de Diplomas Médicos Expedidos por Instituição de Educação Superior Estrangeira (Revalida). Brasília: Inep, 2024. Disponível em: https://www.gov.br/inep/pt-br/areas-de-atuacao/avaliacao-e-exames-educacionais/revalida. Acessado em 01 fev. 2024.

Published

2024-12-18

Issue

Section

Edição Especial Temática- Inovação no Ensino em Saúde

Similar Articles

1 2 3 4 5 6 7 8 > >> 

You may also start an advanced similarity search for this article.