Performance of ChatGPT and BARD AI in the National Examination for the Revalidation of Medical Diplomas in Brazil
DOI:
https://doi.org/10.13037/ras.vol22.e20249478Keywords:
Artificial Intelligence, Education, Medical, Technological DevelopmentAbstract
BACKGROUND: ChatGPT and Bard AI are artificial intelligence tools designed to generate human-like language and perform a wide range of tasks. These tools have been studied for various applications, including in the field of medical education, assessing their performance in relevant exams for professional practice. OBJECTIVE: The aim of this study was to evaluate and compare the performance of ChatGPT-3.5 and Bard AI in responding to questions from the 2023 Brazilian national exam for the revalidation of medical diplomas. METHODS: Objective exam questions were input into the tools, and the obtained responses were compared to official answer keys. Questions were categorized by area, scenario, and complexity. RESULTS: Results showed that both tools achieved over 60% accuracy, with Bard AI outperforming ChatGPT-3.5. No statistically significant differences were found in tool performance when questions were categorized by area, scenario, or complexity. CONCLUSIONS: It is crucial for healthcare professionals to recognize the potential and limitations of these tools, and further research is needed to effectively integrate them into medical education.
Downloads
References
Schwenk, H. Continuous Space Language Models. Computer Speech & Language, vol. 21, n. 3, Jul. 2007, p. 492–518.
Singh, S. K.; Kumar, S.; Mehra, P. S. Chat GPT & Google Bard AI: a Review. In: International Conference on Iot, Communication and Automation Technology (ICICAT). 23 jun. 2023, doi: 10.1109/ICICAT57735.2023.10263706. Acessado em 01 fev. 2024.
Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, vol. 11, no. 6, 19 Mar. 2023, p. 887, doi: 10.3390/healthcare11060887. Acessado em 01 fev. 2024
Gilson, A., et al. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Medical Education, vol. 9, Fev. 2023, e45312, doi: 10.2196/45312. Acessado em 01 fev. 2024.
Takagi, S., et al. Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Medical Education, vol. 9, Jun. 2023, e48002, doi: 10.2196/48002. Acessado em 01 fev. 2024.
Brasil. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. Painel Revalida. Brasília: Inep, 2022. Disponível em: https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/inep-data/painel-revalida. Acessado em 01 fev. 2024.
Gobira, M., et al. Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation. Revista da Associação Médica Brasileira, vol. 69, n. 10, 2023, p. 1-5, doi:10.1590/1806-9282.20230848. Acessado em 01 fev. 2024.
Ferraz, A.P.C.M.; Belhot, R.V. Taxonomia de Bloom: revisão teórica e apresentação das adequações do instrumento para definição de objetivos instrucionais. Gestão & Produção, vol. 17, n. 4, 2010, p. 421-31.
Aragão J.C.S., et al. Evaluation of Residency Admission Exams. Revista Brasileira de Educação Médica, vol. 42, n. 2, Abr. 2018, p. 26-33, doi:10.1590/1981-52712015v421n2RB20170016. Acessado em 01 fev. 2024.
Casiraghi, B., et al. Avaliação de questões de prova do Revalida no Brasil. 2019. XV Congreso Internacional Gallego-Portugués De Psicopedagogía. Disponível em: https://ruc.udc.es/dspace/handle/2183/23486. Acessado em 01 fev. 2024.
Wójcik, S., et al. Reshaping medical education: Performance of ChatGPT on a PES medical examination. Cardiology Journal, Out. 2023, doi: 10.5603/cj.97517. Acessado em 01 fev. 2024.
Dhanvijay, A.K.D., et al. Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology. Cureus. vol. 15, n. 8, Ago. 2023, p. e42972.
Kumari, A., et al. Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing. Cureus., vol. 15, n. 8, Ago. 2023, p. e43861.
Agarwal, M.; Sharma, P.; Goswami, A. Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology. Cureus. vol. 15, n. 6, Jun. 2023, p. e40977.
Brasil. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. Exame Nacional de Revalidação de Diplomas Médicos Expedidos por Instituição de Educação Superior Estrangeira (Revalida). Brasília: Inep, 2024. Disponível em: https://www.gov.br/inep/pt-br/areas-de-atuacao/avaliacao-e-exames-educacionais/revalida. Acessado em 01 fev. 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Fernanda Gabriele Fernandes Morais, Sabrine Teixeira Ferraz Grunewald

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Policy Proposal for Journals offering Free Delayed Access
Authors who publish in this magazine agree to the following terms:
- Authors maintain the copyright and grant the journal the right to the first publication, with the work simultaneously licensed under a Creative Commons Attribution License after publication, allowing the sharing of the work with recognition of the authorship of the work and initial publication in this journal.
- Authors are authorized to assume additional contracts separately, for non-exclusive distribution of the version of the work published in this magazine (eg, publishing in institutional repository or as a book chapter), with the acknowledgment of the authorship and initial publication in this journal.
- Authors are allowed and encouraged to publish and distribute their work online (eg in institutional repositories or on their personal page) at any point before or during the editorial process, as this can generate productive changes, as well as increase impact and citation of the published work (See The Effect of Open Access).