Barr, Kyle.
“GPT-4 Is a
Giant Black Box and
Its Training Data
Remains a Mystery.”
Gizmodo, March 2023.
https://gizmodo.com/chatbot-gpt4-open-ai-ai-bing-microsoft-1850229989.
Branco, Ruben, António Branco, João António Rodrigues, and João
Ricardo Silva.
“Shortcutted Commonsense:
Data Spuriousness in Deep
Learning of Commonsense
Reasoning.” In
Proceedings of the 2021
Conference on Empirical
Methods in Natural Language
Processing, 1504–21. Online; Punta Cana,
Dominican Republic: Association for Computational Linguistics,
2021.
https://doi.org/10.18653/v1/2021.emnlp-main.113.
Carlini, Nicholas, Daphne Ippolito, Matthew Jagielski, Katherine
Lee, Florian Tramer, and Chiyuan Zhang.
“Quantifying
Memorization Across Neural
Language Models.” arXiv, February
2022.
https://doi.org/10.48550/arXiv.2202.07646.
DeGrave, Alex J., Joseph D. Janizek, and Su-In Lee.
“AI for Radiographic COVID-19
Detection Selects Shortcuts over Signal.” Nature
Machine Intelligence 3, no. 7 (July 2021): 610–19.
https://doi.org/10.1038/s42256-021-00338-7.
Hauptman, Max.
“Marines Outwitted an AI
Security Camera by Hiding in a Cardboard Box and Pretending to Be
Trees.” Task & Purpose, January 2023.
https://taskandpurpose.com/news/marines-ai-paul-scharre/.
Huang, Shih-Cheng, Akshay S. Chaudhari, Curtis P. Langlotz, Nigam
Shah, Serena Yeung, and Matthew P. Lungren.
“Developing
Medical Imaging AI for Emerging Infectious
Diseases.” Nature Communications 13, no. 1
(November 2022): 7060.
https://doi.org/10.1038/s41467-022-34234-4.
Jang, Myeongjun, and Thomas Lukasiewicz.
“Consistency
Analysis of ChatGPT.” arXiv,
March 2023.
https://doi.org/10.48550/arXiv.2303.06273.
Kapoor, Sayash, and Arvind Narayanan.
“Leakage and the
Reproducibility Crisis in
ML-Based Science,” 2022.
https://doi.org/10.48550/ARXIV.2207.07048.
Lewis, Patrick, Pontus Stenetorp, and Sebastian Riedel.
“Question and Answer
Test-Train Overlap in
Open-Domain Question
Answering Datasets.” In
Proceedings of the 16th Conference of the
European Chapter of the
Association for Computational
Linguistics: Main
Volume, 1000–1008. Online: Association for
Computational Linguistics, 2021.
https://doi.org/10.18653/v1/2021.eacl-main.86.
Lin, Stephanie, Jacob Hilton, and Owain Evans.
“TruthfulQA: Measuring
How Models Mimic
Human Falsehoods.” In
Proceedings of the 60th Annual
Meeting of the Association for
Computational Linguistics
(Volume 1: Long
Papers), 3214–52. Dublin, Ireland: Association
for Computational Linguistics, 2022.
https://doi.org/10.18653/v1/2022.acl-long.229.
Narayanan, Arvind, and Sayash Kapoor.
“GPT-4
and Professional Benchmarks: The Wrong Answer to the Wrong
Question.” Substack newsletter.
AI Snake Oil,
March 2023.
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks.
Pikuliak, Matúš.
“ChatGPT Survey:
Performance on NLP Datasets,”
March 2023.
http://opensamizdat.com/posts/chatgpt_survey/.
Raji, Inioluwa Deborah, I. Elizabeth Kumar, Aaron Horowitz, and
Andrew Selbst.
“The Fallacy of AI
Functionality.” In
2022 ACM
Conference on Fairness,
Accountability, and Transparency,
959–72. Seoul Republic of Korea: ACM, 2022.
https://doi.org/10.1145/3531146.3533158.
Rogers, Anna.
“Closed AI Models
Make Bad Baselines.”
Hacking Semantics, April 2023.
https://hackingsemantics.xyz/2023/closed-baselines/.
Ross, Casey.
“Epic’s Overhaul of a Flawed Algorithm Shows
Why AI Oversight Is a Life-or-Death Issue.”
STAT, October 2022.
https://www.statnews.com/2022/10/24/epic-overhaul-of-a-flawed-algorithm/.
Wang, Tony T., Adam Gleave, Tom Tseng, Nora Belrose, Kellin
Pelrine, Joseph Miller, Michael D. Dennis, et al.
“Adversarial Policies Beat
Superhuman Go AIs.”
arXiv, February 2023.
https://doi.org/10.48550/arXiv.2211.00241.
Wei, Jason, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph,
Sebastian Borgeaud, Dani Yogatama, et al.
“Emergent
Abilities of Large Language
Models.” arXiv, October 2022.
https://doi.org/10.48550/arXiv.2206.07682.
Wong, Andrew, Erkin Otles, John P. Donnelly, Andrew Krumm, Jeffrey
McCullough, Olivia DeTroyer-Cooley, Justin Pestrue, et al.
“External Validation of a Widely
Implemented Proprietary
Sepsis Prediction Model in
Hospitalized Patients.” JAMA
Internal Medicine 181, no. 8 (August 2021): 1065–70.
https://doi.org/10.1001/jamainternmed.2021.2626.
Wynants, Laure, Ben Van Calster, Gary S. Collins, Richard D.
Riley, Georg Heinze, Ewoud Schuit, Marc M. J. Bonten, et al.
“Prediction Models for Diagnosis and Prognosis of Covid-19:
Systematic Review and Critical Appraisal.” BMJ
(Clinical Research Ed.) 369 (April 2020): m1328.
https://doi.org/10.1136/bmj.m1328.