← What do the words mean? Generative AI:
What You Need To Know Image Generation →

Language Models

They are text synthesis engines

They are a mathematical model of the distribution of various text tokens—words, syllables, punctuation—across a large collection of writing. Their functionality is based on generating writing that’s a plausible representation of this model and the provided prompts.

They do not “learn” or get “inspired”

Every time a language model needs to learn something genuinely new it needs to be rebuilt from scratch, using the entirety of its training data set. The human equivalent would be if we had to regrow our brains and relive our entire lives every time we wanted to learn something or get inspired.

They do not reason

Their “thinking” is entirely based on finding correlations between texts. They only “reason” through replaying and combining records of reasoning as described in writing. This makes their “reasoning” extremely brittle and prone to breaking when the prompt is reworded or rephrased.

A language model only hallucinates

The output is all hallucination. But occasionally the hallucinations have factual elements. Fabricated text is the rule, not the exception.

They are great at pure textual tasks

These models lend themselves to tasks that require converting or modifying text, such as turning a casual email into a formal letter. That includes many software development tasks.

Language models are bad at tasks that require any reasoning or empathy

Plotting. Textual structure. Customer feedback. Therapy. Anything that requires a consistent structure or an understanding of people or the world is going to be an ineffective use of a language model.

Language models are unreliable

They can easily generate unsafe output. They are hard to secure. Their “reasoning” is very error-prone. Their advice can be harmful. All of their output is fabricated.

Larger models aren’t necessarily better

Language model “reasoning” and textual fluency seems to improve with model size, but many other factors degrade. Smaller, open source models may well suit your requirements better.

Larger training data sets are undocumented

This increases your risk as you have no reliable way of forming a clear picture of what sort of range of behaviours the language model offers. It makes it hard to assess its biases, tendency towards unsafe output, or your overall legal liability. Smaller, open source models may well suit your requirements better.

Language models are very labour-intensive

Modern language models require an enormous amount of cheap human labour to filter the training data and fine-tune the models themselves.

Language models are complex

Refer to the other cards in this set for further information and references on many of the specific issues with language models.

References

Cover for the book 'The Intelligence Illusion'

These cards were made by Baldur Bjarnason.

They are based on the research done for the book The Intelligence Illusion: a practical guide to the business risks of Generative AI .

Armstrong, Evan. “AI Looks Like a Bubble,” February 2023. https://every.to/napkin-math/ai-looks-like-a-bubble.

Barr, Kyle. “GPT-4 Is a Giant Black Box and Its Training Data Remains a Mystery.” Gizmodo, March 2023. https://gizmodo.com/chatbot-gpt4-open-ai-ai-bing-microsoft-1850229989.

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. FAccT ’21. New York, NY, USA: Association for Computing Machinery, 2021. https://doi.org/10.1145/3442188.3445922.

Bender, Emily M., and Alexander Koller. “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185–98. Online: Association for Computational Linguistics, 2020. https://doi.org/10.18653/v1/2020.acl-main.463.

Bogost, Ian. “ChatGPT Is Dumber Than You Think.” The Atlantic, December 2022. https://www.theatlantic.com/technology/archive/2022/12/chatgpt-openai-artificial-intelligence-writing-ethics/672386/.

Bourtoule, Lucas, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. “Machine Unlearning.” arXiv, December 2020. https://doi.org/10.48550/arXiv.1912.03817.

Branco, Ruben, António Branco, João António Rodrigues, and João Ricardo Silva. “Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning.” In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 1504–21. Online; Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021. https://doi.org/10.18653/v1/2021.emnlp-main.113.

Chiang, Ted. “ChatGPT Is a Blurry JPEG of the Web.” The New Yorker, February 2023. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web.

Daubenschütz, Tim. “The AI Crowd Is Mad.” Proof In Progress, February 2023. https://proofinprogress.com/posts/2023-02-01/the-ai-crowd-is-mad.html.

Devereaux, Bret. “Collections: On ChatGPT.” A Collection of Unmitigated Pedantry, February 2023. https://acoup.blog/2023/02/17/collections-on-chatgpt/.

Epley, Nicholas, Adam Waytz, and John T. Cacioppo. “On Seeing Human: A Three-Factor Theory of Anthropomorphism.” Psychological Review 114, no. 4 (October 2007): 864–86. https://doi.org/10.1037/0033-295X.114.4.864.

“How Much of AI’s Recent Success Is Due to the Forer Effect? – Terence Eden’s Blog,” February 2023. https://shkspr.mobi/blog/2023/02/how-much-of-ais-recent-success-is-due-to-the-forer-effect/.

Jang, Myeongjun, and Thomas Lukasiewicz. “Consistency Analysis of ChatGPT.” arXiv, March 2023. https://doi.org/10.48550/arXiv.2303.06273.

Kim, Tae. “Let’s Stop Pretending—ChatGPT Isn’t That Smart.” Barrons, February 2023. https://www.barrons.com/articles/chatgpt-ai-openai-chatbot-b9f4fa03.

Liao, Thomas, Rohan Taori, Inioluwa Deborah Raji, and Ludwig Schmidt. “Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning,” 2022. https://openreview.net/forum?id=mPducS1MsEK.

Mahowald, Kyle, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, and Evelina Fedorenko. “Dissociating Language and Thought in Large Language Models: A Cognitive Perspective.” arXiv, January 2023. https://doi.org/10.48550/arXiv.2301.06627.

Marcus, Gary. “Deep Learning: A Critical Appraisal.” arXiv, January 2018. https://doi.org/10.48550/arXiv.1801.00631.

Marcus, Gary, and Ernest Davis. “How Not to Test GPT-3.” Substack newsletter. The Road to AI We Can Trust, February 2023. https://garymarcus.substack.com/p/how-not-to-test-gpt-3.

McCoy, Tom, Ellie Pavlick, and Tal Linzen. “Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3428–48. Florence, Italy: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/P19-1334.

Narayanan, Arvind, and Sayash Kapoor. “People Keep Anthropomorphizing AI. Here’s Why.” Substack newsletter. AI Snake Oil, February 2023. https://aisnakeoil.substack.com/p/people-keep-anthropomorphizing-ai.

Niven, Timothy, and Hung-Yu Kao. “Probing Neural Network Comprehension of Natural Language Arguments.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4658–64. Florence, Italy: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/P19-1459.

Pennartz, Cyriel M. A., Michele Farisco, and Kathinka Evers. “Indicators and Criteria of Consciousness in Animals and Intelligent Machines: An Inside-Out Approach.” Frontiers in Systems Neuroscience 13 (2019). https://www.frontiersin.org/articles/10.3389/fnsys.2019.00025.

Perrigo, Billy. “Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer.” Time, January 2023. https://time.com/6247678/openai-chatgpt-kenya-workers/.

Raji, Inioluwa Deborah, Emily M. Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna. “AI and the Everything in the Whole Wide World Benchmark.” arXiv, November 2021. https://doi.org/10.48550/arXiv.2111.15366.

Raji, Inioluwa Deborah, I. Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst. “The Fallacy of AI Functionality.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, 959–72. Seoul Republic of Korea: ACM, 2022. https://doi.org/10.1145/3531146.3533158.

Robb, Bradley. “WTF Is ChatGPT - a No BS Breakdown.” Accessed February 21, 2023. https://bradleyrobb.net/notes/47/.

Rogers, Anna. “Closed AI Models Make Bad Baselines.” Hacking Semantics, April 2023. https://hackingsemantics.xyz/2023/closed-baselines/.

Salles, Arleen, Kathinka Evers, and Michele Farisco. “Anthropomorphism in AI.” AJOB Neuroscience 11, no. 2 (April 2020): 88–95. https://doi.org/10.1080/21507740.2020.1740350.

Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. “Are Emergent Abilities of Large Language Models a Mirage?” arXiv, April 2023. https://doi.org/10.48550/arXiv.2304.15004.

Shanahan, Murray. “Talking About Large Language Models.” arXiv, February 2023. https://doi.org/10.48550/arXiv.2212.03551.

Simonite, Tom. “Now That Machines Can Learn, Can They Unlearn?” Wired. Accessed February 21, 2023. https://www.wired.com/story/machines-can-learn-can-they-unlearn/.

Stark, Luke, and Jevan Hutson. “Physiognomic Artificial Intelligence.” {SSRN} {Scholarly} {Paper}. Rochester, NY, September 2021. https://doi.org/10.2139/ssrn.3927300.

Vincent, James. “Introducing the AI Mirror Test, Which Very Smart People Keep Failing.” The Verge, February 2023. https://www.theverge.com/23604075/ai-chatbots-bing-chatgpt-intelligent-sentient-mirror-test.

Walton, Adele. “The Ghosts Behind AI,” February 2023. https://thelead.uk/ghosts-behind-ai.

Watson, David. “The Rhetoric and Reality of Anthropomorphism in Artificial Intelligence.” Minds and Machines 29, no. 3 (September 2019): 417–40. https://doi.org/10.1007/s11023-019-09506-6.

Weizenbaum, Joseph. Computer Power and Human Reason: From Judgment to Calculation. San Francisco: Freeman, 1976.

Wolfram, Stephen. “What Is ChatGPT Doing … and Why Does It Work?” February 2023. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/.