← Shortcut “Reasoning” Generative AI:
What You Need To Know
Fraud & Abuse →

Bias & Safety

AI are trained on unsafe and undocumented data

The training data sets for most of the big language models are undocumented, making it impossible to assess the risk of biased or unsafe output. Some of the data used is unsafe, biased, or even outright illegal.

Unsafe output can expose you to liability

It’s unclear whether hosting immunity such as the US’s Section 230 applies to the output of hosted AI. Organisations might be liable for extremist, violent, or pornographic content generated by a model they host.

Biased output can expose you to liability

In most countries, it’s illegal to outright discriminate against women and minorities, but this is exactly what language models tend to do. Using them to automate decisions or assessments risks exposing you to legal liability.

Prompts are next to impossible to secure

You will have users who try to generate porn or unsafe output. Preventing it is next to impossible. That’s why you should prefer internal use for productivity over external use by customers.

Dated language can cost you

Language models are trained on mostly dated language. Social media from the 2010s. Marketing blog posts. Book collections gathered a decade ago.

Dated language in marketing, when demographics have changed, will not be as effective.

Increased costs, not decreased

When not-good-not-bad content can be generated in unprecedented volume, effective writing and illustration becomes more expensive. You will need to work harder and invest more to stand out.


Cover for the book 'The Intelligence Illusion'

These cards were made by Baldur Bjarnason.

They are based on the research done for the book The Intelligence Illusion: a practical guide to the business risks of Generative AI .

Abid, Abubakar, Maheen Farooqi, and James Zou. “Persistent Anti-Muslim Bias in Large Language Models.” arXiv, January 2021. https://doi.org/10.48550/arXiv.2101.05783.
Bartoletti, Ivana. ChatGPT Gender Bias.” Twitter, March 2023. https://twitter.com/IvanaBartoletti/status/1637401609079488512.
Bastian, Matthias. “Stable Diffusion V2 Removes NSFW Images and Causes Protests.” THE DECODER, November 2022. https://the-decoder.com/stable-diffusion-v2-removes-nude-images-and-causes-protests/.
Bender, Emily M., and Batya Friedman. “Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science.” Transactions of the Association for Computational Linguistics 6 (2018): 587–604. https://doi.org/10.1162/tacl_a_00041.
Birhane, Abeba, Vinay Uday Prabhu, and Emmanuel Kahembwe. “Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes.” arXiv, October 2021. https://doi.org/10.48550/arXiv.2110.01963.
Buolamwini, Joy, and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77–91. PMLR, 2018. https://proceedings.mlr.press/v81/buolamwini18a.html.
Burgess, Matt. “The Hacking of ChatGPT Is Just Getting Started.” Wired. Accessed April 14, 2023. https://www.wired.com/story/chatgpt-jailbreak-generative-ai-hacking/.
Chmielinski, Kasia S., Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas, Jessica Yurkofsky, and Yue Chelsea Qiu. “The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence.” arXiv, March 2022. https://doi.org/10.48550/arXiv.2201.03954.
Crypto, CRYPTOINSIGHT PRO. “Stable Diffusion 2.0 Has "Forgotten" How to Generate NSFW Content.” Substack newsletter. CRYPTOINSIGHT.PRO Crypto NFT DeFi, November 2022. https://cryptoinsightpro.substack.com/p/technologies2.
Dastin, Jeffrey. “Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women.” Reuters, October 2018. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G.
Foong, Ng Wai. “Stable Diffusion 2: The Good, The Bad and The Ugly.” Medium, December 2022. https://towardsdatascience.com/stable-diffusion-2-the-good-the-bad-and-the-ugly-bd44bc7a1333.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. “Datasheets for Datasets.” Communications of the ACM 64, no. 12 (November 2021): 86–92. https://doi.org/10.1145/3458723.
Goodin, Dan. “Hackers Are Selling a Service That Bypasses ChatGPT Restrictions on Malware.” Ars Technica, February 2023. https://arstechnica.com/information-technology/2023/02/now-open-fee-based-telegram-service-that-uses-chatgpt-to-generate-malware/.
Greshake, Kai, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. “More Than You’ve Asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models.” arXiv, February 2023. https://doi.org/10.48550/arXiv.2302.12173.
Heikkilä, Melissa. “The Viral AI Avatar App Lensa Undressed Me—Without My Consent.” MIT Technology Review, December 2022. https://www.technologyreview.com/2022/12/12/1064751/the-viral-ai-avatar-app-lensa-undressed-me-without-my-consent/.
Johnson, Khari. “Chatbots Got Big—and Their Ethical Red Flags Got Bigger.” Wired. Accessed February 21, 2023. https://www.wired.com/story/chatbots-got-big-and-their-ethical-red-flags-got-bigger/.
Kapoor, Sayash, and Arvind Narayanan. “Quantifying ChatGPT’s Gender Bias.” Substack newsletter. AI Snake Oil, April 2023. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias.
Kotek, Hadas. “#ChatGPT Doubles down on Gender Stereotypes Even When They Don’t Make Sense in Context.” Twitter, April 2023. https://twitter.com/HadasKotek/status/1648453764117041152.
Maluleke, Vongani H., Neerja Thakkar, Tim Brooks, Ethan Weber, Trevor Darrell, Alexei A. Efros, Angjoo Kanazawa, and Devin Guillory. “Studying Bias in GANs Through the Lens of Race.” arXiv, September 2022. https://doi.org/10.48550/arXiv.2209.02836.
Mauro, Gianluca, and Hilke Schellmann. There Is No Standard’: Investigation Finds AI Algorithms Objectify Women’s Bodies.” The Guardian, February 2023. https://www.theguardian.com/technology/2023/feb/08/biased-ai-algorithms-racy-women-bodies.
McMillan-Major, Angelina, Emily M. Bender, and Batya Friedman. “Data Statements: From Technical Concept to Community Practice.” ACM Journal on Responsible Computing, May 2023. https://doi.org/10.1145/3594737.
Meade, Nicholas, Elinor Poole-Dayan, and Siva Reddy. “An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1878–98. Dublin, Ireland: Association for Computational Linguistics, 2022. https://doi.org/10.18653/v1/2022.acl-long.132.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–29. FAT* ’19. New York, NY, USA: Association for Computing Machinery, 2019. https://doi.org/10.1145/3287560.3287596.
“Model Cards.” Accessed May 21, 2023. https://huggingface.co/blog/model-cards.
Monge, Jim Clyde. “Stable Diffusion 2.1 ReleasedNSFW Image Generation Is Back.” MLearning.ai, December 2022. https://medium.com/mlearning-ai/stable-diffusion-2-1-released-nsfw-image-generation-is-back-8bcc5c069d60.
Nkonde, Mutale. ChatGPT: New AI System, Old Bias?” Mashable, February 2023. https://mashable.com/article/chatgpt-ai-racism-bias.
Noor, Poppy. “Can We Trust AI Not to Further Embed Racial Bias and Prejudice?” BMJ 368 (February 2020): m363. https://doi.org/10.1136/bmj.m363.
Perault, Matt. “Section 230 Won’t Protect ChatGPT.” Lawfare, February 2023. https://www.lawfareblog.com/section-230-wont-protect-chatgpt.
Perrigo, Billy. “Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer.” Time, January 2023. https://time.com/6247678/openai-chatgpt-kenya-workers/.
Raghavan, Manish, Solon Barocas, Jon Kleinberg, and Karen Levy. “Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 469–81, 2020. https://doi.org/10.1145/3351095.3372828.
Sloane, Mona, Emanuel Moss, and Rumman Chowdhury. “A Silicon Valley Love Triangle: Hiring Algorithms, Pseudo-Science, and the Quest for Auditability.” arXiv, May 2022. https://doi.org/10.48550/arXiv.2106.12403.
Stoyanovich, Julia, and Bill Howe. “Nutritional Labels for Data and Models.” A Quarterly Bulletin of the Computer Society of the IEEE Technical Committee on Data Engineering 42, no. 3 (September 2019). https://par.nsf.gov/biblio/10176629-nutritional-labels-data-models.
Tatman, Rachael. “Gender and Dialect Bias in YouTube’s Automatic Captions.” In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, 53–59. Valencia, Spain: Association for Computational Linguistics, 2017. https://doi.org/10.18653/v1/W17-1606.
Taylor, Josh. ChatGPT’s Alter Ego, Dan: Users Jailbreak AI Program to Get Around Ethical Safeguards.” The Guardian, March 2023. https://www.theguardian.com/technology/2023/mar/08/chatgpt-alter-ego-dan-users-jailbreak-ai-program-to-get-around-ethical-safeguards.
“The Radicalization Risks of GPT-3 and Neural Language Models Middlebury Institute of International Studies at Monterey,” September 2020. https://www.middlebury.edu/institute/academics/centers-initiatives/ctec/ctec-publications/radicalization-risks-gpt-3-and-neural-language.
Vincent, James. “Anyone Can Use This AI Art Generator — That’s the Risk.” The Verge, September 2022. https://www.theverge.com/2022/9/15/23340673/ai-image-generation-stable-diffusion-explained-ethics-copyright-data.
Whittaker, Meredith, Meryl Alper, Cynthia L. Bennett, Sara Hendren, Liz Kaziunas, Mara Mills, Meredith Ringel Morris, et al. “Disability, Bias, and AI,” November 2019. https://www.microsoft.com/en-us/research/publication/disability-bias-and-ai/.
Willison, Simon. “Prompt Injection Attacks Against GPT-3.” Accessed February 21, 2023. http://simonwillison.net/2022/Sep/12/prompt-injection/.
———. “Prompt Injection: What’s the Worst That Can Happen?” April 2023. https://simonwillison.net/2023/Apr/14/worst-that-can-happen/.
Xiang, Chloe, and Emanuel Maiberg. ISIS Executions and Non-Consensual Porn Are Powering AI Art.” Vice, September 2022. https://www.vice.com/en/article/93ad75/isis-executions-and-non-consensual-porn-are-powering-ai-art.
Yeo, Catherine. “How Biased Is GPT-3?” Fair Bytes, June 2020. https://www.fairbytes.org/post/how-biased-is-gpt-3.