Introduction

Language models are a useful innovation

They perform well on many natural language tasks. The more synthetic the task is—the more disconnected it is from real-world processes, structures, and communications, the better they work.

Language models are very flawed

“Hallucinations” are fundamental to how they work. They have many hard-to-solve security issues. Their output is biased—often reflecting decades-old attitudes that go down poorly in modern marketing. They aren’t nearly as functional as vendors claim.

Diffusion models are a useful innovation

They do a good job of synthesising images in a variety of styles and converting them between styles. Diffusion models are hard to control—much of their output is mediocre and getting specific results takes a lot of work. They should be a useful addition to creative work.

Diffusion models are very flawed

Like language models, they are biased in that they reflect outdated attitudes and stereotypes that do not perform well in modern marketing. Their visual styles are limited. Many of those styles have been implemented without permission from the original artist.

Both language and diffusion models are lawsuit magnets

Their exact legal status in most major jurisdictions is uncertain, pending the outcome of regulatory action and numerous lawsuits. They are being positioned to directly employees, which is likely to provoke at least some additional regulatory action.

Most of what you see and hear about AI is bullshit

The AI industry has a long tradition of poor research, over-promising, making unfounded claims, and then under-delivering.

What do the words mean?

Artificial Intelligence

Not actual intelligence. This refers to several different methods of building neural network models that can solve tasks for us. They are versatile tools even if they are currently quite unreliable.

Neural Network

A data structure based on early theories on how the brain might work. In a biological neural network each neuron is one of the more complex living cells in the body. Conversely, the “neuron” of an artificial neural network is just a number.

Machine Learning

What software developers actually mean when they say ‘AI’. Not actual learning. We don’t rebuild our brains from scratch using only numbers, by reliving our entire lives up to that point—including everything we’ve read—every time we want to learn something new.

Artificial General Intelligence

What the public thinks ‘AI’ means. This lies somewhere between science fiction and fantasy. It very definitely does not exist yet.

Training data

This is the labelled or unlabelled data that’s fed into the algorithm to incrementally build an AI model’s artificial neural network.

Model

Usually used to describe the collection of parameters of an artificial neural network together with some (but not all) of the software needed to use it. What makes the “AI” tick.

Generative AI

AI models that generate images, video, audio, text, or code.

Language Model

An AI model that has been trained on text. Smaller models are usually specialised and have limited functionality.

Large Language Model

An AI language model that’s large enough to generate fluent text and simulate convincing conversations. This is what ChatGPT is built on.

Foundation Model

A model big enough and versatile enough to be used as a building block for more specialised “fine-tuned” models.

Fine-tuning

Additional training that improves a model’s capability at a specialised task, or degrades its ability to generate a specific kind of negative outcome, sometimes at the expense of versatility.

Diffusion model

A model trained on a large collection of images that are “diffused” into and out of random noise. This “teaches” the model to see images in noise and generate them from scratch. Almost every major image-generating AI model on the market today is a diffusion model.

Language Models

They are text synthesis engines

They are a mathematical model of the distribution of various text tokens—words, syllables, punctuation—across a large collection of writing. Their functionality is based on generating writing that’s a plausible representation of this model and the provided prompts.

They do not “learn” or get “inspired”

Every time a language model needs to learn something genuinely new it needs to be rebuilt from scratch, using the entirety of its training data set. The human equivalent would be if we had to regrow our brains and relive our entire lives every time we wanted to learn something or get inspired.

They do not reason

Their “thinking” is entirely based on finding correlations between texts. They only “reason” through replaying and combining records of reasoning as described in writing. This makes their “reasoning” extremely brittle and prone to breaking when the prompt is reworded or rephrased.

A language model only hallucinates

The output is all hallucination. But occasionally the hallucinations have factual elements. Fabricated text is the rule, not the exception.

They are great at pure textual tasks

These models lend themselves to tasks that require converting or modifying text, such as turning a casual email into a formal letter. That includes many software development tasks.

Language models are bad at tasks that require any reasoning or empathy

Plotting. Textual structure. Customer feedback. Therapy. Anything that requires a consistent structure or an understanding of people or the world is going to be an ineffective use of a language model.

Language models are unreliable

They can easily generate unsafe output. They are hard to secure. Their “reasoning” is very error-prone. Their advice can be harmful. All of their output is fabricated.

Larger models aren’t necessarily better

Language model “reasoning” and textual fluency seems to improve with model size, but many other factors degrade. Smaller, open source models may well suit your requirements better.

Larger training data sets are undocumented

This increases your risk as you have no reliable way of forming a clear picture of what sort of range of behaviours the language model offers. It makes it hard to assess its biases, tendency towards unsafe output, or your overall legal liability. Smaller, open source models may well suit your requirements better.

Language models are very labour-intensive

Modern language models require an enormous amount of cheap human labour to filter the training data and fine-tune the models themselves.

Language models are complex

Refer to the other cards in this set for further information and references on many of the specific issues with language models.

Image Generation

Diffusion models can generate realistic images

These systems are trained on large image collections, which lets them mimic many common photography styles.

Those images often have flaws or defects

An AI model does not have any understanding of anatomy, physics, or three-dimensional spaces. They frequently get relative sizes, structure, and the mechanics of objects wrong. Some of these flaws can be prevented by the AI vendor, but not all.

Integration into existing software is vital

The most effective way to mitigate the limitations of a diffusion model and leverage its strengths, is to use it as an integrated tool in larger image manipulation software. It can be extremely capable at transforming or modifying existing images.

By their nature, the images are mediocre

Due to the way these systems are designed, their standalone output is neither great nor bad. It isn’t unusual to have to generate many images to get even one that’s acceptable. This can get expensive and time-consuming. Even the images that are passable often need substantial post-processing before they can be useful.

They are extremely effective at misinformation and abuse

Setting up a system to create “deepfakes” of a real person is relatively inexpensive and doesn’t require many images of the victim. This capability has been used to abuse children, drive people out of their jobs, frame victims of crimes, and spread misinformation on social media.

AI art generally doesn’t get copyright protection

All AI art should be treated as non-exclusive as a result, unless it is a direct conversion of an existing work, or gets directly integrated into a larger non-AI work.

Many AI vendors seem to be undermining artists and photographers

AI vendors train their systems on the work of artists and photographers without asking permission. They include affordances in their software specifically for copying known artists, again without permission. These tools and systems seem to be positioned as direct replacements for artists and photographers.

AI art is seen by many creative communities as an attack

Using AI art in your business will be seen as a hostile gesture by many, if not most, in creative communities.

Artificial General Intelligence

Artificial General Intelligence (AGI)

AGI is a theoretical AI capable of adaptive and robust general-purpose problem-solving. Current AI “reason” entirely through correlation, are unreliable, rely on specific wording, and work only in narrow, predefined circumstances.

AGI is Science-Fiction

Nobody in AI or tech is claiming that AGI exists. The concern is that it might exist in the near future. There is no reliable research or data that supports this assertion. Everything indicates that these models are incapable of genuine general reasoning.

The intelligence illusion

AI chatbots create a very convincing illusion of intelligence, but that falls apart under scientific scrutiny.

Anthropomorphism

What makes the intelligence illusion so strong is anthropomorphism—our tendency to see inanimate objects and non-human entities as human.

Insects are smarter than AI

Despite having a “neural network” that’s a million times less complex than GPT-4’s, bees are more capable of robust and adaptable general-purpose problem-solving than the language model. The humble bumblebee has more general-purpose smarts than ChatGPT, because the AI has none.

Believing in AGI is harmful

The myth of imminent AGI gives you a skewed mental model for how AI works and serves only to market AI solutions. Believing in it will short-circuit your ability to plan strategies around your use of AI tools.

Snake Oil

AI has a long history of pseudoscience

AI researchers have in the past made impossible claims, such as being able to detect criminality, psychopathy, or sexual orientation from head shape or gait. The field has a long history of pseudoscience.

AI has a long history of over-promising

Throughout the field’s history, the AI industry has routinely claimed that its systems are much more capable than they have turned out to be. There is every indication that this continues today.

AI has a long history of very poor research

AI research tends to be poorly structured, designed to prove a predetermined outcome, be impossible to replicate, or all of the above. Most of the research you see discussed on social media or by journalists is marketing and not structured scientific or academic research.

AI has a long history of outright fraud

The US FTC has repeatedly warned about false promises and dubious practices in the AI industry.

AI has questionable legal and regulatory compliance

Regulatory bodies in the US, Canada, and the EU have opened investigations into recent practices in the AI industry. Many have issued warnings.

AI has privacy and confidentiality issues

Machine “unlearning” is still not practical. The AI does not forget. Confidential and private data has already been leaked because of it.

AI is a magnet for lawsuits

Most of the major AI companies are facing lawsuits because of their practices surrounding copyright and personal data.

Shortcut “Reasoning”

Language models only “reason” through shortcuts

They only solve problems through statistical correlation. They effect pseudo-reasoning by replaying and combining records of prior reasoning.

Shortcut “reasoning” depends on the training data

Because their pseudo-reasoning is based on correlations, the simplest correlation will always win out. For example, an AI trained to detect COVID-19 from chest x-rays ended up only detecting the position of the patient. Prone patients were sicker and so more likely to have COVID-19.

Shortcut “reasoning” is extremely fragile

Correlative pseudo-reasoning breaks very easily. Language model reasoning often falls apart when the question is rephrased or reworded, which is the unpredictability that gives users the impression that prompts work more like magic incantations than commands.

“Reasoning” performance is due to data contamination

The effectiveness of language models at completing standardised tests, for example, seems to be entirely down to those exams, practice questions, and documentation on them being included in the training data set, making the pseudo-reasoning correlations very simple.

AI models can’t handle the genuinely new

It’s “reasoning” mechanism is entirely based on finding patterns in existing data. It will not be able to handle genuinely new problems or circumstances.

AI “reasoning” is extremely vulnerable to simple attacks

AI inability to handle novel problems makes them extremely vulnerable. An attacker can manipulate or bypass a model’s reasoning simply by employing unusual, even ridiculous, tactics.

Bias & Safety

AI are trained on unsafe and undocumented data

The training data sets for most of the big language models are undocumented, making it impossible to assess the risk of biased or unsafe output. Some of the data used is unsafe, biased, or even outright illegal.

Unsafe output can expose you to liability

It’s unclear whether hosting immunity such as the US’s Section 230 applies to the output of hosted AI. Organisations might be liable for extremist, violent, or pornographic content generated by a model they host.

Biased output can expose you to liability

In most countries, it’s illegal to outright discriminate against women and minorities, but this is exactly what language models tend to do. Using them to automate decisions or assessments risks exposing you to legal liability.

Prompts are next to impossible to secure

You will have users who try to generate porn or unsafe output. Preventing it is next to impossible. That’s why you should prefer internal use for productivity over external use by customers.

Dated language can cost you

Language models are trained on mostly dated language. Social media from the 2010s. Marketing blog posts. Book collections gathered a decade ago.

Dated language in marketing, when demographics have changed, will not be as effective.

Increased costs, not decreased

When not-good-not-bad content can be generated in unprecedented volume, effective writing and illustration becomes more expensive. You will need to work harder and invest more to stand out.

Fraud & Abuse

Deepfakes are ideal for fraud

AI tools can be used to generate fake recordings or pictures of real people. There have already been multiple instances of deep-faked voice recordings being used for extortion or fraud.

Abuse

The accessibility of generative AI have made them ideal tools for targeted harassment. Abusers can use a victim’s social media to create fake porn and fake recordings to implicate them in criminal or unethical behaviour. This has already driven innocent people out of their jobs.

Astroturfing and Social media manipulation

AI tools are already being used to manipulate social media with fake profiles and auto-generated text and images. These can only be reliably detected if the scammers are incompetent enough to allow AI-specific responses through.

Ecommerce and streaming fraud

Any paying venue for text, image, or audio is already being flooded with AI-generated output, lowering the signal-to-noise ratio for everybody on those platforms. These are often coupled with click- or streaming fraud to extract fraudulent royalty or advertising revenue.

AI “Checkers” don’t work

No existing software reliably detects AI output. Most of them are worse than nothing and will regularly classify human works as AI-generated.

Standardised tests no longer work

Any written test that is documented and standardised lends itself very well to being solved by AI tools. Any organisation that uses a standardised test to protect or safeguard a valuable process or credential will need to change their strategy.

Privacy

Training data sets seem to include personal data

This includes data both scraped from the web and provided by the customers of the AI vendor in question.

Models can’t “unlearn”, yet

Once a model has trained on a data set, removing that data from the model is difficult. “Machine Unlearning” is still immature, and it’s uncertain whether it can be made to work on models like GPT-4.

Language Models are vulnerable to many privacy attacks

Attackers can discover whether specific personal data was in the training data set. They can often reconstruct and extract specific data. Some attacks let you infer the specific properties of the data set, such as the gender ratios of a medial AI.

Hosted software has fewer privacy guarantees

Many major AI tools are hosted, which limits the privacy assurances they can make. Pasting confidential data into a ChatGPT window is effectively leaking it. Do not enter private or confidential data into hosted AI software.

Data is often reviewed by underpaid workers

Even if personal data in the training set doesn’t end up in the model itself, much of that data is reviewed by a small army of underpaid workers.

AI vendors are being investigated

Privacy regulators are looking into AI industry practices. That includes most major European countries, the EU itself, Canada, four regulatory bodies in the US, and more. The US FTC has forced tech companies in the past to delete models that were trained on unauthorised personal data.

Hallucinations

All AI text is fabricated

All AI output is fabricated based on its mathematical model of language. These answers are only factual as a side effect of the distribution of those facts in the training data. Hallucination is the default because AI is only a text synthesis engine.

Larger AI models hallucinate more

In all research so far and in all problem domains, hallucinations tend to increase in frequency with model size. Bigger is worse.

AI hallucinate while summarising

Language models also hallucinate while summarising. You cannot trust that an AI’s summarisation of a web page, email thread, or article is accurate. They will make up quotations, references, authors, and page numbers.

AI ‘advice’ is dangerous

Since these systems are not minds, they have no notion of the outside world or consequences. You can’t trust that its healthcare, pet, or plant care advice is accurate. Some of it can be harmful.

Do not use them for search or research

These systems have no notion of facts or knowledge and will routinely give completely fabricated answers. They lie!

There is no general solution to AI hallucinations

OpenAI’s approach is to correct falsehoods, one-by-one. This obviously is not a general solution. It only works for common misconceptions. Most answers will still be filled with hallucinations. Truth is scarce, while the long tail of falsehoods is infinite.

Copyright

Most AI-generated art is automatically public domain

Every major legal jurisdiction requires human authorship to some extent. Art and text generated from a prompt is very unlikely to be protected by copyright.

This means that anybody can copy and make money off AI-generated works you ‘create’ and you can’t stop them.

Treat AI art and text as non-exclusive

Because of its lack of copyright protection, it’s more appropriate to think of AI works as non-exclusive. Treat them as you would a work provided by a stock art service.

Some AI-generated art is protected

If the generated art or text is integrated into a larger work that is human-authored, then the work as a whole is usually protected.

If you arrange generated works in specific ways, the arrangement is protected—but not the works themselves.

If an AI generates a work from another human-authored work, then the resulting output is a derivative work of the original and is likely to be protected.

Integration into productivity software is vital

By integrating the generation process into a larger writing or creative tool, the work is more likely to be the result of a mix of AI and human work and get copyright protection.

This is not legal advice

Always trust the opinions of a real lawyer over some guy on the internet.

Plagiarism

All models memorise and overfit

Memorisation is the industry term for when an AI model copies and stores something directly from the training data. Overfitting is when the output fits the training data too well and generates a verbatim copy of something memorised. This happens in all language and diffusion models.

Large models memorise and copy more

The rate of memorisation increases with model size and seems to be part of what increases the performance of larger models.

Copying rate is roughly 1%

Across GitHub Copilot, Stable Diffusion, and many language models, verbatim copying of data from the training data set and into the output happens around 0.1–1% of the time, depending on whether the vendor is specifically trying to minimise it or not. That’s very high for daily use by a team.

Many of these copies are clear copyright violations

Sometimes chunks of the training data are copied exactly. Sometimes only elements from the work. But it happens often enough for it to have already come up as an issue on social media and elsewhere.

Infringement is a matter of outcomes not process

It doesn’t matter if the infringing work is generated by an AI or a chimpanzee. If you publish it and profit from it, you would be in legal trouble.

Copying doesn’t have to be exact to be infringement

Paraphrased text is still infringement. If the work was original enough, then even restaged photographs can be infringement. Being inexact will not protect you.

This is not legal advice

Always trust the opinions of a real lawyer over some guy on the internet.

Poisoning

Training data sets can be poisoned

An attacker can create documents or media that is tailored to affect the output or even general functionality of an AI model.

Poisoning can manipulate results for specific keywords

A poisoning can alter the output sentiment for a specific keyword. In some cases it can even inject a controlled mistranslation of a word. This is the black-hat SEO’s dream.

Poisoning doesn’t require much effort

In many cases it only requires one hundred or so “toxic” documents. The expense seems to be minimal and was as low as $60 USD in one study.

Preventing seems to be difficult or even impossible

The manipulated keyword doesn’t have to appear in the “toxic” content. Some researchers have even argued that filtering out attacks and other harmful training data is mathematically impossible for larger language models.

OpenAI’s defence seems to be staleness

Most of the proprietary large language models are built on training data that is cut off at a specific point in time. OpenAI’s training data set, for example, doesn’t have data from after the year 2021. This should prevent new attacks from affecting the AI.

Fine-tuning can be poisoned as well

Most AI vendors have used prompts and other user-provided data to fine-tune their AI in the past. It’s possible that they have already been poisoned.

Code Generation

AI Copilots risk licence contamination

GitHub’s safeguards against contaminating your code base with GPL-licenced code are insufficient. If the Copilot copies code under the GPL and modifies it in even the slightest way, GitHub’s safeguards no longer work, but your code will still be contaminated.

They are prone to insecure code

They seem to generate code that is at least as insecure as that written by a novice programmer.

They trigger our Automation and Anchoring Biases

Tools for cognitive automation trigger your automation bias. If you’re using a tool to help you think less, that’s exactly what you do, which compromises your judgement about what the tool is doing. We also have a bias that favours whatever “anchors” the current context, usually the first result, even if that result is subpar.

They are “stale”

Most of the code language models are trained on is legacy code. They will not be aware of deprecations, security vulnerabilities, new frameworks, new platforms, or updated APIs.

They reinforce bad ideas

Unlike research, an AI will never tell you that your truly bad idea is truly bad. This is a major issue in software development as programmers are fond of reinventing bad ideas.

They promote code bloat

The frequency of defects is software generally increases proportionally with lines of code. Larger software projects are also more prone to failure and have much higher maintenance costs. Code copilots promote code bloat and could that way increase not decrease development costs in the long term.