
“I asked an AI tool to summarise an article for me. It summarised an article that didn't exist. I have questions.”
Why Does AI Make Things Up, and Can I Actually Trust It?
AI tools make things up because of how they are built, not because they are broken, and not because they are lying. We looked at how the three most widely used tools (ChatGPT, Claude, and Gemini) handle this problem, what each company says about it, and what independent testing shows about when it happens and when it doesn't. The technical term is hallucination, which is an unfortunate word because it implies the tool is having some kind of experience. It isn't. What is actually happening is more mechanical and more manageable than the word suggests, and understanding it is the difference between using these tools confidently and avoiding them out of a fear that is only partly warranted.
The short version: yes, AI tools make things up. The longer version is the one worth knowing.
AI tools generate text by predicting what word is most likely to come next, based on patterns learned from an enormous amount of human writing. That is worth sitting with for a moment. They are not retrieving facts from a database the way a search engine does. They are constructing sentences. And sometimes those sentences are confidently, fluently wrong.
This happens most often in specific situations: when you ask about something very recent that falls outside what the tool was trained on, when you ask for precise details like citations, statistics, or dates, and when you ask about obscure or niche topics where the training data was thin. ChatGPT, Claude, and Gemini all produce false information in these conditions. The rate varies by tool and task, but none of them are immune.
What they are considerably more reliable for: summarising content you paste directly into the conversation, drafting text in a style you describe, answering general questions about well-documented topics, and working through reasoning problems step by step. The hallucination risk is highest when you ask the tool to retrieve specific facts it has to generate from memory rather than from something you have given it.
The honest editorial reaction: this is a real limitation and it deserves to be taken seriously, but it is a specific limitation, not a general one. A tool that makes up citations is still genuinely useful for drafting an email. These are not contradictory facts.

If you asked an AI tool to summarise an article and it gave you a confident, fluent summary of something that did not exist, that is the hallucination problem in its most visible form, and your instinct to pause before using it again was correct. The practical adjustment is not to stop using AI tools but to stop asking them to retrieve things they have to generate from memory: specific facts, citations, statistics, names, dates. Paste the actual article in and ask for a summary of that. You will get a reliable result. Use it to draft, explain, and work through ideas; use a search engine to look things up.
If you asked an AI tool to summarise an article and it gave you a confident, fluent summary of something that did not exist, that is the hallucination problem in its most visible form, and your instinct to pause before using it again was correct. The practical adjustment is not to stop using AI tools but to stop asking them to retrieve things they have to generate from memory. Paste the actual article in and ask for a summary of that. You will get a reliable result. Use it to draft, explain, and work through ideas; use a search engine to look things up.
Rule of thumb: if you can check it with a search, check it with a search. AI is for the stuff that search can't do.
Questions people ask
Is one AI tool more reliable than the others?
All three major tools (ChatGPT, Claude, and Gemini) hallucinate, and independent benchmarking shows the gap between them is smaller than their marketing suggests. The more useful distinction is task type: all three are more reliable when working with content you provide than when generating facts from memory. Choosing a tool based on hallucination rates alone is less useful than understanding which tasks expose you to the problem.
How can I tell when an AI tool is making something up?
Confident, fluent delivery is not a signal of accuracy. AI tools sound equally certain whether they are right or wrong, which is the core of the problem. Practical signals to watch for: citations that feel oddly specific, statistics with unusual precision, quotes attributed to real people that you cannot verify elsewhere. The rule is simple. If it matters enough to act on, verify it independently before you do.
Will this problem get better over time?
It is already improving. The major tools have added features that reduce hallucination, including retrieval augmentation (which means the tool can search the web in real time rather than generating facts from its training memory alone), and citations that link to sources so you can check. These features do not eliminate the problem but they change the risk profile significantly.