LLM. Large Language Model. You will hear this term constantly now and most explanations of it are either way too technical or so vague they're useless. Here's my attempt at something in between.
What the words actually mean
Large — the model was trained on an enormous amount of data. We're talking trillions of words. Basically everything on the internet, plus a lot of books and code on top of that.
Language — it works with language. Text in, text out. (Some can also handle images and audio now, but the core is still language.)
Model — in this context, a model is a mathematical system that's learned to predict things. It's not a database. It's not storing facts. It's a network of billions of parameters that together represent patterns in language.
Put it together: a very large mathematical system that learned patterns from an enormous amount of text, and can now generate new text that follows those patterns. That's an LLM.
How it actually generates text
This part is worth understanding because it explains a lot of LLM behaviour. When you send a message to Claude or ChatGPT, the model doesn't "look up" an answer. It predicts the most likely next word, then the next, then the next — based on everything it learned and everything you've said in the conversation so far.
One word at a time. That's why the responses stream out character by character rather than appearing all at once — it's genuinely generating in real time.
This also explains hallucinations — the thing where AI confidently states something completely wrong. If the model predicts that the next word "should" be something that makes the sentence sound authoritative, it'll produce that, even if the underlying fact is made up. It's optimising for plausible-sounding text, not for truth. Knowing this means you use it accordingly: trust the structure and thinking, verify the specific facts.
Context windows — the thing people don't explain well
Every LLM has what's called a context window — essentially, the amount of text it can "hold in mind" at once. Think of it like short-term memory. Anything in the context window is what it's working with. Anything outside it might as well not exist.
Modern models have very large context windows — Claude can handle hundreds of thousands of words at once. This is actually a big deal. It means you can paste in an entire document, a full email thread, a long contract — and the model will actually read and reason about all of it.
Why the differences between models matter
Claude, ChatGPT, Gemini, Llama — these are all LLMs, but they were trained differently, on different data, with different techniques. They have different strengths. Different personality. Different comfort with nuance. Different context window sizes. Different knowledge cutoff dates.
This is why "just use AI" isn't really useful advice. Which one, and for what? That's a whole other post. But the short answer: try Claude for anything nuanced, writing-heavy, or long-context. It's where I consistently get the best results for the work I actually do.