LLM & Prompt FoundationsHow Large Language Models Actually Work

Discussion Questions

  1. 1Prompting an AI model is sometimes compared to 'programming in natural language'. How is this similar to and different from writing code?
  2. 2Why does the way you phrase a question to an AI model significantly change the answer you get?
  3. 3What do you think is more important when writing a prompt: being precise, being creative, or being concise?
📌

Add a Padlet or Mentimeter board

Embed a shared class board for student contributions and ideas.

How Large Language Models Actually Work

A practical mental model for prompt engineers - without the math.

Quick Take· 60 sec warm-up

Before the lesson

Watch this 60-second clip for a fast vibe-check on the concept. Then dive into the full lesson below.

Video lesson· 12 min
Reading

The 60-second model

A large language model (LLM) is a giant function that, given a sequence of tokens, predicts the next token. Repeat that prediction step a few hundred times and you get a paragraph.

That's it. Everything else - chat, tools, reasoning, code - is built on top of "predict the next token."

Tokens, not words

Models read tokens, which are usually 3–4 characters in English.

  • "prompt" → 1 token
  • "Engineering" → 1–2 tokens
  • "prompt-engineering" → 3–4 tokens (hyphens and casing matter)
  • "私は学生です" → many tokens (Asian languages cost more)

You can inspect any prompt in OpenAI's tokenizer or with the tiktoken library.

How the model was trained (briefly)

  1. Pre-training - predict the next token across trillions of tokens of internet text, code, and books.
  2. Supervised fine-tuning (SFT) - humans write good answers; the model learns to imitate the answer style.
  3. Reinforcement learning from human/AI feedback (RLHF / RLAIF / DPO) - humans rank model answers; the model learns to prefer the kind of answers humans (or constitutional rules) like.
  4. Post-training for reasoning - newer models (o-series, Claude with extended thinking, Gemini Thinking) learn to use a private "scratchpad" before answering.

What that means for you as a prompt engineer

  • The model has no memory between calls (unless you give it some).
  • It's a probabilistic function - same prompt, different runs can give different output.
  • It only knows what was in its training data + your prompt + tools you give it. It doesn't browse, remember, or "look things up" unless you wire that in.
  • It's optimised to be plausible, not necessarily true. Hallucination is a structural feature, not a bug.

Activity

Take three short prompts (a question, a one-line task, and a paragraph). Paste each into the OpenAI tokenizer. Record the token count, then estimate cost at $5 per 1M input tokens. What does this tell you about how to write prompts cheaply?

Key takeaways

  • 1An LLM predicts the next token; everything else is layered on top.
  • 2Models read tokens, not words - token count drives both cost and quality.
  • 3LLMs are stateless and probabilistic; memory and determinism are things you engineer in.

Quick self-check