LLM & Prompting Basics

In this section, you’ll learn the fundamentals of how large language models (LLMs) work — just enough to understand how to write better prompts.

You’ll see how models process information, what influences their outputs, and how your prompt acts like a “mini program” controlling their behavior.

1 What is an LLM (and Why It Matters for Prompting)
- 1.1 Explain Like I’m 5
2 Under the Hood: How LLMs Actually Work
- 2.1 What Are Parameters?
3 Token Prediction: The Core Behavior
- 3.1 How the Model Stores Knowledge
- 3.2 Transformers
4 Context and Limits
5 How Prompts Drive the Model
- 5.1 Prompts Are Soft Programs
6 LLM Educational Resources
7 Next Steps
- 7.1 Sources

What is an LLM (and Why It Matters for Prompting)

A Large Language Model (LLM) is an AI trained to generate human-like text by predicting what comes next, one token (word or piece of a word) at a time.

For prompt engineers, this matters because the model doesn’t “understand”, it predicts.
That means every word in your prompt changes the statistical landscape the model sees.
Good prompts narrow those possibilities to the outputs you actually want.

Explain Like I’m 5

Imagine your phone’s autocomplete — you type “I’m going to the…” and it suggests “store.”
An LLM is like that, but with way more training data and parameters.
It’s guessing, not reasoning.
So your job as a prompt engineer is to guide its guessing — to steer it toward the right kind of continuation.

Under the Hood: How LLMs Actually Work

Large Language Models aren’t databases or search engines, they don’t “look things up.” Instead, everything they know is stored in a dense web of numerical patterns called parameters.

These parameters represent compressed relationships learned from massive amounts of text during training — things like how words relate to each other, what structures are common in language, and even subtle ideas about tone and intent.

When you send a prompt, you’re not asking the model to recall a fact — you’re activating patterns that it’s learned. Your words trigger regions of the model’s internal space that are statistically related to your topic. That’s why the phrasing, structure, and context of your prompt matter so much, small wording changes can cause the model to activate completely different parts of its learned knowledge.

What Are Parameters?

Think of parameters as the “memory knobs” of the model, billions or even trillions of tiny weights that decide how strongly certain words, concepts, and structures are connected.

During training, the model is shown examples of text and asked to predict the next word. If it guesses wrong, it slightly adjusts those parameters using math (a process called gradient descent). After billions of rounds of this, the model builds a nuanced map of how language behaves.

For prompt engineers, this means every prompt is like a set of coordinates on that map. You’re not just asking for text — you’re steering the model toward the region of its parameter space that produces the behavior you want. Clearer, more focused prompts lead to more predictable “paths” through that space.

Token Prediction: The Core Behavior

At its simplest, every LLM is a next-token predictor. It doesn’t plan its full answer ahead of time, it just looks at all the tokens so far and predicts what comes next, one step at a time.

When you send a prompt, the model:

Breaks your text into tokens (words or parts of words).
Calculates probabilities for what the next token should be.
Chooses one (influenced by variables like temperature and top-p).
Appends that token to the text and repeats the process.

This “one step at a time” process is called autoregression.

For prompt engineering, that’s an important insight: the model doesn’t have a master plan — it builds its answer as it goes. So if your prompt doesn’t create a clear, consistent setup, the model can easily drift. Each token is influenced by all the ones before it, meaning your early instructions have an outsized impact on everything that follows.

How the Model Stores Knowledge

Instead of storing sentences or facts verbatim, LLMs store patterns, statistical relationships between concepts. When you write a prompt, the model isn’t pulling a memorized answer; it’s generating one dynamically based on patterns it’s seen.

You can think of it as extreme compression: the model has encoded ideas like “how to explain things,” “how code is structured,” or “what documentation looks like” in mathematical form. So, when your prompt says, “Write an onboarding guide,” it doesn’t look that up — it reconstructs it by statistically recombining those learned patterns.

This is also why vague prompts lead to vague results. If your request is underspecified, the model activates a broad mix of patterns — producing generic or inconsistent output. A specific, structured prompt narrows the space of possibilities, focusing the model’s internal associations on the task you actually want.

Transformers

Transformers are the engine behind modern language models — the architecture that made context-aware prompting possible. Unlike older neural networks that read text one word at a time, transformers look at all tokens simultaneously using a mechanism called self-attention. This lets the model measure how important each word is relative to every other word in the prompt.

For prompt engineers, this has huge implications. Because the model distributes its attention unevenly, where you put information in your prompt determines how much influence it has. The start of a prompt (role, tone, and instructions) and the end (the actual task or data) carry the most weight; details buried in the middle often fade. This is why prompt structure, brevity, and hierarchy matter — you’re shaping the model’s attention map.

Context engineering is essentially working with the transformer’s attention system: deciding what belongs in view, what can be trimmed, and how to order information so the model focuses on the right parts. In short, prompting well isn’t about magic words — it’s about sculpting attention.

Context and Limits

When you send a prompt, everything — your instructions, examples, and prior text — becomes the model’s context window. The model reads this entire window every time it generates a new token. It doesn’t have memory between runs; it rebuilds its understanding from scratch on each call.

That’s why prompt design is partly an information architecture problem. You’re deciding what context to include, what to omit, and how to order it.

A few key takeaways for prompt engineers:

Models have finite context windows (4k–100k tokens depending on size). Anything past that gets cut off.
The start of your prompt sets behavior and tone.
The end provides the immediate input for generation.
The middle can fade in importance, especially in very long prompts.

When designing large workflows, this limitation is what motivates techniques like multi-stage prompting — breaking tasks into smaller, chained prompts that preserve clarity and avoid context overflow.

How Prompts Drive the Model

A prompt is much more than a request, it’s the entire environment the model uses to predict its next token. It defines:

The task (what to do),
The tone (how to sound),
The structure (what format to follow),
And the constraints (how much, how long, or what to avoid).

From the model’s perspective, your prompt is the beginning of a story it’s trying to continue — token by token. The clearer and more complete your “story setup” is, the more accurately it can predict what comes next.

That’s why strong prompts don’t just say what to do — they show how to think about the task.

Prompts Are Soft Programs

At the end of the day, a prompt is like a lightweight, natural-language program. It defines rules, parameters, and control flow — but instead of code, it uses plain English.

A well-written prompt includes:

Inputs (the data or context the model should use),
Logic (the reasoning process or steps to follow),
Constraints (tone, format, or word limits),
Outputs (the structure or schema you expect).

When you start thinking of prompts as programs instead of requests, you can test, debug, and reuse them the same way you would with code. This is the mindset of a true prompt engineer.

LLM Educational Resources

With a subject like AI it often helps to have visual aids when trying to understand the core concepts. Below is a list of curated videos explaining these basics in a way that’s easy to grasp and visually appealing:

Next Steps

Now that you understand the basics of LLMs and what role prompting plays in all this, you’re ready to continue on with the prompt engineering training!

Continue to: 💬 Prompt Engineering Workshop | Training Overview

Sources

IBM Think — What Are Large Language Models (LLMs)? — A high‑level overview of LLMs: what they are and how they generate text. IBM
OpenAI Help Center — Best practices for prompt engineering with the OpenAI API — Official practical guidance on how prompts affect model behavior. OpenAI Help Center
Google Developers — Introduction to Large Language Models — Covers LLM fundamentals like token‑prediction, context windows, and architectures (e.g., Transformers). Google for Developers
DigitalOcean Community — Prompt Engineering Best Practices: Tips, Tricks, and Tools — Developer‑oriented article discussing how to design better prompts. DigitalOcean
Braintrust (Prompt Engineering Article) — Systematic Prompt Engineering: From Intuition to Data‑Driven Optimization — Discusses mature prompt engineering workflows and why prompts are treated like engineering artifacts. Braintrust
GeeksforGeeks — What is a Large Language Model (LLM)? — Technical yet accessible breakdown of how LLMs work (tokens, context, transformer architecture). GeeksforGeeks