Model Mechanism

Learning state

updated 2026-06-21T22:45
sessions 4 · events 30
built 2026-06-21 22:45

Exit test — the deliverable

1/8

Overall mastery

Concepts mastered

0/72

The inference loop

The spine. Everything hangs off this.

Tokenization (BPE) 63%

letter-counting failures · JSON corruption · multilingual cost asymmetry

1 resolved

Autoregressive loop 58%

sequential latency · streaming · can't revise emitted tokens

1 resolved

Logits → softmax → sampling 70%

nondeterminism · creativity · repetition loops

3 resolved

Attention & the transformer

Conceptual, no QKV math.

Self-attention (intuition) 79%

O(n²) context cost · order sensitivity · in-context learning

Layers & residual stream 70%

composition with depth

1 resolved

Positional encoding / RoPE 63%

lost in the middle · position bias · context extension

Context window / KV cache 70%

context ≠ memory · cost scaling · context rot

Why it behaves that way

The insight layer. Most important.

Hallucination is structural 0%

confident wrong citations · confidence ≠ correctness

no data yet

In-context learning 0%

few-shot with no weight updates · induction

no data yet

Training, load-bearing parts only

Causal story, no math.

Pretraining 0%

knowledge cutoff · parametric vs retrieved knowledge

no data yet

Post-training (SFT → RLHF/DPO) 0%

assistant persona · refusals · sycophancy

no data yet

Variants & edge cases

Practical.

Reasoning / test-time compute 0%

scratchpad tokens · different cost/latency class

no data yet

Mixture of Experts 0%

fast for its size · routing nondeterminism

no data yet

Quantization 60%

smaller/faster/measurably dumber

Multimodal 0%

images → patch embeddings → tokens

no data yet

Why temp=0 isn't reproducible 0%

floating point · batching · MoE routing

no data yet

L2 — Context & prompt engineering

Get the most from the window without touching weights.

Prompt structure & roles 0%

system/user/assistant roles · instruction placement · delimiters & formatting

no data yet

Few-shot / in-context examples 0%

example selection · format consistency · when examples beat instructions

no data yet

Chain-of-thought & decomposition 0%

scratchpad reasoning · task decomposition · when CoT helps vs hurts

no data yet

Structured output 0%

function calling / JSON mode · schema-constrained decoding · validity failure modes

no data yet

Context engineering 0%

what to put in the window · ordering vs lost-in-the-middle · retrieval vs stuffing

no data yet

Prompt injection (intro) 0%

untrusted input in the prompt · instruction-override risk

no data yet

Automated prompt optimization 0%

DSPy-style optimization · eval-driven prompt search · stop hand-tuning

no data yet

L3 — Retrieval / RAG

Knowledge from the corpus, not the weights.

Embeddings & vector space 0%

semantic similarity as distance · embedding models · dimensionality

no data yet

Chunking strategies 0%

size & overlap · semantic vs fixed · chunk boundaries

no data yet

Vector stores & ANN 0%

approximate nearest neighbour · index types · recall vs latency tradeoff

no data yet

Hybrid search 0%

BM25 + dense · when lexical beats semantic

no data yet

Reranking 0%

cross-encoders · two-stage retrieve-then-rerank · why a second stage

no data yet

Retrieval evaluation 0%

recall@k · MRR · context relevance

no data yet

RAG failure modes 0%

irrelevant retrieval · stale index · context dilution

no data yet

Document ingestion & parsing 0%

PDF/HTML/code parsing · OCR & tables · cleaning: garbage-in, garbage-out

no data yet

Query transformation 0%

HyDE · multi-query & decomposition · query rewriting

no data yet

Metadata filtering 0%

structured filters + semantic search · access control in retrieval

no data yet

Advanced retrieval patterns 0%

contextual retrieval · GraphRAG · parent-doc / small-to-big

no data yet

L5 — Evaluation

The differentiator. A lens, not a final topic — introduced early, applied everywhere.

The eval mindset 0%

offline vs online · measure before/after · no eyeballing

no data yet

Golden / eval-set construction 0%

coverage · edge cases · labeling quality

no data yet

LLM-as-judge 0%

rubric prompting · judge biases (position/verbosity/self-preference) · when judges fail

no data yet

Task metrics 0%

exact vs semantic match · rubric scoring · pass@k

no data yet

Regression testing / CI gates 0%

prompt/chain regression suites · CI gates · catching silent drift

no data yet

Online eval & monitoring 0%

production monitoring · drift detection · A/B testing

no data yet

Evaluating RAG & agents 0%

component vs end-to-end · trajectory evaluation · beyond single completions

no data yet

Build an eval harness 0%

30+ case harness · error bars · before/after deltas

no data yet

Synthetic data generation 0%

generating eval cases · augmentation & edge-case mining · data for tuning

no data yet

The eval data flywheel 0%

mine production traces into tests · failure-driven test growth · continuous eval-set expansion

no data yet

L4 — Agents & orchestration

Tools, loops, memory — and when a single call is better.

Tool use / function calling 0%

tools as agency · tool schemas · tool selection

no data yet

ReAct & plan-then-execute 0%

reason-act loops · plan then execute · when to stop

no data yet

State & memory 0%

working vs persistent memory · context is not memory at the app layer · when each is needed

no data yet

Multi-agent patterns 0%

decomposition across agents · when one agent is better · coordination cost

no data yet

Orchestration & MCP 0%

graphs / state machines (LangGraph) · MCP tool/context protocol · deterministic vs model-driven control

no data yet

Agent failure modes 0%

loops · runaway cost · error propagation · silent wrong-tool

no data yet

Human-in-the-loop & approvals 0%

approval gates before actions · confidence-based escalation · review queues

no data yet

L6 — Inference ops / production

Bounded latency & cost, graceful degradation, fully traced.

Latency budgeting & token accounting 0%

prefill vs decode · time-to-first-token vs total · token budgets

no data yet

Cost modeling 0%

per-request/user/at-scale · input vs output token cost · context cost scaling

no data yet

Streaming 0%

token streaming UX · partial parsing

no data yet

Caching 0%

prompt caching · semantic caching · when each applies

no data yet

Reliability 0%

retries/timeouts/fallbacks · circuit breaking · structured-output reliability at scale

no data yet

Rate limits & batching 0%

rate-limit handling · batching · throughput

no data yet

Observability / tracing 0%

spans · token + cost per span · tracing chains & agents

no data yet

Model selection & routing 0%

cascades: cheap-first, escalate · fallback chains · which model per call

no data yet

Serving open-weight models 0%

vLLM / TGI · self-host vs API tradeoff · throughput basics

no data yet

Deployment & CI/CD for AI 0%

prompt/chain versioning · staged rollout · gating deploys on evals

no data yet

L8 — Safety & guardrails

Adversarial input is the default, not the exception.

Prompt injection & jailbreaks (defense) 0%

direct vs indirect injection · defense patterns · isolating untrusted content

no data yet

Output validation & refusals 0%

schema enforcement · refusal handling · fail-closed

no data yet

PII & data governance 0%

PII detection/redaction · retention policy · logging hygiene

no data yet

Content moderation 0%

moderation layers · policy enforcement

no data yet

Adversarial robustness basics 0%

attack-surface mapping · red-teaming mindset

no data yet

Hallucination mitigation & grounding 0%

forced citations · abstention / I-don't-know · verification passes

no data yet

L7 — Adaptation / fine-tuning

Lowest priority for the app layer; knowing when NOT to is the skill.

Fine-tune vs RAG vs prompt 0%

the decision framework · cost/benefit framing · when each wins

no data yet

SFT and LoRA / PEFT (conceptual) 0%

what SFT changes · LoRA/PEFT idea (no math) · adapter swapping

no data yet

Preference tuning / DPO (conceptual) 0%

preference data · DPO vs RLHF idea (no math)

no data yet

Distillation 0%

teacher to student · why distill

no data yet

Data curation for tuning 0%

dataset quality · data beats technique

no data yet

Exit test — course complete at 4/4

○Explain mechanistically why a model produces a confident, wrong citation
Hallucination is structuralLogits → softmax → samplingPretraining
3 concepts to go
✓Given two prompts, predict which costs more and why1 attempt
Tokenization (BPE)Self-attention (intuition)Context window / KV cache
○Explain why the same prompt at temp=0 returned two different answers
Logits → softmax → samplingWhy temp=0 isn't reproducible
2 concepts to go
○Explain why last week's fact isn't in the model but works once pasted into context
PretrainingContext window / KV cacheIn-context learning
3 concepts to go
○B1 (build): turn messy dev artifacts (logs, stack traces, API docs) into validated JSON at ~100%, no fine-tuning
Structured outputContext engineering
2 concepts to go
○B2 (build): docs/code RAG over a real repo with a retrieval eval harness that proves a measured improvement
RAG failure modesRetrieval evaluationAdvanced retrieval patternsThe eval mindset
4 concepts to go
○B3 (build): a dev-tools agent (PR review / log triage) with human approval gates that recovers from tool failure
Agent failure modesTool use / function callingHuman-in-the-loop & approvalsPrompt injection & jailbreaks (defense)
4 concepts to go
○B4 CAPSTONE: ship a cloud service from B2/B3 - model routing, caching, full tracing, a 30+ case eval gating CI, and a documented before/after metric
Build an eval harnessModel selection & routingDeployment & CI/CD for AIObservability / tracingOnline eval & monitoring
5 concepts to go

Bug patterns (tutor's read)

depth-as-procrastinationdormant
treats 'I could go deeper here' as a reason to stay on a mastered concept; optimization bias operating on the syllabus
completion-seekingdormant
wants full coverage before moving on; control preference. Redirect to the exit test, not coverage
premature-convergencedormant
closes options / commits to an explanation before testing alternatives; efficiency over exploration