Welcome to the Agentic AI Course!
Course Objectives
- Design, diagram, implement, evaluate, productize, and scale agentic AI systems using modern LLM frameworks and tooling.
- Identify and evaluate problems that are appropriate for agentic AI solutions, and distinguish them from problems better solved by traditional software or non-agentic LLM approaches.
- Clearly define Agentic AI and related concepts, including agents, tools, planning, grounding, orchestration, evaluation, and human-in-the-loop workflows.
- Develop and articulate ethical guidelines for the use of LLMs and agentic AI, including responsible data usage, AI governance principles, safety, privacy, and organizational policy considerations.
- Build the capability to independently learn, evaluate, and adapt to new AI models, tools, and paradigms in a rapidly evolving AI ecosystem.
Course Overview
Unit 1
Build a “Bad” Agent (Weeks 1–4)
Goal: Learn the tools to create a powerful but uncontrollable agent.
Topics: - Intro to agent concepts and LLM programming - API keys, environment setup, and class materials - Making API calls, defining and calling tools, and managing statefulness - Connecting tools: MCP, file system, browser use, internet search, custom functions - Sandboxing with LangChain / Daytona - Subagents and multi-agent architectures - Writing effective prompts from scratch
Project: Personal Assistant Agent
Build a personal assistant agent that connects to real tools — Canvas, Google Calendar, Slack, the file system, and the web — and carries on a coherent, stateful conversation. You will write your own prompts from scratch; no AI, peer, or internet-generated prompts are permitted. The agent will be stress-tested in class to deliberately surface its weaknesses, and those failures become the raw material for Unit 2.
Outcomes: - Call an LLM and receive a response - Write original prompts for agent context - Define custom tools and turn existing code into tools - Connect and invoke tools (MCP, file system, browser, web search, Canvas, Google Calendar, Slack) - Implement conversation history manually and with built-in methods - Introduce statefulness to an agent - Run agent code safely in a sandboxed environment - Build a simple two-agent pipeline - Demonstrate a working personal assistant agent and document where it breaks
Note on AI use: AI assistance is not permitted this unit. Debugging help comes from class time and the Slack group (ChatClass). This is intentional — it builds class unity, surfaces your own assumptions, and gives you room to fail. Failure is a core part of this unit, not a grading penalty. Set a time limit on any problem; if you’re stuck, post in Slack or bring it to class unfinished. That is OK.
Daily Schedule:
| Day | Topic |
|---|---|
| 1 | APIs & Your First LLM Call — define an API, make a call, hide your key, configure the model |
| 2 | Tools: Structure & Uses — what a tool is, how the agent decides to call it, the @tool decorator, community tools |
| 3 | Tool Wrappers & Local Data — tool wrapper structure, local data access, combining web search with local files |
| 4 | Response Formats & Datatypes — define response format, pass photos, PDFs, and other datatypes |
| 5 | Statefulness & Conversation History — stateless vs stateful agents, LangGraph, manual and built-in memory |
| 6 | Connecting to the World via MCP — what MCP is, launch MCP stdio locally, MCP logging |
| 7 | MCP from GitHub — get a real MCP off GitHub (Canvas MCP), evaluate it in depth |
| 8 | Subagents & Multi-Agent Intro — multiagent overview, subagents vs skills vs handoffs vs routers |
| Project | Build a Powerful Bot — personal assistant using all unit tools: Canvas, Google Calendar, Slack, file system, web |
Unit 2
Diagnostics — Why Is It “Bad”? (Weeks 5–8)
Goal: Develop diagnostic tools to peer into the black box of agentic AI, in preparation for building reliable agents.
Topics: - Categories of agent failure: errors, loops, cost explosion, hallucination, unreliability - Observability and tracing with LangSmith - Token and latency analysis - Debugging hallucinations and unexpected behavior - Prompt failure patterns - Cost and loop failure analysis - Identifying and fixing code bugs
Project: Diagnostic Report — Fix a Broken Agent
You are given a pre-built broken agent to diagnose. Your job is to implement tracing, identify every failure, document root causes, and produce a structured diagnostic report. You will implement at least one fix and verify it with traces before presenting to the class.
Outcomes: - Set up LangSmith and capture screenshot traces - Implement tracing features and explain what each one does and why - Measure token usage and latency across tool calls and LLM responses - Identify categories of agent failure from trace data - Distinguish model errors from prompt errors from code bugs - Identify and rewrite failing prompts based on traced results - Implement basic loop guards and context limits - Fix identified bugs and verify with re-tracing - Produce and present a diagnostic report
Note on AI use: AI assistance may be permitted this unit — details TBD. If used, post in Slack what you discussed and any lessons learned.
Daily Schedule:
| Day | Topic |
|---|---|
| 1 | What Makes an Agent “Bad”? — review Unit 1 failures, intro observability, define runs/traces, navigate LangSmith |
| 2 | Defining Main Failures — define failure types, compare good vs bad traces |
| 3 | LangSmith Setup & Trace Reading — get API key, set up tracing, interpret each step of a trace |
| 4 | Token & Latency Analysis — monitor performance, choose evaluation metrics, understand cost growth |
| 5 | Evals & Golden Test Sets — build a test set, write success criteria, judge outputs |
| 6 | Debugging Failure Patterns — repeatable diagnostic process, categorize and prioritize failures |
| 7 | Automation Rules & Alerts — sampling, filters, and alerts to catch bad runs; connect user feedback to trace evidence |
| 8 | Diagnosing a Broken Agent — provide a diagnostic report, apply a fix, retest and compare |
| 9 | Unit Review & Assessment — unit review and decided testing method |
| Project | Diagnostic Report — trace, document, and fix a pre-built broken agent; present findings to the class |
Unit 3
Context Engineering — Make a “Good” Agent (Weeks 9–12)
⚠️ Unit 3 is currently in production. Days will be added as they are finalized.
Goal: Engineer context effectively to produce agents that are reliable, specialized, and capable of operating over long horizons — then prove it in competition.
Topics: - Context engineering vs. prompt engineering - The 4 levels of intent: session, user, organization, world - Simple context management: truncation, summarization, selection, file write, database, isolation/delegation - Context offloading: semantic search, forgetting vector, graph memory - Subagent architecture: parallelization, serialization, concurrency - Checkpointing, human-in-the-loop, and context self-adaptation - Grounding and RAG - Long-horizon task planning - Cost-aware context design
Project: Battle of the Bots
Apply everything from all three units to build a production-ready agent of your own design. Agents compete in structured challenges testing usefulness, resilience, and cost efficiency. The best agent is not necessarily the most powerful — it is the most reliable, the most purposeful, and the most improved from where you started in Unit 1.
Outcomes: - Define context engineering and distinguish it from prompt engineering - Implement truncation and summarization-based context management - Apply context offloading using semantic search or graph memory - Implement parallelized and serialized subagent patterns - Add checkpointing and human-in-the-loop interrupts - Build a RAG pipeline and test retrieval quality - Design agents that maintain state across long multi-step tasks - Apply context engineering improvements to the Unit 1 personal assistant - Compete in Battle of the Bots and articulate design decisions made
Note on AI use: AI assistance may be permitted this unit — details TBD. If used, post in Slack what you discussed and any lessons learned.
Daily Schedule:
| Day | Topic |
|---|---|
| 1 | What Is Context Engineering? — definition, levels of intent, why prompting alone isn’t enough |
| 2 | Planning Agentic Systems — distinguish prompt, context, and intent engineering; plan systems on a whiteboard |
| 3 | Context Management Strategies — truncate, summarize, select, file write, database, isolate/delegate; implement all three core types |
| More days coming | Check back as Unit 3 is finalized |
Credits
This course was created by: James Beeson Ethan Saline Wil Jones Ian Blad Camilla Ramirez Kimberly Juarez