Welcome to the Agentic AI Course!

Course Objectives

Design, diagram, implement, evaluate, productize, and scale agentic AI systems using modern LLM frameworks and tooling.
Identify and evaluate problems that are appropriate for agentic AI solutions, and distinguish them from problems better solved by traditional software or non-agentic LLM approaches.
Clearly define Agentic AI and related concepts, including agents, tools, planning, grounding, orchestration, evaluation, and human-in-the-loop workflows.
Develop and articulate ethical guidelines for the use of LLMs and agentic AI, including responsible data usage, AI governance principles, safety, privacy, and organizational policy considerations.
Build the capability to independently learn, evaluate, and adapt to new AI models, tools, and paradigms in a rapidly evolving AI ecosystem.

Course Overview

Unit 1

Build a “Bad” Agent (Weeks 1–4)

Unit 1

Goal: Learn the tools to create a powerful but uncontrollable agent.

Topics: - Intro to agent concepts and LLM programming - API keys, environment setup, and class materials - Making API calls, defining and calling tools, and managing statefulness - Connecting tools: MCP, file system, browser use, internet search, custom functions - Sandboxing with LangChain / Daytona - Subagents and multi-agent architectures - Writing effective prompts from scratch

Project: Personal Assistant Agent

Build a personal assistant agent that connects to real tools — Canvas, Google Calendar, Slack, the file system, and the web — and carries on a coherent, stateful conversation. You will write your own prompts from scratch; no AI, peer, or internet-generated prompts are permitted. The agent will be stress-tested in class to deliberately surface its weaknesses, and those failures become the raw material for Unit 2.

Outcomes: - Call an LLM and receive a response - Write original prompts for agent context - Define custom tools and turn existing code into tools - Connect and invoke tools (MCP, file system, browser, web search, Canvas, Google Calendar, Slack) - Implement conversation history manually and with built-in methods - Introduce statefulness to an agent - Run agent code safely in a sandboxed environment - Build a simple two-agent pipeline - Demonstrate a working personal assistant agent and document where it breaks

Note on AI use: AI assistance is not permitted this unit. Debugging help comes from class time and the Slack group (ChatClass). This is intentional — it builds class unity, surfaces your own assumptions, and gives you room to fail. Failure is a core part of this unit, not a grading penalty. Set a time limit on any problem; if you’re stuck, post in Slack or bring it to class unfinished. That is OK.

Daily Schedule:

Day	Topic
1	APIs & Your First LLM Call — define an API, make a call, hide your key, configure the model
2	Tools: Structure & Uses — what a tool is, how the agent decides to call it, the `@tool` decorator, community tools
3	Tool Wrappers & Local Data — tool wrapper structure, local data access, combining web search with local files
4	Response Formats & Datatypes — define response format, pass photos, PDFs, and other datatypes
5	Statefulness & Conversation History — stateless vs stateful agents, LangGraph, manual and built-in memory
6	Connecting to the World via MCP — what MCP is, launch MCP stdio locally, MCP logging
7	MCP from GitHub — get a real MCP off GitHub (Canvas MCP), evaluate it in depth
8	Subagents & Multi-Agent Intro — multiagent overview, subagents vs skills vs handoffs vs routers
Project	Build a Powerful Bot — personal assistant using all unit tools: Canvas, Google Calendar, Slack, file system, web

Unit 2

Diagnostics — Why Is It “Bad”? (Weeks 5–8)

Unit 2

Goal: Develop diagnostic tools to peer into the black box of agentic AI, in preparation for building reliable agents.

Topics: - Categories of agent failure: errors, loops, cost explosion, hallucination, unreliability - Observability and tracing with LangSmith - Token and latency analysis - Debugging hallucinations and unexpected behavior - Prompt failure patterns - Cost and loop failure analysis - Identifying and fixing code bugs

Project: Diagnostic Report — Fix a Broken Agent

You are given a pre-built broken agent to diagnose. Your job is to implement tracing, identify every failure, document root causes, and produce a structured diagnostic report. You will implement at least one fix and verify it with traces before presenting to the class.

Outcomes: - Set up LangSmith and capture screenshot traces - Implement tracing features and explain what each one does and why - Measure token usage and latency across tool calls and LLM responses - Identify categories of agent failure from trace data - Distinguish model errors from prompt errors from code bugs - Identify and rewrite failing prompts based on traced results - Implement basic loop guards and context limits - Fix identified bugs and verify with re-tracing - Produce and present a diagnostic report

Note on AI use: AI assistance may be permitted this unit — details TBD. If used, post in Slack what you discussed and any lessons learned.

Daily Schedule:

Day	Topic
1	What Makes an Agent “Bad”? — review Unit 1 failures, intro observability, define runs/traces, navigate LangSmith
2	Defining Main Failures — define failure types, compare good vs bad traces
3	LangSmith Setup & Trace Reading — get API key, set up tracing, interpret each step of a trace
4	Token & Latency Analysis — monitor performance, choose evaluation metrics, understand cost growth
5	Evals & Golden Test Sets — build a test set, write success criteria, judge outputs
6	Debugging Failure Patterns — repeatable diagnostic process, categorize and prioritize failures
7	Automation Rules & Alerts — sampling, filters, and alerts to catch bad runs; connect user feedback to trace evidence
8	Diagnosing a Broken Agent — provide a diagnostic report, apply a fix, retest and compare
9	Unit Review & Assessment — unit review and decided testing method
Project	Diagnostic Report — trace, document, and fix a pre-built broken agent; present findings to the class

Unit 3

Context Engineering — Make a “Good” Agent (Weeks 9–12)

Unit 3

⚠️ Unit 3 is currently in production. Days will be added as they are finalized.

Goal: Engineer context effectively to produce agents that are reliable, specialized, and capable of operating over long horizons — then prove it in competition.

Topics: - Context engineering vs. prompt engineering - The 4 levels of intent: session, user, organization, world - Simple context management: truncation, summarization, selection, file write, database, isolation/delegation - Context offloading: semantic search, forgetting vector, graph memory - Subagent architecture: parallelization, serialization, concurrency - Checkpointing, human-in-the-loop, and context self-adaptation - Grounding and RAG - Long-horizon task planning - Cost-aware context design

Project: Battle of the Bots

Apply everything from all three units to build a production-ready agent of your own design. Agents compete in structured challenges testing usefulness, resilience, and cost efficiency. The best agent is not necessarily the most powerful — it is the most reliable, the most purposeful, and the most improved from where you started in Unit 1.

Outcomes: - Define context engineering and distinguish it from prompt engineering - Implement truncation and summarization-based context management - Apply context offloading using semantic search or graph memory - Implement parallelized and serialized subagent patterns - Add checkpointing and human-in-the-loop interrupts - Build a RAG pipeline and test retrieval quality - Design agents that maintain state across long multi-step tasks - Apply context engineering improvements to the Unit 1 personal assistant - Compete in Battle of the Bots and articulate design decisions made

Note on AI use: AI assistance may be permitted this unit — details TBD. If used, post in Slack what you discussed and any lessons learned.

Daily Schedule:

Day	Topic
1	What Is Context Engineering? — definition, levels of intent, why prompting alone isn’t enough
2	Planning Agentic Systems — distinguish prompt, context, and intent engineering; plan systems on a whiteboard
3	Context Management Strategies — truncate, summarize, select, file write, database, isolate/delegate; implement all three core types
More days coming	Check back as Unit 3 is finalized

Credits

This course was created by: James Beeson Ethan Saline Wil Jones Ian Blad Camilla Ramirez Kimberly Juarez