Welcome to the Agentic AI Course!

Course Objectives

  • Design, diagram, implement, evaluate, productize, and scale agentic AI systems using modern LLM frameworks and tooling.
  • Identify and evaluate problems that are appropriate for agentic AI solutions, and distinguish them from problems better solved by traditional software or non-agentic LLM approaches.
  • Clearly define Agentic AI and related concepts, including agents, tools, planning, grounding, orchestration, evaluation, and human-in-the-loop workflows.
  • Develop and articulate ethical guidelines for the use of LLMs and agentic AI, including responsible data usage, AI governance principles, safety, privacy, and organizational policy considerations.
  • Build the capability to independently learn, evaluate, and adapt to new AI models, tools, and paradigms in a rapidly evolving AI ecosystem.

Course Overview

Unit 1

Build a “Bad” Agent (Weeks 1–4)

Goal: Learn the tools to create a powerful but uncontrollable agent.

Topics: - Intro to agent concepts and LLM programming - API keys, environment setup, and class materials - Making API calls, defining and calling tools, and managing statefulness - Connecting tools: MCP, file system, browser use, internet search, custom functions - Sandboxing with LangChain / Daytona - Subagents and multi-agent architectures - Writing effective prompts from scratch

Project: Personal Assistant Agent

Build a personal assistant agent that connects to real tools — Canvas, Google Calendar, Slack, the file system, and the web — and carries on a coherent, stateful conversation. You will write your own prompts from scratch; no AI, peer, or internet-generated prompts are permitted. The agent will be stress-tested in class to deliberately surface its weaknesses, and those failures become the raw material for Unit 2.

Outcomes: - Call an LLM and receive a response - Write original prompts for agent context - Define custom tools and turn existing code into tools - Connect and invoke tools (MCP, file system, browser, web search, Canvas, Google Calendar, Slack) - Implement conversation history manually and with built-in methods - Introduce statefulness to an agent - Run agent code safely in a sandboxed environment - Build a simple two-agent pipeline - Demonstrate a working personal assistant agent and document where it breaks

Note on AI use: AI assistance is not permitted this unit. Debugging help comes from class time and the Slack group (ChatClass). This is intentional — it builds class unity, surfaces your own assumptions, and gives you room to fail. Failure is a core part of this unit, not a grading penalty. Set a time limit on any problem; if you’re stuck, post in Slack or bring it to class unfinished. That is OK.

Daily Schedule:

Day Topic
1 APIs & Your First LLM Call — define an API, make a call, hide your key, configure the model
2 Tools: Structure & Uses — what a tool is, how the agent decides to call it, the @tool decorator, community tools
3 Tool Wrappers & Local Data — tool wrapper structure, local data access, combining web search with local files
4 Response Formats & Datatypes — define response format, pass photos, PDFs, and other datatypes
5 Statefulness & Conversation History — stateless vs stateful agents, LangGraph, manual and built-in memory
6 Connecting to the World via MCP — what MCP is, launch MCP stdio locally, MCP logging
7 MCP from GitHub — get a real MCP off GitHub (Canvas MCP), evaluate it in depth
8 Subagents & Multi-Agent Intro — multiagent overview, subagents vs skills vs handoffs vs routers
Project Build a Powerful Bot — personal assistant using all unit tools: Canvas, Google Calendar, Slack, file system, web

Unit 2

Diagnostics — Why Is It “Bad”? (Weeks 5–8)

Goal: Develop diagnostic tools to peer into the black box of agentic AI, in preparation for building reliable agents.

Topics: - Categories of agent failure: errors, loops, cost explosion, hallucination, unreliability - Observability and tracing with LangSmith - Token and latency analysis - Debugging hallucinations and unexpected behavior - Prompt failure patterns - Cost and loop failure analysis - Identifying and fixing code bugs

Project: Diagnostic Report — Fix a Broken Agent

You are given a pre-built broken agent to diagnose. Your job is to implement tracing, identify every failure, document root causes, and produce a structured diagnostic report. You will implement at least one fix and verify it with traces before presenting to the class.

Outcomes: - Set up LangSmith and capture screenshot traces - Implement tracing features and explain what each one does and why - Measure token usage and latency across tool calls and LLM responses - Identify categories of agent failure from trace data - Distinguish model errors from prompt errors from code bugs - Identify and rewrite failing prompts based on traced results - Implement basic loop guards and context limits - Fix identified bugs and verify with re-tracing - Produce and present a diagnostic report

Note on AI use: AI assistance may be permitted this unit — details TBD. If used, post in Slack what you discussed and any lessons learned.

Daily Schedule:

Day Topic
1 What Makes an Agent “Bad”? — review Unit 1 failures, intro observability, define runs/traces, navigate LangSmith
2 Defining Main Failures — define failure types, compare good vs bad traces
3 LangSmith Setup & Trace Reading — get API key, set up tracing, interpret each step of a trace
4 Token & Latency Analysis — monitor performance, choose evaluation metrics, understand cost growth
5 Evals & Golden Test Sets — build a test set, write success criteria, judge outputs
6 Debugging Failure Patterns — repeatable diagnostic process, categorize and prioritize failures
7 Automation Rules & Alerts — sampling, filters, and alerts to catch bad runs; connect user feedback to trace evidence
8 Diagnosing a Broken Agent — provide a diagnostic report, apply a fix, retest and compare
9 Unit Review & Assessment — unit review and decided testing method
Project Diagnostic Report — trace, document, and fix a pre-built broken agent; present findings to the class

Unit 3

Context Engineering — Make a “Good” Agent (Weeks 9–12)

⚠️ Unit 3 is currently in production. Days will be added as they are finalized.

Goal: Engineer context effectively to produce agents that are reliable, specialized, and capable of operating over long horizons — then prove it in competition.

Topics: - Context engineering vs. prompt engineering - The 4 levels of intent: session, user, organization, world - Simple context management: truncation, summarization, selection, file write, database, isolation/delegation - Context offloading: semantic search, forgetting vector, graph memory - Subagent architecture: parallelization, serialization, concurrency - Checkpointing, human-in-the-loop, and context self-adaptation - Grounding and RAG - Long-horizon task planning - Cost-aware context design

Project: Battle of the Bots

Apply everything from all three units to build a production-ready agent of your own design. Agents compete in structured challenges testing usefulness, resilience, and cost efficiency. The best agent is not necessarily the most powerful — it is the most reliable, the most purposeful, and the most improved from where you started in Unit 1.

Outcomes: - Define context engineering and distinguish it from prompt engineering - Implement truncation and summarization-based context management - Apply context offloading using semantic search or graph memory - Implement parallelized and serialized subagent patterns - Add checkpointing and human-in-the-loop interrupts - Build a RAG pipeline and test retrieval quality - Design agents that maintain state across long multi-step tasks - Apply context engineering improvements to the Unit 1 personal assistant - Compete in Battle of the Bots and articulate design decisions made

Note on AI use: AI assistance may be permitted this unit — details TBD. If used, post in Slack what you discussed and any lessons learned.

Daily Schedule:

Day Topic
1 What Is Context Engineering? — definition, levels of intent, why prompting alone isn’t enough
2 Planning Agentic Systems — distinguish prompt, context, and intent engineering; plan systems on a whiteboard
3 Context Management Strategies — truncate, summarize, select, file write, database, isolate/delegate; implement all three core types
More days coming Check back as Unit 3 is finalized

Credits

This course was created by: James Beeson Ethan Saline Wil Jones Ian Blad Camilla Ramirez Kimberly Juarez