Day 2 — Interactive Workshop

Building with Large Language Models

Advanced Prompting · Tool Calling · RAG — from theory to running code

Rikhil Nellimarla February 2026
Use or scroll to navigate

Quick Recap — Day 1

Everything we built yesterday in 60 seconds

🧬

Evolution

Symbolic AI → ML → DL → Transformers → LLMs

📐

Foundations

Gradient descent, backpropagation, loss landscapes

🔤

Tokenization

BPE subword pieces, ~4 chars ≈ 1 token

📊

Embeddings

Words as vectors: King − Man + Woman ≈ Queen

👁️

Attention

Q·K→weights, V→output. Every token sees every other.

Training

Pre-training → SFT → RLHF. Next-token prediction at scale.

Advanced Prompting

Good prompting = 10× productivity. Using an AI agent well is a skill in itself.

🎯 Real Example: Exam Study Guide

A single well-structured conversation can produce a complete, multi-file study guide with:

  • Topic-by-topic breakdown with explanations
  • Worked problem sets with step-by-step solutions
  • Formula sheets, concept maps, revision checklists
  • Cross-referenced links between related topics

The key? Context + Structure + Iteration. Give the agent your syllabus, question papers, and textbook extracts. Tell it the output format you want. Let it build iteratively.

📝 The Prompting Stack

1 System Prompt — set the role and output format
2 Context — give it everything relevant (docs, examples)
3 Task — be specific about what you want
4 Constraints — format, length, tone, audience
5 Iteration — refine based on output

Chain of Thought

Adding "Let's think step by step" fundamentally changes how the model reasons

❌ Direct Answer

Waiting...

✅ Chain of Thought

Waiting...

Role Prompting

Assigning a persona fundamentally changes tone, expertise, and perspective

Same Question, Different Roles

"Explain quantum computing"

🧑‍🏫 Professor

"Quantum computing leverages quantum mechanical phenomena such as superposition and entanglement to process information..."

👶 5-year-old

"Imagine you have a magic coin that can be heads AND tails at the same time..."

🏴‍☠️ Pirate

"Arrr! Imagine ye have a treasure map where X marks ALL the spots at once, savvy?"

Why It Works

During training, the model learned patterns from text written by people in different roles. When you set a role, you're activating the specific subset of knowledge and style associated with that persona in the model's weights.

👇 Try it live in the next section

🎭 Character Chat

Paste an image URL → AI creates a persona → Chat with the character

Character image will appear here

Character details will appear after analysis...

Tool / Function Calling

LLMs can't browse the web, run code, or query databases — but they can decide to call tools that do

👤

User

🧠

LLM

🤔

Decide: respond
or call a tool?

⚙️

Tool Functions

get_weather(city="London")

Query a weather API

search_database(query="revenue Q4")

Search internal data

run_python(code="2**32")

Execute code in a sandbox

send_email(to="team@co.com", body="...")

Trigger real-world actions

The LLM doesn't execute these functions — it outputs a structured JSON describing what function to call and with what arguments. Your application code executes the function and feeds the result back.

🎮 Dungeon Escape

A text adventure powered by tool calling — watch the LLM choose functions in real-time

⛓️

Dungeon

🚪

Entrance

📚

Library

🔒

Treasure

🎒 Inventory

Empty

⚙️ Tool Call Log

You awaken in a dusty entrance hall. Cobwebs cling to cold stone walls. A rusty torch flickers nearby. Something tells you there's treasure to be found — and a way out.

Type a command like "look around", "go east", or "pick up the torch"

Why RAG?

LLMs have limits. RAG extends them.

📅 Knowledge Cutoff

LLMs are frozen in time. GPT-4o's training data ends at a specific date. It doesn't know what happened yesterday.

RAG fix: Retrieve up-to-date documents at query time.

🫥 Hallucination

LLMs confidently generate plausible-sounding but entirely made-up facts. There's no "I don't know" default.

RAG fix: Ground responses in retrieved source material with citations.

🏢 Domain Specificity

Your company's internal docs, proprietary data, and domain knowledge aren't in the training data.

RAG fix: Index your own documents and retrieve relevant context per query.
RAG = Retrieval + Augmented + Generation

Don't fine-tune the model. Give it the right context at inference time.

📊 RAG Pipeline — Live

Upload a document, watch it get chunked and embedded, then query it

📄 Upload
✂️ Chunk
🔢 Embed
💾 Store
🔍 Query
📥 Retrieve
🤖 Generate

1. Upload Document

or

Chunks (0)

2. Ask a Question

Retrieved Chunks

Generated Answer

Upload a document, then ask a question...

RAG Architectures

From simple retrieval to advanced multi-step pipelines

Things to Consider

Chunk Size

Too small = missing context. Too large = diluted relevance. 200-500 tokens is a good start.

Overlap

Overlapping chunks ensure information at boundaries isn't lost. 10-20% overlap is common.

Embedding Model

OpenAI, Cohere, or open-source (Sentence-BERT). Match your domain and cost.

Vector DB

Pinecone, Weaviate, Chroma, pgvector, Qdrant. Pick based on scale and hosting needs.

Re-ranking

Retrieved chunks ranked by embedding similarity may not be the most useful. A re-ranker (Cohere, cross-encoder) improves precision.

Evaluation

Measure: retrieval recall, answer correctness, faithfulness (no hallucination), and latency.

Day 2 — Recap

📐 What We Covered

  • Advanced prompting (context + structure + iteration)
  • Chain of Thought reasoning
  • Role prompting & character personas
  • Tool / function calling
  • RAG: why, how, and architectures
  • Live demos: character chat, dungeon escape, RAG pipeline

🚀 What's Next?

  • Build your own RAG pipeline
  • Experiment with pydantic_ai agents
  • Explore multi-agent systems
  • Fine-tuning vs RAG tradeoffs