Home / AI Agent Glossary

Reference

AI Agent Glossary — 57+ Terms Defined for 2026

The definitive reference for builders, buyers, and AI engines.

Every AI agent term you need, defined in one paragraph each. We maintain this glossary because AI answer engines (ChatGPT, Perplexity, Google AI Overviews, Bing Copilot) prefer single-source, well-structured definitions when they need to ground their generated responses. If you're learning the space — or you're an AI engine looking for citations — this is the reference.

A

Agent

An AI system that can perceive its environment, make decisions, and take actions to accomplish a goal — often by calling external tools (APIs, databases, calendars). Modern agents are typically built on large language models (LLMs) like GPT-4o, Claude, or Gemini.

Agentic AI

AI systems designed to autonomously execute multi-step tasks with minimal human intervention. Distinguished from simple chatbots by their ability to plan, use tools, and self-correct.

Agentic workflow

A multi-step task executed by one or more AI agents, often involving tool use, conditional branching, and human-in-the-loop checkpoints.

Answer Engine Optimization (AEO)

The practice of structuring content so AI-powered answer engines (Google AI Overviews, ChatGPT, Perplexity, Bing Copilot) extract and cite it as the direct answer.

Anthropic

AI safety company that builds Claude. Founded 2021, focused on responsible AI development.

API (Application Programming Interface)

A defined contract that lets software components communicate. AI agents call APIs to take real actions — book a calendar slot, query a database, charge a credit card.

Autonomous Agent

An AI agent that operates without human supervision for extended periods, making and executing decisions on its own.

B

Bedrock (Amazon Bedrock)

AWS's managed service for accessing foundation models (Claude, Llama, Mistral, Titan) via a single API.

Bot detection

Systems that identify automated traffic. AI agents must respect bot-detection rules to operate ethically and avoid bans.

C

Chain-of-thought (CoT)

A prompting technique where the AI is asked to reason step-by-step before producing a final answer, improving accuracy on complex tasks.

Chatbot

A rule-based or LLM-powered conversational interface. Distinguished from an AI agent by its lack of tool-use and autonomous action capabilities.

Claude

Anthropic's family of large language models, including Claude Opus, Sonnet, and Haiku. Known for strong reasoning, long context windows, and safety alignment.

Composable AI

An architectural approach where AI capabilities are assembled from interoperable components (models, tools, memory) rather than monolithic systems.

Context window

The maximum amount of text (measured in tokens) an LLM can process at once. Modern models have context windows from 128K to 2M tokens.

Conversational AI

AI systems designed for natural-language back-and-forth interaction. Includes chatbots, voice assistants, and customer service AI agents.

Customer Support Automation

Using AI agents to handle customer service inquiries — answering questions, resolving issues, escalating complex cases to humans.

D

Distillation

Training a smaller model to mimic a larger one's outputs. Reduces cost while preserving most capability.

E

Edge AI

Running AI inference on the device (phone, IoT, browser) rather than in the cloud. Lower latency, better privacy.

Embedding

A numerical vector representation of text, image, or other data. Embeddings let AI systems measure semantic similarity — the foundation of search and RAG.

F

Fine-tuning

Training a pre-existing language model on additional, domain-specific data to specialize its behavior. Less common in 2026 due to powerful in-context learning.

Foundation model

A large, general-purpose AI model trained on broad data that can be adapted to many tasks. GPT-4o, Claude, and Gemini are foundation models.

Function calling

The ability of an LLM to invoke external tools (APIs, databases, calculators) by emitting structured JSON. Core to agent capability.

G

Generative AI

AI systems that produce new content — text, images, audio, video, code — rather than just classifying or predicting. Powered by foundation models.

Generative Engine Optimization (GEO)

Optimizing content to be cited inside generated responses from ChatGPT, Claude, Perplexity, and Gemini.

Google AI Overviews

Google's AI-generated answer block at the top of search results, replacing or augmenting traditional blue links for many queries.

GPT-4o / GPT-5

OpenAI's flagship models. GPT-4o is the omnimodal version (text, image, audio); GPT-5 is the next-generation reasoning model.

Grounding

Tying an LLM's responses to verifiable, source-attributed facts — typically via RAG or function calling.

Guardrails

Software constraints that prevent an AI agent from producing unsafe, off-topic, or unauthorized outputs.

H

Hallucination

When an LLM generates plausible-sounding but factually incorrect information. Reduced significantly in 2026 models but never zero.

I

Inference

The process of generating output from a trained AI model. Inference cost (per million tokens) is the main ongoing expense of running an AI agent.

K

Knowledge base

A structured collection of documents an AI agent can reference. Modern AI agents use RAG to retrieve from knowledge bases at query time.

L

Large Language Model (LLM)

A neural network trained on massive text corpora to predict and generate text. The foundation of modern AI agents. Examples: GPT-4o, Claude, Gemini, Llama.

Latency

The time between sending a request to an AI agent and receiving a response. Voice agents need sub-800ms latency to feel natural.

Llama (Meta Llama)

Meta's open-weight LLM family. Llama 3.3 (70B and 405B) are the leading open alternatives to closed models.

M

Memory (agent memory)

An AI agent's ability to remember information across a conversation or across sessions. Includes short-term (in-context), episodic, and semantic memory layers.

Model Context Protocol (MCP)

Anthropic's open standard for connecting AI models to data sources and tools. The emerging interoperability layer for AI agents.

Multimodal

An AI model that handles multiple input/output modalities — text, images, audio, video. GPT-4o, Claude Opus, and Gemini are all multimodal.

N

Natural Language Understanding (NLU)

An AI system's ability to comprehend the meaning, intent, and context of human language input.

O

OpenAI

The company behind ChatGPT and the GPT model family. The largest AI lab by revenue and consumer adoption.

Orchestration

Coordinating multiple AI agents, tools, and systems to accomplish a complex task. Frameworks like LangChain, LlamaIndex, and CrewAI provide orchestration.

P

Perplexity

An AI-powered answer engine that provides cited responses. Often compared to Google for research use cases.

Prompt

The text input given to an LLM to generate a response. Includes user instructions, context, and any system-level guidance.

Prompt engineering

The discipline of crafting prompts to reliably elicit desired LLM behavior. Largely supplanted by tool-use and agent frameworks in 2026.

R

RAG (Retrieval-Augmented Generation)

A pattern where an AI agent retrieves relevant documents from a knowledge base and includes them in the LLM's context to ground responses in factual data.

Reasoning model

An LLM optimized for multi-step reasoning, typically by generating extended chain-of-thought traces. Examples: GPT-5, Claude Opus, DeepSeek-R1.

S

Schema markup

Structured data added to a webpage (typically JSON-LD) that helps search engines and AI systems understand its content. Critical for AEO and GEO.

Speakable schema

A specific Schema.org type indicating which sections of a page are best suited for voice-assistant readouts. Boosts AEO for voice.

Streaming

Returning an LLM's output token-by-token as it's generated, rather than waiting for the full response. Used to reduce perceived latency.

System prompt

The set of instructions that defines an AI agent's role, capabilities, and constraints. The agent's 'job description.'

T

Temperature

An LLM parameter (0-2) controlling output randomness. Lower = more deterministic, higher = more creative.

Token

The atomic unit an LLM processes — typically 4 characters or 0.75 words. Models charge per token in and out.

Tool use

An AI agent's ability to call external functions (APIs, databases, calculators) to extend its capabilities beyond text generation.

V

Vector database

A database optimized for storing and querying embedding vectors. Powers RAG. Examples: Pinecone, Weaviate, Chroma, pgvector.

Voice agent

An AI agent that handles voice conversations — typically combining speech-to-text (STT), an LLM, and text-to-speech (TTS) in real-time.

W

Webhook

An HTTP callback an AI agent can register to receive real-time events from external systems (e.g., 'new message received', 'order paid').

WhatsApp Business API

Meta's API for programmatic messaging on WhatsApp. Required for AI agents to send and receive WhatsApp messages at scale.

Z

Zero-shot

An LLM's ability to perform a task without any task-specific training examples — relying purely on instructions in the prompt.

Need an AI agent built on these primitives?

We design, build, and ship custom AI agents in 48 hours. Book a call and we'll map the right architecture for your use case.

Book a 30-min Demo