AI Glossary

50 key terms in AI engineering, explained simply.

AI AgentsAgentic AI

Autonomous AI systems that can reason, plan, use tools, and take actions to accomplish goals. They combine LLMs with too...

AttentionDeep Learning

A neural network mechanism that allows models to focus on relevant parts of the input when producing each output. Self-a...

AWQQuantization

A GPU-optimized quantization method that identifies and preserves the most important weights based on activation pattern...

BackpropagationML Fundamentals

The algorithm used to train neural networks. It calculates gradients of the loss function with respect to each weight by...

Batch InferenceInference

Processing multiple inference requests simultaneously to maximize GPU utilization. Continuous batching (used by vLLM) dy...

BedrockCloud

A fully managed AWS service for accessing foundation models (Claude, Llama, Titan) via API. Includes RAG (Knowledge Base...

BERTNLP

Bidirectional Encoder Representations from Transformers. A pre-trained language model by Google that understands context...

Chain-of-ThoughtPrompting

A prompting technique that asks the model to show its reasoning step by step before giving a final answer. Significantly...

ChunkingRAG

The process of splitting documents into smaller pieces (chunks) for storage in a vector database. Chunk size and strateg...

CI/CD for MLMLOps

Continuous Integration and Continuous Deployment adapted for machine learning. Includes automated testing of data, model...

CNNDeep Learning

A neural network architecture designed for processing grid-like data (images, video). Uses convolutional filters to dete...

DockerDevOps

A platform for packaging applications into containers — lightweight, portable, isolated environments that include all de...

DPOAlignment

A simpler alternative to RLHF for aligning LLMs with human preferences. Directly optimizes the model using preference pa...

EmbeddingsML Fundamentals

Dense numerical vector representations of text, images, or other data that capture semantic meaning. Similar items have ...

Feature StoreMLOps

A centralized repository for storing, managing, and serving ML features. Ensures consistency between training and servin...

Few-ShotPrompting

Providing a small number of examples (2-5) in the prompt to guide the model output format and behavior. A key prompt eng...

Fine-TuningLLM Engineering

The process of further training a pre-trained model on domain-specific data to adapt it for a particular task, style, or...

GGUFQuantization

A file format for storing quantized LLM weights that enables running large language models on consumer hardware (CPUs an...

GPTQQuantization

A post-training quantization method for LLMs that uses approximate second-order information to minimize quantization err...

Gradient DescentML Fundamentals

An optimization algorithm that iteratively adjusts model parameters in the direction that reduces the loss function. Var...

HallucinationLLM Engineering

When an LLM generates confident-sounding but factually incorrect information. A fundamental limitation of generative mod...

Hugging FacePlatforms

The largest open-source AI platform. Hosts models, datasets, and spaces. Provides the Transformers library for using pre...

InferenceProduction

The process of running a trained model to generate predictions or outputs. In LLMs, inference means generating text toke...

KubernetesDevOps

An open-source container orchestration platform that automates deployment, scaling, and management of containerized appl...

KV CacheInference

A memory optimization for transformer inference that stores previously computed key-value pairs so they do not need to b...

LangChainFrameworks

A framework for building applications with LLMs. Provides abstractions for chains, agents, retrieval, memory, and tool u...

llama.cppLocal AI

A C/C++ implementation for running LLM inference on consumer hardware. Created by Georgi Gerganov, it enables running mo...

LLMAI Fundamentals

A neural network with billions of parameters trained on massive text datasets to understand and generate human language....

LoRAFine-Tuning

A parameter-efficient fine-tuning technique that freezes the original model weights and trains small rank-decomposition ...

MCPAgentic AI

An open protocol that standardizes how AI models connect to external tools and data sources. Created by Anthropic, it pr...

MLOpsProduction

The practice of deploying, monitoring, and maintaining machine learning models in production. Combines ML engineering, D...

Model DriftMLOps

When a deployed model performance degrades over time because the real-world data distribution changes from what the mode...

OllamaLocal AI

An open-source tool for running LLMs locally on your machine with a single command. Supports Llama, Mistral, Qwen, and o...

PEFTFine-Tuning

A family of techniques that fine-tune only a small subset of model parameters instead of all weights. Includes LoRA, QLo...

PineconeInfrastructure

A managed vector database service for similarity search. Commonly used in RAG systems to store and retrieve document emb...

Prompt EngineeringLLM Engineering

The practice of designing and optimizing input prompts to get desired outputs from LLMs. Techniques include zero-shot, f...

PyTorchFrameworks

An open-source deep learning framework by Meta. The most popular framework for AI research and increasingly for producti...

QLoRAFine-Tuning

A technique that combines 4-bit quantization with LoRA fine-tuning. Enables fine-tuning a 65B parameter model on a singl...

QuantizationOptimization

The process of reducing the precision of model weights (e.g., from 16-bit to 4-bit) to decrease model size and increase ...

RAGLLM Engineering

A technique that enhances LLM responses by retrieving relevant documents from an external knowledge base and including t...

RerankingRAG

A second-stage retrieval step that re-scores and reorders initial search results using a more powerful model. Improves R...

RLHFAlignment

A technique for aligning LLMs with human preferences by training a reward model on human comparisons, then using reinfor...

SageMakerCloud

A fully managed AWS service for building, training, and deploying ML models. Provides notebooks, training jobs, model ho...

TemperatureLLM Engineering

A parameter that controls randomness in LLM outputs. Temperature 0 gives deterministic (most likely) outputs. Higher val...

TokenizationNLP

The process of splitting text into tokens (subwords, words, or characters) that a model can process. Common tokenizers i...

Transfer LearningML Fundamentals

Using a model pre-trained on a large dataset as a starting point for a new task. Instead of training from scratch, you f...

TransformersDeep Learning

A neural network architecture based on self-attention mechanisms. The foundation of all modern LLMs (GPT, BERT, Llama, M...

Vector DatabaseInfrastructure

A database optimized for storing and querying high-dimensional vectors (embeddings). Used in RAG systems to find semanti...

vLLMInfrastructure

An open-source library for high-throughput LLM inference and serving. Uses PagedAttention to manage GPU memory efficient...

Zero-ShotPrompting

Using a model to perform a task it was not explicitly trained for, without any examples. LLMs can do zero-shot classific...