Autonomous AI systems that can reason, plan, use tools, and take actions to accomplish goals. They combine LLMs with too...
AttentionDeep LearningA neural network mechanism that allows models to focus on relevant parts of the input when producing each output. Self-a...
AWQQuantizationA GPU-optimized quantization method that identifies and preserves the most important weights based on activation pattern...
BackpropagationML FundamentalsThe algorithm used to train neural networks. It calculates gradients of the loss function with respect to each weight by...
Batch InferenceInferenceProcessing multiple inference requests simultaneously to maximize GPU utilization. Continuous batching (used by vLLM) dy...
BedrockCloudA fully managed AWS service for accessing foundation models (Claude, Llama, Titan) via API. Includes RAG (Knowledge Base...
BERTNLPBidirectional Encoder Representations from Transformers. A pre-trained language model by Google that understands context...
Chain-of-ThoughtPromptingA prompting technique that asks the model to show its reasoning step by step before giving a final answer. Significantly...
ChunkingRAGThe process of splitting documents into smaller pieces (chunks) for storage in a vector database. Chunk size and strateg...
CI/CD for MLMLOpsContinuous Integration and Continuous Deployment adapted for machine learning. Includes automated testing of data, model...
CNNDeep LearningA neural network architecture designed for processing grid-like data (images, video). Uses convolutional filters to dete...
DockerDevOpsA platform for packaging applications into containers — lightweight, portable, isolated environments that include all de...
DPOAlignmentA simpler alternative to RLHF for aligning LLMs with human preferences. Directly optimizes the model using preference pa...
EmbeddingsML FundamentalsDense numerical vector representations of text, images, or other data that capture semantic meaning. Similar items have ...
Feature StoreMLOpsA centralized repository for storing, managing, and serving ML features. Ensures consistency between training and servin...
Few-ShotPromptingProviding a small number of examples (2-5) in the prompt to guide the model output format and behavior. A key prompt eng...
Fine-TuningLLM EngineeringThe process of further training a pre-trained model on domain-specific data to adapt it for a particular task, style, or...
GGUFQuantizationA file format for storing quantized LLM weights that enables running large language models on consumer hardware (CPUs an...
GPTQQuantizationA post-training quantization method for LLMs that uses approximate second-order information to minimize quantization err...
Gradient DescentML FundamentalsAn optimization algorithm that iteratively adjusts model parameters in the direction that reduces the loss function. Var...
HallucinationLLM EngineeringWhen an LLM generates confident-sounding but factually incorrect information. A fundamental limitation of generative mod...
Hugging FacePlatformsThe largest open-source AI platform. Hosts models, datasets, and spaces. Provides the Transformers library for using pre...
InferenceProductionThe process of running a trained model to generate predictions or outputs. In LLMs, inference means generating text toke...
KubernetesDevOpsAn open-source container orchestration platform that automates deployment, scaling, and management of containerized appl...
KV CacheInferenceA memory optimization for transformer inference that stores previously computed key-value pairs so they do not need to b...
LangChainFrameworksA framework for building applications with LLMs. Provides abstractions for chains, agents, retrieval, memory, and tool u...
llama.cppLocal AIA C/C++ implementation for running LLM inference on consumer hardware. Created by Georgi Gerganov, it enables running mo...
LLMAI FundamentalsA neural network with billions of parameters trained on massive text datasets to understand and generate human language....
LoRAFine-TuningA parameter-efficient fine-tuning technique that freezes the original model weights and trains small rank-decomposition ...
MCPAgentic AIAn open protocol that standardizes how AI models connect to external tools and data sources. Created by Anthropic, it pr...
MLOpsProductionThe practice of deploying, monitoring, and maintaining machine learning models in production. Combines ML engineering, D...
Model DriftMLOpsWhen a deployed model performance degrades over time because the real-world data distribution changes from what the mode...
OllamaLocal AIAn open-source tool for running LLMs locally on your machine with a single command. Supports Llama, Mistral, Qwen, and o...
PEFTFine-TuningA family of techniques that fine-tune only a small subset of model parameters instead of all weights. Includes LoRA, QLo...
PineconeInfrastructureA managed vector database service for similarity search. Commonly used in RAG systems to store and retrieve document emb...
Prompt EngineeringLLM EngineeringThe practice of designing and optimizing input prompts to get desired outputs from LLMs. Techniques include zero-shot, f...
PyTorchFrameworksAn open-source deep learning framework by Meta. The most popular framework for AI research and increasingly for producti...
QLoRAFine-TuningA technique that combines 4-bit quantization with LoRA fine-tuning. Enables fine-tuning a 65B parameter model on a singl...
QuantizationOptimizationThe process of reducing the precision of model weights (e.g., from 16-bit to 4-bit) to decrease model size and increase ...
RAGLLM EngineeringA technique that enhances LLM responses by retrieving relevant documents from an external knowledge base and including t...
RerankingRAGA second-stage retrieval step that re-scores and reorders initial search results using a more powerful model. Improves R...
RLHFAlignmentA technique for aligning LLMs with human preferences by training a reward model on human comparisons, then using reinfor...
SageMakerCloudA fully managed AWS service for building, training, and deploying ML models. Provides notebooks, training jobs, model ho...
TemperatureLLM EngineeringA parameter that controls randomness in LLM outputs. Temperature 0 gives deterministic (most likely) outputs. Higher val...
TokenizationNLPThe process of splitting text into tokens (subwords, words, or characters) that a model can process. Common tokenizers i...
Transfer LearningML FundamentalsUsing a model pre-trained on a large dataset as a starting point for a new task. Instead of training from scratch, you f...
TransformersDeep LearningA neural network architecture based on self-attention mechanisms. The foundation of all modern LLMs (GPT, BERT, Llama, M...
Vector DatabaseInfrastructureA database optimized for storing and querying high-dimensional vectors (embeddings). Used in RAG systems to find semanti...
vLLMInfrastructureAn open-source library for high-throughput LLM inference and serving. Uses PagedAttention to manage GPU memory efficient...
Zero-ShotPromptingUsing a model to perform a task it was not explicitly trained for, without any examples. LLMs can do zero-shot classific...