AI Glossary

A

Agent (AI)

Also: autonomous agent, ai agent, intelligent agent

An AI agent is an autonomous system that perceives its environment, maintains internal state, and takes actions to achieve specific goals through a perceive-plan-act-observe cycle. Agents can use tools, interact with external systems, and adapt their behavior based on feedback and changing conditions.

Artificial Intelligence

Also: AI, machine intelligence, artificial general intelligence, AGI

Artificial Intelligence is the simulation of human intelligence in machines programmed to think and learn. AI systems can perceive their environment, reason about information, make decisions, and adapt their behavior to achieve specific goals across diverse domains.

Attention Mechanism

Also: attention, self-attention, multi-head attention, scaled dot-product attention

The attention mechanism is a neural network component that allows models to focus on relevant parts of input sequences when making predictions. Introduced to address limitations of RNNs, attention enables models to directly access any input position and has become the foundation of transformer architectures, revolutionizing natural language processing and enabling the development of powerful models like BERT and GPT.

B

Backpropagation

Also: backprop, error backpropagation, backward propagation of errors

Backpropagation is the fundamental algorithm for training neural networks, using the chain rule of calculus to efficiently compute gradients of the loss function with respect to network parameters. It propagates error information backward through layers, enabling optimization algorithms to adjust weights and biases to minimize prediction errors.

BERT (Bidirectional Encoder Representations from Transformers)

Also: BERT, bidirectional encoder, BERT model, masked language model

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model developed by Google that revolutionized natural language understanding by using bidirectional context. Unlike autoregressive models like GPT, BERT can attend to both past and future tokens simultaneously, making it exceptionally effective for understanding tasks like question answering, sentiment analysis, and text classification.

C

Convolutional Neural Network

Also: CNN, ConvNet, convolutional network

A Convolutional Neural Network is a specialized deep learning architecture designed for processing grid-like data such as images. It uses convolutional layers with learnable filters to detect local features, pooling layers to reduce dimensionality, and hierarchical feature extraction to achieve state-of-the-art performance in computer vision tasks.

D

Deep Learning

Also: deep neural networks, deep nets, hierarchical learning

Deep learning is a subset of machine learning using neural networks with multiple hidden layers to automatically learn hierarchical representations of data. It has revolutionized AI by achieving human-level performance in image recognition, natural language processing, and other complex tasks through end-to-end learning from raw data.

E

Embedding

Also: embeddings, vector embedding, word embedding, neural embedding

An embedding is a dense vector representation that captures semantic meaning of discrete objects like words, sentences, or images in a continuous numerical space. Embeddings enable machine learning models to process symbolic data by mapping similar concepts to nearby points in high-dimensional vector space, forming the foundation for modern NLP, recommendation systems, and similarity search applications.

F

Fine-tuning

Also: fine-tuning, model fine-tuning, transfer learning, supervised fine-tuning

Fine-tuning is the process of adapting a pre-trained machine learning model to specific tasks or domains by continuing training on task-specific data. This transfer learning approach leverages the general knowledge learned during pre-training while specializing the model for particular applications, achieving better performance with less data and computational resources than training from scratch.

G

GAN (Generative Adversarial Network)

Also: GAN, generative adversarial network, adversarial network, generative model

A Generative Adversarial Network (GAN) is a machine learning architecture consisting of two neural networks—a generator and a discriminator—that compete against each other in a game-theoretic framework. The generator learns to create realistic synthetic data while the discriminator learns to distinguish between real and generated data, leading to increasingly sophisticated data generation capabilities across domains like images, text, and audio.

GPT (Generative Pre-trained Transformer)

Also: GPT, generative pre-trained transformer, GPT model, OpenAI GPT

GPT (Generative Pre-trained Transformer) is a series of large language models developed by OpenAI that use transformer architecture for autoregressive text generation. Starting with GPT-1 in 2018, the series evolved through GPT-2, GPT-3, and GPT-4, demonstrating how scaling model size and training data leads to emergent capabilities in language understanding, reasoning, and code generation.

Gradient Descent

Also: gradient descent optimization, steepest descent, gradient-based optimization

Gradient descent is a fundamental optimization algorithm used to minimize loss functions in machine learning by iteratively adjusting parameters in the direction of steepest decrease. It forms the backbone of neural network training and most machine learning optimization, using the gradient to guide parameter updates toward optimal solutions.

L

Large Language Model (LLM)

Also: LLM, large language models, foundation model, generative language model

A Large Language Model (LLM) is a neural network trained on vast amounts of text data to understand and generate human language. Modern LLMs like GPT-4, Claude, and Gemini use transformer architectures with billions of parameters to perform diverse language tasks including text generation, question answering, code writing, and reasoning through in-context learning and emergent abilities.

Long Short-Term Memory

Also: LSTM, LSTM network, long short-term memory network

Long Short-Term Memory is a specialized recurrent neural network architecture designed to overcome the vanishing gradient problem in traditional RNNs. It uses gating mechanisms and separate cell states to selectively remember and forget information over long sequences, making it highly effective for tasks requiring long-term temporal dependencies.

M

Machine Learning

Also: ML, statistical learning, automated learning

Machine Learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed for every task. It uses algorithms to identify patterns, make predictions, and improve performance through experience, forming the foundation for most modern AI applications.

N

Neural Network

Also: artificial neural network, ANN, neural net, connectionist model

A neural network is a computational model inspired by biological neural networks, consisting of interconnected nodes (neurons) that process and transmit information. These networks learn patterns from data by adjusting connection weights through training, forming the foundation of modern deep learning and AI systems.

O

Overfitting

Also: model overfitting, overtraining, high variance

Overfitting occurs when a machine learning model learns the training data too well, memorizing specific examples rather than generalizing patterns. This results in excellent performance on training data but poor performance on new, unseen data, indicating the model has failed to capture the underlying relationships that enable good generalization.

P

Prompt Engineering

Also: prompt engineering, prompting, prompt design, prompt optimization

Prompt engineering is the practice of designing and optimizing text prompts to effectively communicate with large language models and guide them toward desired outputs. This discipline combines understanding of model behavior, task specification, and iterative refinement to achieve better performance without model training, using techniques like few-shot learning, chain-of-thought reasoning, and structured prompting formats.

R

RAG (Retrieval-Augmented Generation)

Also: RAG, retrieval-augmented generation, retrieval augmented generation, RAG system

Retrieval-Augmented Generation (RAG) is a hybrid AI approach that combines large language models with external knowledge retrieval systems. By first retrieving relevant documents from a knowledge base and then using that context to generate responses, RAG systems can provide more accurate, up-to-date, and factually grounded answers while reducing hallucinations and enabling access to information beyond the model's training data.

Recurrent Neural Network

Also: RNN, recurrent network, sequential neural network

A Recurrent Neural Network is a type of neural network designed for processing sequential data by maintaining internal memory through recurrent connections. RNNs can handle variable-length sequences and capture temporal dependencies, making them ideal for tasks like natural language processing, speech recognition, and time series analysis.

Reinforcement Learning

Also: RL, sequential decision making, reward-based learning

Reinforcement learning is a machine learning paradigm where agents learn optimal behaviors through trial-and-error interactions with an environment, receiving rewards or penalties for their actions. The agent discovers strategies to maximize cumulative rewards over time without explicit supervision, making it ideal for sequential decision-making problems.

S

Supervised Learning

Also: supervised machine learning, predictive modeling, labeled learning

Supervised learning is a machine learning approach where algorithms learn from labeled training data to make predictions on new, unseen data. The system learns to map inputs to correct outputs using examples, enabling tasks like classification and regression through pattern recognition in labeled datasets.

T

Tokenization

Also: tokenization, tokenizer, text tokenization, subword tokenization

Tokenization is the process of breaking down text into smaller units called tokens (words, subwords, or characters) that can be processed by machine learning models. Modern tokenization methods like Byte-Pair Encoding (BPE) and SentencePiece enable language models to handle diverse vocabularies efficiently while managing out-of-vocabulary words and supporting multilingual text processing.

Transformer

Also: transformer model, transformer architecture

The Transformer is a deep learning architecture introduced in 2017 that revolutionized natural language processing through its attention mechanism. It enables parallel processing of sequences and forms the foundation for modern language models like GPT, BERT, and T5 by allowing models to focus on relevant parts of input sequences regardless of their position.

U

Unsupervised Learning

Also: unsupervised machine learning, exploratory data analysis, pattern discovery

Unsupervised learning is a machine learning approach that finds hidden patterns and structures in data without labeled examples or target outputs. It discovers relationships, groups similar data points, and reduces dimensionality to reveal insights from unlabeled datasets through techniques like clustering and dimensionality reduction.

Filter by Category

Filter by Tags

A

B

C

D

E

F

G

L

M

N

O

P

R

S

T

U