Glossary
A
Accuracy
Accuracy is a measure of the degree of correctness of a model’s predictions. It is used to evaluate the performance of a model by comparing its predicted output with the actual output. The accuracy of a model is expressed as a percentage, with higher values indicating a more accurate model.
Algorithm
An algorithm is a set of steps or instructions used to solve a specific problem or accomplish a task. Algorithms are used to train models, process data, and make predictions.
Annotation
Annotation refers to the process of adding information to data in order to make it usable for machine learning algorithms.
Application Programming Interface (API)
APIs are used to provide access to pre-trained models or services, allowing developers to integrate AI capabilities into their own applications without having to build the models from scratch.
Artificial Intelligence
The simulation of human intelligence processes by machines, especially computer systems.
Autoencoder
An autoencoder is a type of neural network used for unsupervised learning, typically for dimensionality reduction or feature learning. An autoencoder consists of two parts: an encoder that maps the input data to a lower-dimensional representation, and a decoder that maps the lower-dimensional representation back to the original data space.
Automated Speech Recognition
Automated speech recognition, also known as speech-to-text, is a technology that allows computers to transcribe spoken language into text. This technology has numerous applications, including voice-controlled devices, speech-to-text dictation, and call center services.
B
Batch
А batch refers to a set of data instances that are processed together during training or evaluation. Batching is used to optimize the processing time and memory usage of machine learning algorithms, as processing large amounts of data at once can be computationally expensive.
Bayes’s Theorem
Bayes’ theorem is a mathematical formula used in probability and statistics to calculate the probability of an event based on prior knowledge or information. It is used in Bayesian methods to incorporate prior knowledge into the model and make predictions based on uncertain or incomplete information.
Bias (Inductive Bias, Confirmation Bias)
Bias refers to a systematic error or deviation from the truth in a model’s predictions. Inductive Bias refers to the assumptions that a machine learning model makes about the underlying structure of the data, while confirmation Bias refers to the tendency of a model to favor outcomes that confirm its existing beliefs.
Bias-Variance Tradeoff
The Bias-Variance Tradeoff is a fundamental concept in machine learning that refers to the tradeoff between a model’s ability to fit the training data (bias) and its ability to generalize to new, unseen data (variance). High Bias models tend to underfit the data and have poor predictive performance, while high-variance models tend to overfit the data and have poor generalization performance. The goal of model selection is to find a balance between bias and variance that results in good predictive performance.
Boosting
Boosting is an ensemble learning method used to improve the performance of machine learning models. It works by combining multiple weak models into a single strong model, where each weak model focuses on correcting the mistakes of the previous model.
Bounding Box
A bounding box is a rectangular box that surrounds an object in an image. Bounding boxes are used to annotate objects in images and are used as input to object detection and segmentation models.
C
Chatbot
A chatbot is a computer program designed to simulate conversation with human users, typically through messaging applications, websites, or mobile apps. Chatbots use natural language processing and machine learning algorithms to understand and respond to user inputs.
Classification
Classification is a type of machine learning task where the goal is to predict the class or category of an input data instance based on its features.
Clustering
Clustering is a type of machine learning task where the goal is to group similar data instances together into clusters. Clustering algorithms do not use labeled data, and the goal is to find structure in the data based on similarity or distance measures.
Cold-Start
Cold-start refers to the scenario where a machine learning model has limited or no prior information about a new user, item, or context.
Collaborative Filtering
Collaborative filtering is a recommendation system that uses the preferences of users to make recommendations to other users. Collaborative filtering algorithms work by finding patterns in user-item interactions and making recommendations based on similar users or items.
Computer Vision
Computer vision is a field of artificial intelligence that deals with the simulation of visual perception and recognition by computers.
Contributor
Contributors are individuals who contribute code, bug reports, or other contributions to the project.
Central Processing Unit (CPU)
A central processing unit (CPU) is the primary component of a computer that performs most of the processing. They are used to perform the computationally intensive tasks involved in training and evaluating machine learning models.
D
Data
Data refers to information that is collected, stored, and processed for a specific purpose. It can be structured, such as tables in a database, or unstructured, such as text or images.
Decision Tree
A decision tree is a type of machine learning model used for both regression and classification tasks. Decision trees are composed of nodes and branches, where each node represents a decision or test on the input data, and each branch represents the outcome of the decision.
Deep Learning
Deep learning is a subfield of machine learning that focuses on the development of deep neural networks, which are composed of multiple layers of interconnected nodes.
E
Embedding
Embeddings are used to represent discrete objects, such as words or images, as dense vectors in a lower-dimensional space.
Ensemble Methods
Ensemble methods are a type of machine learning technique that combine multiple individual models to produce a single, more accurate model. Ensemble methods can be used for both classification and regression tasks, and the goal is to improve the performance of the model by combining the strengths of multiple models.
Entropy
Entropy is used in decision tree algorithms as a criterion for splitting the data into smaller groups based on their class labels. The goal is to split the data in a way that maximizes the reduction in entropy, resulting in more pure and well-separated groups.
F
Feature
Features are the input variables used to make predictions.
Feature Learning
Feature learning is a type of machine learning task where the goal is to automatically learn a compact and meaningful representation of the input data, rather than using hand-designed features.
False Positive
A false positive is a prediction that an instance belongs to a certain class when it actually does not. False positives are also known as type I errors.
False Negative
A false negative is a prediction that an instance does not belong to a certain class when it actually does. False negatives are also known as type II errors.
F-Score
The F-score, also known as the F1-score, is a measure of the performance of a binary classification model. The F-score is the harmonic mean of the precision and recall of the model, where precision is the number of true positive predictions divided by the sum of true positive and false positive predictions, and recall is the number of true positive predictions divided by the sum of true positive and false negative predictions.
G
Garbage In, Garbage Out
The phrase “garbage in, garbage out” refers to the idea that if the data used to train a machine learning model is of poor quality, then the predictions made by the model will also be of poor quality.
Genetic Algorithm
A genetic algorithm is a type of optimization algorithm that is inspired by the principles of natural selection and genetics. Genetic algorithms are used to solve optimization problems, such as finding the global minimum or maximum of a function. They work by generating a population of candidate solutions, evaluating their fitness, and iteratively recombining and mutating the solutions to produce new generations of solutions.
Generative Adversarial Networks (GANs)
Generative adversarial networks (GANs) are a type of machine learning model used for generative tasks, such as synthesizing new data instances that are similar to a training set. The goal of training a GAN is to find a balance between the generator and discriminator, where the generator is able to synthesize high-quality data that is indistinguishable from the real data.
Graphic Processing Unit (GPU)
A graphic processing unit (GPU) is a specialized processor designed for handling graphics and video processing. GPUs can perform many calculations in parallel and are well-suited for matrix operations.
Ground Truth
Ground truth refers to the correct or true value or label of a data instance. It is used to evaluate the performance of a model by comparing its predictions to the actual or true values. Ground truth is also used to label and annotate data, such as in image or speech recognition tasks, to provide a reference for the model to learn from.
H
Hybrid AI
Hybrid AI refers to the combination of multiple AI techniques to solve a single problem. Hybrid AI systems often combine traditional rule-based systems, statistical methods, and machine learning algorithms to achieve a desired outcome.
Hyperparameter
A hyperparameter is a parameter that is set prior to training a machine learning model and is not learned from the data. Hyperparameters control the behavior of the model, such as its complexity, the size of its hidden layers, or the learning rate.
I
ImageNet
ImageNet is a large dataset of labeled images that is widely used for training and evaluating computer vision models. ImageNet contains over 14 million images and over 20,000 categories and has become a benchmark for image recognition and object detection tasks.
Image Recognition
Image recognition is a computer vision task that involves recognizing objects, people, scenes, or other elements in an image. Image recognition models use machine learning algorithms to learn from a large dataset of labeled images and make predictions based on the visual content of an image.
Inference
Inference refers to the process of making predictions or decisions based on a trained machine learning model. Inference involves applying the model to new, unseen data and using the model’s learned relationships between the input features and the target outputs to make predictions.
Insight Engines
Insight engines are systems that are designed to provide insights and recommendations based on large amounts of data. Insight engines use machine learning algorithms and natural language processing techniques to analyze data and provide actionable insights.
J
K
Knowledge Graph
A knowledge graph is a data structure that represents entities and their relationships in a graph format. Knowledge graphs are used to represent and organize knowledge.
Knowledge Model
A knowledge model is a type of artificial intelligence model that is designed to represent and reason knowledge. Knowledge models can be used for a variety of tasks, including question answering, recommendation systems, and natural language processing. Knowledge models can be represented in a variety of formats, including knowledge graphs, semantic networks, and probabilistic graphical models.
L
Language Data
Language data refers to the text data used for natural language processing tasks, such as text classification, machine translation, and question answering.
Layer (Hidden Layer)
A layer refers to a set of interconnected nodes that perform a specific computation. Neural networks are composed of multiple layers, where the input layer receives the input data and the output layer produces the predictions. Hidden layers are the layers between the input and output layers, and are used to extract complex, high-level representations of the input data.
Learning Rate
The learning rate is a hyperparameter that controls the rate at which a machine learning model learns from the training data. The learning rate determines the step size of the updates to the model’s parameters, and affects the speed and stability of the learning process.
Linked Data
Linked data refers to a set of best practices for publishing and interlinking data on the Web. It involves representing data as a set of interlinked entities, where each entity is identified by a unique URI and can be linked to other entities to represent relationships.
Logit Function
The logit function maps a real-valued input to the range of 0 to 1, which can be interpreted as a probability. The logit function is used in logistic regression models to model the relationship between the input features and the target binary output.
M
Machine Learning
Machine learning is a type of artificial intelligence that enables computers to learn from data and make predictions or decisions without being explicitly programmed. The goal of machine learning is to develop models that can automatically improve their performance over time as they are exposed to more data.
Machine Translation
Machine translation enables computers to automatically translate text from one language to another.
Model
A mathematical representation of the relationship between the input features and the target output. Models can be used for a wide range of tasks, including classification, regression, clustering, and dimensionality reduction.
N
Natural Language Generation (NLG)
Natural language generation (NLG) is a type of artificial intelligence that enables computers to automatically generate human-like text.
Natural Language Processing (NLP)
Natural language processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans using natural language. NLP involves using machine learning and computational linguistics to process and analyze human language, and includes tasks such as text classification, sentiment analysis, machine translation, and named entity recognition.
Natural Language Understanding (NLU)
Natural language understanding (NLU) is a type of artificial intelligence that involves automatically extracting meaning and context from natural language inputs. NLU models use a combination of machine learning algorithms and computational linguistics to analyze text and identify its meaning, relationships, and context.
Neuron
A neuron is a computational unit that represents a single node in the network. Neurons receive input from other neurons, perform a computation, and pass the result to other neurons. Neural networks are composed of many neurons connected in a graph-like structure.
O
Optical Character Recognition
Optical character recognition (OCR) is a type of computer vision task that involves recognizing and converting printed or handwritten text into a machine-readable format.
Optimization
Optimization is the process of finding the best solution to a problem by adjusting the parameters of a model or algorithm.
Overfitting
Overfitting occurs when a machine learning model is too complex and has learned the noise or random fluctuations in the training data, rather than the underlying relationships. Overfitting results in a model that performs well on the training data but poorly on new, unseen data.
P
Pattern Recognition
Pattern recognition is a type of machine learning task that involves recognizing patterns and regularities in data. Pattern recognition models use machine learning algorithms to learn from labeled data and make predictions based on the input data.
Pooling
Pooling is a type of operation used in convolutional neural networks (CNNs) to reduce the spatial dimensionality of the feature maps generated by convolutional layers. It performs a down-sampling operation by taking the maximum, average, or other summary statistics of the values in a small window of the feature map.
Precision
Precision measures the fraction of positive predictions that are actually correct. Precision is a positive predictive value and is defined as the number of true positive predictions divided by the total number of positive predictions.
Prediction
Prediction refers to the process of using a trained model to make predictions on new, unseen data.
Preprocessing
Preprocessing refers to the steps taken to prepare data for analysis or modeling.
Q
Quantum computing
Quantum computing is a type of computing that uses quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data. Unlike classical computing, which uses binary bits to represent data and perform computations, quantum computing uses quantum bits or qubits.
R
Random Forest
Random forest is an ensemble learning method that combines multiple decision trees to make predictions. In a random forest, a set of decision trees are trained on random subsets of the training data and their predictions are combined through voting or averaging. The use of multiple trees helps to reduce overfitting and improve the stability of the model.
Recall
Recall measures the fraction of positive instances that were correctly predicted. Recall is also known as the true positive rate or sensitivity, and is defined as the number of true positive predictions divided by the total number of actual positive instances.
Recurrent Neural Networks
Recurrent neural networks (RNNs) are a type of artificial neural network designed to handle sequential data. RNNs have a hidden state that is updated at each time step, allowing the network to maintain information about the previous inputs and make predictions based on the entire sequence.
Regression
Regression is a type of machine learning task that involves predicting a continuous output value based on a set of input features. Regression models use machine learning algorithms to learn the relationship between the input features and the target output.
S
Search Query
A search query is a request made to a search engine by a user to find information on the internet.
Sentiment Аnalysis
Sentiment analysis is a type of natural language processing (NLP) task that involves determining the sentiment or emotion expressed in a piece of text.
Statistical Distribution
A statistical distribution is a function that describes the likelihood of different values of a random variable. Statistical distributions are used in statistics to model the distribution of data and can be used to make predictions and inferences about the underlying data.
Strong AI
Strong AI, also known as artificial general intelligence (AGI), is a type of artificial intelligence that can perform any intellectual task that a human can. Strong AI systems have the ability to learn and reason, and can perform tasks that go beyond simple rule-based systems or narrow domain-specific applications.
Supervised Learning
Supervised learning is a type of machine learning task that involves learning from labeled training data to make predictions on new, unseen data. In supervised learning, the model is trained on a labeled dataset, where the target outputs are known, and the goal is to learn the relationships between the input features and the target outputs.
Support Vector Machines (SVM)
Support vector machines (SVMs) are a type of machine learning algorithm used for classification and regression. SVMs find the hyperplane that best separates the data into different classes or predicts the target output.
Synthetic Data
Synthetic data is data that is artificially generated to represent real-world data. Synthetic data can be used to train machine learning models when real-world data is scarce or difficult to obtain, and can be used to test and evaluate the performance of models in a controlled environment.
T
Taxonomy
Taxonomies are used to organize and categorize data, such as text documents, images, or knowledge graphs. Taxonomies can be used to improve the accuracy and efficiency of information retrieval and search systems.
TensorFlow
TensorFlow is an open-source software library for machine learning and deep learning. It provides a flexible platform for building and training machine learning models, and includes a variety of tools and libraries for implementing complex neural network architectures.
Test Data
Test data is a set of data used to evaluate the performance of a machine learning model. Test data is typically separate from the training data, and is used to estimate the generalization error of the model and assess its ability to make predictions on new, unseen data.
Training Data
Training data is a set of data used to train a machine learning model. Training data consists of input features and corresponding target outputs, and is used by the model to learn the relationships between the input features and the target outputs.
Transfer Learning
Transfer learning is a technique in machine learning where a model trained on one task is fine-tuned or adapted to perform another related task. Transfer learning can be used to improve the performance of a model when there is limited training data available, by leveraging the knowledge learned from a related task.
Turing Test
The Turing Test is a measure of a machine’s ability to exhibit intelligent behavior that is indistinguishable from a human.
U
Uncertainty
Uncertainty refers to the lack of certainty in the predictions made by a machine learning model, or the lack of certainty in the underlying data used to train the model.
Unstructured Data
Unstructured data is a type of data that does not have a pre-defined format or structure, and does not fit neatly into traditional databases or data structures.
Unsupervised Learning
Unsupervised learning is a type of machine learning task that involves learning patterns and relationships in data without labeled target outputs. In unsupervised learning, the model is trained on a dataset without any prior knowledge of the target outputs, and the goal is to discover patterns and structure in the data.
V
Validation
Validation is the process of evaluating the performance of a machine learning model on a separate dataset that has not been used for training. Validation is used to estimate the generalization error of the model and to assess its ability to make predictions on new, unseen data.
Variance
Variance is a measure of the variability of the predictions made by a model.
Variation
Variation refers to the differences in the input features or target outputs in the training data.
Visual Recognition
Visual recognition is a type of artificial intelligence task that involves recognizing objects, scenes, and people in images or videos. Visual recognition models use computer vision algorithms to analyze images and identify objects, scenes, and people based on their visual characteristics.
W
Weak AI
Weak AI, also known as narrow AI or artificial narrow intelligence (ANI), is a type of artificial intelligence that is designed to perform a specific task or set of tasks. Weak AI systems are limited in their scope and ability, and are not capable of performing any intellectual task that a human can.
Web Crawler
A web crawler is a type of software program that is designed to automatically explore the internet and collect information.
Web Scraper
A web scraper is a type of software program that is designed to extract data from websites. Web scrapers are used to collect information from websites that do not have APIs or other means of accessing the data programmatically.