Machine learning and computer vision terms: All you need to know

Whether you’re just starting your journey in machine learning or already working on a related project, machine learning vocabulary can be quite confusing or difficult to comprehend. We know the struggle, and because we want to make machine learning and computer vision more accessible to individuals and businesses, we’ve come up with a so-to-say machine learning terminology cheat sheet that also includes common computer vision terms. Just enough to make you feel comfortable to discuss ML projects, yet not too much to overwhelm you with definitions. Below we'll cover:

Basic machine learning terms you can’t do without
Terminology of machine learning algorithms
Computer vision key terms
Final thoughts

Basic machine learning terms you can’t do without

Let’s start with the basics. If you’re new to machine learning and the terms deep learning, neural networks make no sense to you, this is the best place to start. This table of machine learning terms and definitions below will significantly simplify your navigation through the unknown.

Artificial intelligence

Though there’s no official definition for artificial intelligence (AI), it is often described as the intelligence demonstrated by machines. As a vast subfield of computer science, AI aims at building smart machines capable of performing tasks that typically require human intelligence.

Machine learning

Think of machine learning as of a computer code that does statistical analysis over an enormous amount of data, looks for patterns and suggests a better understanding of that data in the future.

Training data

This is the collection of information that’s going to be used for model training. For example, if you’re building a face detection system, your training data will probably consist of various images with annotated faces of different shapes, sizes, lighting conditions, etc. Clear and consistent training data is the most essential part of building a well-performing AI model.

Batch

The set of training examples is used in one iteration of model training. There are 3 types of gradient descent algorithms based on the batch size: stochastic gradient descent, batch gradient descent algorithm, mini-batch gradient descent.

Accuracy

Accuracy expresses how precise the machine learning (ML) model is based on the percentage of correct predictions. The number of correct predictions divided by the number of total predictions allows calculating the accuracy of the model.

Bias

Bias is stereotyping, prejudice, or preference towards particular items (data points) over others. In ML, bias is considered a systematic error that occurs in the training set or ML model when the algorithm outcome is distorted in favor of or against a certain idea. Bias impacts data collection and interpretation, system design, and users' engagement with a system.

Noise

Mislabeled data points, misrecorded or omitted feature values are all examples of noise. Essentially, anything that interferes with a clean and consistent dataset is considered noise.

Epoch

The time period marked by feeding your entire training data set to the model is known as an epoch.

Edge cases

Edge cases represent the occurrence of the unexpected or unpredictable behavior of an AI model or cases when the model doesn’t perform well on the outliers in data.

Deep learning

Some refer to deep learning as merely a type of algorithm, but in reality, it’s a subfield of ML that operates with artificial neural networks. Multiple layers of deep learning processing are used to extract higher-level features from data. Deep learning is especially useful for image classification and object detection tasks.

Artificial neural network

A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature.

Activation function

Artificial neural networks operate with activation functions. These take in the weighted sum of all of the inputs from the previous layer and then generate and pass an output value to the next layer.

Convolutional neural network

Convolutional neural networks are by far the most popular neural networks for computer vision and image analysis tasks due to their ability to extract features and detect patterns via hidden convolutional layers within the network.

Recurrent neural network

Recurrent neural networks (RNNs) are particularly popular for evaluating sequences because of their ability to run multiple times, so the hidden layers learn from previous runs of the neural network on earlier parts of the sequence. In other words, instead of moving the data only forward, RNNs memorize the output of one layer and “feed” it to neurons of other layers.

Terminology of machine learning algorithms

When talking about algorithms, we firstly discuss the three main categories: supervised, unsupervised, and reinforcement learning. The terminology in machine learning algorithms goes much further, as one would expect. Still, for the purposes of this article, we’ve covered the necessary foundations.

Supervised learning

Supervised learning is an approach to creating AI, where a computer algorithm is trained on input data that has been labeled for a particular output. In other words, there is a “supervisor,” e.g., data annotator, who labels the training data points for future deployment.

Unsupervised learning

Contrary to supervised learning, unsupervised learning does not involve human-suggested labels. It discovers the underlying structure or patterns among the data points by means of finding similarities or differences in information and clustering it.

Reinforcement learning

Reinforcement learning is a type of algorithm that continuously mines feedback from previous iterations, learns on trial and error, and is led by the action-reward principle. In games, reinforcement learning algorithms are often used to analyze historical data and discover sequences that eventually lead either to victory or defeat.

Active learning

Active learning represents a training algorithm that interactively queries the information source to label fresh data points with the intended outputs. The active learning algorithm randomly chooses the data points it learns from, which is particularly valuable when labeled examples are scarce or expensive to obtain.

To learn more about how active learning functions, check out our articles about active learning.

Active learning for semantic segmentation
Active learning for classification models
Active learning for object detection and human pose estimation

Classification

Classification is a supervised learning technique that aims to categorize the target variables. For instance, detecting whether an email is a spam or not is a classification task. It is also called a binary classification since the target variable has only two possible values, spam or not. If the target variable contains more than two values (i.e., classes), it is known as multi-class classification.

Regression

Regression is a supervised learning approach with continuous target variables. In regression tasks, we evaluate the performance of machine learning algorithms based on how close the predicted values are to the actual values.

Clustering

Clustering allows for grouping data points based on their similarity in one cluster. Unlike classification, the data points in clustering do not have labels, hence it’s an unsupervised learning technique.

Computer vision key terms

Now that we more or less discovered machine learning key terms let’s take a deeper look into computer vision vocabulary and have them explained. What is computer vision? To put it simply, it’s the ability of computers to “see” things and be able to transfer that information to other systems. Not only is computer vision a huge passion of ours, but it’s also our day-to-day business, so if you’re interested in the topic and wish to learn more about the field, our blog is definitely worth checking.

Computer vision tasks

Segmentation

A type of digital image processing which aims at grouping similar regions or segments of an image under appropriate class labels.

Semantic segmentation

Semantic segmentation classifies and labels images on a pixel level. This involves detecting objects within an image and grouping them based on defined categories. As opposed to instance segmentation, all pixels under a particular class hold the same pixel value in semantic segmentation.

Instance segmentation

Similar to semantic segmentation, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. This takes semantic segmentation one step further and involves detecting objects within defined categories.

Panoptic segmentation

Unifying semantic and instance segmentations panoptic segmentation enables having a wholesome view of image segmentation both category-wise as well as instance-wise.

Object detection

As the name suggests, object detection is the ability to find or identify specific objects in any given image.

Image classification

Image classification basically means identifying what class the object belongs to. Typically the system is trained to choose from given classes.

Optical character recognition

Also known as OCR and commonly used to recognize text in scanned documents and images, optical character recognition is the process of identifying printed characters in digital images.

machine learning and computer vision terms

Annotation

The first and most essential step of any computer vision project is data annotation. In order for computers to recognize and detect items of interest on digital images, we should firstly provide a clear and consistent way of training and learning. We do annotation, and we do it well.

Class

A class is a label that gives information about the instance. To illustrate, when annotating images of apples, we would annotate the apples and assign each of them to the class “Apple.”

Attribute group/Attribute

Attribute groups complement the class with single attributes that provide information on class details. Again, if the class is “Apple,” the attribute group can be “Color” with “yellow,” “green,” and “red” attributes.

Bounding box

A bounding box is a rectangular area around an object on a digital image, that’s typically described by its (x, y) coordinates around an area of interest.

Polygon

This one is a usually non-rectangular shape outlining the object of interest and allowing more precision than a regular bounding box.

Polyline

Polylines are essentially small lines connected at vertices to define linear structures and trace the shape of structures, like roads, rail tracks, and pipelines.

Final thoughts

In this article, we gave a short introduction to key machine learning terminology that lies in the foundation of most tech-related talks nowadays. We hope you were able to enrich your AI vocabulary and learn new concepts that will prove to be useful in your future initiatives. There’s always more to learn, so we’ll leave you with this data science glossary if you need to dig deeper. Other than that, we readily invite you to save our blog for further research in machine learning or computer vision-related topics.