When it comes to the global trend nowadays - artificial intelligence and machine learning, the first thing we care about is data. A machine learning model's life starts with data and ends with the deployed model, and turns out that high-quality training data is the backbone of a well-performing model.
Through this article, we'll examine what carries the core responsibility for ready-to-train data, also known as data annotation.
Data annotation is the action of adding meaningful and informative tags to a dataset, making it easier for machine learning algorithms to understand and process the data. Previously, data annotation was not as crucial as it is now for the reason that data scientists were using structured data which did not require many annotations. During the last 5-10 years, data annotation became more critical for machine learning systems so they can work effectively.
Without it, machine learning algorithms would be lost in a sea of unstructured data, struggling to distinguish one piece of information from another. Note that unstructured data makes up a big portion of data in the world – like emails, social media posts, image and audio data, text, sensor data, etc. – thus making the role of data annotation exceptionally important. We can make a bold statement and call data annotation an ingredient in the data processing cycle one can't afford to avoid. With the growing value of AI and machine learning and the exponentially growing amounts of data in the world, data annotation has become even more essential for businesses and organizations to stay competitive.
Whether you are new to data annotation or a seasoned professional, this article will provide valuable insights into the world of data annotation and help you stay on top of its latest trends.
By the end of this article, you'll learn about
- Image annotation
- Video annotation
- Text annotation
- LiDAR annotation
- Audio annotation
- Other annotation types
Let's start our data annotation journey with one of the most widely used processes in computer vision, image annotation. Image annotation is the action of tagging digital images with metadata or any additional information that helps to identify and understand the visual content.
Breakdown of image annotation and its importance
You can refer to image annotation as the process of making an image easier to find. The best way to achieve this is by giving the image some sort of description, otherwise referred to as annotation. By giving the annotated, structured image dataset to our machine learning models, we allow them to train and deliver the desired results (this depends on the quality of the training data).
With the advent of computer vision and machine learning, image data annotation has become an essential ingredient of data annotation for many applications, including autonomous vehicles, agricultural automation systems, medical imaging, and surveillance systems.
Image annotation tools and use cases
We've touched upon the widespread usage of image data annotation across multiple industries, now it's time to take a closer look at each specific detail and practical application of the tool.
Image classification (or tagging): Image classification is a fundamental data annotation process that involves assigning one or more labels to an entire image. With image classification, you aim to automatically identify the content of an image and categorize it accordingly. Say you're a farmer who wants to analyze crop health. By classifying your training data's crop images, the algorithms can detect early signs of disease or stress, enabling you to take preventative measures and increase your crop yields.
Object detection: Object detection is the action of identifying and localizing objects within an image or a video. It is often confused with image classification, but there is a distinct difference between these two. Image classification refers to categorizing an entire image into one class. Whereas object detection localizes and categorizes objects in an image and assigns tags to each object.
A famous tool for object detection is bounding boxes. Bounding boxes are rectangles that surround an object of interest in an image with the purpose of providing a visual representation of the object's location, like locating pedestrians for autonomous vehicles, identifying people and objects in security camera footage, etc. Its technique is remarkable for its simplicity - it simply doesn't require a complex machine learning algorithm to train.
Image captioning (free text description): Image transcription is the process of extracting information from images. It's like making descriptive stories from images and keeping them in the form of textual annotated data. You need to give the tool images and data annotation requirements of the deliverable, and the tool will return the images together with the transcribed information.
Optical character recognition: Optical character recognition (OCR) is a technology that allows computers to read and recognize text from scanned images or documents. This involves, for example, drawing bounding boxes around each line or block of text, which can be used to train OCR algorithms to recognize and extract the text accurately. Let's note that OCR's technology has revolutionized the way we interact with printed and handwritten text. How? It has enabled us to digitalize and preserve historical documents, automate data entry processes, and even enabled accessibility for people with visual impairments. OCR opportunities are diverse and endless, leaving us impatient to witness its future advancement.
Pose estimation (keypoint annotation): Pose estimation is the process of estimating the 2D or 3D coordinates of a human body in a given image or video. It involves detecting and tracking key points on the body and then using the information to determine the position and orientation of the body in 3D space. Key points usually correspond to joints, such as the shoulders, elbows, wrists, hips, knees, or other body parts. Applications of pose estimation in human health are widely recognized to analyze the movement of patients with neurological disorders, such as Parkinson's disease, strokes, and many other cases. It can track a patient's movement and analyze it in real-time, allowing for objective measurements of progress over time.
Instance segmentation (polygon annotation): Instance segmentation is a computer vision task that involves identifying and localizing each individual object instance within an image or video and assigning a unique label to each instance (polygon points). Imagine this as a more advanced form of object detection that not only identifies the box coordinates of the objects but also the exact pixel locations of the object.
Polygons is a famous tool for instance segmentation that creates ground truth data. It traces the outline of each object using a set of connected vertices, which define the shape and location of the object. The user typically selects a set of points along the boundary of the object using a polygon tool and then creates the annotated data. It usually takes a lot of time to trace the object which makes it very expensive to create large amounts of polygon datasets. To decrease the annotation time of these tedious tasks, researchers created several AI-assisted algorithms that help users create pixel-precise masks with a few button clicks. SuperAnnotate also integrates state-of-the-art algorithms providing the most accurate and intuitive tools for polygon annotation.
Semantic segmentation: In computer vision, semantic segmentation is referred the AI or ML model which classifies each pixel in the image based on the predefined classes. Semantic annotation is the process of classifying each pixel and is used in many fields such as autonomous driving, retail, and fashion. Classifying each pixel with semantic annotation is probably the most tedious annotation work which makes it really hard for a data scientist to create well-performing semantic segmentation algorithms. To ease the pain of semantic annotation, companies like Segment and SuperAnnotate created the SuperPixel based approach which can accelerate the annotation process by several factors. The SuperPixel-based semantic annotation can be seen below.
Panoptic segmentation: Panoptic Segmentation combines semantic segmentation and instance segmentation into one algorithm. As a result, to annotate an image for panoptic segmentation one needs to use both techniques for semantic annotation and polygon annotation.
Other use cases include rotated box annotation, lane annotation, cuboids, etc. There are many other data annotation practices that are often times used in niche markets. For example, rotated box annotation is quite similar to bounding box annotation, where one needs to specify the rotation angle in the bounding box. Cuboids are also very similar to bounding boxes in their concept, except they capture an object's depth in a 3d format. On the other hand, bounding boxes and lane annotations are very similar to polygon annotations. Such techniques are primarily used in the autonomous driving industry.
Image annotation with SuperAnnotate
SuperAnnotate offers a comprehensive set of tools for accurate and efficient image annotation for all the annotation tasks described above. The platform provides a wide range of user-friendly tools that make it easy to create accurate and precise data annotations. It also offers customization options, allowing users to create their own annotation templates and workflows. With the built-in quality control mechanisms as well as AI-assisted tools, SuperAnnotate makes sure that the annotations are up to their highest standards. SuperAnnotate also provides secure and private data storage, guaranteeing that your data is safe and confidential.
Take a look at this annotation demo video with SuperAnnotate's platform:
Next in the list of commonly used data annotation types is video annotation. To summarize, video annotation is the action of detecting and classifying objects or actions within a video, which is also considered a more complex version of image annotation.
Introduction to video annotation and its importance
Since video data makes up a significant portion of media content, you can already guess the importance of its annotation practice. Let's dive deeper into the world of video annotation.
Video annotation tools and use cases
Video classification (or tagging): Video classification is the process of analyzing and categorizing video content into predefined classes or categories. In internet content moderation, video classification plays an important role in identifying and filtering out inappropriate, offensive, or harmful content, making sure that users have a safe and positive experience.
Video captioning (free text description): Similar to image captioning, video captioning deals with extracting story and knowledge from video data and maintains the deliverable in textual form.
Video event or action detection: Video event or action detection is widely implemented in activity recognition and classification in sports videos, drawing a lot of attention from computer vision industry experts. Common applications range from the classification of different actions in sports videos such as a basketball player dribbling the ball or shooting a three-pointer, to performance analysis, athlete recruitment, fan engagement, and much more. Event detection is also widely used as an active learning step in video surveillance applications. In such applications, the events occur rarely, and finding potential annotation frames can be done by event annotation.
Video object detection and tracking: Object detection in videos is the task of identifying the presence of an object in video frame sequences. Object tracking is monitoring an object's movement during a video sequence, including its presence, location, shape, size, etc. Here are some data annotation tools which are efficient for video tracking.
1. Tracking with bounding boxes is a fundamental technique in computer vision that involves detecting, localizing, and tracking objects within a sequence of video frames. Think about how vehicles can be detected and tracked in a traffic video. The process involves drawing bounding boxes around each vehicle and then generating a unique ID that can be used to track the same vehicle in the upcoming frames. Box tracking techniques are widely applied in industries such as autonomous driving, video surveillance, sports analytics, etc.
2. Polygon tracking, also known as video object segmentation, is similar to box tracking annotation but it tracks the exact object boundaries with precise polygons. The annotation is generally much more complex and can take longer if advanced automation tools are not used.
3. Keypoint annotation is used when the object shape itself is not our main concern, but we want to identify critical points within the shape of the object, track those points and know how they move or change their position. This technique is famous in human motion analysis, particularly in sport analytics applications.
Video annotation in SuperAnnotate
There are several factors that one needs to consider when choosing a video annotation tool. For example - for action detection and video captioning, smooth video playback in different speed options is very critical for an efficient annotation. On the contrary, when dealing with object tracking use cases with bounding boxes, polygons, or points supporting frame-based annotation and interpolation is becoming a key to speed up the annotation process. In more advanced use cases, AI-assisted labeling tools based on optical flow, video object tracking, or segmentation are becoming more essential to speed up annotation. SuperAnnotate's video annotation platform is made to speed up the annotation of all these use cases. Additionally, we created several tools to provide efficient collaboration and error detection, making the quality assurance process as important as the annotation process. Here is a small video snippet that demonstrates what the tool looks like.
It's time to learn about the language processing superhero in data annotation. Text annotation is the action of adding extra information to a text with the aim of helping machines understand human language. With text data annotation, machines are able to understand concepts and relationships within texts even if they're in an unclear form or language. Think of it as giving machines magic glasses to see through the complexity of human language.
Introduction to text annotation and its importance
Data annotation in text space is becoming more important than ever, especially with new applications created by ChatGPT or other large language models (LLMs). However, before LLM use cases became popular, text annotation was still playing an integral role in extracting relevant data from various sources of text. In natural language processing (NLP), text annotation tasks are used for applications such as sentiment analysis, entity recognition, translation, and many more.
Text annotation tools and use cases
Text classification: Text classification is one of the most foundational tasks of NLP. Text classification algorithms analyze and recognize patterns within the text and accurately assign the to appropriate categories. These algorithms are instrumental in a wide array of applications, such as sentiment analysis, spam filtering, topic detection, and document organization.
Language translation: The name itself is pretty self-explanatory. Language translation is about using machine learning models to understand text data and translate them to another language. The process of using artificial neural network to predict the likelihood of a sequence of words in machine translation is called neural machine translation (NMT). One of the most vivid applications of NMT is in communications, where AI translation can facilitate multilingual communication between individuals and groups that speak different languages. This can be particularly useful for businesses that operate globally, or for international conferences and events.
Named entity recognition (entity annotation): Named entity recognition is another text annotation technique that is used for unstructured data annotation. NER involves identifying and annotating data of named entities with specific categories. You can understand entities under the same category as words or phrases that explain similar concepts or mean the same thing. Take the sentence
"SuperAnnotate was ranked as the best data annotation platform in G2".
In this sentence, we can extract multiple entities: SuperAnnotate and G2 fall under "company" category, and "data annotation platform" belongs to a "product" entity. Scale this example and you will have a proper understanding of entity annotation!
Coreference resolution (relationship annotation): Coreference resolution or relationship annotation is a text annotation task that identifies all phrases that refer to the same exact entity. To not confuse it with entity annotation, let's jump straight to a similar example.
"SuperAnnotate was ranked as the best data annotation platform in G2. The company received 92 reviews with 4.9/5 score in the world's largest software marketplace."
In this case, "the company" in the second sentence refers to "SuperAnnotate", and "world's largest software marketplace" refers to "G2". With a text annotation tool, the annotated data will look the following way:
Intent annotation: Intent annotation can be considered a subset of text classification but instead of predefined classes, one needs to classify based on the intent of the conversation's response (for example what your customers really want). Intent annotation is the ingredient for understanding the true purpose of text messages. By annotating each message with a specific intent category, such as "Booking Request" or "Complaint", you can unlock powerful insights into your customer's true needs and preferences.
Text annotation with SuperAnnotate
SuperAnnotate's text annotation tools are designed to be intuitive and easy to use, with features like keyboard shortcuts, auto-save, and collaborative annotation capabilities. Users can customize the annotation interface to annotate data based on their specific needs, and adjust settings such as font size, background color, and annotation type.
As we covered image and video annotation, next comes LiDAR, a fancy abbreviation of light detection and ranging. Lidar is a remote sensing technology that uses laser pulses to measure distances between objects. LiDAR annotation has changed the game of data annotation and we're going to show you how.
What is LiDAR data and why annotate it?
The annotation techniques that we already discussed mostly covered detecting data in 2D space. However, let's not overlook the fact that we need a tool to calculate 3D information such as depth, the distance between objects, the reflectivity of the objects, and other cases where 2D techniques lack efficiency. LiDAR annotation addresses such issues and we're about to find out how.
To understand the most common application of LiDAR data annotation, let's first learn a new terminology: sensor fusion. Sensor fusion is the process of data collection from multiple sensors to create a more accurate and comprehensive understanding of the environment. In fact, information from just one source tends to be more biased and incomplete compared to combined annotated data from different sources. LiDAR is great for detecting the distance and position of objects in 3D space, but it can't always provide the full picture. That's where images come in, providing additional details such as color and texture.
LiDAR gained its popularity mainly after the recent hype around autonomous vehicles. As self-driving cars become more and more prevalent, LiDAR annotation emerges as a key technology that enables them to safely navigate their surrounding. Let's discuss the case of fusing LiDAR and images to create a more robust and accurate perception of autonomous vehicles.
LiDAR annotated data can provide accurate distance measurements for detecting obstacles and identifying road features. However, LiDAR data annotators alone cannot provide detailed information about the color, texture, and appearance of objects. By combining annotated images with LiDAR data, autonomous vehicles can extract additional information, such as object color and texture to facilitate their understanding of the environment.
LiDAR annotation use cases
LiDAR segmentation: Existing for the past 10 years, LiDAR technology has very recently become a hot topic, especially in LiDAR autonomous driving, due to its ability to deliver detailed 3D information about a vehicle's surroundings. This information includes obstacles, their position, the velocity with respect to the vehicle, and other data which is crucial for a safe driving experience. LiDAR segmentation tries to predict a point and labels it based on predefined labels. Accurate LiDAR segmentation algorithms allow autonomous systems to correctly identify all the obstacles and the road in the street thus making driving safer.
Object detection: Object detection with 3D bounding boxes for LiDAR data is the process of identifying and classifying objects in the point cloud data. Object detection in LiDAR data can be done much easier than segmentation and is often used for detecting pedestrians and cars for autonomous driving companies.
Sometimes the LiDAR data is collected as a sequence of frames. In such cases, Object Tracking becomes an important part of LiDAR annotation. Interpolation, AI-assisted labeling, and automated tracking algorithms are becoming essential in cases when one wants to perform fast and accurate annotation on LiDAR data.
LiDAR annotation with SuperAnnotate: SuperAnnotate provides both LiDAR annotation software and LiDAR annotation services. Choosing a vendor for both your software and services is not an easy task since there is a tradeoff between quality, price, scalability, security, domain expertise, etc. To make things simpler, we summarized several components that you need to consider when choosing a LiDAR annotation partner in a previous article. A visual summary of these components can be found below.
Audio annotation has revolutionized the game of sound in the modern world. Every time we ask Shazam to find that sound we really like, ask Siri a question, or Spotify to recommend us a new song, we're using the benefits of audio data annotation. This technology allows us to categorize different types of sounds from voice assistants to wildlife monitoring.
Put on the headphones to crank up the volume and get into the world of "singing" data annotation.
Why annotate audio?
Audio data is generated every day, and without its annotation, large amounts of audio value would be lost. With audio annotations, we can train machines to recognize and categorize different types of audio data, from speech and music to ambient noise and animal sounds. With this powerful tool, we can unlock a world of possibilities for everything from speech-to-text transcription to music recommendation systems.
Audio annotation use cases
Audio classification: Audio classification is the data annotation process of classifying sound training data based on their characteristics. The objective of audio classification is to enable machines to identify and differentiate various types of audio, including music, speech, nature sounds, and many more. It's also widely used to classify music genres which helps companies like Spotify and others to recommend similar music based on its genre.
Audio transcription: Audio transcription is the simple process of converting spoken words of audio files into written text. A very useful practice of audio transcription is creating captions for audio and video materials such as interviews, films, or TV shows. Automating audio annotation is very essential for collecting high-quality training data. Whisper is a recent algorithm by OpenAI that helps transcribe audio files in different languages. The transcription is not always accurate when using such automated models, and to correct the initial model's predicted transcription one needs efficient audio annotation transcription tools. Luckily we have a great step-by-step tutorial on how to efficiently use Whisper prediction API and correct those predictions in SuperAnnotate’s audio annotation tool.
Large language models (LLM) annotation
The end of 2022 marked the birth of what everyone's talking about now -- ChatGPT and AI generated text. The wonders that GPT and other large language models are creating is due to a massive amount of labor done for annotation, and we're about to explore a few types of language models and their annotation procedures.
Back in 1997 Ramon Neco and Mikel Forcada suggested the “encoder-decoder” structure for machine translations, which became popular after 2016. Imagine translation is a text-to-text procedure where you need techniques to first encode the input sentence to vector space, and then decode it to the translated sentence. This is the very simplified logic of encoder-decoder models.
Let's discuss the example of translation from English to French. The encoder logic is described in the first part of the above image. It takes the input sentence and converts it to some numerical representation that captures the inherent structure and patterns of that sentence.
The encoded information then passes to the decoder, which captures the information encrypted in the encoded sentence and generates translated output in French. Of course, there are much more complex processes happening under the hood, but for the sake of simplicity, we kept everything in a few sentences and basic terms.
Data annotation process here includes training data of pairs of sentences in different languages. Each pair will consist of an input sentence(in English) and an output sentence(in French). The source sentence serves as an input for the encoder, and the target is the output of the decoder. This is just the case of translation, and depending on the task, the annotation process will differ.
Popular encoder-based models in NLP include recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and more recently, transformer models like BERT (Bidirectional Encoder Representations from Transformers). While encoder-decoder first went viral in translation, they later emerged to tasks like text classification, sentiment analysis and text generation from prompts.
The problem with traditional encoder-decoder architectures lay in their sequential nature and their difficulty in capturing long-range dependencies in language. In the case of translation, for example, capturing the relationship between the first word and the last word in a long sentence becomes challenging for RNN-based models. This is what lead to the birth of transformers.
Transformer-based models were developed in 2017 by researchers in Google and came as a replacement for recurrent neural networks to cover up the areas that RNNs failed to succeed. They addressed such issues by introducing self-attention, enabling parallel processing and improving context understanding. The attention layer has access to all the previous states and weighs them according to a learned measure of relevance, providing relevant information about far-away tokens.
The end of 2017 marked the culmination of the recurrent networks era and models were already fully based on self-attention.
Let's break down the transformers lifecycle:
- It all starts with basic language model training, which takes the majority of time for building transformer, since the model is being trained on a huge amount of text data.
- After you have a good language model, it's time to fine-tune it. This involved training on a task-specific dataset with annotated training data.
Pre-training is typically done on a larger dataset than fine-tuning, due to the limited availability of labeled training data.
Reinforcement learning from human feedback (RLHF)
Reinforcement learning from human feedback (RLHF) is the practice of using human feedback and preferences in reinforcement learning tasks in order to optimize language models. By this, we aim to create a system that's able to quantify our preferences by assigning numerical rewards to language models' actions and trajectories. ChatGPT has been the greatest success of RLHF and takes responsibility for the current viral interest in RLHF; let's see what role data annotation takes in RLHF.
RLHF consists of the following phases:
- Pre-training a language model (LM)
- Training a reward model
- Fine-tuning the LM with RL
The data annotation part is mainly involved in the second, training a reward model stage. Here, human annotators are ranking the results of LM, giving feedback in the simple form of yes/no approval; i.e. the language model comes up with responses and the human gives an opinion on which response of the agent is good enough to "deserve" a reward. It's important to note that the human annotation rewards have to be scalar so that our preferences are represented numerically.
Other types of data annotation
We already discussed the fundamental types of data annotation, but there are a few more that we shouldn't omit due to their widespread use in different industries. Let's explore a few of those data annotation methods.
A lot of documents are kept in PDF format, making PDF annotation a necessity in financial, legal, and governmental organizations for digitalization purposes. PDF annotation is the action of adding notes, comments, or other metadata to a PDF document to provide additional information or feedback.
Website annotation is the process of adding notes or comments on a live website page, as well as classifying different websites based on predefined classes. It is often needed for content moderation for multiple purposes such as finding out whether the website is safe or not, or whether it contains any nudity, hate speech, etc.
Time series annotation
Time series data annotation involves annotating data that changes over time, such as sensor readings, stock prices, and ECG data. It is often used to predict abnormal activities and anomalies and the annotation tools help to identify and localize those events in the Times series data.
Medical data annotation
Medical data annotation involves annotating various medical images and records, such as X-rays, CT scans, and patient records. With relevant information, it becomes easier to develop accurate machine learning models for medical diagnosis and treatment.
Annotating any other data types with SuperAnnotate
With SuperAnnotate, you can bring your own data format and build a custom annotation editor that is best suited for your annotation needs. Our robust project management and data management toolset will be attached to the annotation editor that you will build and in turn enable the creation of high-quality training data at scale. As a part of the custom editor, we already released an HTML editor, a PDF annotation editor, and a Website annotation editor. You can read more about these editors in our documentation.
In conclusion, data annotation plays an essential role in the success of supervised machine learning models as it provides accurately labeled datasets, and serves as the foundation for training data infrastructure. By employing robust annotation techniques for images, video, text, and audio files, machine learning engineers can ensure that their models effectively learn from high-quality annotations. These annotations become the new oil for companies trying to reach AI supremacy. We, at SuperAnnotate, are happy to fuel these companies to advance their AI capabilities through our platform and integrated services.