CVAT Resources

Explore our library of data annotation resources—from CVAT technical docs and release notes to case studies and video lessons.

Blog

Announcing the new Ultralytics YOLO support for automatic annotation via CVAT agents. Powerful computer vision libraries such as Ultralytics YOLO, Detectron2, and MMDetection have made it easier to train high-performing models for a wide variety of tasks. However, using these models for automated annotation often requires custom code, format conversions, and one-off integrations, especially when labeling workflows span multiple tasks. As a result, many teams fall back on manual labeling because they find automation too complex to adopt at scale. Ultralytics YOLO is one of the most widely used model families in the computer vision community. Until now, CVAT included a single built-in YOLO model for auto-annotation, but expanding beyond that required manual setup. That's why we're excited to announce our new integration with Ultralytics YOLO via the CVAT AI annotation agent. Introducing the new Ultralytics YOLO and CVAT integration With this new integration, you can use native Ultralytics models (YOLOv5, YOLOv8, YOLO11) and third-party YOLO models with Ultralytics compatibility (YOLOv7, YOLOv10, etc.) for automatic image or video annotation for a wide range of computer vision tasks, including: Classification Object detection Instance segmentation Oriented object detection Pose estimation Just pick a YOLO model you want to label your dataset with, connect it to CVAT via the agent, run the agent, and get fully labeled frames or even entire datasets, complete with the right shapes and attributes, and all, in a fraction of the time. Annotation possibilities unlocked This integration opens up multiple workflow optimization and automation opportunities for ML and AI teams. Here are just a few. Pre-label data using the right model for the task Connect the YOLO models that match your annotation goals and run them sequentially to pre-label your data. Each model can be triggered individually through the CVAT interface, allowing you to generate different types of labels for the same dataset without custom scripts or external tools. This works for any YOLO model, out-of-the-box or fine-tuned. Label entire tasks in bulk Working with a large dataset? You don’t have to annotate each frame manually. Apply a YOLO model to the entire task in one step. Just open the Actions menu in your task and select Automatic annotation. CVAT will send the job to the agent and automatically annotate all frames across all jobs in a task, saving you time and reducing repetitive work. Share models across teams and projects Register a model once via a native function and agent, and make it instantly available across your organization in CVAT. Team members can use it in their own tasks without any local setup. Validate model performance on real data Test your fine-tuned YOLO model directly on annotated datasets and compare its predictions side-by-side with human labels in CVAT. Spot mismatches, edge cases, or underperforming classes, all without leaving your annotation environment. How it works Here’s what a typical YOLO auto-annotation setup via agents looks like: Step 1. Write and register the function Start by implementing a native function–a Python script that loads your YOLO model (e.g., yolov8n, yolo11m-seg) and defines how predictions will be generated and returned to CVAT. Then register this function in CVAT using the CLI. Note: You can reuse the same native function both in CLI-based annotation and agent-based mode. Step 2. Start the agent Once the function is registered, launch an agent using the CLI command. This starts a local service that automatically connects to your account in CVAT Online or Enterprise, and listens for annotation requests from CVAT. The agent then runs the model (inside your function), generates predictions, and sends them back to CVAT. For more in-depth information about how to set up automated data annotation with a YOLO or any custom model using a CVAT AI agent, read this article. Step 3. Create or select a task in CVAT Log into your CVAT instance and create a new task (or select an existing one). Upload your images or video, and define the labels you want to annotate (e.g., "person", "car", "helmet"). Depending on your use case, you can define different types of labels such as bounding boxes, polygons, or skeletons to match the expected output from your model. Step 4. Choose the model in the UI Once your task and the job are created and the agent is running, go to the AI Tools panel inside your job. Select the Detector tab and the YOLO model you registered earlier. Step 5. Run AI annotation on selected frames After selecting the model, CVAT sends a request to the running agent. The agent runs the model and returns predictions in the form of shapes (e.g., boxes, polygons, or keypoints), each associated with a label ID. Get started now Ready to speed up your annotation workflow with YOLO? Sign in to your CVAT Online account and try it out yourself. For more information about Ultralytics YOLO models and the tasks they support, check the Ultralytics documentation page. For more information about CVAT AI annotation agents, visit Announcing CVAT AI Agents: A New (and Better) Way to Automate Data Annotation using Your Own Models

Product Updates

October 23, 2025

CVAT Integrates Ultralytics YOLO Models, Unlocking Scalable Auto-Annotation for ML Teams

Blog

Industry Insights & Reviews

October 6, 2025

Blog

Data labeling is the process of assigning meaningful tags or annotations to raw data, such as images, text, audio, or videos, to make it usable for training AI and machine learning models. It can be done manually by human annotators or automatically using pretrained models integrated in data annotation tools. As AI adoption grows, so does the demand for large, high-quality labeled datasets. However, manual labeling is often slow and resource-intensive. Automated and hybrid approaches address this challenge by accelerating the annotation process and enabling organizations to stay within timelines and budgets.The goal of automated labeling isn’t to replace humans but to accelerate workflows and reduce costs by automating the most repetitive or straightforward tasks. This allows annotators to dedicate more time and effort to tricky or edge cases, leaving the routine work to algorithms.‍How Manual Labeling and Automated Data Labeling WorkAutomated data labeling services utilize machine learning models trained on previously labeled datasets. Once trained, these models can be seamlessly integrated into data labeling platforms like CVAT, automating most of the annotation workload.While automation offers speed and scalability, the most effective strategy often combines it with human oversight — a method known as "human-in-the-loop." This hybrid approach strikes a balance between accuracy and relatively low cost, making it well-suited for real-world applications.That said, not all scenarios are the same. Depending on the project requirements, data characteristics, and resource availability, there are three main approaches to data labeling: manual, automated, and hybrid.Manual Labeling: Human annotators manually review and assign labels to each data point.Automated Labeling: Software tools or algorithms automate the labeling process, eliminating the need for human intervention.Hybrid Approach: Combines manual and automated labeling methods: human annotators label a subset of data to create a high-quality, relatively small training dataset, which automated methods then use to extend labeling to larger datasets.Manual Data LabelingIn this approach, human annotators manually review and assign labels to each data point, ensuring high accuracy and quality through careful judgment and attention to detail. This method is ideal when working with novel or sensitive data, edge cases, or tasks that require specialized domain expertise. It is especially suited for projects where ground truth accuracy is paramount. This approach is commonly used in domains where precision is critical, such as medical imaging, autonomous driving, aerospace, and other fields where even minor errors can have serious consequences.‍Automated Data Labeling: ML, Active Learning, Programmatic Some tools and techniques allow data to be labeled automatically by machines, with little to no human involvement. Here are some examples:Machine Learning & Deep Learning Pre-Trained Models: Used exclusively for predictive labeling based on learned patterns without human validation.Active Learning (Automated Variant): In semi-automated setups, the model auto-labels data when it's highly confident, reducing the need for constant human input. However, for uncertain cases, human help is still essential, and therefore, this approach is on the fence, balancing between automated and hybrid techniques.Programmatic Labeling: Uses rule-based labeling logic implemented through scripts to handle clear-cut annotation tasks systematically. While by itself it is designed to work fully automatically, there are some concerns regarding this approach, and humans still intervene primarily in ambiguous cases or edge scenarios to ensure labeling quality.These methods are effective for large-scale, repetitive tasks where patterns are clear and confidence is high, such as in e-commerce, content moderation, industrial inspection, and others.‍Human-in-the-Loop: Hybrid Data Annotation ApproachHuman-in-the-loop (HITL) annotation combines the strengths of both automated and manual labeling. The process starts with a small, manually labeled dataset, which is used to train an initial machine learning model. The amount of manual data required depends on your specific task and the complexity of your dataset; it often takes some experimentation to determine the optimal volume.Once trained, the model begins labeling new, unlabeled data. These labels are then reviewed by human annotators, who identify and correct any mistakes.The corrections made by humans are then given back to the model and, therefore, used to refine and improve it further. As the cycle repeats, the model becomes increasingly accurate and reliable, enabling automation to assume a larger share of the labeling work over time.Human-in-the-loop annotation is particularly useful in domains where automation can accelerate labeling but human judgment remains essential, such as medical diagnostics, financial document processing, and automotive systems. It strikes a balance between efficiency and accuracy, making it ideal for evolving datasets or complex tasks where fully automated methods fall short.Manual vs. Automated vs. Hybrid: When to Choose EachUnderstanding when to select manual or automated labeling depends on several factors, including data complexity, scale, and the desired level of accuracy.‍‍So, how do you know which approach is right for the job?Use Manual Labeling When:You're working with new or sensitive data that contains many edge cases requiring special attention. Human intervention is critical in your case, and all data must be manually checked.The task is subjective or requires specialized domain expertise, and no suitable model is available.Ground truth accuracy is critical, for example, in safety-critical applications where lives may depend on the outcome.Use Auto Labeling When:You're dealing with large volumes of data that follow consistent and repetitive patterns.You have a reliable, pre-trained, or previously developed model, and its performance is good.The goal is to maximize efficiency and output with minimal human involvement.Use Hybrid Labeling When:You want to achieve a balance between accuracy, scalability, and costs.You can afford to invest time and resources upfront to create a high-quality labeled dataset that supports model training.The dataset is diverse—some parts are repetitive and straightforward, while others contain edge cases or require nuanced judgment.You plan to improve model performance continuously through iterative human feedback.‍Auto-Labeling Annotation TechniquesIn fully or partially automated data labeling pipelines, various strategies exist to minimize manual effort while maintaining high-quality labels. The choice of technique depends on the level of model maturity, data complexity, and project constraints.‍Pretrained ModelsPretrained models are based on large, diverse datasets and are capable of delivering high-quality labels out of the box or with minimal fine-tuning. Examples like Meta's Segment Anything Model 2 (SAM 2) are particularly valuable for image segmentation.Annotating an image with SAM 2 in CVATBenefits:Delivers high-quality annotation with minimal setup.Useful in domains where labeled data is scarce or expensive.Accelerates annotation significantly when well-matched to the domain.Challenges:May require fine-tuning or engineering for domain-specific tasks.Performance may degrade in niche applications or where the model’s training data lacks relevant examples.Use case: In radiology or pathology, pretrained models can help segment organs or anomalies in scans and x-rays. ‍Active LearningActive Learning is a machine learning approach where a model identifies the most informative or uncertain data points and asks a human annotator to label them. The idea is to improve model performance efficiently by prioritizing which examples to label, rather than labeling a random sample. Benefits:Let human annotators prioritize labeling the most uncertain or valuable samples, handling all routine work.Improves model performance with fewer labeled examples.Reduces overall human labeling effort.Challenges:Without human review, confidence thresholds alone may not prevent propagation of errors.Can mislabel edge cases if the model’s uncertainty estimation is poor.Use Case: Training an object detection model for self-driving cars by automatically selecting and labeling frames where the model is least confident, therefore reducing the need to label every frame manually.‍Programmatic LabelingProgrammatic labeling utilizes rules or scripts to assign labels to data automatically. It works best for straightforward cases where the logic is clear (e.g., keywords, patterns, etc.). While it's mostly automated, humans may still intervene to handle edge cases or review uncertain results to maintain high quality.Benefits:Speeds up labeling for repetitive or clear-cut tasks.Scales easily with large datasets.Reduces the need for manual annotation in well-defined scenarios.Challenges:Only works well when the labeling logic is consistent and straightforward.Struggles with ambiguous, messy, or context-heavy data.May need human oversight to handle exceptions or improve accuracy.Use case: A system labels emails as “spam” or “not spam” using simple rules, such as checking for specific phrases ("win money", "free offer"), suspicious domains, or formatting patterns. This labeling is done automatically, but human reviewers may intervene to correct mistakes or update rules when spammers modify their tactics.‍CVAT and Automated LabelingCVAT offers a comprehensive range of automatic and hybrid labeling options, designed to meet different infrastructure needs, control levels, and user types. Below is an overview of the four primary automation methods supported in CVAT. Nuclio (CVAT Community and Enterprise)CVAT integrates with Nuclio, a serverless framework for running machine learning models as functions. This framework is available in CVAT Community and Enterprise versions. How It Works:Requires Docker Compose setup with a specific metadata file.Models (e.g., YOLOv11 or SAM2) are wrapped as Nuclio functions using a metadata file and implementation code.Once deployed, models are added to CVAT's model registry for use in auto-annotation.Supported:Object detection (bounding boxes, masks, polygons)Tracking across framesRe-identification and interactive mask generationPros:Highly flexible; supports multiple model typesFully self-hosted and customizableCons:Requires some tech experience and Docker-level deploymentNot available in CVAT OnlineThird-Party Platforms Integration (CVAT Online and Enterprise)CVAT supports model integration from third-party platforms Hugging Face and Roboflow, enabling annotation using externally hosted models.How It Works:You can add models in CVAT -> Models PageHere is a helpful tutorial: Run the models to annotate your data, like in the video herePros:Easy integration if your models are already hosted on Hugging Face or RoboflowEasy to integrate and use, even for non-tech-savvy usersCons:Relies on third-party service availability and APIsSlower due to data being sent frame-by-frame to remote serversAuto-Annotation with CVAT CLI (CVAT Community)Using CVAT's Python SDK and CLI, you can implement and run custom auto-annotation functions locally.How It Works:Write the scriptRun the script locally with CVAT CLI to annotate a taskHere is a helpful tutorial showing how to do it step by stepPros:No server configuration neededIdeal for solo users and individual experimentsCons:Requires local execution and sufficient machine resourcesNo interactive annotation; everything is done through CLI Task-wide onlyCurrently limited to detection modelsAgent-Based Functions (CVAT Online and Enterprise)CVAT AI agents are a powerful and flexible way to integrate your custom models into the annotation workflow, available on both CVAT Online and CVAT Enterprise, v2.25 and above.How It Works:You start by creating a Python module ("native function") that wraps your model's logic using the CVAT SDK.Register the function with CVAT using the command-line interface (CLI). This only sends metadata (such as function names and labels), not the model code or weights.Then, run a CVAT AI agent that uses your native function to process annotation requests from the platform.When users request automatic annotation, the agent retrieves the task data, runs the model, and returns the results to CVAT.Pros:Custom Models: Use models tailored to your specific datasets and tasks.Collaborative: Share models across your organization without requiring users to install or run them locally.Flexible Deployment: Run agents anywhere: on local machines, servers, or cloud infrastructure.Scalable: Deploy multiple agents to handle concurrent annotation requests.Cons:Only detection-type models (bounding boxes, masks, keypoints) are supported.Requires tasks to be accessible to the agent's user account.‍ConclusionAutomated data labeling is a powerful tool in the machine learning (ML) workflow arsenal. Used wisely, it can reduce costs and expedite labeling projects without compromising quality. The key lies in understanding your data, your goals, and the capabilities of the automation tools at your disposal.Want to automate your annotation workflow with CVAT? Sign in or sign up to explore automation features in CVAT Online or contact us if you'd like to explore try CVAT's Enterprise automation features on your server.

Boost Your Annotation Accuracy with CVAT Immediate Job Feedback

Blog

Introducing Honeypots: Smarter Quality Checks, Smaller Validation SetsAt CVAT, we know how important quality is when it comes to labeling large volumes of data. A small mistake can have serious consequences, such as a misclassified traffic light impacting autonomous vehicle safety or mislabeled medical images impacting patient care. That's why validation is crucial to ensure annotations meet quality standards before training AI models.To help our customers maintain high labeling standards, CVAT already supports multiple validation approaches, including manual reviews and Ground Truth (GT) jobs. While both methods deliver excellent accuracy, they come with significant trade-offs: manual reviews require a lot of expert time, and GT jobs need carefully curated validation sets — making them expensive, time-consuming, and hard to scale for large volumes of data.That’s why we’re excited to introduce Honeypots — a powerful new addition to CVAT’s automated quality assurance workflow. Designed for scalability, honeypots make it possible to validate large datasets more efficiently and cost-effectively, especially when traditional methods become impractical.What are Honeypots?Honeypots are a smart way to monitor annotation quality without disrupting your team’s workflow. It works by randomly embedding extra validation frames—so-called “honeypots”—directly into your labeling tasks.With this validation mode, annotators don’t know which frames are being checked, so you can measure attentiveness, consistency, and accuracy in a completely natural and unobtrusive way. Why and When Use Honeypots?Quality assurance in data labeling traditionally requires significant resources — either extensive manual reviews or large validation sets. Honeypots offer a more scalable and efficient solution that maintains high standards while reducing overhead.Consider a medical imaging project with 10,000 images. Traditional validation might use 100 expert-annotated images (1% of the dataset) for quality checks—far from ideal when patient care is at stake. This is where Honeypots transform the validation process. Instead of using those 100 validated images just once, Honeypots let you embed them multiple times throughout your annotation pipeline, achieving 5-10% validation coverage without requiring additional expert input.By embedding a small, independent validation set randomly throughout your project, Honeypots' approach ensures that even the largest datasets stay under consistent quality control without requiring expensive and time-consuming Ground Truth validation for every batch. And, since Honeypots reuse the same validation set across multiple jobs, they scale automatically as your project grows, so you don’t need to generate new validation data for each task. This dramatically reduces both validation time and cost while maintaining high-quality standards.Combined with Immediate Feedback, this validation technique creates a highly reliable yet cost-effective validation system that scales effortlessly with your projects.How Honeypots WorkAs Honeypots are based on the idea of a validation set — a sample subset used to estimate the overall quality of a dataset—you’ll need to create a Honeypots validation set first when creating your task and select your validation frames. The whole validation set is available in a special Ground Truth job, which needs to be annotated separately.Note: It’s not possible to select new validation frames after the task is created, but it’s possible to exclude “bad” frames from validation. Honeypots are available only for image annotation tasks and aren’t supported for ordered sequences such as videos. Next, these validation frames are randomly mixed into regular annotation jobs. While your annotators work on the project, CVAT tracks how accurately annotators handle these ‘hidden’ honeypot frames. After the job is done, you can go to the analytics page to see all the errors and inconsistencies within the project. There, you get quality scores and error analyses for each job, the validation frame, and the whole task. No need for extra tools or manual comparison. Just label, review, and improve.For more detailed setup instructions, shape comparison notes, and analytics explanations, read our full guide.Try Automated QA and Honeypots TodayThe Honeypots QA mode is available out of the box in all CVAT versions, including the open-source edition. However, using it for quality analysis requires a paid subscription to CVAT Online or the Enterprise edition of CVAT On‑Premises.Ready to catch bad labels before they bite? Sign up or log in to your CVAT Online account to try honeypots today, or contact us to enable them on your self-hosted Enterprise instance.‍

Product Updates

May 6, 2025

Introducing Honeypots: Scalable Quality Checks, Smaller Validation Sets

Blog

We’re excited to announce one of our most requested features — Keyboard Shortcut Customization. This update puts you in full control of how you interact with CVAT, making your workflow more comfortable, personalized, and highly efficient.What’s NewWith this new update, you can:Customize a wide range of shortcuts: create your own button combinations both globally for the whole application and scope-limited for specific sections or workspaces.Never worry about setting up anything incorrectly: CVAT will automatically warn you about any detected shortcut conflict, helping you configure everything smoothly.Choose any button combination: customize to what feels right to you. Shortcuts can be any combination of modifiers (Ctrl, Shift, or Alt) and up to one non-modifier key (e.g., Ctrl+Shift+F1). Only a few browser-linked combinations are limited.Assign shortcuts for almost every action — from drawing and switching tools to managing playback and annotation, you have the freedom to set up shortcuts that fit your workflow.Why We Built ItUntil now, CVAT users worked with a fixed set of shortcuts. While functional, they weren’t always the most intuitive or comfortable for everyone. We realized that giving you the power to customize shortcuts — and fully tailor your workspace to your preferences — was not just a nice-to-have, but absolutely essential.How to Set Up Your Custom ShortcutsSetting up your shortcuts is simple and quick. You’ll find the shortcut customization panel under: Settings → Shortcuts.From there, browse workspaces from a category menu,choose the action for your own shortcut where you see fit,press your preferred key combo — and you're all set!Your custom shortcuts are saved directly in your browser, so they’ll remain set even after you reload the page. They’ll only reset if you clear your browser’s cache. If you use a different browser or device to access CVAT, you’ll need to set your shortcuts again there. And if you ever change your mind about any shortcut — you can easily update them or click "Restore Defaults" to reset everything. Check out this article for detailed step-by-step instructions.Start Customizing Your Shortcuts TodaySign up or log in to your CVAT Online account, or reach out to us to get expert assistance on starting your annotation journey.

Product Updates

April 30, 2025

Blog

Manual data labeling can be a real slog, especially when you’re working with massive datasets. That’s why automated annotation is such a lifesaver—it speeds up the process, ensures consistency, and frees you up to focus on building smarter machine learning models. CVAT OGs know that both our platforms (SaaS and on-premises) support a number of options for automated annotation using AI models, including:‍Nuclio platformRoboflow and Hugging Face integrationCLI-based annotation on your own hardwareThese methods are used and loved by thousands of users, but because data annotation projects come in all shapes and sizes, they may not work for everyone. Nuclio functions, for example, are currently managed by the CVAT administrator and are limited to CVAT On-prem installations. Roboflow and Hugging Face support a limited range of model architectures. CLI-based annotation requires users to set up and run models only on their own machines, which can be hardware-intensive and time-consuming for some teams.‍Today, we’re excited to share that CVAT is addressing all these limitations with the launch of AI agents. ‍What is a CVAT AI agent?An AI agent in CVAT is a process (or service) that runs on your hardware or infrastructure and acts as a bridge between the CVAT platform and your AI model. Its main role is to receive auto-annotation requests from CVAT, transfer data (e.g., images) to your model for processing, retrieve the resulting annotations (e.g., object coordinates, masks, polygons), and send these results back to CVAT for automatic inclusion in your task. ‍In other words, CVAT AI agents work as a bridge between your custom model and the CVAT platform, enabling seamless integration of your model into the auto-annotation process.‍How are CVAT AI agents different from other automation methods?‍Customization and accuracy: Unlike with Roboflow, or Hugging Face integrations, you can now use your own AI models, tailored specifically to your datasets and tasks, to produce precise annotations that meet your exact training requirements.Collaboration and accessibility: Unlike CLI-based annotations, AI agents allow you to centralize your model setup and share it across your organization. Team members can access and use the models without any additional setup.‍‍Flexibility across platforms: AI agents don’t require CVAT administrator control and are available on both CVAT Online and On-prem (Enterprise, version 2.25 or later), giving you the freedom to deploy and manage your models in any environment.These features make CVAT AI agents a powerful tool for scaling your annotation processes while maintaining accuracy, collaboration, and control.‍How to annotate data with CVAT AI agentNow, let’s see how to set up automated data annotation with a custom model using a CVAT AI agent. For that, you will need:An account on a CVAT instance. In the tutorial we’ll use CVAT Online, but you can use your own CVAT On-prem instance if you wish - just substitute your instance’s URL in the commands.A CVAT task with labels from the COCO dataset (or a subset of them) and some images.You will also need to install Python and CVAT CLI to your machine.‍Refresher: CLI-based annotationLet’s first briefly review how CLI-based annotation works, since the agent-based method has a lot in common with it.‍First, you need a Python module that implements the auto-annotation function interface from the CVAT SDK. These modules will serve as bridges between CVAT and whatever deep learning framework you might use. For brevity, we will refer to such modules as native functions. ‍The CVAT SDK includes some predefined native functions (using models from torchvision), but for this article, we’ll use a custom function that uses YOLO11 from Ultralytics. ‍Here it is:‍import PIL.Image from ultralytics import YOLO import cvat_sdk.auto_annotation as cvataa _model = YOLO("yolo11n.pt") spec = cvataa.DetectionFunctionSpec( labels=[cvataa.label_spec(name, id) for id, name in _model.names.items()], ) def _yolo_to_cvat(results): for result in results: for box, label in zip(result.boxes.xyxy, result.boxes.cls): yield cvataa.rectangle(int(label.item()), [p.item() for p in box]) def detect(context, image): conf_threshold = 0.5 if context.conf_threshold is None else context.conf_threshold return list(_yolo_to_cvat(_model.predict( source=image, verbose=False, conf=conf_threshold)))‍Save it to yolo11_func.py., and then run:‍cvat-cli --server-host https://app.cvat.ai --auth "<user>:<password>" task auto-annotate <task id> --function-file yolo11_func.py --allow-unmatched-labels‍This will make the CLI download the images from your task, run the model on them and upload the resulting annotations back to the task.‍Note: long-time readers might notice a few changes since the last time we talked about CLI-based annotation on this blog. In particular, we changed the command structure of CVAT so that you have to use task auto-annotate rather than just auto-annotate. In addition, native functions can now support custom confidence thresholds, so our YOLO11 example reflects that.‍Registering the function with CVATNow, let’s see how we can integrate the same model as an agent-based function.‍An important thing to know is that the agent-based functions feature also uses native functions. In other words, if you already have a native function you’ve used with the cvat-cli task auto-annotate command, you can use the same function as an agent-based function, and vice versa. So let’s reuse the yolo11_func.py file we just created.‍First, we must let CVAT know about our function. Use the following command:‍cvat-cli --server-host https://app.cvat.ai --auth “<user>:<password>” function create-native "YOLO11" --function-file yolo11_func.py‍The string “YOLO11” here is just a name that CVAT will use for display purposes; you can use any name of your choosing.‍Now, if you open CVAT and go to the Models tab, you will see our model there, looking something like this:‍‍You can click on it and check that it has all the expected properties, such as label names. However, if you actually try to use this model for automatic annotation, it will not work. The request will stay “queued”, and after a while, it will automatically be aborted. That’s because we need to do one final step.‍Note: At no point in the process does the function itself (like the Python code or weights) get uploaded to CVAT. The only information the registration process transfers to CVAT is metadata about the function, such as the name and list of labels.‍Powering the function with an agentWe must now run an agent that will process requests for the function. Use the following command:‍cvat-cli --server-host https://app.cvat.ai --auth “<user>:<password>” function run-agent 58 --function-file yolo11_func.py‍Instead of 58, substitute the model ID you see in the CVAT UI. You can also find the same ID in the output of the function create-native command. This command starts an agent for our function, which runs indefinitely. The job of the agent is to process all incoming auto-annotation requests involving the function.‍While the agent is running, open your task in CVAT and click Actions -> Automatic annotation. You’ll be able to select the YOLO11 model and set the auto-annotation parameters, like for any other type of model CVAT supports.‍Click "Annotate." After a short delay, you should see the agent start printing messages about processing the new request. Once it’s done, CVAT should notify you that the annotation is complete. You can then examine the jobs of your task to see the new bounding boxes. The agent will keep running, ready to process more requests.‍CleanupNow that we’re done testing the function, we can remove it from CVAT. First, interrupt the agent by pressing Ctrl+C in the terminal. Second, delete the function by running the following command:‍cvat-cli --server-host https://app.cvat.ai --auth “<user>:<password>” function delete 58‍Alternatively, you can do this through the UI: find the model in the Models tab, click the ellipsis and select Delete.‍Working in an organizationIn the preceding tutorial, you added the function to your personal workspace. In this case, only you can annotate with it. Now let’s discuss what’s needed to share a function with an organization.‍First, you’ll need to add an --org parameter to all of your CLI commands:‍cvat-cli --org <your organization slug> ...‍Second, you should be aware of the permission policy when you work in an organization. A function can be…‍… added by any organization supervisor.… removed by its owner or any organization maintainer.… used to auto-annotate a task by any user that has write access to that task.These rules are the same as for Roboflow and Hugging Face functions.‍In addition, to power a function, an agent must run as that function’s owner or any organization maintainer. However, an agent must be able to access data for tasks it’s requested to process. So if you want to make it possible to use the function on any task in the organization, you should run the agent as a user with the maintainer role.‍Technical detailsThe following diagram shows the major components involved in agent-based functions. In the general case, the agent can run in a completely separate infrastructure from the CVAT server. The only requirement is that it’s able to connect to the CVAT server via the usual HTTPS port. The agent does not need to accept any incoming connections. Of course, if you run your own CVAT instance, you can run the agent in the same infrastructure, even on the same machine if you’d like.‍While so far we’ve been talking about the agent, you’re not actually limited to running one agent per function. If you’d like to be able to annotate more than one task at a time, you can run multiple agents:All annotation requests coming from the users are placed in a queue and distributed to agents on a first-come-first-serve basis. If one agent crashes or hangs while processing a request, that request will eventually be reassigned to another agent.‍What AI agents can’t do (yet)AI agents are still pretty new, so there are a few things they can’t do just yet (but don’t worry, we’re on it and will roll out updates soon): ‍Annotate just one frame,Work with skeletons,Handle videos or 3D data tasks, orSupport shapes with attributes.‍Get started with CVAT AI agentsCVAT AI agents are here to level up how teams automate data annotation. Now, you can use models trained just for your unique datasets or tasks, no matter if you’re on CVAT Online or On-Prem. This means: ‍✅ more precise annotations that are better aligned with your requirements, ✅ less manual fixing, and ✅ datasets that are ready to go for AI training or deployment. ‍And, with a centralized setup, your whole team can easily access the model, speeding up workflows and improving collaboration.‍Ready to take your automated annotation to the next level? Sign up or log in to your CVAT Online account or contact us to get CVAT with AI agents support on your server.‍

Product Updates

January 16, 2025

CVAT AI Agents Guide: A New Way to Automate Data Annotation using Your Own Models

Blog

IntroductionChoosing the right data annotation service is a key step in any AI or machine learning project. High-quality labeling services are essential for training algorithms and ensuring accurate predictions. CVAT (Computer Vision Annotation Tool) and Clarifai are two leading platforms offering various annotation services. These platforms cater to a wide range of users, from individual researchers to large companies.In this comparison, we’ll examine the strengths and weaknesses of both. We will focus on performance, scalability, and ease of use. We will also consider the target audience and suitability for specific industries. This will help you make the best choice for your project.‍Performance and capabilitiesCVAT is an open-source tool designed for teams that need more control and customization over their annotation workflows. It offers the following annotation types.Annotation types2D Image Annotations: Support for detailed annotations like bounding boxes, polylines, points, skeletons and polygons for more intricate data.Video Annotations: Capabilities for object tracking, recognition, and event detection in video-based tasks.3D Sensor Fusion: Provides support for annotations involving 3D sensor data, making it ideal for applications like autonomous driving, robotics, and LiDAR tasks.‍One of CVAT's key strengths is its ability to handle complex annotations, like instance and semantic segmentation with high precision. This makes it ideal for industries like healthcare, automotive, and surveillance, where detailed accuracy is very important.Clarifai is a comprehensive platform that focuses on automating data annotation processes to improve efficiency. Its main features include:‍2D Image Annotations: Efficient handling of large-scale image classification tasks using AI-driven automation, including bounding boxes and polygons.Text Classification: Support for natural language processing (NLP) initiatives, making it suitable for text-based projects.Video Annotations: Offers video object tracking to automate and simplify video analysis.Document Analysis: Named entity recognition (NER) for processing and analyzing large volumes of text efficiently.‍Clarifai is highly adaptable for different annotation tasks due to its AI tools. This makes it a good fit for industries like e-commerce, finance, and media. These industries handle a large amount of data, but the annotations are less complex.‍Ease of UseCVAT provides an easy-to-use platform that doesn't require technical expertise. Users can quickly sign up on the CVAT cloud platform and start labeling process right away. Data scientists and AI researchers value its powerful customization features. However, smaller teams or individuals without much technical knowledge can also use it effortlessly. The platform also supports complex project setups and allows for collaboration among multiple users, making it suitable for team-based projects.‍Clarifai is also designed for ease of use, requiring minimal setup. Its intuitive platform includes many automated features that help reduce manual effort. This makes it a great choice for project managers or companies looking to outsource data labeling without getting into the technical details. Teams can quickly start using the platform, even if they don’t have extensive technical knowledge in data annotation.‍‍Scalability and FlexibilityScalability is crucial for teams and organizations looking to expand their AI projects. CVAT excels in this area, primarily because it is open-source. This allows teams to enhance their annotation operations by improving infrastructure, adding custom plugins, or adjusting workflows to fit specific needs. Such flexibility is particularly beneficial for large organizations and AI research teams. These teams are involved in complex projects that require tailored workflows or intricate annotations. Examples include projects in the autonomous driving or aerospace sectors.‍‍On the other hand, Clarifai offers a simple approach to scalability. With its global workforce and AI automation, it excels in projects that require quick deployment. Companies in sectors like retail, healthcare, and marketing can easily scale their annotation needs. They can do this using Clarifai’s fully managed services. These services help reduce operational burdens. This is particularly advantageous for businesses looking for fast results without the need to establish a dedicated in-house annotation team.‍Industry-Specific SuitabilityClarifai and CVAT are versatile tools that can be applied across various industries, though they approach data annotation differently. Clarifai emphasizes automated data labeling, ideal for large datasets requiring speed and efficiency. Its AI-driven labeling is fast, yet it also supports manual annotation when needed for flexibility. On the other hand, CVAT focuses on manual labeling. This makes it better suited for tasks that demand high accuracy and human oversight. CVAT also offers automated and semi-automated annotation options. This allows CVAT to adapt to projects where repetitive or simpler tasks can be handled by AI. More complex tasks are left for human annotators.‍The decision between manual and automated annotation depends on the complexity of the data and specific project requirements. Automated annotation excels with large, straightforward datasets, whereas manual annotation is essential for more precise and intricate work. Both tools successfully cater to the unique data annotation requirements of various sectors, ensuring high-quality results across industries, including:‍HealthcareAnnotation helps analyze medical images like X-rays and MRIs. It is important for diagnosing tumors and other diseases.‍Surveillance and Security In this field, annotation is used for video tasks like event detection and facial recognition. It improves accuracy in important situations.‍Autonomous VehiclesAnnotation is key for object tracking and 3D sensor fusion. It trains models for lane detection, pedestrian tracking, and obstacle recognition.‍E-commerceAnnotation assists in classifying images and tagging products. This makes it easier to handle large data volumes and enhances user experience.‍Retail and MarketingIn these areas, annotation analyzes customer data. It helps businesses gain insights and make predictions.‍RoboticsAnnotation trains robots for tasks like object recognition and navigation. It creates reliable models for complex environments, such as automated warehouses and factories.‍Pricing ModelData Labeling ServicesA labeling service is a data annotation service used to train artificial intelligence models. Specialists manually mark objects in images or text so that the AI can learn to recognize and categorize them. This process is crucial for creating high-quality training datasets. These datasets allow AI to accurately perform tasks such as facial recognition, object detection, or text analysis. CVAT and Clarifai offer data labeling services. Below, we will review their data annotation offerings:‍CVAT· Discussion of Requirements: First, you contact the CVAT team or your contacts to discuss the details of your project. This helps them understand your specific needs and goals.· Proof of concept (POC) annotation: CVAT will request a data sample and an initial specification. This will allow CVAT to demonstrate its expertise. It will also help prepare an accurate project quote and estimate the time required to complete the project. This phase is completely free for a customer!· Team Formation: Depending on the scope and complexity of the project, CVAT may form a specialized team of annotators. This team will be responsible for carrying out the annotations according to your requirements.· Project and Task Creation: CVAT creates a project on their platform, including tasks for annotation. These tasks contain instructions and examples to guide the annotators on how to work with your data.· Data Preparation and Upload: You provide your data (images, videos, etc.), which are then uploaded into the system. CVAT supports various formats, making the upload process easier.· Annotation Process: The annotators begin working on annotating the data. CVAT offers powerful annotation tools, allowing the team to perform their tasks efficiently.· Quality Control: During and after the annotation, quality control is conducted. This may include reviewing the annotators' work and using automated tools to ensure accuracy.· Documentation: CVAT provides documentation for the project, including reports on completed work, quality metrics, and any important comments. This is useful for analysis and reporting.· Delivery of Annotated Data: Once the project is completed, you receive the annotated data in the agreed format, ready for use in your project.· Feedback and Support: The CVAT team remains in contact to gather your feedback on the process and provide support for any questions that may arise.‍Clarifai· Easy Execution: Users can effortlessly upload data in various formats to the Clarifai platform. The labeled data will be returned to the specified format for continued training, whether on Clarifai or another platform.· Expert and Flexible Workforce: The platform reduces the daily management burden of data labeling pipelines by allocating a specialized team based on expertise. A single team will manage the entire project to ensure consistency.· Quality Assurance Checkpoints: Clarifai conducts tests against data samples to ensure quality before finalizing the labeling of the complete training dataset. Users receive regular updates and transparency regarding quality metrics and turnaround times.· More Secure: The platform offers a secure environment for handling image, video, and document data. It adheres to strict security standards and data privacy principles. This allows users to select teams with background checks. The annotation takes place in secure facilities.· Flexible Pricing: Clarifai provides flat-rate pricing, making it easier to outsource data labeling needs and reduce operational overhead. Pricing scales with project growth.· Speed Time to Production: The team utilizes a state-of-the-art platform. This platform employs AI automation to expedite dataset annotation and project completion. It ensures high levels of accuracy.CVAT’s flexible pricing includes options like per-object, per-image, or hourly billing based on project demands. The only limitation for CVAT is that the project cost cannot be less than $5,000.‍Clarifai offers a more fixed project evaluation system, but there is also the option for a customized approach to the project.‍‍Suggestions for self-service on the platform.There are also plans available for independent work on the platform. Below is a comparison.CVAT‍Clarifai‍Additional Areas of ComparisonTo assist you in making an informed choice, here are five distinctions between CVAT and Clarifai:‍Integration with Existing Tools:CVAT's open-source architecture allows for seamless integration with third-party tools and custom pipelines. This makes it a suitable choice for teams with established AI ecosystems. This flexibility enables organizations to tailor their workflows to specific needs. While Clarifai also provides integration options, its emphasis on ready-to-use AI models may limit customization for teams with advanced technical skills.Project Management:CVAT offers robust project management features. These features allow team leaders to assign tasks, monitor progress, and collaborate in real time. This can be particularly beneficial for complex projects involving larger teams. Clarifai provides managed services for annotation and project management, which can streamline processes and support team coordination.Annotation Accuracy:CVAT is equipped with comprehensive annotation tools that are ideal for tasks demanding high precision, such as autonomous driving or medical imaging. Its capabilities allow for detailed data management. Clarifai utilizes AI-driven automation to enhance efficiency. This may be sufficient for many applications. However, it may face challenges with highly complex datasets.Turnaround Time:Clarifai's AI automation and distributed workforce are recognized for delivering faster turnaround times, making it suitable for projects that prioritize speed. Conversely, CVAT focuses on meticulous manual and semi-automated annotation. This ensures a high quality of results. This can be particularly important for complex datasets, even if it may take longer.Security and Data Privacy:CVAT's open-source nature allows for on-premise hosting. This grants organizations full control over data privacy. This is an essential feature for businesses handling sensitive information. Clarifai provides cloud-based solutions with strong security measures. This may appeal to companies that prioritize data security. However, it may not offer the same level of direct data control as CVAT.‍ConclusionCVAT and Clarifai are both powerful data annotation platforms, each serving different needs and applications. CVAT is well-suited for those requiring customizable, precise, and scalable solutions, particularly in sectors like robotics, autonomous driving, healthcare, and surveillance. Its open-source nature allows for easy installation and project management, especially for teams with the technical expertise to handle complex annotation tasks.‍On the other hand, Clarifai is designed for teams that value user-friendliness, automation, and rapid scalability. Its focus on AI features and managed services makes it a strong contender across various industries.‍Are you ready to make your choice? Explore both CVAT and Clarifai to determine which platform aligns best with your project's unique needs and objectives!‍

Industry Insights & Reviews

October 8, 2024

CVAT vs. Clarifai: Which Data Annotation Service Is Right for You?

Blog

CVAT, your go-to computer vision annotation tool, now supports the YOLOv8 dataset format.‍‍Version 2.17.0 of CVAT is currently live. Among the many changes and bug fixes, CVAT also introduced support for YOLOv8 datasets for all open-source, SaaS, and Enterprise customers. Starting now, you export annotated data to be compatible with YOLOv8 models.What is the YOLOv8 Dataset Format?YOLOv8, developed by Ultralytics, is the latest version of the YOLO (You Only Look Once) object detection series of models. YOLOv8 is designed for:‍‍Classification: Classifying or organizing an entire image into a set of predefined classes;‍Classification‍‍Object Detection: Detecting, locating, and identifying the class of object in the image or visual data.‍Object Detection‍‍Pose Estimation: Identifying the location and orientation of a person or object within an image by recognizing specific keypoints (also referred to as interest points).‍Pose Estimation‍‍Oriented Bounding Boxes: A step further than object detection and introduce an extra angle to locate objects more accurately in an image.Oriented Bounding Boxes‍‍Instance Segmentation: Pixel-accurate segmentation of objects or people in an image or visual data.‍Instance Segmentation‍With the help of CVAT’s data labeling and annotation tools, YOLOv8 models can be trained to perform the functions as accurately as possible.‍What are the Benefits of Using the YOLOv8 Model for Computer Vision?Ultralytics has used the knowledge and experience garnered from previous iterations of their AI models to create the latest and most advanced YOLOv8. The benefits of using YOLOv8 include, but are certainly not limited to: ‍Highly accurate object detection;Versatility when it comes to detecting multiple objects, classifying and segmenting them, and detecting keypoints within images;Efficient, as the YOLOv8 has been optimized for efficient hardware usage and doesn’t require much computing power to run;Open-source means that the YOLOv8 is always evolving and being built by a passionate community of developers and all its features are easily accessible;And a lot more that would require a much longer list than this. ‍Which Industries Can Benefit from Training YOLOv8 Models?A trained YOLOv8 model can then be used for a variety of tasks. The functionality that YOLOv8 computer vision models can provide will benefit the following industries.‍Computer vision and AI models trained to detect various objects related to automotive are the way of the future in the automotive industry. Self-driving vehicles and traffic management are just a few of the ways that YOLOv8 models will benefit the automotive industry.Automotive use case‍The YOLOv8 object detection model can also offer significant functionality for security. Thanks to highly accurate object tracking and pose estimation, YOLOv8 models can detect intrusions and monitor for unregistered activities or prohibited objects within a given area.‍Security us case‍Using computer vision in retail and logistics will improve the efficiency at which stores maintain their supply and stock. They can also use YOLOv8's powerful object detection function to detect which shelves need to be restocked to improve customer experience.‍Naturally, the robotics industry greatly benefits from AI models with accurate computer vision, as it helps significantly when it comes to problem-solving. With each advancement in computer vision, problem-solving robots get more and more sophisticated as a result.‍Robotics use case‍In construction and architecture, computer vision can identify weak support, foundational problems, and other structural errors. This can help construction crews to detect potentially disastrous errors before any serious problems occur. On top of that, visual surveillance can be paired with AI to help construction managers detect safety hazards before they take place.‍Safety hazards use case‍There are ton of functions for many other industries when it comes to Ultalytics' YOLOv8 model. For now, these are among the most popular use cases for this tech.‍Understanding the Technical Details of the YOLOv8 Dataset FormatThe YOLOv8 dataset format uses a text file for each image, where each line corresponds to one object in the image.‍Each line includes five values for detection tasks: class_id, center_x, center_y, width, and height. These coordinates are normalized to the image size, ensuring consistency across varying image dimensions.‍For tasks like pose estimation, the YOLOv8 format also includes additional keypoint coordinates. Segmentation tasks require the use of polygons or masks, represented by a series of points that define the object boundary. Additionally, oriented bounding boxes can be rotated, which helps in annotating objects not aligned with the image axes.‍Dataset StructureThe YOLOv8 dataset typically includes the following components:‍<dataset directory>/ ├── data.yaml # configuration file ├── train.txt # list of train subset image paths │ ├── images/ │ ├── train/ # directory with images for train subset │ │ ├── image1.jpg │ │ ├── image2.jpg │ │ ├── image3.jpg │ │ └── ... ├── labels/ │ ├── train/ # directory with annotations for train subset │ │ ├── image1.txt │ │ ├── image2.txt │ │ ├── image3.txt │ │ └── ...‍‍Images Folder: This folder contains the images you are training the model on. These images are referenced by the corresponding annotation files.‍Annotations: Each image has a corresponding .txt file with the same name located in the annotations folder. The file structure for detection tasks looks like this:‍# <image_name>.txt: # content depends on format # YOLOv8 Detection: # label_id - id from names field of data.yaml # cx, cy - relative coordinates of the bbox center # rw, rh - relative size of the bbox # label_id cx cy rw rh 1 0.3 0.8 0.1 0.3 2 0.7 0.2 0.3 0.1 # YOLOv8 Oriented Bounding Boxes: # xn, yn - relative coordinates of the n-th point # label_id x1 y1 x2 y2 x3 y3 x4 y4 1 0.3 0.8 0.1 0.3 0.4 0.5 0.7 0.5 2 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 # YOLOv8 Segmentation: # xn, yn - relative coordinates of the n-th point # label_id x1 y1 x2 y2 x3 y3 ... 1 0.3 0.8 0.1 0.3 0.4 0.5 2 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 0.7 0.5 # YOLOv8 Pose: # cx, cy - relative coordinates of the bbox center # rw, rh - relative size of the bbox # xn, yn - relative coordinates of the n-th point # vn - visibility of n-th point. 2 - visible, 1 - partially visible, 0 - not visible # if second value in kpt_shape is 3: # label_id cx cy rw rh x1 y1 v1 x2 y2 v2 x3 y3 v3 ... 1 0.3 0.8 0.1 0.3 0.3 0.8 2 0.1 0.3 2 0.4 0.5 2 0.0 0.0 0 0.0 0.0 0 2 0.3 0.8 0.1 0.3 0.7 0.2 2 0.3 0.1 1 0.4 0.5 0 0.5 0.6 2 0.7 0.5 2 # if second value in kpt_shape is 2: # label_id cx cy rw rh x1 y1 x2 y2 x3 y3 ... 1 0.3 0.8 0.1 0.3 0.3 0.8 0.1 0.3 0.4 0.5 0.0 0.0 0.0 0.0 2 0.3 0.8 0.1 0.3 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 0.7 0.5 # Note, that if there are several skeletons with different number of points, # smaller skeletons are padded with points with coordinates 0.0 0.0 and visibility = 0‍‍data.yaml: This configuration file defines the dataset structure for training. It includes paths to the images and annotation files and lists all class names. An example of a data.yaml file looks like this:‍path: ./ # dataset root dir train: train.txt # train images (relative to 'path') # YOLOv8 Pose specific field # First number is the number of points in a skeleton. # If there are several skeletons with different number of points, it is the greatest number of points # Second number defines the format of point info in annotation txt files kpt_shape: [17, 3] # Classes names: 0: person 1: bicycle 2: car # ... ‍This lightweight and modular format allows for flexibility and scalability in your machine-learning pipeline. It also means that it can undertake a wide range of computer vision tasks, including object detection, pose estimation, segmentation, and oriented bounding boxes. For more technical details and in-depth usage, you can explore the full YOLOv8 format documentation.‍How to Use the YOLOv8 Dataset Format in CVATExporting YOLOv8 DatasetsAfter completing annotations in CVAT, exporting them in a YOLOv8 format is straightforward. Here’s how you can do it:‍‍Export Your Dataset: Once your annotations are ready, CVAT allows you to export them in YOLOv8 format, ensuring they are perfectly structured for use in YOLOv8 models. This includes annotations for detection, pose, oriented bounding boxes, and segmentation tasks. For detailed instructions on exporting your dataset, you can refer to the Exporting Annotations Guide.‍Train Your YOLOv8 Model: With your annotations exported, you can now directly integrate them into Ultralytics' YOLOv8 training pipeline. The dataset will be ready to train your model for detection, pose estimation, or segmentation tasks without the need for conversion. For further guidance on training your YOLOv8 models using Python, check out the Ultralytics YOLOv8 Python Usage Guide.‍Importing YOLOv8 DatasetsIn addition to exporting datasets, CVAT also supports importing datasets that are already in the YOLOv8 format. This feature allows you to bring external datasets and annotations into CVAT for further refinement or use in different projects. You can import both annotations and images for detection, oriented bounding boxes, segmentation, and pose estimations.‍To learn more about how to import YOLOv8 datasets and annotations, follow the detailed instructions in our Dataset Import Guide.‍F.A.Q.Which CVAT users have access to YOLOv8 support?All CVAT users, including open-source, SaaS, and Enterprise, have access to annotation tools for the YOLOv8 computer vision model.‍How good is YOLOv8 object detection?A YOLOv8 computer vision model trained with data annotated through CVAT can be very accurate in identifying various objects in visual data. YOLOv8 models can identify object borders down to the pixel, making them incredibly powerful when it comes to object detection.‍What functions do YOLOv8 models perform in computer vision?As listed above, YOLOv8's functions include classification, object detection, pose estimation, oriented bounding boxes, and instance segmentation.‍Start Using YOLOv8 in CVAT Today!The additional support for YOLOv8 dataset formats is a major milestone for CVAT. All open-source, SaaS customers and Enterprise clients are welcome to try out CVAT to help you train a YOLOv8 model for all manner of computer vision uses.‍For more information, visit our YOLOv8 format documentation. ‍Not a CVAT.ai user? Click through and sign up here.‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Product Updates

September 17, 2024

CVAT Adds YOLOv8 Format Support for Seamless Dataset Import and Export

Blog

‍We are excited to announce a new feature for our enterprise clients: we've added Security Assertion Markup Language Single Sign-On (SAML SSO) support into the CVAT platform. This addition underscores our commitment to providing secure and flexible solutions tailored to the needs of large organizations.‍What is SAML SSO and Why Does It Matter?‍SAML is a well-established and trusted SSO standard widely adopted by many companies due to its robust security features. It allows users to authenticate across multiple applications using a single set of credentials, significantly simplifying the login process while enhancing security. ‍CVAT.ai SSO Proposal‍Better User Experience: SAML SSO simplifies the login process for users by allowing them to access multiple applications with a single set of credentials. This reduces the time spent on managing various logins and enhances overall productivity.‍‍Improved Security: SAML is known for its rigorous security standards, making it the preferred choice for many large organizations. ‍We understand that every enterprise has unique requirements. That is why CVAT.ai supports both SAML and OIDC (OpenID Connect), another popular SSO protocol. Enterprises can choose the protocol that best fits their infrastructure and security policies.‍Get Started Today‍With the new SAML SSO integration, your enterprise can enjoy a more secure, streamlined, and flexible authentication process. Whether you already use CVAT or consider it part of your enterprise's workflow, this new feature ensures you have the best tools to manage security and access effectively.‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍

Product Updates

August 29, 2024

CVAT On-prem Enterprise Clients Can Now Benefit from Enhanced Security with SAML SSO Integration

Blog

In a significant update for computer vision enthusiasts and professionals, the powerful Segment Anything 2 model has been integrated into the Computer Vision Annotation Tool (CVAT.ai). This cutting-edge technology, developed by Meta, improves the image segmentation speed and accuracy and streamlines the annotation process. ‍So, what's new in the SAM 2?‍SAM 2 dramatically improves over earlier methods in image annotation without prior training on 17 different datasets. It also reduces the need for human involvement by about three times, making the process much more efficient.SAM 2 performs better than its predecessor, SAM, on a suite of 23 datasets without prior training and operates six times faster.Using SAM 2 feels like real-time processing, as it can handle about 44 frames per second.Using SAM 2 for video segmentation annotation in the loop is 8.4 times quicker than manual per-frame annotation with the original SAM.‍CVAT.ai Cloud: Segment Anything Model v2 Now Available for Image Segmentation‍CVAT has integrated "Segment Anything 2" into its SaaS version, improving the platform's capabilities for image segmentation.Integrating Meta AI's advanced machine learning models transforms CVAT into a more powerful tool for various users, ranging from academic researchers to industry professionals. This integration highlights a mutual commitment to advancing the field of computer vision. For now, in CVAT.ai, SAM 2 works only for images today, but video support will be added soon!We've Added Bounding Box Input‍The public version of CVAT.ai now supports optional bounding box input for Segment Anything 2. This feature allows users to define areas to annotate more quickly and accurately, enhancing the efficiency of model training processes for various applications.‍‍CVAT.ai Enterprise Edition: Added Segment Anything Model v2 CVAT has stepped up its game for Enterprise users by integrating Segment Anything 2 interactor support. This edition is tailored to meet the high demands of corporate environments where precision and scalability are critical. Enterprises can leverage this feature to handle complex segmentation tasks more effectively, ensuring higher accuracy and productivity in machine learning projects.‍‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍

Product Updates

August 15, 2024

Meta's SAM 2 is Now Available in CVAT Online for Image Segmentation

Blog

In the dynamic world of computer vision, staying current with technology advancements is not just beneficial—it's critical. This is particularly true for organizations that use self-hosted solutions for the Computer Vision Annotation Tool (CVAT.ai). ‍Regular updates to such a tool are essential for several reasons: security, improved functionality, ensuring compatibility, and maintaining operational efficiency. This article explores why regularly updating your self-hosted CVAT.ai solution is crucial for maintaining a competitive edge and operational reliability.‍This article is divided into two parts: the first addresses 'why' regular updates are necessary, and the second explains 'how' to implement these updates effectively.‍Why is it Necessary to Update CVAT.ai Regularly?‍Improved Security: One of the most compelling reasons to regularly update your self-hosted CVAT is to enhance security. Although the latest version of CVAT.ai is secure, the threat landscape constantly evolves. New vulnerabilities are discovered daily, and the CVAT.ai Team releases patches to mitigate these risks. By staying updated, you safeguard your system against vulnerabilities that malicious actors could otherwise exploit. Regular updates are crucial for maintaining the integrity of your data and ensuring the privacy of the information processed by CVAT.‍Access to Latest Features: CVAT is continuously improved by a community of developers who add new functionalities and enhancements. These updates can include everything from improved annotation algorithms, support for new formats, and enhanced user interfaces to integration capabilities with other tools and platforms. ‍Compatibility and Integration: As your IT environment evolves, new versions of dependent software and hardware are introduced. Regularly updating CVAT ensures compatibility with other software tools and infrastructure changes. For example, updates may be needed for CVAT.ai to operate smoothly with newer versions of browsers, operating systems, or integrations with third-party APIs and services. Maintaining an updated system prevents disruptions caused by compatibility issues, which can be costly and time-consuming to resolve after the fact.‍Operational Reliability: Regular updates introduce new features and improvements, including optimizations that enhance CVAT's performance and stability. These optimizations can lead to faster load times, improved response times, and more efficient data processing, enhancing the system's overall reliability. For businesses relying heavily on computer vision technologies, operational reliability is non-negotiable.‍How to Update CVAT?‍Before we delve into the procedure, it’s important to note that the steps described here apply only to standard CVAT.ai standard public images.‍If you have created a custom image that we need to be aware of, we assume you are technically proficient and can handle the necessary updates tailored to your image.‍Step 1: Back Up Your Data‍Before making any changes to your CVAT installation, it's essential to back up your data. This ensures you can restore your system to its previous state if something goes wrong during the update.For more information, see CVAT.ai Backup Guide.‍Step 2: Stop the Old Version‍You need to stop the currently running version of the application to avoid potential conflicts.Use the Docker compose command to stop the running CVAT.ai container.‍Step 3: Pull Updates from Repository‍Once the system is halted, you can safely update the software by pulling the latest changes from the CVAT GitHub repository. You must download the entire source code, not just the Docker Compose configuration file.To see if the new version was released and to check the latest changes, use CVAT.ai Changelog.You must also check and update the additional components at this stage.‍Step 4: Handle Personal Customizations‍If you have custom configurations, such as a database managed outside Docker, you must ensure these are compatible with the new version. Review your configurations and make necessary adjustments to ensure they work with the new version of CVAT. In some cases, you need to build images locally; see this Guide for details.‍Step 5: Run the New Version‍After updating the software and adjusting your customizations, you can start the new version of CVAT.Start CVAT container: Use Docker commands to run the new CVAT containers; see the Upgrade Guide for details.‍Step 6: Manual Updates Where Needed‍Sometimes, you may need to update custom external components or manually handle migration scripts.‍And that's it!You now have new updates CVAT.ai with all necessary security improvements and features!‍Looks Too Complicated?‍Updating and managing CVAT can sometimes feel complex, mainly when you're focused on annotating and training models for your work or research. If you'd prefer to leave the sysadmin and DevOps tasks to someone else, CVAT offers installation support and help managing Enterprise self-hosted solutions. Explore our enterprise proposals and plans to find the right level of support for your needs. Alternatively, consider using our online version—it's always up-to-date and secure, so you can focus solely on annotating without hassle.‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍

Blog

We are pleased to introduce annual subscription options for both Solo and Team users at CVAT.ai. Opting for an annual subscription can help you save up to 30% on your costs per year.‍For new users, the annual plan can be accessed directly through the CVAT.ai interface:‍1. Click on the arrow next to your nickname in the top-right corner of the screen.2. Select Upgrade to Plan.3. Choose the annual subscription and follow the on-screen instructions.‍‍For existing users with a monthly subscription who wish to switch to the annual plan, please follow these steps:‍1. In the top-right corner, near the nickname, click on the arrow.2. Select Manage Solo/Team Plan.3. On the Stripe page that appears, click Update Plan.‍‍4. Choose Yearly and then click Continue.‍‍5. The price will be adjusted according to the new Quantity field (if updated), taking into account the amount of money that was not spent in the current period.‍‍Upon payment, your subscription will be renewed and the start date will be reset to the day you switch to the new plan. .‍We have also introduced our Refund Policy. Please review it to understand the changes and how they might affect you.‍Stay curious, keep annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Product Updates

April 18, 2024

Announcing Annual Plans for CVAT Online

Blog

Using open-source datasets is crucial for developing and testing computer vision models. Here are 10 notable datasets that cover a wide range of computer vision tasks, including object detection, image classification, segmentation, and more.‍Common Objects in Context (COCO)Description: The Common Objects in Context (COCO) dataset is a large-scale dataset that includes such objects as cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment. It was created to overcome the limitations of existing datasets by including more contextual details, a broader range of object categories, and more instances per category.COCO dataset is commonly used for several computer vision tasks, including but not limited to object detection, semantic segmentation, superpixel stuff segmentation, keypoint detection, and image captioning (5 captions per image). Its diverse range of images and annotations includes 330K images (>200K labeled), 1.5 million object instances, 80 object categories, and 250,000 people with keypoints. ‍Be aware that although COCO annotations are famous and widely used, their quality can vary and sometimes may be restrictive for certain use cases.‍History: The COCO dataset was first introduced in 2014 to improve the state of object recognition technologies. While the dataset itself has not been updated regularly in terms of new images being added, its annotations and capabilities are frequently enhanced and expanded through challenges and competitions held annually.‍Licensing: The COCO dataset is released under the Creative Commons Attribution 4.0 License, which allows both academic and commercial use with proper attribution. ‍Official Site: https://cocodataset.org/‍‍ImageNet‍Description: ImageNet is a collection of images structured around the WordNet classification system. WordNet groups each significant idea, which might be expressed through various words or phrases, into units known as "synonym sets" or "synsets." With over 100,000 synsets, predominantly nouns exceeding 80,000, ImageNet's goal is to furnish roughly 1000 images for every synset to accurately represent each concept. The images for each idea undergo strict quality checks and are annotated by humans for accuracy. Upon completion, ImageNet aspires to present tens of millions of meticulously labeled and organized images, covering the breadth of concepts outlined in the WordNet system.‍ImageNet played a pivotal role in the evolution of computer vision technologies, particularly through the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC), which has been important in pushing the boundaries of image recognition capabilities and deep learning techniques. It is widely recognized for its role in advancing machine learning and computer vision, particularly in areas such as object recognition, image classification, and deep learning research. ‍History: The ImageNet project, initiated in 2009 by researchers at Stanford University, was designed to create a vast database of labeled images to enhance the field of computer vision. ImageNet significantly influenced the growth of deep learning, especially through its yearly ImageNet Large Scale Visual Recognition Challenge, which was held until 2017. Although these challenges have ended, the ImageNet dataset remains a key resource in the computer vision field, even though it is not regularly updated with new images.‍Licensing: ImageNet does not own the copyright of the images, it only compiles an accurate list of web images for each synset of WordNet. For this reason, ImageNet is available for use under terms that facilitate both academic and non-commercial research, with specific guidelines for usage and attribution.‍Official Site: http://www.image-net.org/‍‍PASCAL VOC‍Description: PASCAL VOC is a well known dataset and benchmarking initiative designed to promote progress in visual object recognition. It offers a substantial dataset and tools for research and evaluation on its dedicated platform, serving as an essential resource for the computer vision community.The PASCAL VOC dataset was developed to offer a diverse collection of images that reflect the complexity and variety of the world, which is crucial for building more effective object recognition models. This dataset has become a cornerstone in the field of computer vision, driving significant advancements in image classification technologies. The challenges associated with PASCAL VOC played an important role in pushing researchers to improve the accuracy, efficiency, and reliability of computerized image understanding and categorization. PASCAL VOC's dataset played a huge role in such fields as instance segmentation, image classification, person pose estimation, object detection, and person action classification‍History: The PASCAL VOC project, initiated in 2005, was developed to offer a standard dataset for tasks related to image recognition and object detection. It gained recognition through its yearly challenges that significantly advanced the field until they concluded in 2012. Although these annual challenges have ended, the PASCAL VOC dataset remains an important tool for researchers in computer vision, even though it is not updated with new data anymore.Licensing: PASCAL VOC is made available under conditions that support academic and research-focused projects, adhering to guidelines that encourage the ethical and responsible use of the dataset. Also, the VOC data includes images obtained from the "flickr" website, for more information, see "flickr" terms of use.‍Official Site: http://host.robots.ox.ac.uk/pascal/VOC‍‍CityscapesDescription: The Cityscapes dataset was created to help improve how we understand and analyze city scenes visually. This dataset includes a varied collection of stereo video sequences captured across street scenes in 50 distinct cities. It boasts high-quality, pixel-precise annotations for 5,000 frames and also includes an extensive selection of 20,000 frames with basic annotations. Consequently, Cityscapes significantly surpasses the scale of earlier projects in this domain, offering an unparalleled resource for researchers and developers focusing on urban environment visualization.‍Cityscapes was developed with the ambition to close the gap in the availability of an urban-focused dataset that could drive the next leap in autonomous vehicle technology and urban scene analysis. Cityscapes offers a rich collection of annotated images focused on semantic urban scene understanding. This initiative has catalyzed significant advancements in the analysis of complex urban scenes, contributing to the development of algorithms capable of more nuanced understanding and interaction with urban environments.‍History: The Cityscapes dataset was launched around 2019 to aid research aimed at understanding urban scenes at a detailed level, especially for segmentation tasks that require precise pixel and object identification. This dataset is regularly updated and remains crucial in the field, assisting developers and researchers in enhancing systems like those used in autonomous vehicles.‍Licensing: The Cityscapes dataset is provided for academic and non-commercial research purposes. ‍Official Site: https://www.cityscapes-dataset.com/‍‍KITTI‍Description: The KITTI dataset is well-known in the field of autonomous driving research, offering a comprehensive suite for several computer vision tasks related to automotive technologies. The dataset is focused on real-world scenarios and encompasses several key areas: stereo vision, optical flow, visual odometry, and 3D object detection and 3D object tracking.‍Developed to bridge the gap in automotive vision datasets, KITTI was developed to improve the domain of autonomous driving by providing a dataset that captures the complexity of real-world driving conditions with a depth and variety unseen in previous collections. ‍History: The KITTI dataset was launched in 2012 to help advance autonomous driving technologies, concentrating on specific tasks such as stereo vision, optical flow, visual odometry, 3D object detection, and tracking. It was developed through a partnership between the Karlsruhe Institute of Technology and the Toyota Technological Institute at Chicago. While the KITTI dataset is not updated regularly, it remains an essential tool for researchers and developers in the automotive technology field.‍Licensing: The KITTI dataset is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License that supports academic research and technological development, promoting its use among scholars and developers in the autonomous driving community. ‍Official Site: http://www.cvlibs.net/datasets/kitti‍‍VGGFace2 ‍Description: VGGFace2 is made of around 3.31 million images divided into 9131 classes, each representing a different person identity. It is used for a multitude of computer vision tasks such as face detection, face recognition, and landmark localization. It boasts a rich collection of images featuring a wide demographic diversity, including variations in age, pose, lighting, ethnicity, and profession, thus ensuring a robust framework for developing and testing algorithms that closely mimic human-level understanding of faces.‍The dataset comprises images of faces ranging from well-known public figures to individuals across various walks of life, enhancing the depth and applicability of face recognition technologies in real-world scenarios.‍History: VGGFace2 developed by researchers from the Visual Geometry Group at the University of Oxford was introduced in 2017 as an extension of the original VGGFace dataset. There are no regular updates to the VGGFace2 as it was released as a static collection for academic research and development purposes.‍Licensing: VGGFace2 supports both academic research and non-commercial use, detailed on its website. ‍Official Website: https://paperswithcode.com/dataset/vggface2-1‍‍CIFAR-10 & CIFAR-100‍Description: The CIFAR-10 and CIFAR-100 datasets are curated segments of the extensive 80 million tiny images collection, put together by researchers Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. These datasets were created to facilitate the analysis of real-world imagery. CIFAR-10 encompasses 60,000 color images of 32x32 pixels each, distributed across 10 categories, with each category featuring 6,000 images. This dataset is split into 50,000 images for training and 10,000 for testing, spanning a diverse array of subjects such as animals and vehicles.‍On the other hand, CIFAR-100 expands on this by offering 100 categories, each with 600 images, making for a total of the same 60,000 images but with a finer division. It allocates 500 images for training and 100 images for testing in each category. The CIFAR-100 dataset further organizes its categories into 20 supercategories, with each image tagged with both a "fine" label, identifying its specific category, and a "coarse" label, denoting its supercategory grouping.‍These datasets were created to push forward the study of image recognition by offering a detailed and varied collection of images that previous datasets lacked. They aid in developing algorithms that can distinguish and recognize a broad array of object types, bringing computer vision closer to human-like understanding.‍History: CIFAR-10 and CIFAR-100 were developed by researchers at the University of Toronto and released around 2009. They have not been regularly updated since their release, serving primarily as benchmarks in the academic community.‍Licensing: Both CIFAR-10 and CIFAR-100 are freely available for academic and educational use, under a license that supports their wide use in research and development within the field of image recognition (licensing information can be found on the official site).‍Official Site: https://www.cs.toronto.edu/~kriz/cifar.html‍‍IMDB-WIKI‍Description: To address the constraints of small to medium-sized, publicly available face image datasets, which often lack comprehensive age data and rarely contain more than a few tens of thousands of images, the IMDB-WIKI dataset was developed. Utilizing the IMDb website, the creators selected the top 100,000 actors and methodically extracted their birth dates, names, genders, and all related images.‍In a similar vein, profile images and the same metadata were collected from Wikipedia pages. Assuming images with a single face likely depict the actor, and by trusting the accuracy of the timestamps and birth dates, a real biological age was assigned to each image. Consequently, the IMDB-WIKI dataset comprises 460,723 face images from 20,284 celebrities listed on IMDb, along with an additional 62,328 images from Wikipedia, bringing the total to 523,051 images suitable for use in facial recognition training.History: The IMDB-WIKI was created by researchers at ETH Zurich in 2015. It has not received regular updates since its initial release.‍Licensing: The MDB-WIKI dataset can be used only for non-commercial and research purposes (licensing information can be found on the official site).‍Official Site: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/‍‍Open Images Dataset by Google‍Description: The Open Images Dataset by Google is recognized as one of the largest and most detailed public image datasets available today. It is designed to support the wide variety of requirements that come with computer vision applications. Covering a vast range of categories, from simple everyday items to intricate scenes and activities, this dataset strives to exceed the boundaries of previous collections by offering an extensive array of detailed annotations for a broad spectrum of subjects.‍Integral to a host of computer vision tasks, including image classification, object detection, visual relationship detection, and instance segmentation, the Open Images Dataset is a treasure trove for advancing machine learning models. ‍Diving into specifics, the dataset includes:‍15,851,536 bounding boxes across 600 object classes,2,785,498 instance segmentations in 350 classes,3,284,280 annotations detailing 1,466 types of relationships,675,155 localized narratives that offer rich, descriptive insights,66,391,027 point-level annotations over 5,827 classes, showcasing the dataset's depth in granularity,61,404,966 image-level labels spanning 20,638 classes, highlighting the dataset's broad scope,An extension that further enriches the collection with 478,000 crowdsourced images categorized into over 6,000 classes.escriptionHistory: The Open Images Dataset by Google was initially released in 2016. The dataset has been updated regularly, with its final version, V6, released in 2020, including enhanced annotations and expanded categories to further support the development of more accurate and diverse computer vision models.‍Licensing: The annotations are licensed by Google LLC under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license. Both licenses support academic research and commercial use, promoting its application across a wide array of projects and developments in the field of computer vision.‍Official Site: https://storage.googleapis.com/openimages/web/index.html‍‍SUN Database: Scene Categorization Benchmark‍Description: The SUN dataset is a large and detailed collection created for identifying and categorizing different scenes. It is notable for its wide range of settings, from indoor spaces to outdoor areas, filling the need for more varied scene datasets as opposed to those focusing just on detection. The SUN Database aims to improve how we understand complicated scenes and their contexts by offering a wide variety of scene types and detailed annotations.‍This dataset is crucial for many computer vision tasks, such as sorting scenes, analyzing scene layouts, and object detection in various settings. It includes over 130,000 images covering more than 900 types of scenes, each with careful annotations to help accurately recognize different scenes.‍History: The SUN dataset was developed by researchers at Princeton University and Brown University and first released in 2010. Unlike some other datasets, the SUN Database has not been regularly updated since its initial release but remains a pivotal resource in the field of computer vision.‍Licensing: The SUN Database is distributed under terms that permit academic research, provided there is proper attribution to the creators and the dataset itself.‍Official site: https://vision.princeton.edu/projects/2010/SUN/‍‍ConclusionConcluding this article, we sincerely hope you found it helpful and that it enhances your research in model training and your daily computer vision tasks. If you haven't found exactly what you're looking for, please stay tuned and follow our social media channels. We plan to share our knowledge on how to create, annotate, and maintain your very own dataset tailored to your specific needs.‍Stay curious, keep annotating!‍‍Stay curious, keep annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Industry Insights & Reviews

April 17, 2024

10 Best Known Open Source Datasets for Computer Vision in 2024

Blog

Data annotation is key to training machine learning models, especially in computer vision. As the CVAT.ai team, we recommend using CVAT for annotation. However, once annotation is done, you'll need to export the data in a suitable format.‍ One of the most commonly requested formats is YOLOv8. While not directly supported by CVAT, there's a straightforward workaround that allows you to convert data from the COCO format (which CVAT does support) to YOLOv8, a format that supports polygons.‍In this article, we’ll show how you can get the annotations needed from CVAT in a few simple steps and then convert them into YOLO8. For this, we’ll use another intermediate format, capable of representing the same annotations - COCO. ‍Let’s start with annotation, for this article we use a fraction of Cats and Dogs dataset with two classes: cats and dogs, we’ve selected 10 random images. ‍‍For the purpose of this article, we've annotated the dataset with polygons. You can do this manually or use automatic annotation if you're on the paid plan.‍ After annotation was done, we’ve exported the annotations in COCO format and named the resulting JSON file coco_annotations.json.‍The annotations in our JSON file look like this:‍ { "id": 9, "width": 359, "height": 269, "file_name": "dog.4121.jpg", "license": 0, "flickr_url": "", "coco_url": "", "date_captured": 0 }, { "id": 10, "width": 200, "height": 297, "file_name": "dog.4123.jpg", "license": 0, "flickr_url": "", "coco_url": "", "date_captured": 0 } ], "annotations": [ { "id": 1, "image_id": 1, "category_id": 1, "segmentation": [ [ 479.0, 63.0, 471.0, 63.0, 463.0, 69.0, 460.0, 75.0, 460.0, 86.0, 450.0, 101.0, 425.0, 110.0, 415.0, 116.0, 398.0, 120.0, 392.0, 120.0, 390.0, 118.0, 389.0, 106.0, 386.0, 101.0, 385.0, 69.0, 381.0, 63.0, 372.0, 66.0, 345.0, 86.0, 333.0, 87.0, 326.0, 90.0, 309.0, 90.0, 295.0, 86.0, 286.0, 86.0, 283.0, 89.0, 283.0, 100.0, 289.0, 111.0, 289.0, 116.0, 283.0, 123.0, 276.0, 143.0, 264.0, 159.0, 250.0, 172.0, 223.0, 213.0, 206.0, 247.0, 192.0, 286.0, 186.0, 324.0, 191.0, 335.0, 190.0, 353.0, 197.0, 362.0, 218.0, 370.0, 237.0, 374.0, 257.0, 375.0, 277.0, 380.0, 293.0, 377.0, 296.0, 369.0, 292.0, 357.0, 307.0, 342.0, 314.0, 331.0, 323.0, 308.0, 323.0, 286.0, 325.0, 284.0, 330.0, 288.0, 333.0, 307.0, 337.0, 316.0, 342.0, 321.0, 355.0, 323.0, 360.0, 318.0, 363.0, 309.0, 360.0, 277.0, 353.0, 262.0, 339.0, 250.0, 343.0, 235.0, 354.0, 222.0, 364.0, 193.0, 372.0, 183.0, 392.0, 168.0, 409.0, 163.0, 427.0, 152.0, 445.0, 135.0, 475.0, 112.0, 483.0, 87.0, 483.0, 70.0 ] ],‍To convert annotations from COCO to YOLOv8 format, we'll use the official COCO Dataset Format to YOLO Format tool provided by Ultralytics. ‍Follow these steps to achieve the result:‍Let's get your COCO annotations organized just right. The snippet below shows the folder structure: ‍coco/ └── annotations/ └── coco_annotations.json‍Next up, you're going to create a .py file with the COCO snippet inside. Call it whatever feels right to you; we went with coco_to_yolo.py. Any text editor will do the trick, but Visual Studio Code is a solid choice. Here's a peek at what your file should look like:‍from ultralytics.data.converter import convert_coco convert_coco(labels_dir='annotations', use_segments=True)‍When your file is all set, place it right next to the COCO annotations folder. Your setup should look something like this:‍coco/ └── annotations/ └── coco_annotations.json coco_to_yolo.py‍Time to create a virtual environment! Here's how:‍Open PowerShell, and head over to your project folder with running cd command:‍cd path\to\your\project\directory‍Once you're in, create a virtual environment by running the following command:‍python -m venv venv‍Get that environment going with (you might need to allow scripts in PowerShell):‍.\venv\Scripts\activate‍If the command above for some reason doesn’t work, try the following:‍.\venv\Scripts\activate.ps1‍You'll know it's ready when you see (venv) before your command prompt.‍Now, let's install Ultralytics. Just type in:‍pip install ultralytics‍And wait; this part might take a little bit.All set? Great! Now, just run:‍python coco_to_yolo.py‍And there you have it—done! Your converted annotations are now neatly organized in the coco_converted folder, located alongside the other files. Please note that the coco_annotations part might have a different name in your case if you named the exported .json file differently.‍coco/ └── annotations/ └── coco_annotations.json coco_to_yolo.py coco_converted/ ├── images/ └── labels/ └── coco_annotations/ └── image1.txt └── image2.txt‍Inside this folder, you'll find a list of .txt files that have the same names as the images in the dataset.‍‍Each .txt file contains annotations in YOLO8 format.‍‍Now that your data is in the YOLOv8 format, you're ready to use it in your models. This opens up exciting new possibilities and improves your machine learning projects. Keep an eye out for our next articles. We'll go deeper into how to use these converted annotations in your models effectively.This is just the start of a journey toward more efficient, powerful, and flexible machine learning models.‍And that’s it for today. Let us know what you think? ‍Happy annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

March 28, 2024

CVAT.AI: Exporting Annotations from CVAT to YOLOv8 Format on Windows

Blog

Annotating data for machine learning is notoriously time-consuming, and striving for precision doesn't make it any easier. The industry—be it retail, automotive, medical imaging, or any other—doesn't change this fundamental need for quick, accurate data annotation.‍Now, let's explore a scenario (which might not be so hypothetical) where your ML model requires datasets annotated with masks. CVAT.ai has tool for it, but there's an even cooler feature that streamlines the process: annotation actions, particularly the shape converter.‍With this feature, you can initially use whatever annotation shape suits you best or is easiest for you—let's say polygons, for instance. This approach saves time, especially if you're more familiar with a specific tool or leveraging automatic annotation. Once you're done, you can easily convert all your annotations from masks to polygons with just a few clicks, ensuring both speed and accuracy in your work. You can also filter out and delete shapes that you do not need anymore.‍To see how this works in action, check out our latest video:‍‍The video covers the following topicsYou can use shapes converter in the retail sector. For example, ensuring that all products on shelves are monitored accurately for restocking requires uniform annotations. CVAT.ai allows quick conversion of different shapes to standardize data input, making it easier for models to learn and predict.In the automotive industry, getting the details right when marking street signs, pedestrians, and vehicles is crucial. CVAT.ai's shape converter assists in creating precisely annotated sets for these needs, enhancing the training of autonomous systems to navigate and understand the real world with greater accuracy.In the medical field, where the accurate analysis of diagnostic images can be a matter of life and death, CVAT.ai shape conversion tool allows flexible, precise annotations. This is vital for developing medical tools that can accurately diagnose conditions from medical imagery, enhancing research outcomes and patient care.ConclusionCVAT Annotations Actions simplify and improves the annotation process, making it faster and more efficient to prepare datasets for machine learning across various industries. This not only saves time but also improves the quality of data, leading to more reliable and effective AI models.‍Happy annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Tutorials & How-Tos

February 28, 2024

CVAT.ai Annotation Actions: Perform Bulk Actions on Filtered Shapes

Blog

Introduction to CroudsourcingAs dataset sizes grow, the demand for scalable and efficient data annotation methods increases. Crowdsourcing can be a solution, as it offers significant advantages like scalability and reduced costs but comes with challenges in management, communication, and technical requirements. ‍To address this, recently, we’ve introduced a crowdsourcing solution combining CVAT.ai and HUMAN Protocol, now available for use. ‍In the current article, we demonstrate the benefits of our approach through a real-world dataset annotation experiment, shedding light on its efficiency for potential users. This experiment also revisits key platform features and highlights the roles of crowdsourcing participants.‍When we’re speaking about crowdsourcing, there are the following key participants:‍Requesters: ML model developers, researchers, and competition organizers seeking precise data annotation.Annotators: Individuals eager to earn through data annotation, ranging from those seeking extra income to full-time professionals.‍If you're a Requester, CVAT and Human Protocol make the whole annotation process easy and automated for you — from setting up and managing tasks to checking the work and handling payments based on how well the job is done. To get an annotated dataset, you only need to create an annotation specification (how you want the data to be annotated), upload data, and set your quality and payment expectations. Our platform does everything else, giving you back a dataset that meets your required quality standards without any hassle.‍If you're an Annotator, starting to earn money is just a few simple steps away. Sign up, pick a task, and follow the clear instructions for annotating. We keep assignments short to fit your schedule and boost efficiency. Once you complete tasks, you'll receive your earnings (tokens) in your wallet after some time.Compensation for annotators is in cryptocurrency, necessitating a digital wallet. For Requesters there is also an option of making payments via a bank card. The funds are earmarked at task initiation and disbursed after the task is completed and validated.‍Now, we’re ready to look at our annotation experiment and investigate its outcomes.‍Why we did it?We conducted an experiment to test the efficiency of using a crowd for data annotation in real-world tasks. Our goal was to evaluate several factors:‍What's the time investment like?What quality level can we realistically achieve?How cost-effective is it?‍If you’re a Requester, understanding these factors is crucial to decide whether crowd-sourced annotation can be a good solution for your specific task, and whether it meets your needs for speed, cost, and quality.‍The DatasetFor our experiment, we chose the Oxford Pets dataset, a publicly available collection with approximately 3.5k images featuring various types of annotations such as classification, bounding boxes, and segmentation masks. While the dataset is moderately sized, it offers real-world, manually curated annotations for each image. Originally encompassing over 30 classes, we simplified our task to focus solely on two categories: cats and dogs. Our goal was to have annotators precisely mark the heads of these animals with tight bounding boxes, critical for applications designed to distinguish between different pet species.‍*In this context, a "class" refers to a category or type of object that the model is trained to identify. Each class represents a distinct group, such as 'cat' or 'dog', allowing the model to categorize images based on the characteristics defined for each class during training.‍The ExperimentWe recruited 10 random annotators without previous experience and closely monitored their performance. The primary goal was to reach a quality level of 80%, a benchmark that, while challenging, is crucial for the precision needed by machine learning models. This standard is a starting point that may need adjustment based on the specifics of your dataset. Achieving this level of quality is vital for ensuring the efficiency of machine learning models.To guarantee annotation accuracy, our system employs Ground Truth (GT) annotations, also known as Honeypot. GT is a small subset of a dataset, typically 3-10% depending on its size, used for validating annotations. Usually, datasets lack annotations initially, requiring GT to be annotated as a separate task and manually reviewed and accepted. Since we had original annotations for each image, we used them for the GT.‍To ensure accuracy and consistency in our study, we meticulously prepared task descriptions and selected 63 Ground Truth (GT) images (2% of the total dataset) to assess annotation quality. Annotators were assigned small batches of images for labeling. After completion, their annotations were automatically compared to the GT to evaluate accuracy. This process allowed us to systematically verify the quality of the annotations provided.‍Execution and ResultsSo let’s go back to the questions we’ve asked in the first part of the article and answer them one by one, based on the experiment outcomes.What's the time investment like?Our experiment revealed that high-quality annotations can be achieved, and they can be achieved without significant delays. Initially, we estimated that an experienced team of annotators would complete the dataset in 1-3 days, including validation and assignment management. Interestingly enough, for a team with no prior knowledge, the actual time taken was 3-4 days. Here we’ve excluded some necessary adjustments on our part, but included the temporary unavailability of some annotators.‍We see this as a highly positive outcome, as with such a setup, it is not necessarily obvious that the full dataset can be completed at all. In the future, learning from the mistakes and adjustments made during the first run, we are expecting to reduce the time required, bringing it closer to our original expectations. ‍What quality level can we realistically achieve?‍When it comes to the quality of crowd-sourced annotation, we always expect that the quality is going to be lower than one from a professional team. Meanwhile, our experiment delivered some promising insights. We set a high bar with an accuracy target of 80% (surely, it can be higher), aiming for the level of precision that machine learning models need to function reliably. We achieved this quality!The resulting annotation quality is decent. There are certainly errors of different kinds, but overall, the results definitely can be used for model training. ‍‍‍Note, that in our case the full annotation was available and we were able to confirm our statistical estimations. We can see that there is some quality drift on the full dataset compared to only the Ground Truth portion, but it is expected, as there were only 2% of the images in the Ground Truth set.‍We can also see that our annotation quality surpassed that found on MTurk, where it typically ranges between 61% and 81%. According to research on Data Quality from Crowdsourcing, our results align with the highest standards for annotation quality.‍This finding is crucial for anyone considering crowd-sourced annotation for their projects. It means that not only can you expect to get your visual data annotated affordably and swiftly, but you can also rely on the quality of the work to be good enough for training sophisticated deep learning models. ‍How cost-effective is it?‍Our examination of the cost-effectiveness of crowd-sourced annotation revealed that the expense of annotating a dataset with bounding boxes was remarkably low, costing only $0.02 per bounding box or image (a bit below the market price). This pricing strategy led to a total cost of $72 for the entire dataset, assuming most images featured just one object.‍Here's a simple breakdown of the pricing we used:‍Each task included up to 10 regular images that we paid for, and 2 Ground Truth (GT) images that were not paid for. Every image cost 2 cents, so each task cost 20 cents, adding up to $72 for all 3,600 images. This price setup meant we only paid for work that met our quality checks, ensuring you only pay for accurate annotations.‍In the system we use HMT, a cryptocurrency, for payments, which makes the whole process fast and smooth. We don’t use regular (fiat) money at all, but if the annotators wish, they can always convert the money received into any other cryptocurrency or fiat money‍This shows that using CVAT.ai and Human Protocol for crowd-sourced annotation is not just easy on your wallet but also effective, helping you get high-quality data labeled without spending a lot.‍Summary: was the approach feasible? Our experiment shows that crowdsourced annotation is both viable and effective, achieving desired quality with minimal deviation. We identified potential improvements, significantly reducing workforce management to just onboarding and technical support. All tasks, from recruitment to payment, were fully automated. We encourage both requesters and annotators to try our service, offering a streamlined, automated platform for high-quality data annotation tasks. If you need any help in the setting up the process, you can also drop us an email: contact@cvat.ai.here.Happy annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Product Updates

February 23, 2024

Crowdsourcing Annotation with CVAT and Human Protocol: Real Data Experiment Showed Amazing Results

Blog

We are happy to announce that CVAT has been accepted into Google Summer of Code 2024 (GSoC 2024)! This marks a significant milestone for our project, highlighting our dedication to fostering innovation and collaboration in the computer vision and machine learning fields.‍‍What is GSoC?‍GSoC is a Google initiative that offers a unique opportunity for students and IT enthusiasts around the world to contribute to open-source projects while earning a stipend. It’s an exciting chance to work closely with mentors, develop technical skills, and contribute to projects that make a real impact. For more information, see GSoC FAQ.‍How Can You Join CVAT.ai as a GSoC Contributor?‍Becoming a GSoC contributor under CVAT involves a few crucial steps. Here’s a simplified roadmap:‍Explore CVAT Platform to get insights: Start by understanding CVAT and its objectives. Check the CVAT GitHub page (don’t forget to give us a star!) :). Read the Documentation, specifically the contribution guide.‍And we have a great CVAT.ai Youtube channel, highly recommended!Connect with the Community: If you have questions left, you are welcome to ask them in our Google group! Or you can use contacts mentioned on the CVAT GSoC Page:Prepare Your Proposal: Write a detailed proposal outlining your project idea, objectives, timeline, and how it aligns with CVAT’s tasks. Seek feedback from potential mentors and the community. To connect to the community, please join our Google Group.If needed, share your resume and other details: cvat-gsoc-2024@cvat.ai. Please note that this is NOT a formal application. You have to apply directly to GSoC (see next step).Mentors will review your application and in case of approval will contact you via email for further discussions, possibly including a video call and if everything is ok, you can proceed with the formal application to GSoC. Please note that while receiving a mentor's endorsement to formally apply to GSoC is encouraging, it does not secure a placement. Conversely, lacking such approval significantly diminishes one's prospects of acceptance.Submit Your Application: Follow GSoC’s guidelines to submit your application. Ensure it reflects your passion and commitment to open-source development.Contribute: While waiting for the selection results, start contributing to CVAT. Bug fixes, feature enhancements, or documentation improvements can all be good starting points.‍Why Join CVAT for GSoC 2024?‍CVAT is not just a tool; it’s a community-driven project aimed at solving real-world computer vision challenges. By joining us, you’ll gain practical experience, mentorship from industry experts, and the chance to contribute to a project used by researchers and companies worldwide.‍Conclusion‍CVAT’s inclusion in GSoC 2024 is more than just an opportunity for growth; it’s a call to action for young developers passionate about driving innovation in open-source software. Ready to make a difference? Join CVAT and take your first steps towards becoming a GSoC contributor!‍Happy annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Company News

February 22, 2024

CVAT Joins Google Summer of Code 2024!

Blog

Industry Insights & Reviews

February 8, 2024

CVAT vs. Label Studio: Which One to Choose?

Blog

In the realm of digital annotation, not all projects are created equal. Some are straightforward, while others present intricate challenges – like the annotation of crowded images with overlapping objects. Our latest video tutorial delves into this complex task, demonstrating effective strategies using CVAT's advanced features.‍‍‍We tackled two types of tasks: one with still images and another with video conten to demonstrate CVAT's capabilities in managing complex scenarios. This groundwork allows us to showcase how CVAT's functionalities can be leveraged to bring order and clarity to crowded scenes.‍In the image annotation task, we encountered a scenario with numerous overlapping rectangles, posing the challenge of differentiating between objects. By adjusting settings like 'color by instance' and 'Selected opacity,' we were able to enhance visibility and distinguish each object with ease.‍Transitioning to video annotation, we explored how to efficiently track objects across multiple frames, even amidst crowded scenes. Key features like the 'Switch hidden property' and filtering options were used to maintain focus on specific objects, simplifying the tracking process.‍While these tips might seem straightforward, their impact on the accuracy and speed of annotation is significant. We invite you to watch our tutorial to see these strategies in action and experience the efficiency of CVAT firsthand.‍Happy annotating!Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Tutorials & How-Tos

February 6, 2024

Tips on how to annotate overlapping objects with CVAT

Blog

Industry Insights & Reviews

January 31, 2024

Best Artificial Intelligence and Computer Vision Conferences of 2024

Blog

In the realm of computer vision, handling large-scale datasets can be a challenging task. That's where CVAT.ai steps in, offering a streamlined approach for simultaneous annotation and task distribution among several annotators. Our latest video provides hands-on guidance for teams and organizations eager to accelerate their annotation process with CVAT. ‍‍Consider the scenario of leading an organization swamped with a vast array of images and videos needing annotation. The efficiency of your project hinges on effectively sharing these tasks among your team members. It all starts with creating a project in CVAT and carefully adding the necessary labels, but then what? The real question is, how do you ensure that the workload is evenly split and managed efficiently?‍CVAT.ai offers a solution by allowing you to segment your image dataset into distinct parts, with each segment assigned as a separate job. This method enables team members to work on different segments at the same time, thereby greatly enhancing the speed of the annotation process.‍Our guide dives into the nitty-gritty of optimizing your workflow in CVAT.ai, highlighting the best practices for dividing annotation tasks in a collaborative setting. This approach not only fosters efficient team collaboration but also ensures quick turnaround times for your computer vision projects.‍To gain more insights into efficient data annotation and task distribution in CVAT, make sure to watch our video. And if you find it helpful, don't hesitate to like, subscribe, and share. Stay tuned for more tips and techniques to streamline your image annotation processes in the field of computer vision.‍‍Happy annotating!‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

January 25, 2024

Simultaneous Annotation in CVAT.ai: How to Distribute Dataset Among Several Annotators?

Blog

Computer vision has become an integral part of various industries, from autonomous vehicles to medical imaging. To train robust and accurate computer vision models, high-quality labeled datasets are essential; The open-source image annotation tools have emerged as powerful solutions to address this need. Such tools not only offer cost-effectiveness but also provide collaborative platforms for data labeling. In this article, we will explore the best open-source image annotation tools in 2024.‍Computer Vision Annotation Tool (CVAT) CVAT Annotation Interface‍Computer Vision Annotation Tool (CVAT.ai) is an open-source video and image annotation tool, well-regarded in the computer vision community. It supports key supervised machine learning tasks like object detection (supporting also 3D point cloud data), image classification, and image segmentation. CVAT is celebrated for its user-friendliness, comprehensive manual and automatic annotation features, collaborative capabilities, strong community support. CVAT also has a huge number of learning materials to dig into the tool on YouTube, and official documentation. ‍Additionally, CVAT enables users to try its features online via the CVAT.ai Cloud platform, allowing access without local installation. The online platform facilitates all the features available at open source and even more powerful capabilities to enhance the annotation process in a web-based environment, offering a convenient, accessible way to explore CVAT and assess its fit for specific annotation projects.‍To sum it up: CVAT distinguishes itself as a highly comprehensive and user-friendly open-source annotation tool, making it a preferred choice for individual researchers and organizations. developers‍Pros:‍Advanced annotation capabilities to label tags, rectangles, polygons, polylines, ellipses, points, binary masks, skeletons and 3D cuboids, including many automated features to accelerate the process.Enables collaborative, role-based work with multiple users.Features built-in annotation review mechanisms and automatic quality control features based on ground truth annotationsSupports integration with popular data storages, like AWS S3, Microsoft Azure, Google CloudProvides a lot of learning materials and documentationOffers extensive support for 24+ popular annotation formats. Benefits from regular updates and strong community support.CVAT.ai Cloud platform does not require any technical knowledge for set up, it is ready to use. ‍Cons:‍Self-hosted solution requires a relatively high level of technical expertise for setup and configuration.Processing of the large datasets might require additional time.‍LabelMe‍LabelMe Annotation Interface‍LabelMe is an open-source annotation tool for digital images, developed by the MIT Computer Science and Artificial Intelligence Laboratory in 2008. This freely accessible platform allows users to annotate images and contribute to its expanding dataset library.‍It's designed to support various computer vision research and development projects, offering a collaborative environment for image labeling and dataset creation. LabelMe is recognized for its user-friendly interface and its significant contribution to the computer vision community, facilitating accessible data for research and application development.‍Pros:‍Features a simple, intuitive user interface.Supports different annotation primitives including polygon, rectangle, and point.Offers an easy installation and setup process.Provides the ability to export annotations in multiple formats.‍Cons:‍Manual installation and setup are required.The potential lack of frequent updates and maintenance might result in compatibility issues with newer technologies.‍LabelImgLabelImg Annotation Interface‍LabelImg is a graphical image annotation tool designed for drawing bounding boxes around objects in images. ‍LabelImg is developed using Python and Qt, making it versatile and accessible across multiple operating systems including Windows, Linux, and macOS. This tool is useful for tasks in machine learning and computer vision that require precise object localization within images. Its compatibility with various platforms and ease of use for bounding box annotations make LabelImg a popular choice in the image annotation community.‍Pros:‍Lightweight and straightforward for deployment.Supports both bounding box and polygon annotations.Efficiently integrates with popular deep learning frameworks.Compatible with multiple platforms, including Windows, Linux, and macOS.‍Cons:‍Annotation capabilities are more limited compared to other tools.Lacks advanced features such as collaborative options and support for various annotation types.‍Label Studio‍‍‍Label Studio stands out as a comprehensive and adaptable open-source tool for data labeling. It caters to a variety of projects and users, handling diverse data types seamlessly on a single platform. The tool excels in offering a range of labeling options across different data formats and integrates smoothly with machine learning models. This integration enhances the efficiency and accuracy of the labeling process by providing predictive labeling and supporting ongoing active learning. Its modular design allows for easy integration into existing machine learning workflows, offering versatility for various labeling requirements. For more details, Label Studio's website provides extensive information.‍Pros:‍Supports a variety of projects, users, and data types on a single platform.Enables diverse types of labeling across numerous data formats.Integrates with machine learning models for label predictions and active learning.Offers an enterprise cloud service with advanced security, team management, data analytics, reporting, and SLA support.‍Cons:‍Requires technical knowledge for setup and usage.May not be ideal for smaller-scale projects.Might not be the easiest option for those seeking minimal setup and ease of use.‍Imagetagger‍Imagetagger Annotation Interface‍‍Imagetagger is an open-source image annotation tool that allows users to label images for object detection and image segmentation. It is written in JavaScript and is available for Windows, Linux, and macOS.‍Pros:‍User-friendly interface for quick annotation.Supports polygon and bounding box annotations.Easy integration with existing workflows.Export annotations in popular formats.‍Cons:Limited documentation and support resources.May have performance issues with large datasets.‍Deeplabel‍Deeplabel Annotation Interface‍Deeplabel is an open-source image annotation tool that allows users to label images for object detection and image segmentation. It is written in Python and is available for Windows, Linux, and macOS.‍Pros:‍Supports various annotation types, including bounding boxes, polygons, and keypoints.Customizable interface and workflow.Integration with popular deep learning frameworks.Active development and community support.‍Cons:‍Requires a certain level of technical expertise to use effectively.Lack of a graphical user interface may be less user-friendly for some users.‍Image annotation comparative table‍Image annotation comparative table‍In conclusion, the landscape of open-source image annotation tools in 2024 offers a diverse range of options tailored to different needs in the field of computer vision. From CVAT's advanced capabilities and robust community support to LabelImg's simplicity and multi-platform compatibility, each tool presents unique features and advantages. The choice of the right tool ultimately hinges on the specific requirements of your project, the scale of operations, and the desired ease of use. Whether you're an individual researcher or part of a larger organization, these tools provide cost-effective, flexible solutions to effectively label data, a critical step in developing accurate and efficient computer vision models. This array of tools underscores the dynamic nature of technology in the realm of AI and machine learning, offering promising avenues for innovation and progress.‍Stay abreast of the latest tools and techniques in the fast-evolving field of computer vision. ‍Happy annotating!‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Industry Insights & Reviews

January 21, 2024

Best Open-Source Image Annotation Tools in 2024

Blog

In a world where artificial intelligence (AI) is expanding at an unprecedented pace, the demand for accurately labeled data is at an all-time high. CVAT.ai is a well-known open-source platform in the data annotation field, specifically designed for visual data annotation tasks.‍HUMAN Protocol, on the other hand, is an innovative framework that facilitates job markets on the blockchain. It connects humans with machine-based requests, allowing for secure, decentralized job completion. By integrating with CVAT.ai, HUMAN Protocol unlocks the potential for a global workforce to contribute to data annotation projects, ensuring quality and scale like never before.‍Together, CVAT.ai and HUMAN Protocol are carving a new path in visual data annotation.‍Mastering Image Annotation with CVAT.ai and HUMAN Protocol‍Old-school ways of labeling data — using either your team or people from the crowd — are hitting their limits. They're often too expensive, not good enough, and they don't scale up well, which can lead to missing important deadlines. It's clear we require a big change.‍That's where the game-changing partnership between CVAT.ai and HUMAN Protocol comes in. We are shaking things up with a new solution that uses blockchain technology. By using “smart contracts”, this collaboration spreads out the workload among freelance annotators around the globe and makes sure everything runs smoothly.‍To learn more about the technical aspects, feel free to check our last article about Mastering Image Annotation with CVAT.ai and HUMAN Protocol.The reach of this project is huge, as it will connect with millions of workers all over the world, making sure big projects get done on time. This initiative transcends mere efficiency; it's about guaranteeing timely completion and changing the approach to managing project timelines. Moreover, it promises to make a significant impact on the industry by accurately annotating data, thereby preventing AI hallucination and enhancing overall system reliability.‍‍CVAT.ai and HUMAN Protocol aren't just joining the market; they're setting new rules for how visual data should be labeled. This move is changing the game, leading us to an exciting place where tech smarts and human skills come together to do amazing things.‍Remember to like, share, and subscribe for more updates!‍‍Happy Annotating!‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Company News

November 15, 2023

CVAT & HUMAN Protocol: A New Dawn in Visual Data Annotation

Blog

IntroductionImplementing annotation crowdsourcing right can be a challenging task, especially for Computer Vision problems. While crowdsourcing offers huge benefits, such as unlimited scaling and low overhead, it also has fundamental challenges, such as low resulting annotation quality, difficult validation, and overwhelming assignment management. Now, we’d like to introduce our solution for annotation at scale, based on CVAT.ai and HUMAN Protocol integration. Read on to learn how you can use it for your datasets and for annotation.‍Overview‍The solution is an online platform based on HUMAN Protocol services, which uses CVAT for data annotation and Web3 technologies for payments in crypto currencies.‍On the platform, we bring together 2 key user roles: ‍Requesters - individuals who need their data annotated. AI model delevelopers, researchers, and AI challenge organizers all fall into this category.Annotators, workers - people who want to earn money by annotating data. This category includes people looking for a side hustle, freelancers, and professional data annotators.‍For Requesters, the platform provides automatic data annotation assignment preparation and management, automatic annotation validation and merging, and payments based on the annotation quality. Together these features create a place where you can just ask for an annotated dataset - describe the task, provide the data, quality requirements, and reward details - and get the data annotated with the specified quality after some time. No more manual control and management is needed. Keep in mind that currently the platform supports only several Computer Vision task types, but it can be extended in future.‍For Annotators, the platform is a place where they can start earning money in just several clicks. All they need to do is register, select a task, and ask for an assignment. Once they get an assignment, they can learn annotation requirements by reading the attached annotation guide, and start annotating - using one of the most popular open-source annotation tools in the Computer Vision community. Each assignment is small enough to be completed in a matter of minutes, allowing flexible participation for the workers. ‍Bounty payments are based on cryptocurrency; so annotators are required to have a crypto wallet to get the money. For requesters, however, there is also an option to pay with a bank card. Funds for payments are reserved at the time of task creation and are disbursed automatically after the entire dataset has been annotated and validated.‍Quick startIf you want to get your dataset annotated‍Before initiating a new annotation task, please be aware of the following platform requirements:‍A crypto wallet and a configured browser extension for wallet accessAn Amazon S3 data bucket configured for public accessA dataset with images in common formats (.jpg, .png, …), with at least 2 imagesSupport for only two task types: bounding box and single point annotations‍Examples:‍‍‍‍‍Only the MS COCO Object Detection / Keypoint Detection dataset format is supported, which is one of the most widely used formats in the industry. Note that points are encoded as 1-keypoint skeletons in the COCO Keypoint Detection formatAnnotations for ground truth (validation) images should be in the COCO .json file format‍Please keep in mind that simpler tasks tend to be easier and quicker to annotate. If your annotations require complex work, it can be a good idea to split the whole task into simpler subtasks, and annotate them separately for better quality and efficiency.‍Now, let’s go over the steps required to start an annotation task.‍ 1) Create a crypto wallet using one of the corresponding applications (e.g. MetaMask)‍Example for MetaMask in Chrome:‍ 1. Install the browser extension. 2. Open the extension page in the browser, create a password. 3. Select the Polygon Mainnet network or add it from the list. 4. Use an existing ETH-based wallet or click Add an account in MetaMask. 5. Rename the account to “Job Requester” or any other name you like. 6. Now, you'll have a wallet (ETH-based) in the network. It should show 0 MATIC on the balance. 7. Transfer some MATIC from another ETH-based wallet or buy (e.g. $5). Make sure to buy in the Polygon Mainnet network. 8. Click Import Tokens. Add the HMT token: Id: 0xc748B2A084F8eFc47E086ccdDD9b7e67aEb571BF Name: HMT‍ 9. Now, it should also show 0 HMT on the balance. 10. Convert some MATIC into HMT‍Now that you can perform payments in cryptocurrencies, it’s time to prepare your data and create an annotation task. ‍ 2) An AWS S3 bucket with public access is required. Сreate an AWS S3 bucket and upload your images as separate files into the bucket. Make sure to note the link to the directory containing the files within the bucket. You can obtain this URL by clicking Copy URL (the directory URL is needed).‍Example: 3) Select a small validation subset from the original images (e.g., 3%, 30 images from 1000), annotate it, and upload annotations in the COCO format to the bucket. You will only need a .json file. You can prepare such a dataset using CVAT or other tools. Remember to note the URL to this file as well.‍Example: ‍‍‍‍‍‍‍‍‍‍Using 0.1-5% of the images is recommended, depending on the dataset size. It is recommended to select just random images from the whole dataset. These images will appear randomly during the annotation process to check the annotation quality. The annotations produced by workers for these images are not included in the resulting annotated dataset; instead, the ground truth annotations are used.‍Please note that only 1-keypoint skeletons are supported for the COCO Keypoints format.‍ 4) Proceed to the platform and register a new account, if you don’t have one. 5) Click Create a Job.‍ 6) Select the payment method you prefer. You can pay via a crypto wallet or with a bank card. 7) Once the payment method is selected, you’ll see the job configuration page. Please select the CVAT job type and the Polygon Mainnet network.‍‍Let’s consider the fields one by one:Type of job: Bounding Boxes / PointsDescription: brief task description in 1-2 sentencesLabels: the list of label names (classes, categories) to be used during annotationData URL: a link to the bucket directory with dataset imagesGround truth URL: a link to the bucket file in .json format with GT annotations in COCO formatUser guide URL: a link to the full task description document with public access. Such documents describe annotation rules for the workers. You can share this document via the same S3 bucket, Google Docs or other similar servicesAccuracy target: the required accuracy target for annotations. The typical value range is 50-95% (keep in mind that normal human image classification accuracy is ~94%, and values from the 75-93 range are the most applicable. The metric used is IoU). 8) At the next step, you’ll need to enter the bounty details for workers. The resulting value is the price for the whole dataset. The bounties for the annotators will be calculated automatically based on the number of images annotated.‍Here you can choose to pay in crypto currency or with a bank card. The money will be reserved until the whole dataset is annotated.‍ 9) Once you’re ready, click Pay now.‍ 10) If everything is set up correctly, after the next several minutes, the job will appear in the job list. ‍In the list, you can check the current annotation status and details of the jobs you have created. Now, the annotators will look for assignments and begin their work. The process is fully autonomous for you, so you will only need to wait for the process to finish. It can take some time, depending on the current market prices, available worker pool, and other conditions. The workers will be able to join the task and get an assignment via this platform link.‍If, at some point, you decide to cancel a job, here you can find a button for this. The money you reserved for the work will get back to you.‍‍‍ 11) Once the data is annotated, the final annotations will be validated and merged automatically, and the result will be available by the URL you receive in the job details section.‍If you want to earn money with image annotation and video annotation‍1. Create a crypto wallet (check the explanation in the requester part of this guide). 2. Go to the annotator platform and register. You’ll need to pass a mandatory KYC procedure to finish the registration process, this is required by the applicable laws. 3. Currently, we require users to explicitly request participation in CVAT labeling. Please email us at app@humanprotocol.org to express your interest.4. On the platform, open the list of available CVAT annotation tasks:‍‍You will see the list of open annotation tasks, and find assignment details such as bounty per assignment, brief task description, the size (i.e. the number of images) to be annotated, and the annotation type (bounding boxes. points, etc.):‍5. To join a task, press the button on the right side of the task entry:‍‍Once it is done, switch to the “My Jobs” tab, and you should see your assignment for this task:‍6. To start annotation, press the “Go to job” button in the “Job URL” column‍You will be navigated to the CVAT job, where you can draw annotations.‍7. Make sure to check the job requirements by clicking on the Guide button:‍‍‍‍‍‍8. Draw annotations as required in the task, and move between the job frames:‍‍‍9. When all the job frames are finished, click Save.10. Switch the job state to Completed.Implementation details‍In this section, we’ll discuss several implementation details to give you a clearer understanding of how everything works together. At the heart of the project lie three key components:‍A tool that allows annotators to create high-quality annotationsA platform that manages assignments, tasks, and dataA contract that specifies the job, acceptance criteria, and bounty‍All these components contribute to solving the crowdsourcing platform challenges, improving the overall efficiency of the system, and providing an effective solution, when combined. Now, let’s look closer at the specific problems and solutions we implemented in the platform.‍Task creation and assignment management. The platform handles the datasets and worker assignments in an automatic way, which is measurable, consistent, and reliable. The platform splits the dataset given into small chunks (the assignments), allowing to annotate data efficiently in parallel. Each annotator gets a fixed amount of work in an assignment, and each assignment can be validated with no human interaction. As there is no human factor in the assignment creation or management, the process of getting an assignment is quick and is protected from occasional errors. ‍Assignment validation. Once an assignment is annotated, it goes to validation. There are several validation strategies used in the industry, including consensus scoring, honey pots/ground truth, model validation, etc. In the current implementation, we decided to use ground truth-based scoring. With this approach, the task requester is required to provide a small number of ground truth annotations during task creation. Each assignment includes several images from the validation set so we can always check the quality of the annotations in the assignment. Then, we extrapolate the resulting quality for the whole assignment. If the assignment quality is below the required level, the annotations are discarded, and the assignment is sent for reannotation.‍‍Payments. When the task is created, the requester configures the bounty and the money is reserved from their wallet. The platform relies on Smart Contracts in blockchain to guarantee fair payments and to allow cancellation of tasks. Each contract gets a clear definition of the work required to receive the bounty, which can be checked automatically. The workers are rewarded for each assignment completed, after the whole dataset quality is accepted during validation.‍‍‍‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

Tutorials & How-Tos

November 13, 2023

Mastering Image Annotation Crowdsourcing for Computer Vision with CVAT.ai and HUMAN Protocol

Blog

If you're on the hunt for an integrated solution to boost your machine learning models with top-notch annotated data, your search ends here. Voxel51 and CVAT.ai have teamed up to offer an integration that's set to transform your approach to data annotation and analysis.‍Join us in this concise, step-by-step tutorial where we'll show you how to effortlessly transfer data from Voxel51 to CVAT, annotate it effectively, and then export the annotated results back to Voxel51.‍‍Remember to like, share, and subscribe for more updates!‍Happy Annotating!‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

Product Updates

November 9, 2023

Optimize Your Data Annotation and Analysis Workflow with CVAT.ai and Voxel51 integration

Blog

Computer Vision Annotation Tool (CVAT) has been a game-changer for many businesses and researchers in the field of computer vision. Its intuitive interface and powerful features have made it a go-to solution for annotating images and videos. While the self-hosted version of CVAT has its merits, there's a compelling case to be made for transitioning to CVAT.ai Cloud with its paid plans.‍Here's why:‍Scalability and Performance‍One of the most significant advantages of CVAT.ai Cloud is its scalability. With the self-hosted solution, you're limited by your own infrastructure. As your projects grow, you will find yourself investing more in hardware and maintenance that, in the end, will result in bigger expenses. With CVAT.ai Cloud, you can easily scale up or down based on your needs, ensuring optimal performance without the hassle of managing the infrastructure yourself.‍Reliability and Uptime‍CVAT.ai Cloud is designed with a robust infrastructure, aiming to provide consistent availability for your needs. This means you can rely on the platform to be available whenever you need it. With a self-hosted solution, you're responsible for ensuring uptime, which can be challenging especially if you don't have a dedicated IT team.‍Enhanced Security‍Data security is paramount for us, especially when dealing with sensitive information. CVAT.ai Cloud is encrypted end-to-end. We perform automated as well as manual security audits, and are compliant with data protection laws. This ensures that your data is protected from potential threats and leaks.‍Automatic Updates and New Features ‍With CVAT.ai Cloud, you'll always have access to the latest features and updates without having to worry about manual upgrades or potential compatibility issues. This not only saves time but ensures that you're always using the most advanced and efficient version of the platform.‍Dedicated Support‍Paid plans on CVAT.ai Cloud come with dedicated customer support. This means that if you ever run into issues or have questions, you can quickly get the help you need. With a self-hosted solution, you're largely on your own unless you have an in-house expert (exception: if you’re a CVAT.ai Enterprise customer you get even more care from our support organization). ‍Collaboration and Accessibility‍CVAT.ai Cloud is accessible from anywhere there is an internet connection, making collaboration easier. Team members can work on projects from different locations, ensuring continuity and efficiency. With a self-hosted solution, remote access might require additional setup and could pose security risks.‍Paid features exclusive to CVAT.ai Cloud‍CVAT.ai Cloud stands out with its exclusive paid features, ensuring a seamless and enhanced user experience. ‍One of the standout offerings is Single Sign-On (SSO), which simplifies user management by allowing users to access multiple applications with a single set of credentials. This not only enhances security but also streamlines the user experience. ‍Furthermore, CVAT.ai Cloud's integration with platforms like Roboflow and Hugging Face elevates its capabilities. Users can effortlessly tap into the power of state-of-the-art machine learning models from Roboflow and Hugging Face for preprocessing and augmenting datasets. ‍In essence, choosing CVAT.ai Cloud over self-hosted solutions means opting for convenience, advanced features, and a future-proof annotation environment.Conclusion‍While the self-hosted version of CVAT has served many users well, the benefits of switching to CVAT.ai Cloud with paid plans are undeniable. From scalability and performance to enhanced security and support, CVAT.ai Cloud offers a comprehensive solution that can cater to the evolving needs of businesses and researchers in the field of computer vision.‍If you're looking for a hassle-free, efficient, and secure annotation tool, it might be time to make the switch to CVAT.ai.‍‍Remember to like, share, and subscribe for more updates!Happy Annotating!‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

October 17, 2023

Improve your Workflow: Switch from CVAT.ai Free Self-Hosted Solution to CVAT Online with Paid Plans

Blog

Data annotation is a critical step in the development of machine learning models. However, manual annotation can be time-consuming specifically for big datasets. What if you could automate this process and therefore make it faster? And with any model? ‍In CVAT.ai it is possible with CVAT CLI. But before diving into the technical details, let’s set up the basic understanding. ‍The CVAT CLI leverages Software Development Kit (CVAT SDK) to auto-annotate, or pre-annotate your dataset, allowing you to focus more on model development and less on data preparation. ‍The SDK enables you to incorporate functionalities from a variety of machine learning libraries. Including torchvision, but you can use others. The SDK provides you with a range of options for automated annotation, also known as Auto-Annotation (AA) functions.‍What are AA Functions?‍Auto-Annotation, or AA, functions, are Python objects designed to perform specific annotation tasks. These functions translate your raw data into annotations. ‍A typical AA function generally includes the following components: Code to load the machine learning model.A specification outlining the types of annotations that can be generated.Code to transform CVAT data into a format the machine learning model understands.Code to run the model to obtain predictions.Code to convert predicted annotations back into a CVAT-friendly format.‍The CVAT SDK is built on a layered architecture comprising several parts:‍The Interface: Defines the protocol that any AA function must implement.The Driver: Manages the execution of AA functions and performs the actual annotation on the CVAT dataset. Predefined AA Functions: Includes a set of predefined functions.‍This is just a glimpse; the following article will walk you through the steps and specifics to get you started on your automated annotation journey.‍There are two ways to auto annotate using the CVAT CLI:‍Annotating with predefined Auto-Annotation Functions in CVAT SDK.Annotating with your own Auto-Annotation Function‍Before starting the annotation‍Before starting the annotation process let’s set up a task in CVAT Cloud. In this case it is a simple dataset with animals and labels “cat” and “dog”.‍CVAT screen with image for annotation‍For both cases, first we need to create an environment where we could run the function. Let's begin by installing a few Python packages on the local machine. Please note that commands might vary for different operating systems. For the sake of this article, all commands that we use are for Windows.‍Run the following command:‍python -m venv venv‍When the virtual environment is ready, you will need to activate it:‍.\venv\Scripts\Activate.ps1‍The next step is to install the CVAT.ai CLI. Execute the command and wait for the installation to complete.‍pip install cvat-sdk[pytorch] cvat-cli‍To allow CVAT CLI access to CVAT, you'll need to store your CVAT password in the PASS environment variable. We'll utilize the Read-Host command here to prevent the password from being displayed.‍$ENV:PASS = Read-Host -MaskInput‍Enter your CVAT password and hit Enter. Now you are ready to run the automatic annotation.‍Easy Guide to Using Predefined Auto-Annotation Functions in CVAT SDK ‍You can auto-annotate with its two functions that utilize models from the torchvision library.The CVAT SDK includes two predefined AA functions. Each function is implemented as a module to allow usage through the CLI auto-annotate command. ‍After you’ve installed Python and environment is ready, run an Automatic Annotation from CLI we will use the following command:‍cvat-cli auto-annotate "<task ID>" --function-module cvat_sdk.auto_annotation.functions.torchvision_detection \ -p model_name=str:"<model name>" ...‍Let’s come back to the task that was created earlier. To run the function you will need a host, task ID, and username. For the model name check the torchvision documentation. In the example below we’ll use fcos_resnet50_fpn.‍The score_thresh=float:0.7 parameter is used to specify the threshold for object detection confidence scores. In this case, it's setting the confidence score threshold to 0.7, meaning that only object detections with a confidence score greater than or equal to 0.7 will be included in the results of the auto-annotation process. Objects with lower confidence scores will be filtered out. ‍CVAT screen showing where to get all parameters‍‍With these elements added to the command, you will get the following result:‍cvat-cli --server-host app.cvat.ai --auth mk auto-annotate 274373 --function-module cvat_sdk.autocvat_sdk.auto_annotation.functions.torchvision_detection -p model_name=str:fcos_resnet50_fpn -p score_thresh=float:0.7 --allow-unmatched-labels ‍Where app.cvat.ai is the host, 274373 is the task ID, and mk is the username.‍By default, the CLI will check that every label that the function can output exists in the task. In this case, our task only has "cat" and "dog" labels, while the function can output 80 labels in total, --allow-unmatched-labels tells the CLI to ignore all labels that don't exist in the task.‍It's a good practice to start with a clean state. So if there are any annotations that were done before, you can add –-clear-existing option the command, that will clear all existing annotations. ‍The annotation will start. Wait until it’s over, then go back to the task. You might need to refresh the page for annotations to be visible.‍CVAT annotated image‍It's time to check the quality. Go through the dataset to ensure that the annotations meet your requirements.‍How to Auto-Annotate Your Dataset with Model of Choice and the Command Line Interface‍The second method is when you use the auto-annotation feature not with predefined functions but with any model of your choice. In this guide, we'll walk through using YOLO v8 for auto-annotation via the Command Line Interface (CLI). Here is the task that will be annotated:‍CVAT with image to be annotated‍When the environment is ready, you can run a model function. Something like this:‍import PIL.Image from ultralytics import YOLO import cvat_sdk.auto_annotation as cvataa import cvat_sdk.models as models _model = YOLO("yolov8n.pt") spec = cvataa.DetectionFunctionSpec( labels=[cvataa.label_spec(name, id) for id, name in _model.names.items()], ) def _yolo_to_cvat(results): for result in results: for box, label in zip(result.boxes.xyxy, result.boxes.cls): yield cvataa.rectangle(int(label.item()), [p.item() for p in box]) def detect(context, image): return list(_yolo_to_cvat(_model.predict(source=image, verbose=False))) open_in_new MORE content_copy COPY @cvataicode at thiscodeWorks.com‍To move to the next step, you'll need to install the Ultralytics library, which houses the YOLO models. To do it, execute the following command. Wait for the installation to finish.‍pip install ultralytics‍It's a good practice to start with a clean slate. For this purpose, the –-clear-existing option is added to the command, which will clear all existing annotations. ‍Note that you’ll need to specify the path to the file implementing the function. ‍You can also exclude labels that you don't need.‍Here’s how you'd run the command in the CLI:‍cvat-cli --server-host app.cvat.ai --auth mk auto-annotate 274373 --function-file .\yolo8.py --allow-unmatched-labels –-clear-existing‍Press Enter and wait for the auto-annotation by YOLO8 to be accomplished.‍Once the auto-annotation is complete, it's time to check the quality. Go through the dataset to ensure that the annotations meet your requirements.‍CVAT annotated image‍There you have it! Now you know how to use any model, including YOLO v8, to auto-annotate your dataset via the CLI. Using auto-annotate can save you a tremendous amount of time and help you achieve consistent annotation across your datasets. If you have more questions, please see Auto Annotation documentation.‍And check the video to see the full process:‍‍Remember to like, share, and subscribe for more updates!Happy Annotating!‍‍Not a CVAT.ai user? Click through and sign up here‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

Tutorials & How-Tos

October 5, 2023

An Introduction to Automated Data Annotation with CVAT CLI

Blog

Blog

Image Annotation and Video Annotation play a vital role in various fields; from computer vision to machine learning. These annotations enable us to understand and interpret visual content. But what if we could add even more meaning to these Annotations? That's where CVAT.ai Label Attributes come into play! In this article, we'll explore how CVAT.ai Label Attributes add an additional semantic layer, enhancing the data via a more specific set of annotations that will improve context and insights.‍What is CVAT.ai?‍CVAT.ai is a powerful computer vision annotation tool used to mark objects and define their boundaries in images and videos. It simplifies the process of creating annotated datasets, making it easier for machines to understand visual data.‍CVAT.ai Attributes‍Imagine taking the already efficient CVAT annotation process to a whole new level with CVAT.ai Label Attribute. These Attributes are like descriptive tags or labels that provide additional information about each annotated object. They act as tiny pieces of context, guiding both humans and machines towards better comprehension.‍Using CVAT.ai Label Attributes is as simple as it gets. For each annotated object, you can assign relevant Attributes to add more meaning to the Annotation. For instance, if you are annotating Traffic Signs, you can tag the primary Label, Traffic Signs, with attributes like "Stop Sign," "Yield Sign," or even "Damaged Sign" to provide precise details.‍Why CVAT.ai Attributes Matter‍1. Enhanced Context: Imagine annotating a dataset with just bounding boxes around cars that have a simple, single Label. With CVAT.ai Label Attributes, you can add labels like "SUV," "Sedan," or "Convertible." This additional context empowers ML algorithms to better recognize different car types, leading to more accurate model results.‍2. Improved Analysis: When dealing with complex data, such as medical images, CVAT.ai Label Attributes enable annotators to include critical details like "Benign Tumor," "Malignant Tumor," or "Inflammation." This extra layer of information allows researchers medical-centric ML algorithms to perform more insightful analysis and make better-informed decisions.‍3. Enriched Machine Learning: ML models thrive on data diversity. With CVAT.ai label attributes, you can feed the model with richer data that goes beyond simple annotations. This exposure to varied Attributes helps to train the model to be more adaptive and versatile.‍And many more!Want to know more about how to use them? Watch the video! ‍‍And share your feedback!‍Happy annotating!Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

August 2, 2023

CVAT.ai - Attributes...labels that make more sense!

Blog

Every day, the vast world of artificial intelligence (AI) becomes increasingly interconnected with our lives. In essence, the backbone of AI technology is data. More specifically, for AI to understand and interpret the visual world around us, we need data in the form of images. These images must be labeled or annotated, turning them from raw pixels into a language that AI can understand. This process, called image annotation, is an integral part of AI development.However, image annotation is not simply a case of attaching labels to pictures. It requires meticulous work to ensure the data is labeled correctly and consistently. This is where the importance of image annotation quality comes into play.‍Imagine you have a dataset full of labeled images. But how do we know the labels are accurate? How can we be sure that these labels are reliable enough to train AI models? To answer these questions, we need to assess the quality of our data. Fortunately, in CVAT, an image labeling tool specifically designed for such tasks, there's a simple way to do this using a method known as the 'Honeypot'.‍‍The Magic of CVAT Honeypot ‍n the world of image annotation, quality is king. That's where CVAT and its Honeypot method come in. The Honeypot method is all about comparing actual annotations with a 'ground truth', or known correct annotations. This ground truth is set up in a unique job within CVAT.‍Worried about double-annotating an entire dataset for the ground truth? Fret not. Just a fraction of images, say 5-15%, is enough to give an estimate of the overall quality. The size of the 'ground truth' job is flexible, you can set it as a specific number or a percentage of frames .‍These 'ground truth' jobs are different from regular jobs. They don’t mingle with your main dataset, whether it's exporting, importing, or automatic annotation. And if you ever need to tweak parameters, you can delete the ground truth job and create a fresh one.‍Once your ground truth job is complete, annotate the dataset and let CVAT crunch the data. Once processed, all the information will appear on the Task Analytics page which is dedicated to showing annotation quality results. There you'll find your task's quality score, including the average annotation quality, the number of conflicts and issues, and a per-job breakdown. For a closer look, you can always download the detailed report for the task or for each job.‍And if you need to customize quality score requirements? CVAT's got you covered. You can set parameters, for example what counts as a 'bad' overlap and others. Once set, these will be applied in the next quality check. So, there you have it. Ensuring high-quality annotations is a breeze with CVAT and the Honeypot feature.‍Check the video about this new feature:‍‍‍Thank you for choosing CVAT!‍Stay tuned and follow the news here:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

Tutorials & How-Tos

July 20, 2023

Quality control of Image annotation in CVAT

Blog

Are you tired of the hassle involved in providing instructions to your annotation team? CVAT has introduced an exciting new feature that streamlines the annotation process by simplifying how you share instructions with your annotators.‍In today's article, we'll explore this feature and how it can enhance your annotation workflow.‍‍Introducing the built-in data annotation instructions‍The feature eliminates the need for separate tools or external resources when providing data annotation specification. Previously, you had to create specifications using tools like Google Docs and then link or send them separately via CVAT or email. ‍However, CVAT has made it easier by integrating markdown instructions for your data annotation team directly into the CVAT interface. Available in one click.‍‍Creating and modifying data annotation instructions‍CVAT now has an built-in Markdown editor where you can easily create and modify a document with annotation instructions for your team. The editor provides a live preview of the formatted text, allowing you to see the results as you type. You can use the top toolbar for formatting options or write directly in raw Markdown. For more information, see Markdown Cheat Sheet.‍‍Adding Text, Visuals, and Code‍The new feature supports various media types, including text, images and links. You can add images by providing the URL or by drag-and-drop or pasting directly into the editor. Additionally, you can include links to external resources. If code examples are needed, you can seamlessly incorporate them into the instructions using Markdown's code formatting.‍Enhanced Collaboration and Accessibility‍With the feature, collaboration becomes more efficient, as data annotation instructions can be shared through CVAT and tasks.‍But who can edit the created annotation instructions? ‍If you are an individual user, and the instructions are linked to a project, only the project owner and assignee will have the authority to modify the markdown document. ‍If the instructions are linked to the task, then the owner of the tasks and assignee can modify it as needed.‍For organizations, editing rights will have owner, assignee and maintainer for both projects and tasks. ‍In all cases, annotators assigned to the job can view the data annotation instructions but cannot make changes. ‍Instant Access and Improved Workflow‍The data annotation instructions can be easily accessed. ‍By clicking on the guide icon in the top right corner of the screen, annotators can quickly access all the necessary information without navigating through multiple tabs or external resources. ‍This streamlines the workflow and ensures a smoother annotation process.‍Watch the Video for a Visual Overview‍Watch the video for a visual demonstration of the feature and discover how it can improve your data annotation experiece and collaboration.‍‍Thank you for choosing CVAT!‍Stay tuned and follow the news here:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

July 5, 2023

Built-in data annotation instructions in CVAT

Blog

Exciting News! CVAT.ai Joins the NVIDIA Inception Program! 🎉 What does it mean for you?‍Before diving into the list of benefits, lets cover some basics:‍NVIDIA Inception is a global program designed to accelerate the growth of innovative AI and data science start-ups.Computer Vision Annotation Tool (CVAT.ai) is a rapidly growing startup in the field of visual data labeling for AI models.‍The partnership between the NVIDIA Inception program and CVAT.ai opens up incredible opportunities for growth, innovation, and collaboration in the AI and computer vision industry, and it is beneficial for both parties. And also for you!‍And here we come to the main question: what does all this mean for you as a user?‍Reliability and performance: The collaboration between CVAT.ai and NVIDIA means that as a user, you can expect CVAT.ai to use NVIDIA's advanced hardware technologies, such as powerful GPUs. This will result in improved performance, faster processing, and more accurate results when using CVAT.ai for data labeling and AI model training.Recognition and trust: CVAT.ai's acceptance into the NVIDIA Inception program is a big deal! It means that NVIDIA, a well-known and respected company in technology, believes in CVAT.ai's potential. This acknowledgment highlights CVAT.ai's outstanding ability to succeed and grow in the AI and computer vision industry. When NVIDIA trusts us, you can trust us too!Getting things done faster and better: CVAT.ai now has access to helpful guidance, mentorship, and training from the NVIDIA Inception program. This support will supercharge the CVAT platform, allowing it to develop new and improved features. As a result, you'll enjoy even more innovative and advanced tools from CVAT.ai for annotating computer vision projects. Faster and more accurate features are on the way!Working together for a better future: By joining the NVIDIA Inception Program, CVAT.ai is teaming up with other startups, researchers, and industry experts. It means brilliant minds will come together to share ideas, work on exciting projects, and learn from each other. This teamwork will drive advancements in AI and computer vision, leading to better tools and resources for your data labeling needs. Get ready to benefit from the collective expertise and industry impact of this collaboration!‍So, prepare for improved technology, more innovation, and collaborations that will bring you amazing tools and resources. It's pretty cool, right? We'd love to hear your thoughts in the comments! Share your opinion with us! 🎉✨Stay tuned and follow the news here:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

Company News

June 29, 2023

CVAT joins the NVIDIA Inception program!

Blog

Have you ever sat back and wondered: “how productive my annotation team is?” or "How much time does an annotator take to complete a task?" Well, you're not alone in this, and we've got the perfect solution for you - CVAT Analytics!‍Picture this: you've got a magic telescope that lets you peek into the realm of your team's productivity, task completion rates, and even helps you spot the bottlenecks. Sounds awesome, right? This isn't magic, but it's the next best thing - it's the range of dashboards available in CVAT Analytics (powered by Grafana). We've made an easy-to-understand video just for you, which you can check out at the end of this post.‍CVAT Analytics is like your personal productivity wizard, and it comes with three magical lenses (dashboards): All Events, Management, and Monitoring. Let's dive into what each of them does.‍The "All Events" Lens‍The "All Events" lens, or dashboard, is like a bird's-eye view of your team's work landscape. It lets you see all events, when they happened, and who triggered them. You can think of it like your detective tool, keeping an eye on everything that's going on.‍There's a neat activity graph at the top and some handy filters that act like magnifying glasses, letting you zoom in on specific users, tasks, or projects. It's perfect for understanding your team's overall performance and making decisions that'll help everyone be more productive. Isn't that cool?‍The "Management" Lens‍Next up, we've got the "Management" lens. This one is like your team captain, helping you oversee your team's work in a simple and powerful way. There's another activity graph here, and you can click on a team member's ID to see what they've been up to.‍The best part? There's a handy table at the bottom that tells you who's been working on what and how long it took them. This lens helps you manage your projects efficiently, understand how your team works, and make decisions that'll lead to even better results.‍The "Monitoring" Lens‍Last but not least, there's the "Monitoring" lens. This one is like a health check-up for your projects. It gives you a snapshot of what everyone's up to, how active they are, and even shows you if any errors popped up.‍There's a graph for overall activity, one for the duration of events, and an "Exceptions" graph that's like your team's error weather forecast. Each error is described in detail, which is super useful for understanding what went wrong and how to fix it.‍This lens is great for keeping track of your annotators' work hours, getting weekly statistics for each annotator, and managing productivity across all tasks.‍All in all, these magical lenses of CVAT Analytics are your best friends for monitoring tasks, evaluating your team's productivity, spotting errors, and simplifying project management. ‍They're packed full of valuable insights that can help you make smarter decisions and solve problems quicker.‍And remember, we've only just scratched the surface here. The possibilities with CVAT Analytics are endless!Ready to see this in action? Here is the video:‍‍For more information, see CVAT Analytics and Monitoring.Happy annotating!Stay tuned and follow the news here:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

June 14, 2023

CVAT Analytics and Monitoring: Make Your Annotation Team More Productive

Blog

Today we're going to talk about something really cool and useful. It's called "webhooks". We will show where to find them in CVAT, how to configure them, and the final result of their work.‍‍But before we dive into technical stuff, let’s explain the basics in simple words.For example, imagine a situation: you're waiting for a friend to come over and play. You could sit by the window and keep looking outside until your friend shows up, right? But wouldn't it be easier if your friend just rang the doorbell when he or she arrived? That way, you could do other fun stuff instead of just waiting.That's pretty much how webhooks work. They're like the doorbell of the internet.‍Now, we're going to talk about a specific tool called CVAT (which stands for Computer Vision Annotation Tool). This tool is used for image annotation and video annotation for further processing of the data in the ML models.‍When you're using CVAT, you might be working solo or with a team. In the second case, the team might be working on a lot of different things at the same time. For example, they might teach the computer to recognize different types of dogs or to understand what's happening in a video. ‍While they are doing all of this, wouldn't it be nice to know when a specific task is done, or when something changes, without having to keep checking all the time? That's where webhooks come in.‍CVAT webhooks are like little messengers. You don't have to keep checking on your annotators by yourself anymore. Instead, these webhooks will let you know when something new happens in CVAT. So, if a task gets started or finished, or if there's a problem or a change, you'll get a message straight away. It's a simple and fluent way to stay updated!‍But how does this work? When you're setting up a CVAT webhook, you're basically giving it a special online address (a URL) where it can send its updates. You set the other details, hit the Submit button, and you're good to go! ‍Now, whenever something changes in CVAT, your webhook gets to work. It sends a message to the address you provided, telling you all about the event, like what happened and all the specific details.‍Curious about how it all works in action? Take a look at this short video that walks you through the whole process:‍‍And that's the basics of CVAT webhooks! They might seem a little complicated at first, but once you understand how they work, they're actually pretty simple and really useful.‍So remember, next time you're using CVAT and you want to know how things are going with the tasks, just set up a webhook. It's like setting up a doorbell so you know when your friend has come over. Happy annotating!‍Stay tuned and follow the news here:‍Facebook‍DiscordLinkedInGitterGitHub‍‍

June 6, 2023

What are CVAT Webhooks and how to create and use them

Blog

Animal classification is the process of categorizing different species of animals based on their physical and biological characteristics. ‍Here, when we say physical biological charastetrisitcs, we mean:1. Symmetry: radial like a starfish or bilateral like a butterfly?2. Body plan: does it have a backbone? Is it covered in fur or scales?3. Reproduction: does it lay eggs or is it some kind of internal fertilization?4. Metabolism: is it a herbivore or carnivore? Or something else?And many more. ‍By studying these characteristics, scientists can better understand the evolutionary relationships between different species and how they have adapted to different environments over time. They can aslo see the changes in the behavior and appearance of the animals by checking the data from different time periods. Based on this information, scientists make conclusions and provide recommendations for ecological improvements that can benefit both endangered and non-endangered species alike.‍So animal classification is something really important. And challenging. As it requires both: collection and processing of Information. For this very reason it is also very time consuming and costly: the animal kingdom is big and so is the volumes of the collected data. ‍Image annotation can help with animal classification by providing a way to analyze large amounts of visual data quickly and efficiently. ‍The procedure is straightforward: assign labels, such as bird, starfish, bear, or zebra. When necessary, add attributes like the presence of a backbone, radial or bilateral symmetry, or even the gender of the subject. Once completed, export the annotated dataset and apply the machine learning model to it. This will classify animals based on the provided labels, resulting in a quicker and more accurate animal classification procedure.To make the process of adding classification labels easier, ecologists use different tools and one of them is CVAT (Computer Vision Annotation Tool).‍ Here is the short video describing the whole process step-by-step:‍We are waiting for your feedback here:DiscordLinkedInGitterGitHub‍You can find more information at our YouTube channel

Tutorials & How-Tos

April 26, 2023

Accelerate image classification with CVAT

Blog

You asked and we delivered! Facebook Segment Anything Model (SAM) is now available in CVAT's self-hosted solution!‍‍What is Segment Anything Model (SAM)?‍SAM is the revolutionary image segmentation model designed to improve annotation speed and quality in the world of computer vision.‍In computer vision, segmentation plays a critical role and is based on pixel classification and defining pixels that belong to specific objects within an image. This technique has numerous applications, from analyzing scientific imagery to editing photos.‍Nonetheless, achieving accurate annotation through segmentation can be quite challenging. And building a segmentation model demands expertise, AI training infrastructure, and vast amounts of annotated data.‍Facebook's SAM project tackles all these challenges head-on.‍The aim behind SAM's design was to boost image segmentation speed and precision by introducing a new comprehensive model, trained on a record-breaking 1-Billion mask dataset -- the largest segmentation dataset ever.‍And the goal was accomplished. With SAM, there's no need for specialized knowledge, high computing power, or custom data. The SAM model has it all covered. It performs object detection and generates masks for them in any image or video frame, even those it hasn't encountered before. ‍SAM can be utilized for various applications without additional training, showcasing its impressive zero-shot transfer capability. It can be employed for data annotation across various fields, from medical to retail to autonomous vehicles. We eagerly anticipate discovering all the potential uses and applications that have yet to be imagined.‍Annotate with Segment Anything Model (SAM) in CVAT‍Now let's see how to use SAM in CVAT. This integration is currently available in a self-hosted solution and coming soon to CVAT.ai cloud!‍Note, that SAM is an interactor model, It means you can annotate by using positive and negative points. ‍The process is easy and described in the following video:‍‍Or if you prefer text to the video, follow this instruction:Deploy the model:1. If necessary, follow the basic instructions to install CVAT with serverless functions support. 2. The model is available on both CPU and GPU. The second option is significantly quicker, but if you want to install a GPU version, please additionally set up NVIDIA container Toolkit.3. To deploy the Segment Anything interactor just perform one of the following commands from the root CVAT directory on your host machine: On GPU: cd serverless && ./deploy_gpu.sh pytorch/facebookresearch/sam/nuclio/‍ On CPU: cd serverless && ./deploy_cpu.sh pytorch/facebookresearch/sam/nuclio/Annotate using the model:‍Open CVAT, create a task, open an annotation job, and go to AI Tools > Interactors. You will find the model in the drop-down list.Begin the annotation process by selecting the foreground using left mouse clicks and removing the background with right mouse clicks. Once the annotation is complete, save the job, and you'll be able to export the annotated objects in various supported formats.‍‍What’s next?We are currently working on adding the Segment Anything Model to CVAT.ai cloud! Stay tuned and follow the news here:‍DiscordLinkedInGitterGitHub‍‍

Product Updates

April 12, 2023

Meta's Segment Anything Model is now available in CVAT Community

Blog

Our previous article was an overview of what you can expect from our Enterprise plan. In this article, we'll explain how to request premium features and become a Corporate client. We will dive deeper into benefits and present future gains.Let’s start with a simple use case: a medical research company uses annotated data for studies. As the company grows, the amount of data to be labeled increases accordingly, making manual annotation time-consuming and error-prone. So they start using CVAT's self-hosted platform, as it can be customized to meet internal requirements and has integrated models for automatic labeling.The company starts with a free version but finds some important features missing, so they submit feature requests to CVAT's GitHub and wait for them to be added to the platform.‍However, there is no guarantee that the requested features will be implemented or when they will be delivered.‍Because once you submit the feature request, you must wait in hope that it will be selected for development, keeping in mind that there are some drawbacks to this approach:‍The feature might not be selected for development.The feature might not be implemented as requested.There's no guarantee for how long development will take.‍This is where requesting a Feature Boost can help. You can directly contact the CVAT developer team because, at CVAT.ai, we are eager to listen to your needs and prioritize your requests ahead of our roadmap.‍Now to start working with CVAT.ai there is a list of requirements you need to fit:‍Price: The agreement we make with you comes with a price of $50,000 and up per contract. The contract can include one or several big features or many small ones. Availability: As CVAT is an open-source platform, most of the features we develop are added to the public repository. We can include exclusive features in the contract as an exception, but they should not be core features and must be discussed and agreed upon first.Deployment: We will help you to deploy your features, but the CVAT team will not rework your company’s infrastructure or adjust internal services.‍If you fit all three points above, then the process will look like this:Step 1: You contact CVAT.ai through one of our communication channels. We advise LinkedIn or Sales.‍Step 2: We sign an NDA first. Then the CVAT team will listen to your request, discuss all the details, and ask for additional materials from your side if needed. ‍Step 3: CVAT.ai will process the request internally, breaking it down into stages and estimating the costs. Some additional information from you might be needed at this stage, so we will stay in touch with you. ‍Step 4: Once costs and stages are estimated internally, CVAT.ai will make a proposal, and start a discussion with you regarding the pricing options, project stages, and timelines for implementing the project. If everything aligns with your needs, we will proceed to sign the contract.‍Note: This step may also include procurement procedures and other additional activities on both sides.‍Step 5: Development starts as agreed in the contract. What will you get?The ready project will come in two parts:‍MVP: the first prototype that we will show to you, so adjustments can be done if needed. Also at this stage, you can stop the development. You will pay only 20% of the agreed price in this case.All the rest: after MVP is approved, we go on with the rest of the development and release the feature.‍After the feature is deployed we will help with the integration of this feature into your workflow.‍We will provide support for a month to make sure the feature works as you desire and you feel comfortable using it. If there are any bugs, they will be fixed with no additional payment. ‍The feature will be updated on time if it is not exclusive and went into the CVAT open-source repository. ‍If you are curious about real-life examples, here is a list of the features that we created with Feature Boost plans:‍‍Annotation with skeletons.‍ Analytics and monitoring, also known as audit logs.‍‍‍ Webhooks for Projects and Organizations that are used to handle application notifications about changes in a specific project or organization.‍‍Any questions left? Feel free to contact us!DiscordLinkedInGitterGitHub‍‍

March 6, 2023

Feature Boost: Request features you need the most

Blog

Tutorials & How-Tos

February 14, 2023

CVAT SDK PyTorch adapter: using CVAT datasets in your ML pipeline

Blog

CVAT already has impressive automatic annotation abilities with its built-in models. Today, we announce that we further advanced them by adding third-party DL models from Hugging Face and Roboflow. This integration has the potential to increase annotation speed by an order of magnitude, depending on the model. In this article, we show how these models can improve your annotation process.‍‍Introduction to Hugging Face and Roboflow‍Hugging Face and Roboflow are very popular online platforms providing ready-to-use services for artificial intelligence.‍Hugging Face provides a large collection of pre-trained machine learning models for Natural Language Processing (NLP) and Computer Vision. It offers an API for the models, and integrates with various deep learning frameworks for various applications. The platform provides a way for developers and researchers to quickly experiment with different models and use them for tasks such as sentiment analysis, text classification, language translation, and image classification, among others.‍Roboflow is a comprehensive platform for managing and preprocessing datasets and models. It has a wide range of models to meet different needs and requirements, including object detection, image classification, image segmentation, and many more. Whether you're working on a simple task or a complex project, you can use one of more than 7000 pre-trained models, and just focus on your projects instead of training your own models from scratch.‍CVAT leverages the strengths of both Hugging Face and Roboflow by integrating their models into its platform, resulting in an efficient and flawless data annotation workflow.‍CVAT integration with Hugging Face and Roboflow‍The integration of Roboflow and Hugging Face models into CVAT has unlocked boundless potential for data annotation. With the convenience of the CVAT interface, you can now harness the power of the leading models and annotate your data at lightning speed. Adding these models to CVAT is now a breeze, thanks to the user-friendly interface designed for that purpose.‍To add a model from Roboflow, there are a few requirements that must be met. Firstly, create an account, then you will need to get the model URL and the API Key, both of which can be found in the Roboflow Universe.Simply locate the desired model, but keep in mind that CVAT integration only supports image classification, object detection, and image segmentation models. For optimal testing and experimentation with the new feature, we suggest trying the following list of models:License Plate Recognition Detection: a pre-trained model for license plates recognition.Hard Hats Detection: designed to detect faces. Face Detection Detection: pre-trained model for face detection.Mask Wearing Detection: designed to detect faces in masks.Or use search.Click on the model’s name and scroll down to the Hosted API. That's how you will obtain the model URL and the API key.‍‍To integrate Hugging Face models with CVAT, first create an account on the Hugging Face website. Access your User Access Token from the settings page once you have logged in. Choose a model from the list of available models. Same as for the Roboflow, keep in mind that CVAT integration only supports image classification, object detection, and image segmentation models. Once you’ve clicked on the model’s name, you will get the model URL.‍‍‍To add a model to CVAT, first log in and navigate to the Models section. From there, click the Add New Model (+) button and proceed to input the model URL:‍‍Once the model URL has been added, CVAT will automatically detect the provider. The final step is to enter the API key (or User Access Token if you are using Hugging Face) and click Submit:‍‍The model will appear on the Models page:‍‍Click on it to see the predefined labels:‍Now all that is left is to create a task with the model’s predefined labels, and it will then be accessible from the CVAT tools for both manual annotation of individual images and automatic annotation of multiple images or videos:‍‍And you can start annotating:‍‍To wrap it up: the combination of Hugging Face, Roboflow, and CVAT is a game-changer for computer vision projects, offering the convenience and versatility of pre-trained models from all platforms. The integration of these tools results in a flawless and intuitive annotation process.

Product Updates

February 2, 2023

Streamline annotation by integrating Hugging Face and Roboflow models

Blog

We are excited to introduce the 6th video in the Computer Video Annotation Course, which will guide you through using the CVAT SDK.‍In the video we demonstrate the CVAT SDK that is a Python library that provides functionality to simplify the integration of CVAT API into your application. The SDK is open-source, platform-independent, and easy to install with “pip”. ‍The library functionality can be divided into three parts: ‍low-level API;high-level API;and various utilities and helpers.‍The low-level API provides server API wrappers, while the high-level API allows you to interact with the server objects using object-oriented API. ‍The Client class ties together the functionality of the high-level API and provides simple access to the server entities such as tasks and projects. The library also includes data validation, response parsing, error handling, and inline documentation for each layer. With the CVAT SDK, you can save time on writing integration code and focus more on your tasks.‍We are waiting for your feedback here:DiscordLinkedInGitterGitHub‍You can find more information on our YouTube channel‍

February 1, 2023

Using CVAT SDK

Blog

In this blog post, we will provide you with insight into two annotation techniques that can significantly accelerate the process.‍The first technique is the interpolation mode. This method allows you to annotate multiple frames automatically by specifying a frame step. The process is easy to follow and comprises three basic steps, which are outlined in detail in this video:‍‍‍‍The second technique introduces the use of polygons for annotation. This method is particularly useful for object annotation tasks that require precise boundaries. It's essential to note that these two techniques are not mutually exclusive and can be used in conjunction to achieve even more efficient annotation. For example, the annotation with polygons technique makes use of the interpolation mode where necessary to speed up the annotation process. The process itself is straightforward and comprises four steps, which are outlined in detail in this video: ‍‍In conclusion, annotation can be a time-consuming task, but with the right tools and techniques, you can make it faster. By utilizing the interpolation mode and polygons for annotation, you will greatly speed up the process and achieve more precise results. Don't hesitate to check out the videos for more information and examples, and feel free to provide feedback to improve the process and the tools.‍We are waiting for your feedback here:DiscordLinkedInGitterGitHub‍You can find more information on our YouTube channel

Product Updates

January 24, 2023

Annotating Smarter: Interpolation and Polygons in Practice

Blog

We are excited to share the 5th video in our Computer Video Annotation Course, which will guide you through the process of using the CVAT Command Line Interface (CLI).‍‍In the fifth video of our Computer Video Annotation Course, we will take a deep dive into the world of CLI and its many uses in video annotation. We'll start by providing an overview of what CLI is and when it can be useful. ‍CLI allows you to interact with a computer by typing commands into a terminal or command prompt, rather than using a graphical user interface (GUI). This can be especially useful when working with large datasets, automating tasks, or running tasks on a remote server.‍Next, we'll walk you through the installation and setup process for CLI, including instructions on how to install the CVAT package and set up the environment to run it.‍Once you have the environment set up, we'll provide demonstrations of how to run various commands (with parameters) in CLI and what the responses will look like.‍All concepts are explained with practical examples and real-world use cases, making it easy to understand and apply what you learn in this video to your video annotation projects.We are waiting for your feedback here:DiscordLinkedInGitterGitHub‍You can find more information on ourr YouTube channel

January 19, 2023

Using CVAT CLI

Blog

We are excited to introduce the 4th video in the Computer Video Annotation Course, designed to guide you through the process of using the CVAT API.‍‍With CVAT APIs, you can easily interact with the CVAT server through RESTful HTTP requests. CVAT APIs follow the pull interaction model: the user sends an HTTP request to the server, which performs a set of actions and responds with data.The APIs offer a variety of operations for managing the server's objects, including tasks, projects, and users. The endpoints are organized by context and require authentication for access.Here, you can use basic (where you send a username and password to the API) and token-based authentication. Once you are verified, you can start using the endpoints to perform operations such as creating and exporting tasks and datasets.The documentation and schema for the APIs outline what data you can send and what data will be received based on the request.Overall, if you're working on computer vision projects, CVAT APIs are a great tool to help with annotation and organization. With proper usage, you can streamline the annotation process, save time and effort, and make the workflow more efficient.You can find more information at:DiscordLinkedInGitterGitHub‍Subscribe to get latest updates from CVAT Team →‍

January 11, 2023

Using CVAT API

Blog

Object detection is a field within Computer Vision that involves identifying and locating objects within an image. Advances in object detection algorithms have made it possible to detect objects in real-time as they are moving.‍There are a number of Object detection technologies available. One of the most popular ones to date is the YOLO object detector. Currently, YOLO is almost the gold standard algorithm for Object detection, owing to its high speed. As such, it finds widespread applications in a number of crucial areas like security and surveillance, traffic management, autonomous vehicles, as well as healthcare.In this article, we will learn how YOLO works and how you can use it to annotate images in CVAT automatically. ‍A brief history of YOLOEarly object detectors were mainly region-based. They used a 2-step process to detect objects. In the first step, these algorithms proposed regions of interest that are likely to contain objects. In the second step, they classified the images in these proposed regions.‍Some of the popular region-based algorithms include:R-CNNFast R-CNNFaster R-CNNRFCNMask R-CNN‍Region-based detectorsR-CNN (Regions with CNN features) was the first region-based object detector, proposed in the year 2014. This detector used a process of selective search to cluster similar pixels into regions and generate a set of region proposals. These regions were then fed into a Convolutional Neural Network (CNN) to generate a feature vector. The feature vector was then used to classify and put a bounding box around detected objects. Besides other limitations, this algorithm proved to be quite time-intensive.‍So, it was succeeded by the Fast R-CNN detector, which put the whole image through a CNN and used ROI pooling to extract the region proposals. The feature vectors thus generated were passed through several fully connected layers for classification and bounding box regression. Although this was faster than R-CNN, it was still not fast enough, as it still required selective search.‍The Faster R-CNN detector drastically speeded up the detection process by getting rid of the selective search approach and using a Region Proposal Network instead. This network used an ‘objectness score’ to produce a set of object proposals. The objectness score indicates how confident the network is that a given region contains an object. ‍Another approach used was the R-FCN detector, which used position-sensitive score maps. ‍All the above methods required 2 steps to detect objects in an image:‍Detect the object regionsClassify the objects in those regions‍This 2-step process made the object detection process quite slow. A more sophisticated approach was required if object detection was to be used in real-time applications. ‍Emergence of YOLOThe YOLO algorithm was first proposed by Joseph Redmond et al in 2015. In contrast to earlier Object Detection algorithms, YOLO does not use regions to find objects in a given image. Neither does it require multiple iterations over the same image. ‍It passes the entire image through a Convolutional Neural Network that simultaneously locates and classifies objects in one go. That is how the algorithm gets its name (You Only Look Once).‍This approach enables the algorithm to achieve substantially better results than other object detection algorithms. ‍How YOLO worksThe YOLO algorithm divides the image into an NxN grid of cells (typically it is 19X19). It then finds B bounding boxes in each cell of the grid. For each bounding box, the algorithm finds 3 things: The probability that it contains an objectThe offset values for the bounding box corresponding to that objectThe most likely class of the object‍After this, the algorithm selects only the bounding boxes that most certainly contain an object. ‍IoU (Intersection Over Union)The YOLO algorithm uses a measure called the IoU to determine how close the detected bounding box is to the actual one.The IoU is a measure of the overlap between two bounding boxes. During training, the YOLO algorithm computes the IoU between the bounding box predicted by the network and the ground truth (the bounding box that was pre-labeled for training). ‍It is calculated as follows:IoU = area of intersection of the overlapping boxes / area of union of all the overlapping boxes‍An IoU of 1 means that both bounding boxes completely overlap one another, whereas an IoU of 0 means that the two bounding boxes are completely distinct. A threshold for the IoU is fixed, and only those bounding boxes that have an IoU above the threshold value are retained, while others are ignored. This helps eliminate a lot of unnecessary bounding boxes so that you’re left with only the ones that best fit the object.‍Non-Maximum SuppressionDuring the testing phase, since there are a number of cells detecting the same object, it is possible to be left with several bounding boxes corresponding to the same object. YOLO takes care of this by using a technique called Non-maximum suppression.Non-max suppression involves first selecting the bounding box with the highest probability score and removing (suppressing) all other boxes that have a high overlap with this box. This again makes use of the IoU, this time between all the candidate bounding boxes and the one with the highest probability score.‍‍Bounding boxes that have a high IoU with the most probable bounding box are considered to be redundant and are thus removed. However, those with a low IoU are considered to perhaps belong to a different object of the same class and are thus retained.In this way, the YOLO algorithm selects the most appropriate bounding box for an object.‍The YOLO ArchitectureYOLO is essentially a CNN (Convolutional Neural Network). The YOLOv1 network consists of 24 convolutional layers, and 4 max-pooling layers, followed by 2 fully connected layers. The model resizes the input image to 448x488 before passing it through the CNN.‍ ‍The convolutional layers in the network alternate 1x1, followed by 3x3 reduction layers to reduce the feature space as the image goes deeper into the network. ‍The last convolutional layer uses a linear activation function, while all others use leaky ReLU for activation. ‍Limitations of YOLOThe YOLO algorithm has been a great leap in the field of object detection. Since it can process frames much faster than traditional object detection systems, it is ideal for real-time object detection and tracking. However, it does come with some limitations.‍The YOLO model struggles when there are small objects in the image. It also struggles when the objects are too close to one another. For example, if you have an image of a flock of birds, the model would not be able to detect them very accurately. ‍Popular YOLO VariationsTo overcome the limitations of YOLOv1, many new versions of the algorithm have been introduced over the years. ‍The YOLOv2 was introduced in 2016 by the same author (Joseph Redmond). It addressed the most important limitations of YOLOv1 - the localization accuracy and the detection of small clustered objects. The new model allowed the prediction of multiple bounding boxes (anchor boxes) per grid cell, so more than one object could now be detected in a single cell. Moreover, to improve accuracy the model used Batch Normalization in the convolutional layers. ‍YOLOv2 uses the Darknet-19 network, which consists of a total of 19 convolutional layers and 5 max-pooling layers.‍Following YOLOv2, the YOLO9000 was introduced. This model was trained on the COCO dataset (which is almost a superset of ImageNet), allowing it to detect more than 9000 image classes.‍When YOLOv3 came about, it brought with it an architectural novelty that made up for the limitations of both YOLO and YOLOv2. So much so that it is still the most popular of the YOLO versions to date. This model uses a much more complex network – the Darknet-53. It gets its name from the 53 convolutional layers that make up its architecture. The model itself consists of 106 layers, with feature maps extracted at 3 different layers. In this way, it allows the network to predict at 3 different scales. This means that the network is especially great at detecting smaller objects.‍Besides that, YOLOv3 uses logistic classifiers for each class, instead of a softmax function (used in the previous YOLO models). This allows the model to label multiple classes for a single object. For example, an object could be labeled as both a ‘man’ as well as a ‘person’.‍After YOLOv3, other authors introduced newer versions of YOLO. For example, Alexey Bochkovskiy introduced the YOLOv4 in 2020. This new version mainly increased the speed and accuracy of the model with new technologies like weighted residual connections, cross mini batch normalization, and more.‍Many other versions have come about following the YOLOv4, like the YOLOv5, YOLOACT, PP-YOLO, and more. The latest version to date is the YOLOv7. The paper for this model was released in July 2022 and is already quite popular. ‍According to the authors, the YOLOv7 could outperform most conventional object detectors, including YOLOR, YOLOX, and YOLOv5. In fact, the YOLOv7 is being hailed by its authors as the ‘New State-of-the-Art for Real-Time Object Detectors’.‍How you can use YOLO in CVAT / Integration of YOLO and CVATTo train any object detection model on image data, you need pre-annotated images (containing labeled bounding boxes). There are a number of tools available both online and offline to help you do this. One such tool is CVAT (Computer Vision Annotation Tool). ‍This is a free, open-source online tool that helps you label image data for computer vision algorithms. Using this tool, you can simply annotate your images and videos right from your browser.‍Here’s a quick tutorial on how to annotate objects in your image using CVAT.‍Using CVAT to Annotate ImagesLet’s say you have the following image and you want to put bounding boxes and labels on the two cars, dog, and pedestrians. ‍To do it, you need to go to cvat.ai, create an account and upload an image.‍When it comes to image uploading, the whole process includes several steps. First step is to set up a project and add task with the labels of choice (in this case: ‘pedestrian‘, ‘dog‘, and ‘car‘). ‍Second step is to upload one or more images you want to annotate and click ‘Submit and Open‘.‍Once everything is in place, you will see your task and all the details as a new job (with a new job number). The window below is the ‘Task dashboard’. Click on the job number link:‍‍ It will take you to the annotation interface:‍‍Now you can start annotating.‍How to Manually Annotate Objects in an ImageIn this example we show annotations with rectangles. To add a rectangular bounding box manually, you need to select a proper tool on the controls sidebar. Hover over the ‘Draw new Rectangle’, and from the drop-down list select the label you want to add to the annotated object. Click `Shape`.‍‍‍With a rectangle, you can annotate using either 2 or 4 points. Let’s say, you chose 2 points, then simply click on the top left corner and then the bottom right corner of the object, like this:‍‍CVAT will put the bounding boxes with specified labels around the objects. ‍This method is good when you do not have too many objects on the image. But if you have a lot of them, then the manual method can get quite tedious. For multiple objects cases CVAT has a more efficient tool to get the job done – the YOLO object detection.‍Using YOLO to Quickly Annotate Images in CVATCVAT incorporates YOLO object detection as a quick annotation tool. You can automate the annotation process by using the YOLO model instead of manually labeling each object. ‍Currently, two YOLO versions are available in CVAT: YOLO v3 and YOLO v5. In this example, we will use YOLO v3.To use the YOLO v3 object detector, on the controls sidebar hover over the AI Tools and go to Detectors tab. You will see a menu with a drop-down list of available models. From this drop-down, select ‘YOLO v3’.‍‍The next thing you need to do is the labels’ matching. This need is based on the fact that some models are trained on the datasets with a predefined list of labels. ‘YOLO v3’ is a model like that and to start annotating you need to give YOLO a hint - how its model’s labels are correlating with the ones you’ve added to CVAT. ‍For example, you want to label all people on the image and added a ‘pedestrian’ label in CVAT. The most fitting YOLO label for this type of object will be ‘person’. To start annotating, you need to match the YOLO label ‘person’ to the CVAT label ‘pedestrian’ in the Detectors menu.‍Luckily, for other objects there is no need to think twice, as YOLO has `dog` and `car` model labels:‍‍‍Once you’re done matching the labels, click ‘Annotate’. CVAT will use YOLO to annotate all the objects for which you have specified labels.‍‍After annotation is done, go ahead and save your task by clicking the Save button, or export your annotations in the .xml format from Menu > Export Job Dataset.‍Quickly Annotating Objects in VideosYou can use CVAT Automatic Annotation with YOLO detector to label objects in videos directly from your Task Dashboard with a few simple steps.‍First step is to find the task of the required video. Once you’ve identified it, hover over three dots to open the pop up menu.‍‍‍In the menu click on Automatic Annotation to open the dialog box, and from the drop down menu select ‘YOLO v3’ . ‍Second step is to check the labels matching, and adjust them to fit your needs and requirements (if needed).‍‍When all is set and ready, click ‘Annotate’ to start labeling objects in the video.‍It will take some time for automatic annotation to complete. The progress bar will show the status of the process. ‍‍When it is done, you will see a notification box along with a link to the task.‍‍Click on the link to open the task dashboard, and again on the job link to open the annotation interface.‍‍Where you will see the video with objects automatically labeled in every frame: ‍‍‍You can now go ahead and edit the annotations as needed if you find any false positives or negatives. ‍ConclusionYOLO is a specialized Convolutional Neural Network that detects objects in images and videos. It gets its name (You Only Look Once) from its technique of localizing and classifying objects in an image in just one forward pass over the network. ‍The YOLO algorithm presented a major improvement over the previous 2-stage object detection algorithms like R-CNN and Faster R-CNN in the inference speed. In an attempt to increase the speed and accuracy of object detection, numerous versions of YOLO have been introduced over the years. The latest version is the YOLOv7.‍Using YOLO on the CVAT platform, you can annotate images and videos within minutes, significantly reducing the amount of manual work that image annotations usually call for.‍We hope this tutorial helped you understand the concept and architecture of YOLO, and that you can now use it to detect and annotate objects in your own image data.‍

Tutorials & How-Tos

January 2, 2023

How to automatically detect objects with YOLO in CVAT

Blog

TL;DR: The quality of Deep Learning-based algorithms strongly depends on the quality of training data employed. This is especially true in the Computer Vision domain. Poor data quality leads to worse predictions, increased training times, and the need for bigger datasets. FiftyOne and CVAT can be used together to help you produce high-quality training data for your models. Keep reading to see how! ‍‍IntroductionRecently, the “Data-Centric movement” has been gaining popularity in the machine learning space. Over the last decade, improvements in machine learning primarily focused on models, while datasets remained largely fixed. As a community, we looked for better network architectures, created scalable models, and even implemented automatic architecture search. At present, however, the performance of our increasingly powerful models is limited by the datasets on which they are trained and validated. ‍In practice, datasets rarely stay fixed. They are constantly changing as more data is collected, annotated, and models are retrained. This iterative model improvement process is called Data Loop, illustrated in the image below.‍‍It is generally established that the more high-quality data you feed into the model, the better performance it achieves. The estimations are (eg. [1], [2]) that to reduce the training error by half, you need four times more data. But there’s a tradeoff: the more data you use in the training, the more time is needed for the training itself, as well for the annotation. And, unlike model training, because the annotation process is largely human-led, it can’t be simply sped up by more performant hardware.‍That’s why it is important to keep datasets just the right size to be able to annotate data quickly and with high quality. The smaller the dataset, the better the annotation quality required to achieve good training results. Annotations must not contradict each other and be accurate. Since the annotations are done by people, they require validation. And that’s where tools, like FiftyOne and CVAT, can greatly help.‍FiftyOne is an open-source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.‍CVAT is one of the leading open-source solutions for annotating Computer Vision datasets. It allows you to create quality annotations for images, videos and 3D point clouds and prepare ready-to-use datasets. It has an online platform and can be deployed on your computer or cluster. It is a scalable solution both for personal use and for big teams.‍In this blog post we will demonstrate how you can use these tools to create high-quality annotations for a dataset, validate the annotations, and detect and fix problems.‍Follow along with the code in this post through this Colab notebook.‍Dataset CurationTo demonstrate a data-centric ML workflow, we will create an object detection dataset from raw image data. We will use images from the MS COCO dataset. This dataset is available in the FiftyOne Dataset Zoo. You can easily download custom subsets of the dataset and load them into FiftyOne. This dataset does have object-level annotations, but we will avoid using them in order to show how to annotate a dataset from scratch.‍import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset( "coco-2017", split="validation", label_types=[], ) # Visualize the dataset in FiftyOne session = fo.launch_app(dataset)‍While it is easy to load a large dataset into FiftyOne for visualization, it is much harder (and often a waste of time and money) to annotate an entire dataset. A better approach is to find a subset of data that would be valuable to annotate and start with that, adding samples as needed. A subset of a dataset can be useful if it contains an even distribution of visually unique samples, maximizing the informational content per image. For example, if you are training a dog detector, it would be better to use images from a range of different dog breeds rather than only using images of a single breed.‍With FiftyOne, we can use the FiftyOne Brain to find a subset of the unlabeled dataset with visually unique images.‍import fiftyone.brain as fob # Generate embeddings model = foz.load_zoo_model("clip-vit-base32-torch") embeddings = dataset.compute_embeddings(model) results = fob.compute_similarity( dataset, embeddings=embeddings, brain_key="image_sim" ) results.find_unique(500) unique_subset = dataset.select(results.unique_ids) session.view = unique_subset‍Then we visualize these unique samples in the FiftyOne App.‍‍Note: You can use the FiftyOne Brain compute_visualization() method to visualize an interactive plot of your embeddings to identify other patterns in your dataset.‍These visually unique images give us a diverse subset of samples for training, while at the same time reducing the amount of annotation that needs to be performed. As a result, this procedure can significantly lower annotation costs. Of course, there is a lower bound to the number of samples needed to sufficiently train your model, so you will want to iteratively add more unique samples to your dataset as needed.‍Dataset AnnotationNow that we’ve decided on the subset of samples in the dataset that we want to annotate, it’s time to add some labels. We will be using CVAT, one of the leading open-source annotation tools, to create annotations on these samples. CVAT and FiftyOne have a tight integration, allowing us to take the subset of unique samples in FiftyOne and load them into CVAT in just one Python command.‍results = unique_view.annotate( "annotation_key", label_type="detections", label_field="ground_truth", classes=["airplane", "apple", …], backend="cvat", launch_editor=True, )‍Since this annotation process can take some time, we will want to make sure that our dataset is persisted in FiftyOne so that we can load it again at some point in the future when the annotation process is complete. ‍dataset.persistent = True # Optionally give it a custom name dataset.name = "cvat-fiftyone-demo" # In the future in a new Python process dataset = fo.load_dataset("cvat-fiftyone-demo")‍After our data is uploaded into CVAT, a web browser page should be opened. We will see the main annotation window, where we can create, modify, and delete annotations. There are different tools available on this window toolbar, so we can draw polygons, rectangles, masks, and several other figures. In the object detection task, the primary annotation type is the bounding box. Let’s draw one using the corresponding tool from the toolbar. Now, we can set the label and other attributes for the created rectangle. In this case, we used the “cat” label. After we’ve finished annotating this object, we can continue annotating other objects and images the same way. After all objects are annotated, we save work by clicking the Save button. Then, we can click the Menu button above and the Open the task button in the menu to open the task overview page.‍In CVAT, the data is organized into Projects, Tasks, and Jobs. Each Project represents a dataset with multiple subsets (or splits) and can have one or many Tasks. You can manage tasks inside a project, join them into subsets, and export and import the data. A Task represents an annotation assignment for a person or several people. The Task can be treated as a dataset, but its primary role is to organize and split the big workload into smaller chunks. Each Task is divided into Jobs to be annotated.‍CVAT supports different scenarios. In typical scenarios the datasets are big - from hundreds to millions of images. Datasets like these are annotated in teams divided into squads with different assignments: annotating, and reviewing of the annotated data. In CVAT, we can do both these assignments. We can assign people to jobs using the Assignee field. If there is a person to review our work and we want the annotations to be reviewed, we need to change the job Stage to “validation”:‍Now, the reviewer can open the job and comment on the problems found. The user interface now will allow us to create Issues. The issues are just comments in the free form, though CVAT provides several options to mark common problems with annotation with just a single click.Once the review is finished, we click the Save button, and return back to the task page. If everything is annotated correctly, we can mark the job as accepted and move onto other tasks. If there are problems found during the review, we can switch the job back to the annotation stage and assign it back to the annotator again.‍Now, the annotator will be able to fix the problems and leave comments on the issues. This process can take several turns before the dataset is annotated correctly. ‍When we finish annotating this batch of samples we can again use the CVAT and FiftyOne integration to easily load the updated annotations back into FiftyOne.‍unique_view.load_annotations("annotation_key")‍Dataset ImprovementWith the annotations loaded into our FiftyOne dataset, we can make use of the powerful querying and evaluation capabilities that FiftyOne provides. You can use them in the FiftyOne Python SDK and the FiftyOne App to analyze the quality of the annotations and the dataset as a whole. Dataset quality is a fairly vague concept that can depend on several factors, such as the accuracy of labels, spatial tightness of bounding boxes, class hierarchy in the annotation schema, “difficulty” of samples, inclusion of edge cases, and more. However, with FiftyOne, you can easily analyze any number of different measures of “dataset quality”.‍For example, in object detection datasets, having the same object annotated multiple times with duplicate bounding boxes is detrimental to model performance. We can use FiftyOne to automatically find potential duplicate bounding boxes based on the IoU overlap between them, and then visually analyze if it actually is a duplicate or if it is just two closely overlapping objects.‍import fiftyone.utils.iou as foui from fiftyone import ViewField as F foui.compute_max_ious( dataset, "ground_truth", iou_attr="max_iou", classwise=True, ) dups_view = dataset.filter_labels( "ground_truth", F("max_iou") > 0.75 ) session.view = dups_view‍We can then tag these samples in the FiftyOne App as needing reannotation in CVAT.‍‍Note: Other workflows FiftyOne provides to assess your dataset quality include methods to evaluate the performance of you model, ways to analyze embeddings, a measure of the likelihood of annotation mistakes, and more.‍Using the FiftyOne and CVAT integration, we can send only the tagged samples over to CVAT and reannotate them.‍reannotate_view = dataset.match_tags("needs_reannotation") results = reannotate_view.annotate( "reannotation", label_field="ground_truth", backend="cvat", )‍‍‍We can then load these annotations back into FiftyOne from CVAT with more confidence in the quality of our dataset. We can also export the created dataset into any of the common formats, including MS COCO, PASCAL VOC, and ImageNet, to be used in a model training framework directly from CVAT:‍Next StepsNow that we have an annotated dataset of sufficiently high quality, the next step is to start training a model. There are many ways you can train models by integrating FiftyOne datasets into your existing model training workflows or using CVAT to create a dataset ready for use.‍However, the process doesn’t stop after the model is trained. This is just the beginning. As you evaluate your model performance, you will find failure modes of the model that can indicate a need for further annotation improvements or for additional data to add to your datasets to cover a wider range of scenarios. ‍This process of dataset curation, annotation, training, and dataset improvement is the heart of data-centric AI and is a continuous cycle that will lead to improved model performance. Additionally, this process is necessary for any production models to prevent them from becoming out of date as the data distribution shifts over time‍SummaryIn the current age of AI, and especially in the computer vision domain, data is king. Following a data-centric mindset and focusing on improving the quality of datasets is the most surefire way to improve the performance of your models. To that end, there are several open-source tools that have been built with data-centric AI in mind. FiftyOne and CVAT are two leading open-source tools in this space. On top of that, they are tightly integrated, allowing you to explore, visualize, and understand your datasets and their shortcomings, as well as to take action and efficiently annotate and improve your labels to start building better models.‍

Product Updates

November 29, 2022

CVAT <> FiftyOne: Data-Centric Machine Learning with Two Open Source Tools

Blog

We are excited to release the second video in our course series designed to help you annotate data faster and better using CVAT.AI. In this video we will cover:1. Deployment options, which includea) Using the app online at app.cvat.ai (for those who want to get started annotating and get the job done)b) A “bare metal” local deployment (for software engineers who want to develop and debug the application for their own use cases)c) A containerized local deployment with docker-compose (for regular use), andd) A local cluster deployment with Kubernetes (for enterprise users)2. A 2 minute tour of the interface (we will do some example annotations)3. A breakdown of CVAT’s internals (for those who are curious or want to tinker)4. A demonstration of how to deploy CVAT using docker-composeYou can find more information at: Discord LinkedIn GitterGitHub‍Don't forget to Subscribe, please!

November 15, 2022

Computer Vision Annotation Course. Lecture #2: Deployment and use

Video

Playbook

Templates

March 18, 2025

CVAT Resources

5 Ground-Breaking Datasets for Computer Vision Applications in 2026

Top 4 Datasets for Semantic Segmentation

CVAT Integrates Hugging Face Transformers Model Library for Automatic Image and Video Annotation

Segment Anything Model 3 in CVAT, Part 1: Image Segmentation Support

The Top ML/AI Models To Use for Object Tracking in Video Annotation

CVAT Digest, 2025 Wrap Up: The Biggest Product Releases and Milestones of the Year

Point Cloud Annotation Updates: Faster Navigation, Better Stability, Cleaner UX

The Critical Role of Data Annotation for Autonomous Vehicles

How Data Annotation is Powering the Next Wave of Agriculture

CVAT Digest, November 2025: Easier K8s Installs, Safer Access, Leaner Backups

Introducing Personal Access Tokens: A More Secure Way to Work with CVAT API

Data Annotation for Robotics AI: Unique Challenges, Key Methods, and Best Practices

The 6 Best Open Source Data Annotation Tools in 2026

CVAT Digest, October 2025: Smarter Cloud Workflows and Token-Based Automation

The Most Popular Datasets for Computer Vision Applications in 2026

CVAT Integrates Ultralytics YOLO Models, Unlocking Scalable Auto-Annotation for ML Teams

The 10 Biggest AI & Computer Vision Conferences in 2026

CVAT Digest, September 2025: Bulk Actions, Task Moves for Organizations, Flexible Cloud Storage, and More

Is Your Training Data Trustworthy? How to Use Precision & Recall for Annotation QA

Why Data-Centric AI Leads to Better Results Than Model-Centric AI

CVAT Digest, August 2025: One-Step Account Cleanup, Smarter Labels, and More

Building a Robust Annotation Infrastructure: A Guide to CVAT Online and Enterprise Platforms

How Data and Annotation Quality Improves ML Model Training

What ML & AI Teams Should Learn from the Scale AI Data Leak

CVAT Digest, July 2025: CVAT Academy, SAM2 Tracking via AI Agents, and More

SAM2 Object Tracking Comes to CVAT Online Through AI Agent Integration

CVAT Celebrates 14K Stars on GitHub and Its Three-Year Anniversary!

CVAT Digest, June 2025: Online Status Page, SDK & CLI Upgrades, and Self-Hosted Performance Boosts

Advanced Analytics: In-Depth Labeling Metric Analysis for CVAT Online and Enterprise

Subscription or One-Off? How Smart Teams Choose Annotation Services

Four Ways to Automate Your Labeling Process in CVAT

Automated Data Labeling: What It Is and When to Use It

Point Cloud Annotation: A Complete Guide to 3D Data Labeling

CVAT Digest, May 2025: Smarter Data Tracking, Improved Data Import and Export, and an Expanded Analytics Suite

How to Choose a Data Annotation Service Provider (and Not Regret It)

Boost Your Annotation Accuracy with CVAT Immediate Job Feedback

Introducing Honeypots: Scalable Quality Checks, Smaller Validation Sets

All Shortcuts, Your Way: Keyboard Shortcut Customization Is Here!

Video Annotation Guide (Applications, Techniques & Best Practices)

Medical Data Annotation: Improving AI Accuracy in Healthcare

CVAT AI Agents: What's New?

CVAT Now Supports Video Annotation with SAM 2

Introduction to Image Annotation for Computer Vision and AI Model Training

How to Create Data Labeling Specifications for Your Annotation Project: A Client's Guide (+ Free Template)

CVAT AI Agents Guide: A New Way to Automate Data Annotation using Your Own Models

CVAT vs. Clarifai: Which Data Annotation Service Is Right for You?

CVAT Adds YOLOv8 Format Support for Seamless Dataset Import and Export

CVAT On-prem Enterprise Clients Can Now Benefit from Enhanced Security with SAML SSO Integration

Meta's SAM 2 is Now Available in CVAT Online for Image Segmentation

Why is it Essential to Keep CVAT Updated?

CVAT.ai Birthday is Here: See Our Achievements in the Field of Data Annotation and Image Labeling

How Much Does it Cost to Outsource Annotation to a Data Labeling Service?

How Much Does It Cost to Annotate Data with an In-House Team?

Calculating the Cost of Image Annotation for AI Projects: Annotating Solo

What is Semantic Segmentation?

Embedded Vision Summit 2024: CVAT.ai Recognized as a Top-Choice Tool

CVAT.ai vs. DataLoop: Which one to choose?

Why Choose CVAT as Your Data Labeling Service

Annotate Images and Videos in CVAT.ai as a Team: A Step-by-Step Guide

Google Summer of Code 2024: Congratulations to Our Selected Contributors!

Announcing Annual Plans for CVAT Online

10 Best Known Open Source Datasets for Computer Vision in 2024

CVAT.AI: Exporting Annotations from CVAT to YOLOv8 Format on Windows

CVAT.ai Annotation Actions: Perform Bulk Actions on Filtered Shapes

Crowdsourcing Annotation with CVAT and Human Protocol: Real Data Experiment Showed Amazing Results

CVAT Joins Google Summer of Code 2024!

CVAT vs. Label Studio: Which One to Choose?

Tips on how to annotate overlapping objects with CVAT

Best Artificial Intelligence and Computer Vision Conferences of 2024

Simultaneous Annotation in CVAT.ai: How to Distribute Dataset Among Several Annotators?

Best Open-Source Image Annotation Tools in 2024

CVAT & HUMAN Protocol: A New Dawn in Visual Data Annotation

Mastering Image Annotation Crowdsourcing for Computer Vision with CVAT.ai and HUMAN Protocol

Optimize Your Data Annotation and Analysis Workflow with CVAT.ai and Voxel51 integration

Improve your Workflow: Switch from CVAT.ai Free Self-Hosted Solution to CVAT Online with Paid Plans

An Introduction to Automated Data Annotation with CVAT CLI

Unlock Swift Image and Video Annotation with CVAT.ai and Hugging Face Integration!

Effortlessly Annotate Videos and Images with CVAT.ai and Roboflow Integration!

How to Ensure Quality in Image Annotation with CVAT.ai