Product Updates

Summer is in full bloom, and we haven’t slowed down. Here’s a quick roundup of our team’s July deliveries across CVAT Online, Enterprise, and Community platforms, and more. CVAT Academy After months of preparation, we’re excited to launch CVAT Academy! CVAT Academy is a hands-on online course to help you and your team master key annotation tools and techniques in CVAT, from your first bounding box to advanced workflows. We’ve published the first module on core CVAT annotation tools like bounding boxes, polygons, and AI tools. You can watch them on the CVAT Academy page or YouTube and leave your likes and comments. Your feedback is valued, and we’ve already made some tweaks based on user input. The best part: after talking to our customers and soon-to-be annotators, we decided to make this course completely free! So, whether you want to improve your annotation skills or onboard new members faster, you can do it at no cost with CVAT Academy. New features (Releases 2.41 + 2.42) SAM2 Object Tracking via AI Agents (CVAT Online) CVAT Online users can now automatically track shapes across video frames with SAM2 or any other tracking model, custom or pre-trained, using AI agents. Read more Improved Navigation on Listing Pages (All Platforms) On listing pages like Tasks, Jobs, Projects, and Models, users can choose how many items to display per page: 10, 20, 50, or 100, and quickly jump to a specific page by entering its number. These updates enhance browsing large datasets and make navigation more efficient. Quick Edit from List Views (All Platforms) We’ve added a dynamic Edit option for Task, Job, and Project list pages. Users can now update key fields directly from the list view without opening each item: Assignee can be changed for tasks, jobs, and projects. State and Stage can be updated for jobs. This simplifies routine updates and reduces extra clicks. Other Changes & Fixes (All Platforms) Enabled multi-threaded image downloading from cloud storage when preparing chunks, enhancing performance. Resolved COCO keypoints export issues with absent keypoints. Updated the organization Actions menu to match the style of other menus. Enabled shortcuts configuration in tag annotation mode. Ensured “Automatically go to the next frame” setting applies correctly when adding the first tag in the tag annotation workspace. Improved performance of GET /api/jobs/<id>/annotations and GET /api/tasks/<id>/annotations with many tracks containing mutable attributes. Enforced email verification when using Basic HTTP authentication, if ACCOUNT_EMAIL_VERIFICATION is set to mandatory, preventing unverified access. Updated Python runtime for the Segment Anything Nuclio function from 3.8 to 3.10 to ensure compatibility and support. API: Deprecated legacy token authentication. Updated API token and session/CSRF token auth schemas for simplified and more secure access. The PATCH and PUT endpoints at /api/tasks/<id>/annotations and /api/jobs/<id>/annotations now enforce that annotation IDs are present when updating and absent when creating. This improves consistency and prevents mismatched operations. Updated API schema for session authentication. To get into all the nitty-gritty details, visit our GitHub changelog.

Product Updates

August 1, 2025

CVAT Digest, July 2025: CVAT Academy, SAM2 Tracking via AI Agents, and More

SAM2 Object Tracking Comes to CVAT Online Through AI Agent Integration Previously on this blog, we described the use of the Segment Anything Model 2 (SAM2) for quickly annotating videos by tracking shapes from an initial frame. However, this feature was limited to self-hosted CVAT Enterprise deployments. We have also covered using arbitrary AI models via agents and auto-annotation functions to annotate a CVAT task from scratch. Today we’ll talk about a new CVAT feature that combines the benefits of the two approaches: tracking support in auto-annotation (AA) functions. This enables each user of CVAT Online to make use of an arbitrary tracking AI model by writing a small wrapper (AA function) around it, and running a worker process (AI agent) on their hardware to handle requests. In addition, we have implemented a ready-to-use AA function based on SAM2, so that users who want to make use of that particular model can skip the first step and just run an agent. In this article we will explain how to use the SAM2-based AA function, as well as walk through some of the implementation details. Quick start Let’s get started. You will need: Installed Python (3.10 or a later version) and Git. An account at either CVAT Online or an instance of CVAT Enterprise version 2.42.0 or later. First, clone the CVAT source repository into some directory on your machine. We’ll call this directory <CVAT_DIR>: git clone https://github.com/cvat-ai/cvat.git <CVAT_DIR> Next, install the Python packages for CVAT CLI, SAM2 and Hugging Face Hub: pip install cvat-cli -r <CVAT_DIR>/ai-models/tracker/sam2/requirements.txt If you have issues with installing SAM2, note that the SAM2 install instructions contain solutions to some common problems. Next, register the SAM2 function with CVAT and run an agent for it: cvat-cli --server-host <CVAT_BASE_URL> --auth <USERNAME>:<PASSWORD> \ function create-native "SAM2" \ --function-file=<CVAT_DIR>/ai-models/tracker/sam2/func.py -p model_id=str:<MODEL_ID> cvat-cli --server-host <CVAT_BASE_URL> --auth <USERNAME>:<PASSWORD> \ function run-agent <FUNCTION_ID> \ --function-file=<CVAT_DIR>/ai-models/tracker/sam2/func.py -p model_id=str:<MODEL_ID> where: <CVAT_BASE_URL> is the URL of the CVAT instance you want to use (such as https://app.cvat.ai). <USERNAME> and <PASSWORD> are your CVAT credentials. <FUNCTION_ID> is the number output by the function create-native command. <MODEL_ID> is one of the SAM2 model IDs from Hugging Face Hub, such as facebook/sam2.1-hiera-tiny. Optionally: Add -p device=str:cuda to the second command to run the model on your NVIDIA GPU. By default, the model will run on the CPU. Add --org <ORG_SLUG> to both commands to share the function with your organization. <ORG_SLUG> must be the short name of the organization; it is the name displayed under your username when you switch to the organization in the CVAT UI. The last command should stay running, indicating that the agent is listening to annotation requests from the server. This completes the setup steps. Now you can try the function in action: Open the CVAT UI. Create a new CVAT task or open an existing one. The task must be created either from a video file or from a video-like sequence of images (all images having the same dimensions). Open one of the jobs from the task. Draw a mask or polygon shape around an object. Right-click the shape, open the action menu and choose “Run annotation action”. Choose “AI Tracker: SAM2” in the window that appears. Enter the number of the last frame that you want to track the object to and press Run. Wait for the annotation process to complete. Examine the subsequent frames. You should now see a mask/polygon drawn around the same object on every frame up to the one you selected in the previous step. Instead of selecting an individual shape, you can also track every mask & polygon on the current frame by opening the menu in the top left corner and selecting “Run actions”. Implementation Now let’s take a peek behind the curtains and see how the SAM2 tracking function works. This will be useful if you need to troubleshoot, or if you want to implement a tracking function of your own. Unfortunately, the source of the module is too long to explain in its entirety in this article, but we’ll cover the overall structure and key implementation features. First, let’s look at the top-level structure of func.py: @dataclasses.dataclass(frozen=True, kw_only=True) class _PreprocessedImage: ... @dataclasses.dataclass(kw_only=True) class _TrackingState: ... class _Sam2Tracker: ... create = _Sam2Tracker Since we wanted to support multiple model variants, as well as multiple devices, with a single implementation, we did not place the function’s required attributes directly in the module. Instead, we put them inside a class, _Sam2Tracker, which we want to be instantiated by the CLI with the parameters passed via the -p option. To tell the CLI which class to instantiate, we alias the name create to our class. There are also two auxiliary dataclasses, _PreprocessedImage and _TrackingState. These are not part of the tracking function interface, but an implementation detail. We will see their purpose later. Let’s now zoom in on _Sam2Tracker. __init__ and spec Similar to detection functions that we’ve covered before, in the constructor we load the underlying model (SAM2VideoPredictor). We also create the PyTorch device object and create an input transform. def __init__(self, model_id: str, device: str = "cpu") -> None: self._device = torch.device(device) self._predictor = SAM2VideoPredictor.from_pretrained(model_id, device=self._device) self._transform = torchvision.transforms.Compose([...]) Also similar to detection functions, our tracker must define a spec, although it has to be of type TrackingFunctionSpec: spec = cvataa.TrackingFunctionSpec(supported_shape_types=["mask", "polygon"]) In a tracking function, the spec describes which shape types the function is able to track. However, the other attributes of _Sam2Tracker are entirely unlike those of detection functions. On a high level, a tracking function must analyze an image with a shape on it, then predict the location of that shape on other images. However, to allow more efficient tracking of multiple shapes per image, as well as to enable interactive usage, this functionality is split across three methods. preprocess_image def preprocess_image( self, context: cvataa.TrackingFunctionContext, image: PIL.Image.Image ) -> _PreprocessedImage: image = image.convert("RGB") image_tensor = self._transform(image).unsqueeze(0).to(device=self._device) backbone_out = self._predictor.forward_image(image_tensor) ... return _PreprocessedImage( original_width=image.width, original_height=image.height, vision_feats=...(... backbone_out...), ..., ) This method is supposed to perform any processing that the function can do without knowing the details of the shape it’s tracking. In this way, the results can be reused for multiple shapes. In our case, the underlying model has a dedicated method for doing this, so we transform our input image, and pass it to this method. We then return all information we’ll need later as a new instance of our class _PreprocessedImage. The agent does not care what type of object is returned by preprocess_image - it just saves that object so it can pass it to the other methods. Speaking of which… init_tracking_state def init_tracking_state( self, context: cvataa.TrackingFunctionShapeContext, pp_image: _PreprocessedImage, shape: cvataa.TrackableShape, ) -> _TrackingState: mask = torch.from_numpy(self._shape_to_mask(pp_image, shape)) resized_mask = ...(... mask ...) current_out = self._call_predictor(pp_image=pp_image, mask_inputs=resized_mask, ...) return _TrackingState( frame_idx=0, predictor_outputs={"cond_frame_outputs": {0: current_out}, ...}, ) def _call_predictor(self, *, pp_image: _PreprocessedImage, frame_idx: int, **kwargs) -> dict: out = self._predictor.track_step( current_vision_feats=pp_image.vision_feats, frame_idx=frame_idx, ... **kwargs, ) return ...(... out ...) This method is supposed to analyze the shape on the initial frame. Here we convert the input shape to a mask tensor (for brevity we’ll omit the definition of _shape_to_mask here), and then pass it, alongside the preprocessed image, to the underlying model (via a small wrapping function). The method then encapsulates all information that will be needed to track the shape on subsequent frames in a new _TrackingState object and returns it. Much like preprocess_image, the agent doesn’t care what type of object the method returns, so the tracking function can choose the type in order to best suit its own needs. The agent will simply pass this object into our final method… track def track( self, context: cvataa.TrackingFunctionShapeContext, pp_image: _PreprocessedImage, state: _TrackingState ) -> cvataa.TrackableShape: state.frame_idx += 1 current_out = self._call_predictor( pp_image=pp_image, frame_idx=state.frame_idx, output_dict=state.predictor_outputs, ... ) non_cond_frame_outputs = state.predictor_outputs["non_cond_frame_outputs"] non_cond_frame_outputs[state.frame_idx] = current_out ... output_mask = ...(... current_out["pred_masks"] ...) if output_mask.any(): return self._mask_to_shape(context, output_mask.cpu()) else: return None This method is supposed to locate the shape being tracked on another frame. Here we pass data from the state object and the preprocessed image to the model and get a mask back. If the mask has any pixels set, we return it as a TrackableShape object. The _mask_to_shape method (whose definition we’ll omit) will convert the mask to a shape of the same type as the original shape passed to init_tracking_state. If the mask is all zeros, we presume that we lost track of the shape, and return None. The model also returns additional data that can be used to better track the shape on subsequent frames. track adds it to the tracking state, as can be seen with the non_cond_frame_outputs update. This way, future calls to track are able to make use of this data. Agent behavior Now that we’ve examined the purpose of each method, we can see how they all fit together by looking at the tracking process from the agent’s perspective. Let’s say an agent has loaded tracking function F, and a user makes a request for shape S0 from image I0 to be tracked to images I1, I2, and I3. In this case, the agent will make the following calls to the tracking function: STATE = F.init_tracking_state(SC, F.preprocess_image(C, I0), S0) S1 = F.track(SC, F.preprocess_image(C, I1), STATE) S2 = F.track(SC, F.preprocess_image(C, I2), STATE) S3 = F.track(SC, F.preprocess_image(C, I3), STATE) It will then return resulting shapes S1, S2, and S3 to CVAT. Here C and SC are context objects, created by the agent. For more information on these, please refer to the reference documentation. Limitations There are a few things to keep in mind when using tracking functions (SAM2 included). First, agents currently keep the tracking states in their memory. This means that: Only one agent can be run at a time for any given tracking function. If you run more than one agent for a function, users may see random failures as agents try to complete requests referencing some other agent’s tracking states. If the agent crashes or is shut down, all tracking states are destroyed. If this happens while a user is tracking a shape, the process will fail. Second, tracking functions can only be used via agents. There is no equivalent of the cvat-cli task auto-annotate command. Third, tracking functions may be used either via an annotation action (as was shown in the quick start), or via the AI Tools dialog (accessible via the sidebar). However, the latter method only works with tracking functions that support rectangles - other functions will not be selectable. Fourth, skeletons cannot currently be tracked. Conclusion Tracking with SAM2 saves significant time compared to manually annotating each frame. If you are a user of CVAT Online, this feature is now available to you - sign in and try it out! If there is another model you’d like to use for tracking, you can likely do that as well, as long as you implement the corresponding auto-annotation function. For more details on that, refer to the reference documentation: https://docs.cvat.ai/docs/api_sdk/sdk/auto-annotation/ https://docs.cvat.ai/docs/api_sdk/cli/#examples---functions For more information on other capabilities of AA functions and AI agents, see our previous articles on the topic: https://www.cvat.ai/resources/blog/an-introduction-to-automated-data-annotation-with-cvat-ai-cli https://www.cvat.ai/resources/blog/announcing-cvat-ai-agents https://www.cvat.ai/resources/blog/cvat-ai-agents-update

Product Updates

July 31, 2025

SAM2 Object Tracking Comes to CVAT Online Through AI Agent Integration

The June edition of the CVAT Monthly Digest is here. We are happy to keep you updated with the latest improvements, fixes, and new features across both the SaaS and self-hosted versions of CVAT. What’s new? Status Page for CVAT Online CVAT Status: Now you can check if CVAT Online platform is up and running in real-time at https://status.cvat.ai/. Self-Hosted Enhancements (CVAT Community, Enterprise) Configurable Cache Size Limit: You can now define a maximum size for cached data to avoid oversized data chunks. This gives you more control over your server resources. Grafana Username Filtering: Dashboards just got more intuitive. It’s now possible to filter by usernames, not just internal user ID. It makes monitoring and debugging much more user-friendly. User Activity Tracking (CVAT Online, Community, Enterprise) CVAT now records the last activity date for each user (updated daily). Command-Line Interface Update Clearer Auto-Annotation Errors: If a spec attribute is missing during auto-annotation, you’ll now receive a clear, helpful error message so you can fix the issue quickly. SDK Updates New decode_mask Function: This handy addition lets you generate a bitmap from a mask's points array. Improved encode_mask: You can now use this function without needing to define a bounding box, making it more flexible. Other Improvements (All Versions) Zoom Behavior: Navigation in the annotation view has been improved for both touchpad and mouse users, enjoy smoother and more responsive zooming. Kvrocks Auto Compaction: CVAT now automatically schedules compaction to remove outdated data from disk, helping your system stay efficient. Nuclio Functions: We’ve fixed an issue where shapes from previous frames were incorrectly passed during tracking. Now, tracking starts fresh from the current frame. Annotation Input Validation: Endpoints that accept annotations now validate the shape data format to prevent issues during import. File Import Checks (TUS Protocol): Filename validations have been added during imports for better reliability. Job Frame Input Field: This field now automatically adjusts to match the maximum frame number, improving usability during annotation. TUS Metadata Storage: Only declared fields are now saved—no more clutter from unnecessary data. Grafana & Helm Fixes: We’ve resolved an issue that prevented connections to ClickHouse from Grafana when using Helm charts. Lambda Request Performance: The GET /api/lambda/requests endpoint now performs much better and puts significantly less strain on your database. Reduced Database Load: Dataset export is now much lighter on your database. Small Fixes (All Versions) Page Size Selector: This now works correctly on the organization page. Webhook Setup UI: The project field width has been adjusted for better visibility. Project Reports: These now reuse existing task quality data when available—saving time and resources. 3D Data Export: Exporting 3D data for projects now functions properly. New in the Docs SSO Documentation: We added a new article about CVAT integration with SSO providers: https://docs.cvat.ai/docs/enterprise/sso/. We hope these updates help make your experience more seamless and productive. As always, feedback is very valuable and drives our roadmap. If you have suggestions or run into friction, let us know through the usual channels. You can read the full changelog here: https://github.com/cvat-ai/cvat/releases

Product Updates

June 30, 2025

CVAT Digest, June 2025: Online Status Page, SDK & CLI Upgrades, and Self-Hosted Performance Boosts

Whether it's a small university research or a large enterprise activity, project owners often face similar challenges. They need to maintain consistent quality, track team productivity effectively, and avoid extra costs — no matter what tools they use. CVAT addresses these challenges by providing clear, detailed, and easy-to-understand analytics that includes all the necessary metrics for annotation projects, tasks, and jobs. This allows managers to effortlessly monitor progress, pinpoint productivity bottlenecks, making the annotation workflows smoother and more efficient. But what exactly does CVAT Analytics offer, how do you access Analytics data, and how can you practically use it in your projects? In this article, you'll discover how CVAT Analytics helps to approach these challenges by providing practical tools and actionable insights. What is CVAT Analytics? CVAT Analytics provides insightful metrics for project managers and annotation teams to monitor and improve the annotation workflows. The following outlines the types of metrics that can be tracked: Working time: See exactly how much time annotators spend on tasks. Monitoring time allocation for job stages: Track how long each stage of annotation takes, helping identify slow stages. Quantifying total objects annotated: Keep accurate counts of annotated objects to evaluate productivity. Measuring annotation speeds: Monitor the pace at which annotations are completed and identify efficient annotators or potential issues. Identifying annotation activity and label usage: Gain insights into how labels are being used and annotation patterns. Accessing CVAT Analytics Analytics is available only to users with paid CVAT Online plans or with CVAT Enterprise. The level of access users have depends on their role within CVAT: Owners and Maintainers: Can access analytics for all projects, tasks, and annotation jobs across their workspace. For example, a project owner can review metrics of all team activities to estimate the overall productivity and resource allocation. Supervisors: Can access analytics data only for the projects, tasks, and jobs they have visibility over. For instance, a supervisor overseeing two specific tasks can see analytics for those tasks but not other unrelated projects. Workers: Have access only to analytics related to tasks and jobs assigned directly to them. For example, an annotator will see metrics for their assigned job, allowing them to track their own productivity and performance. Navigating CVAT Analytics To access analytics data in CVAT, navigate to the overview page where all your projects, tasks, or jobs are listed. Find the specific project, task, or job for which you need analytics data, click the three-dot menu next to it, and select "Analytics." When opening the Analytics page for the first time, no data will be displayed immediately. You'll need to click the "Request" button to load the data. This will gather analytics for the selected item and any associated tasks or jobs. Whenever updated analytics are needed, simply click the "Request report update" button to refresh the metrics. Understanding the Analytics Page The Analytics page in CVAT is divided into several tabs, each giving a different view of annotated data to help with tracking progress and improving performance. The Summary tab gives you a quick overview of key project metrics. By default, data is shown for the entire lifetime, but it can be filtered by specifying a UTC start and end date. You can also view the number of created and deleted objects, along with the overall difference in the Objects Diff section. Total Working Time displays the cumulative time spent across all events. Average Annotation Speed indicates the average number of objects annotated per hour. You’ll also see the total working time spent by all annotators, and the average annotation speed, calculated in the number of annotated objects per hour. Scroll down to view pie charts displaying annotation data for shapes and tracks, broken down by type and label. Hover over each segment to see a tooltip with additional information. The Annotations tab breaks down annotation statistics further, depending on the type of annotation. The Detection tab shows count by shapes, there you’ll find a number of objects labeled per category (polygons and masks). This is useful when you want to check whether the distribution of labels aligns with your dataset goals. In the Tracking tab, you’ll get data on how many keyframes, interpolated frames, and object tracks were annotated. Both views come with searchable, filterable tables that you can export annotation statistics or raw events if needed. The Events tab gives a deeper look into what happened and when. This is the most important tab. This tab allows you to track how, when, and by whom each child's job was changed. It shows how everything changed over time. The Total objects, Total images, Total working time, and Avg. annotation speed, which are recalculated automatically depending on the selected filters. You can see who the task and job were assigned to, as well as the annotation stage and current status. On all three tabs of the Analytics page, you can use the calendar to select the time period for which you want to view analytics. For example, you can see if a particular annotator spent extra time in a specific stage or made a excessive number of edits. This level of detail helps identify inconsistencies or inefficiencies in the workflow. Events are grouped based on the job's status and who performed the actions, making it easier to follow the history of work done over time. The Export Events button downloads raw, non-aggregated event data based on the selected dates for users who need custom analytics beyond what's shown on the Events tab. Each table allows users to customize visible columns. Note that not all columns are shown by default on the Events tab. Best Practices for Using CVAT Analytics To get the most out of CVAT Analytics, many teams apply a few simple habits that make a big difference in their workflow. These practices help ensure annotations are not only accurate but also completed efficiently. For instance, in a project focused on labeling traffic signs for autonomous vehicles, a small team of annotators works across multiple batches of city footage. The project manager downloads and reviews analytics reports every Friday. They look for patterns like a sudden drop in label volume or a spike in rejected tasks. Lets say, one of team members has consistently low object counts after a schema change.Weekly review helps the manager catch this early and leads to a quick clarification of the labeling rules. Without the analytics check, several batches could have gone out with missing data. In another scenario, a healthcare research group is annotating MRI scans with regions of interest. Different annotators handle different patient sets. Over time, the team notices that one annotator is completing far fewer images than the others. Analytics shows they spend significantly more time per image, because, as it turned out, they’re unsure how to label edge cases in a new category. With that insight, the team arranges a brief retraining session and updates their labeling guide. Productivity improves, and uncertainty drops across the board. Monitoring quality metrics can also prevent wasted effort on downstream tasks. In a project detecting damaged packages from warehouse photos, annotation speed and object counts are tracked closely. If an annotator suddenly doubles their speed but the object count per image drops, it may signal they’re rushing or misunderstanding instructions. This could be due to the fact that a new guideline wasn't fully communicated, and several batches had to be rechecked. Having access to annotation speed and density trends helped the lead catch the issue before model training began. How to Use CVAT Analytics: Step-by-Step Guide CVAT Analytics helps teams keep their annotation projects on track by showing clear, useful data. It makes it easier to spot problems early, check the quality of work, and make sure tasks are shared fairly among team members. Whether the project is small or large, using analytics regularly can save time and improve results Still have questions? Check out the Analytics documentation or watch a short video that explains everything in detail. Ready to explore the new analytics? Create a CVAT account to get started, or contact us to deploy CVAT on your own premises.

Product Updates

June 26, 2025

Advanced Analytics: In-Depth Labeling Metric Analysis for CVAT Online and Enterprise

CVAT Digest, May 2025: Smarter Data Tracking, Improved Data Import and Export, and an Expanded Analytics Suite Over the past month the CVAT team has focused on three themes: richer analytics, clearer control over data volume, and a cleaner, more predictable API. Below you’ll find an overview of the most noticeable changes along with a closer look at improvements that make daily annotation work faster and more reliable. Improved Analytics The most significant change you’ll notice lives in the new Analytics report, available from the “Actions” menu on every project, task, and job. The report now includes three tabs, each helping you to understand project progress better: Summary tab assembles high-level charts^total objects, time spent, average annotation speed, as well as a breakdown of shapes per label. Annotations tab lists every label alongside counts for polygons and other shapes, all searchable and filterable so large projects remain manageable. Events tab brings together log-level detail: task and job IDs, event type, frame count, object count, assignee, and timestamp. Because search and filter tools are built into each view, you can move from a bird’s-eye perspective to a single mislabeled frame without exporting data If you need to share findings, you can export data with all the filters applied, so no need to clean up spreadsheets. 3D Annotation (Online & On-Prem) We’ve added a quick way to annotate 3D objects with cuboids when object sizes are known in advance. Annotators can now create one (or several) correctly sized objects and simply copy/paste them into place, saving time and ensuring consistency. Automatic Data-Size Tracking CVAT now keeps an eye on the storage footprint of every image, video, and guide file you upload. For cloud users the measurement happens in the background; on self-hosted installations an administrator can initialize the scan with a single python manage.py initcontentsize command. Continuous tracking means you’ll see the true scale of a project before downloads, migrations, or backup operations begin. Improved Import and Better Export File-based annotation imports now tag their source as “file” by default, leaving no ambiguity. On the export side, event cache files move into a dedicated /data/cache/export/ directory, where they follow an automatic cleanup schedule so stale archives don’t accumulate. An API That Gets Out of the Way Several endpoints have been retired or consolidated to reduce duplication: Event logs now can be exported through POST /api/events/export, and the results can be fetched with a GET request to /api/requests/{rq_id}. Old endpoints were deprecated. Status checks for background requests were moved to the /api/requests/{rq_id} path, replacing the older quality-report endpoint. On the SDK side, classes such as DatasetWriteRequest and TaskAnnotationsUpdateRequest have been removed, and failed background tasks now raise a dedicated BackgroundRequestException rather than a catch-all ApiException. Project Management and Project Quality Settings Project-level quality settings now propagate automatically to every task under the project unless a task has its own custom rules. The Project Quality page has been tightened up so that all tasks render consistently, and a new job filter helps large teams isolate outliers without digging through spreadsheets. Orientation checks now save properly, so reviewers can apply the setting once and move on. Faster Search and Lower Memory Imports During annotation you can jump directly to a frame by typing part of the file name, which is handy for long video sequences. Behind the scenes, YOLO and COCO imports have been refactored to lower peak memory use, which in practice allows larger datasets to load without timeouts. Ultralytics YOLO archives import correctly even when image information is absent, and the Datumaro engine now supports ellipse shapes, improving a workflow for circular objects. Security and Reliability Fixes A browsable-API issue that exposed certain resource names and IDs has been closed. Restored backups now preserve the original asset owner, AI model tracking auto-starts as intended, and several corner-case crashes involving 3D cuboids and track interpolation have been addressed. Looking Ahead All the features described here are already available on CVAT Online and in the latest CVAT On-Premises release. As always, feedback is very valuable and drives our roadmap; if you have suggestions or run into friction, let us know through the usual channels. Until next month! You can read the full changelog here: https://github.com/cvat-ai/cvat/releases

Product Updates

May 30, 2025

CVAT Digest, May 2025: Smarter Data Tracking, Improved Data Import and Export, and an Expanded Analytics Suite

Product Updates

May 8, 2025

Announcing CVAT AI Agents: A New (and Better) Way to Automate Data Annotation using Your Own Models

CVAT, your go-to computer vision annotation tool, now supports the YOLOv8 dataset format.‍‍Version 2.17.0 of CVAT is currently live. Among the many changes and bug fixes, CVAT also introduced support for YOLOv8 datasets for all open-source, SaaS, and Enterprise customers. Starting now, you export annotated data to be compatible with YOLOv8 models.What is the YOLOv8 Dataset Format?YOLOv8, developed by Ultralytics, is the latest version of the YOLO (You Only Look Once) object detection series of models. YOLOv8 is designed for:‍‍Classification: Classifying or organizing an entire image into a set of predefined classes;‍Classification‍‍Object Detection: Detecting, locating, and identifying the class of object in the image or visual data.‍Object Detection‍‍Pose Estimation: Identifying the location and orientation of a person or object within an image by recognizing specific keypoints (also referred to as interest points).‍Pose Estimation‍‍Oriented Bounding Boxes: A step further than object detection and introduce an extra angle to locate objects more accurately in an image.Oriented Bounding Boxes‍‍Instance Segmentation: Pixel-accurate segmentation of objects or people in an image or visual data.‍Instance Segmentation‍With the help of CVAT’s data labeling and annotation tools, YOLOv8 models can be trained to perform the functions as accurately as possible.‍What are the Benefits of Using the YOLOv8 Model for Computer Vision?Ultralytics has used the knowledge and experience garnered from previous iterations of their AI models to create the latest and most advanced YOLOv8. The benefits of using YOLOv8 include, but are certainly not limited to: ‍Highly accurate object detection;Versatility when it comes to detecting multiple objects, classifying and segmenting them, and detecting keypoints within images;Efficient, as the YOLOv8 has been optimized for efficient hardware usage and doesn’t require much computing power to run;Open-source means that the YOLOv8 is always evolving and being built by a passionate community of developers and all its features are easily accessible;And a lot more that would require a much longer list than this. ‍Which Industries Can Benefit from Training YOLOv8 Models?A trained YOLOv8 model can then be used for a variety of tasks. The functionality that YOLOv8 computer vision models can provide will benefit the following industries.‍Computer vision and AI models trained to detect various objects related to automotive are the way of the future in the automotive industry. Self-driving vehicles and traffic management are just a few of the ways that YOLOv8 models will benefit the automotive industry.Automotive use case‍The YOLOv8 object detection model can also offer significant functionality for security. Thanks to highly accurate object tracking and pose estimation, YOLOv8 models can detect intrusions and monitor for unregistered activities or prohibited objects within a given area.‍Security us case‍Using computer vision in retail and logistics will improve the efficiency at which stores maintain their supply and stock. They can also use YOLOv8's powerful object detection function to detect which shelves need to be restocked to improve customer experience.‍Naturally, the robotics industry greatly benefits from AI models with accurate computer vision, as it helps significantly when it comes to problem-solving. With each advancement in computer vision, problem-solving robots get more and more sophisticated as a result.‍Robotics use case‍In construction and architecture, computer vision can identify weak support, foundational problems, and other structural errors. This can help construction crews to detect potentially disastrous errors before any serious problems occur. On top of that, visual surveillance can be paired with AI to help construction managers detect safety hazards before they take place.‍Safety hazards use case‍There are ton of functions for many other industries when it comes to Ultalytics' YOLOv8 model. For now, these are among the most popular use cases for this tech.‍Understanding the Technical Details of the YOLOv8 Dataset FormatThe YOLOv8 dataset format uses a text file for each image, where each line corresponds to one object in the image.‍Each line includes five values for detection tasks: class_id, center_x, center_y, width, and height. These coordinates are normalized to the image size, ensuring consistency across varying image dimensions.‍For tasks like pose estimation, the YOLOv8 format also includes additional keypoint coordinates. Segmentation tasks require the use of polygons or masks, represented by a series of points that define the object boundary. Additionally, oriented bounding boxes can be rotated, which helps in annotating objects not aligned with the image axes.‍Dataset StructureThe YOLOv8 dataset typically includes the following components:‍<dataset directory>/ ├── data.yaml # configuration file ├── train.txt # list of train subset image paths │ ├── images/ │ ├── train/ # directory with images for train subset │ │ ├── image1.jpg │ │ ├── image2.jpg │ │ ├── image3.jpg │ │ └── ... ├── labels/ │ ├── train/ # directory with annotations for train subset │ │ ├── image1.txt │ │ ├── image2.txt │ │ ├── image3.txt │ │ └── ...‍‍Images Folder: This folder contains the images you are training the model on. These images are referenced by the corresponding annotation files.‍Annotations: Each image has a corresponding .txt file with the same name located in the annotations folder. The file structure for detection tasks looks like this:‍# <image_name>.txt: # content depends on format # YOLOv8 Detection: # label_id - id from names field of data.yaml # cx, cy - relative coordinates of the bbox center # rw, rh - relative size of the bbox # label_id cx cy rw rh 1 0.3 0.8 0.1 0.3 2 0.7 0.2 0.3 0.1 # YOLOv8 Oriented Bounding Boxes: # xn, yn - relative coordinates of the n-th point # label_id x1 y1 x2 y2 x3 y3 x4 y4 1 0.3 0.8 0.1 0.3 0.4 0.5 0.7 0.5 2 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 # YOLOv8 Segmentation: # xn, yn - relative coordinates of the n-th point # label_id x1 y1 x2 y2 x3 y3 ... 1 0.3 0.8 0.1 0.3 0.4 0.5 2 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 0.7 0.5 # YOLOv8 Pose: # cx, cy - relative coordinates of the bbox center # rw, rh - relative size of the bbox # xn, yn - relative coordinates of the n-th point # vn - visibility of n-th point. 2 - visible, 1 - partially visible, 0 - not visible # if second value in kpt_shape is 3: # label_id cx cy rw rh x1 y1 v1 x2 y2 v2 x3 y3 v3 ... 1 0.3 0.8 0.1 0.3 0.3 0.8 2 0.1 0.3 2 0.4 0.5 2 0.0 0.0 0 0.0 0.0 0 2 0.3 0.8 0.1 0.3 0.7 0.2 2 0.3 0.1 1 0.4 0.5 0 0.5 0.6 2 0.7 0.5 2 # if second value in kpt_shape is 2: # label_id cx cy rw rh x1 y1 x2 y2 x3 y3 ... 1 0.3 0.8 0.1 0.3 0.3 0.8 0.1 0.3 0.4 0.5 0.0 0.0 0.0 0.0 2 0.3 0.8 0.1 0.3 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 0.7 0.5 # Note, that if there are several skeletons with different number of points, # smaller skeletons are padded with points with coordinates 0.0 0.0 and visibility = 0‍‍data.yaml: This configuration file defines the dataset structure for training. It includes paths to the images and annotation files and lists all class names. An example of a data.yaml file looks like this:‍path: ./ # dataset root dir train: train.txt # train images (relative to 'path') # YOLOv8 Pose specific field # First number is the number of points in a skeleton. # If there are several skeletons with different number of points, it is the greatest number of points # Second number defines the format of point info in annotation txt files kpt_shape: [17, 3] # Classes names: 0: person 1: bicycle 2: car # ... ‍This lightweight and modular format allows for flexibility and scalability in your machine-learning pipeline. It also means that it can undertake a wide range of computer vision tasks, including object detection, pose estimation, segmentation, and oriented bounding boxes. For more technical details and in-depth usage, you can explore the full YOLOv8 format documentation.‍How to Use the YOLOv8 Dataset Format in CVATExporting YOLOv8 DatasetsAfter completing annotations in CVAT, exporting them in a YOLOv8 format is straightforward. Here’s how you can do it:‍‍Export Your Dataset: Once your annotations are ready, CVAT allows you to export them in YOLOv8 format, ensuring they are perfectly structured for use in YOLOv8 models. This includes annotations for detection, pose, oriented bounding boxes, and segmentation tasks. For detailed instructions on exporting your dataset, you can refer to the Exporting Annotations Guide.‍Train Your YOLOv8 Model: With your annotations exported, you can now directly integrate them into Ultralytics' YOLOv8 training pipeline. The dataset will be ready to train your model for detection, pose estimation, or segmentation tasks without the need for conversion. For further guidance on training your YOLOv8 models using Python, check out the Ultralytics YOLOv8 Python Usage Guide.‍Importing YOLOv8 DatasetsIn addition to exporting datasets, CVAT also supports importing datasets that are already in the YOLOv8 format. This feature allows you to bring external datasets and annotations into CVAT for further refinement or use in different projects. You can import both annotations and images for detection, oriented bounding boxes, segmentation, and pose estimations.‍To learn more about how to import YOLOv8 datasets and annotations, follow the detailed instructions in our Dataset Import Guide.‍F.A.Q.Which CVAT users have access to YOLOv8 support?All CVAT users, including open-source, SaaS, and Enterprise, have access to annotation tools for the YOLOv8 computer vision model.‍How good is YOLOv8 object detection?A YOLOv8 computer vision model trained with data annotated through CVAT can be very accurate in identifying various objects in visual data. YOLOv8 models can identify object borders down to the pixel, making them incredibly powerful when it comes to object detection.‍What functions do YOLOv8 models perform in computer vision?As listed above, YOLOv8's functions include classification, object detection, pose estimation, oriented bounding boxes, and instance segmentation.‍Start Using YOLOv8 in CVAT Today!The additional support for YOLOv8 dataset formats is a major milestone for CVAT. All open-source, SaaS customers and Enterprise clients are welcome to try out CVAT to help you train a YOLOv8 model for all manner of computer vision uses.‍For more information, visit our YOLOv8 format documentation. ‍Not a CVAT.ai user? Click through and sign up here.‍Do not want to miss updates and news? Have any questions? Join our community:‍Facebook‍DiscordLinkedInGitterGitHub

Product Updates

September 17, 2024

CVAT Digest, August 2025: One-Step Account Cleanup, Smarter Labels, and More

Save Time,
Annotate Better

Subscribe to the CVAT Newsletter

Product & Services

Company

Resources

Product Updates

CVAT Digest, August 2025: One-Step Account Cleanup, Smarter Labels, and More

CVAT Digest, July 2025: CVAT Academy, SAM2 Tracking via AI Agents, and More

SAM2 Object Tracking Comes to CVAT Online Through AI Agent Integration

CVAT Digest, June 2025: Online Status Page, SDK & CLI Upgrades, and Self-Hosted Performance Boosts

Advanced Analytics: In-Depth Labeling Metric Analysis for CVAT Online and Enterprise

CVAT Digest, May 2025: Smarter Data Tracking, Improved Data Import and Export, and an Expanded Analytics Suite

Boost Your Annotation Accuracy with CVAT Immediate Job Feedback

Introducing Honeypots: Scalable Quality Checks, Smaller Validation Sets

All Shortcuts, Your Way: Keyboard Shortcut Customization Is Here!

CVAT AI Agents: What's New?

CVAT Now Supports Video Annotation with SAM 2

Announcing CVAT AI Agents: A New (and Better) Way to Automate Data Annotation using Your Own Models

CVAT Adds YOLOv8 Format Support for Seamless Dataset Import and Export

Save Time, Annotate Better

Subscribe to the CVAT Newsletter

Product & Services

Company

Resources

Save Time,
Annotate Better