Try for Free
PRICING cloudOn-prem deployment
Control User Access and Permissions in CVAT with Open Policy Agent 🔒

The article examines the implementation of Identity and Access Management (IAM) in the Computer Vision Annotation Tool (CVAT), which is a part of the OpenCV ecosystem. CVAT helps to annotate raw images and video files to produce a ready-to-use Computer Vision dataset in popular formats such as MS COCO, PASCAL VOC, YOLO, etc. Here we will cover: 

  • How to implement IAM in an application, using an open source, general-purpose policy engine
  • The permission management system in CVAT
  • Frequently asked questions about roles and permissions in CVAT

When CVAT was released as an open-source project, it already had multi-user support. Each user could perform one or more roles such as observer, annotator, user, or admin. A primitive form of Attribute-Based Access Control (ABAC) was used to access resources like tasks, jobs, and projects. Thus, the owner or an assignee of the resource could view, modify, or delete it.

The system was poorly designed and inflexible. Users reported many issues about that. Overall, the most frequent request was to limit data annotators’ rights. Workers should not be able to export datasets, reassign jobs to others, arbitrarily change the job’s statuses, finish jobs, etc. Many CVAT users are AI developers who delegate annotation tasks to other workers, however they still need to control all the stages of the data annotation pipeline. Another part of CVAT’ss userbase is labeling companies that annotate data for external customers, and it is critical for them to avoid leaks.

Below is a code snippet that shows how protection of resources was implemented in a previous CVAT version. It was based on the django-rules package:

The snippet of code has a predicate that checks rights to change a job object and a permission class to enforce these rights. The complete code for the example can be found here. Note that this code presents several roadblocks for maintaining and extending this solution:

  • It is necessary to modify the Python code itself.
  • The code depends on the database structure.
  • The rights of different users are unclear. 

It was necessary to find an alternative solution to overcome these problems, and a number of open-source products cover many aspects of IAM. One such product, KeyCloak, was found to be too heavy for our needs; the matchers for another product, pycasbin, are only suitable for simple use cases but otherwise require Python to implement functions. 

Ultimately the clear winner was Open Policy Agent (OPA)-  a light, powerful, and flexible solution. OPA (pronounced “oh-pa”) is an open-source, general-purpose policy engine. For more information, see their website. The rest of this article describes only the features that are used for integration with CVAT.

For CVAT, OPA is a microservice which has a REST API. Thus, CVAT sends HTTP requests (Query) to OPA that replies with “allow” or “deny” in the simplest case (Decision). Those who are acquainted with the fundamentals of authorization might recognize the PDP-PEP pattern. In this case, OPA is the Policy Decision Point (PDP) and makes decisions, while CVAT is the Policy Enforcement Point (PEP) and enforces these decisions by providing information about requested resources or responding with an error.

The question is how the policy engine understands how to respond to our queries correctly. OPA provides a high-level declarative language that lets you specify the policy as code and simple APIs to offload policy decision-making from your software. CVAT controls what queries look like and implements policies to handle them. After a query is processed by OPA in accordance with implemented policies, OPA replies to CVAT with its decision.

For example, suppose that somebody wants to protect information about users. In case of CVAT, this information can be extracted using the following operations:

GET /api/users - get the list of users

GET /api/users/self - get information about the current user

GET /api/users/1 - get information about the user with id #1

DELETE /api/users/1 - delete the user with id #1

Below is an example of a reply from GET /api/users/self:

Obviously, a newly registered user without admin permissions should not be able to delete another account. Our goal is to implement some policies to protect resources from undesirable actions.

# TODO: check that readers of the article will not be able to delete my account on

For the sake of practice, play with the example below at the rego playground. This code is simplified but the general ideas are the same as used in CVAT. We’ll start with a query:

Every query is an arbitrary JSON document, but in CVAT it consists of the scope, the resource, and information from the authentication system. In the example above, the scope is view. It means that the query requests permissions to view the specific resource. In general, the scope is an action that a user wants to do with a resource like view, update, create, import:dataset, etc. In this case the resource contains information about a user record which has one available attribute (id #3). The user on whose behalf the request is made has the privilege user and id #3.

The policies are written in the Rego language and grouped into modules:

Modules consist of a package, optional import statements, and optional rules. In the first line the instruction defines the package users, which will contain the policy for the user resource. Next, the allow variable is defined with a value of false by default. The policy consists of four rules that define the content of Virtual Documents in OPA. When OPA evaluates a rule, we say that OPA generates the content of the document. All expressions inside one allow rule are joined using AND, but different allow rules are joined by OR. Thus, the user policy in the code above means deny by default but allow if the user has the admin privilege OR the query’s scope equals to list OR the scope is one of update, delete, or view AND the identifier of the resource is the same as the identifier of the user on whose behalf the request is made.

Previously policies were encoded using Python, now it uses Rego; why is one better? There are multiple advantages:

  • The Policy Decision Point (PDP) is a separate microservice based on a popular, well-documented solution that’s maintained by experts in a corresponding area and scales well.
  • All policies are placed in a separate directory, and it is relatively easy to extend and modify them without necessity to change the CVAT server itself. Of course, it is true only if the structure of queries to OPA is not changed as well.
  • OPA has an approach for testing policies and this feature is used actively by CVAT. An example of generated tests can be found here.
  • The Policy Enforcement Point (PEP) is simple and laconic. The whole logic is defined in one file.

However, there are also some disadvantages:

  • The entry level is relatively high, with lots of materials to understand and learn.
  • Although it is possible to play and experiment with OPA in the sandbox and enable decision logs, the debugging of complex policies can be a non-trivial procedure.
  • OPA is an additional dependency which must be considered during development. CVAT will not work anymore, if the OPA service isn’t running.
  • In our case the REST API had to be refactored because the previous version could not support the new security model. Keep in mind the extra overhead during integration.
  • Every user’s request to the CVAT server leads to one or several queries to OPA service, which is bad for the response time.

Handling Requests for Lists of Objects

There is one more significant and non-obvious advantage of OPA. One of the biggest problems in the PDP-PEP pattern is the handling of lists of objects. Imagine that it is necessary to return a list of users (in this case the scope is list), but the list should contain only objects which the user has the right to view. In other words, the list must be consistent with which queries with the scope set to view would be allowed.

The simplest solution is to send a number of queries with the scope set to view equal to the number of users in the system. Next, return the list of users for which corresponding requests were approved. Understandably, such a solution will work slowly. 

By the end of March 2022, CVAT public server had more than 50 thousand registered users. The good news is that OPA can return an arbitrary JSON document. Hence there is another solution: it’s possible to replicate CVAT’s database into OPA as described in the documentation. Then, use Rego to extract from the data the list of filtered users. Unfortunately, this approach leads to a lot of technical difficulties.

There is one more lightweight, elegant solution: instead of a list of users, OPA can return a filter expression to apply on the PEP side. The filter expression will depend on the request. For a user with the “admin” privilege, it will filter nothing. But for a regular user, it will extract a restricted number of objects. A filter expression is a list of Django Q objects and operations on these objects in the reverse Polish notation for building a filter on the CVAT server side. In this case the filter expression contains only one Django Q object:

Now we hope  it is clear how OPA is integrated into CVAT. The method described can be used as a guide to integrate this policy engine into other products.

But there is one more important question: How do we describe the implemented IAM system for the end user? The Rego language is a solution to develop and maintain policies. For end users, the code in .rego files is unfriendly and unreadable. We spent much time before a suitable solution was found. All permissions are now described in a table with a predefined set of columns. In the CVAT repository for each resource there is a CSV file that describes all permissions in a simple and human-readable form.

For example, the users.csv file describes permissions for working with information about users. Every line in the file is a primitive rule that tells us who has rights to perform a specific action. But how is users.csv related to users.rego? Each line in the CSV file corresponds to a rule in the .rego file. It is automatically checked by tests which have been generated from CSV files (e.g., users_test.gen.rego). It is known that it is a bad practice to put generated files into a repository, and we plan to fix this aspect in the future. The script that was used to generate each *.gen.rego file can be found at the end of the file as a comment.

Let’s look at users.csv and understand what it contains. All CSV files that describe permissions have the same set of columns:

  • Scope is the action performed on the resource. For example, list – get the list of users, update – change information about a user, view – get information about a user and so on.
  • Resource describes the object on which the action is performed.
  • Context can take one of two values: sandbox or organization. An object is in the sandbox if it is created outside of any organization. In theory it is possible to make an object in the sandbox visible for other users but let’s skip these details. An organization can have users with different roles and resources. Resources are shared between members of the organization, but each member has permissions in accordance with their role and ownership. If a user creates an object inside an organization, he/she delegates some rights for the object to members with maintainer and owner roles in the organization.
  • Ownership describes how the user, and the specific resource are connected together.  For example, N/A value means that the property isn’t applicable for the query. Some possible values can be self, owner, assignee, etc. None value means that the user who is making the query doesn’t have any relationships with the resource.
  • Limit covers constraints for the query. Currently this field is not so user-friendly in the table, but it makes it easy for the code generation. This will probably be solved in future releases. Typical constraints are the number of tasks and projects that a regular user can create.
  • Method and URL contain data for information purposes only and are not used. They help to connect rules with the REST API. If a user makes a GET /api/users/1 call, it is easy to locate corresponding permissions in the table. Thus, the scope is view, the resource is user, and the context is sandbox.
  • Privilege corresponds to the group for the current user with the maximum level of permissions. It can be empty if the user doesn’t belong to any groups or equal to worker, user, business, or admin. The primary idea is to delimit the fundamental rights of the user on the platform. For example, users with the privilege less than or equal to worker cannot create tasks and projects. At the same time a user with the maximum privilege admin doesn’t have any restrictions.
  • Membership is the user’s role inside the organization like worker, supervisor, maintainer, or owner. The column makes sense only if a request is made in the context of an organization and allows to delimit access to resources of the organization. For example, if a user has the maintainer role in an organization, but the privilege is worker, he/she will be able to see all resources in the organization but not be able to create tasks and projects in the organization.

At the moment it is difficult to say how suitable such an IAM system will be for all the needs of our community over time. It looks logical and allows us to solve all the related  problems that were submitted by users in the past. If somebody needs to change the default behavior of their own CVAT code, it is possible to modify the policies defined in .rego files and restart the OPA microservice. Additionally, sharing a resource with other users, a popular feature in many applications, can be easily implemented in the future because data and policies in OPA can be changed dynamically using the OPA REST API.

Note: The world is not perfect. The biggest challenge is when a query affects multiple resources at the same time. For example, if somebody moves an existing task from project #X to project #Y. In this case it is necessary to update the project field in the task. Obviously, the current user must have permission to both modify the task and update the field. But it is less obvious that the user must also have permissions to create a task in the project #Y. Such cases are handled in the CVAT server code directly. In this specific case two sequential requests to OPA will be sent and both must be allowed by the policy engine.

To summarize: we’ve described how the new identity and access management system is implemented in CVAT. It was demonstrated how Open Policy Agent is integrated in accordance with the PDP-PEP pattern. The pros and cons of the approach were discussed. A solution was provided for how to describe user permissions in a readable tabular format (CSV). 

Hopefully, developers and community members who are trying to solve a similar problem got a helpful overview of how the permission management system works in CVAT. Please leave your comments to help improve future articles and tools. Thank you!

By Nikita Manovich

August 11, 2022
By Nikita Manovich
Go Back