Architecture Overview
Sentry Components
There are components involved in the authorization process:
Sentry Server: The Sentry RPC server manages the authorization metadata. It supports interface to securely retrieve and manipulate the metadata;
Data Engine: This is a data processing application such as Hive or Impala that needs to authorize access to data or metadata resources. The data engine loads the Sentry plugin and all client requests for accessing resources are intercepted and routed to the Sentry plugin for validation;
Sentry Plugin: The Sentry plugin runs in the data engine. It offers interfaces to manipulate authorization metadata stored in the Sentry server, and includes the authorization policy engine that evaluates access requests using the authorization metadata retrieved from the server.
Key Concepts:
User Identity and Group Mapping
finance-department Alice
Bob finance-managers
Analyst grant SELECT on tables Customer and Sales to this role
The next step is to join these authentication entities (users and groups) to authorization entities (roles)
Role-Based Access Control
Role-based access control (RBAC) is a powerful mechanism to manage authorization for a large set of users and data objects in a typical enterprise. New data objects get added or removed, users join, move, or leave organisations all the time. RBAC makes managing this a lot easier. Hence, as an extension of the discussed previously, if Carol joins the Finance Department, all you need to do is add her to the
finance-department group in AD. This will give Carol access to data from the Sales and Customer tables
Unified Authorization
Another important aspect of Sentry is the unified authorization. The access control rules once defined, work across multiple data access tools. For example, being granted the Analyst role in the previous example will allow Bob, Alice, and others in the finance-department group to access table data from SQL engines such as Hive and Impala, as well as via MapReduce, Pig applications or metadata access via HCatalog.
Sentry Integration with the Hadoop Ecosystem
As illustrated above, Apache Sentry works with multiple Hadoop components. At the heart you have the Sentry Server which stores authorization metadata and provides APIs for tools to retrieve and modify this metadata securely.
Note that the Sentry server only facilitates the metadata. The actual authorization decision is made by a policy engine which runs in data processing applications such as Hive or Impala. Each component loads the Sentry plugin which includes the service client for dealing with the Sentry service and the policy engine to validate the authorization request.