Arrange federated entry to Amazon Athena for Microsoft AD FS customers utilizing AWS Lake Formation and a JDBC shopper

Tens of hundreds of AWS prospects select Amazon Easy Storage Service (Amazon S3) as their information lake to run large information analytics, interactive queries, high-performance computing, and synthetic intelligence (AI) and machine studying (ML) functions to achieve enterprise insights from their information. On prime of those information lakes, you should use AWS Lake Formation to ingest, clear, catalog, remodel, and assist safe your information and make it accessible for evaluation and ML. After you have setup your information lake, you should use Amazon Athena which is an interactive question service that makes it simple to research information in Amazon Easy Storage Service (Amazon S3) utilizing normal SQL.

With Lake Formation, you may configure and handle fine-grained entry management to new or present databases, tables, and columns outlined within the AWS Glue Information Catalog for information saved in Amazon S3. After you set entry permissions utilizing Lake Formation, you should use analytics companies equivalent to Amazon Athena, Amazon Redshift, and Amazon EMR with no need to configure insurance policies for every service.

A lot of our prospects use Microsoft Lively Listing Federation Companies (AD FS) as their identification supplier (IdP) whereas utilizing cloud-based companies. On this submit, we offer a step-by-step walkthrough of configuring AD FS because the IdP for SAML-based authentication with Athena to question information saved in Amazon S3, with entry permissions outlined utilizing Lake Formation. This allows end-users to log in to their SQL shopper utilizing Lively Listing credentials and entry information with fine-grained entry permissions.

Resolution overview

To construct the answer, we begin by establishing belief between AD FS and your AWS account. With this belief in place, AD customers can federate into AWS utilizing their AD credentials and assume permissions of an AWS Identification and Entry Administration (IAM) position to entry AWS sources such because the Athena API.

To create this belief, you add AD FS as a SAML supplier into your AWS account and create an IAM position that federated customers can assume. On the AD FS aspect, you add AWS as a relying occasion and write SAML declare guidelines to ship the appropriate person attributes to AWS (particularly Lake Formation) for authorization functions.

The steps on this submit are structured into the next sections:

  1. Arrange an IAM SAML supplier and position.
  2. Configure AD FS.
  3. Create Lively Listing customers and teams.
  4. Create a database and tables within the information lake.
  5. Arrange the Lake Formation permission mannequin.
  6. Arrange a SQL shopper with JDBC connection.
  7. Confirm entry permissions.

The next diagram supplies an outline of the answer structure.

The movement for the federated authentication course of is as follows:

  1. The SQL shopper which has been configured with Lively Listing credentials sends an authentication request to AD FS.
  2. AD FS authenticates the person utilizing Lively Listing credentials, and returns a SAML assertion.
  3. The shopper makes a name to Lake Formation, which initiates an inner name with AWS Safety Token Service (AWS STS) to imagine a task with SAML for the shopper.
  4. Lake Formation returns momentary AWS credentials with permissions of the outlined IAM position to the shopper.
  5. The shopper makes use of the momentary AWS credentials to name the Athena API StartQueryExecution.
  6. Athena retrieves the desk and related metadata from the AWS Glue Information Catalog.
  7. On behalf of the person, Athena requests entry to the information from Lake Formation (GetDataAccess). Lake Formation assumes the IAM position related to the information lake location and returns momentary credentials.
  8. Athena makes use of the momentary credentials to retrieve information objects from Amazon S3.
  9. Athena returns the outcomes to the shopper based mostly on the outlined entry permissions.

For our use case, we use two pattern tables:

  • LINEORDER – A reality desk containing orders
  • CUSTOMER – A dimension desk containing buyer data together with Personally Identifiable Data (PII) columns (c_name, c_phone, c_address)

We even have information shopper customers who’re members of the next groups:

  • CustomerOps – Can see each orders and buyer data, together with PII attributes of the client
  • Finance – Can see orders for analytics and aggregation functions however solely non-PII attributes of the client

To reveal this use case, we create two customers referred to as CustomerOpsUser and FinanceUser and three AD teams for various entry patterns: data-customer (buyer data entry excluding PII attributes), data-customer-pii (full buyer data entry together with PII attributes), and data-order (order data entry). By including the customers to those three teams, we are able to grant the appropriate degree of entry to totally different tables and columns.


To observe together with this walkthrough, you could meet the next conditions:

Arrange an IAM SAML supplier and position

To arrange your SAML supplier, full the next steps:

  1. Within the IAM console, select Identification suppliers within the navigation pane.
  2. Select Add supplier.
  3. For Supplier Sort, select SAML.
  4. For Supplier Title, enter adfs-saml-provider.
  5. For Metadata Doc, obtain your AD FS server’s federation XML file by coming into the next handle in a browser with entry to the AD FS server:

  6. Add the file to AWS by selecting Select file.
  7. Select Add supplier to complete.

Now you’re able to create a brand new IAM position.

  1. Within the navigation pane, select Roles.
  2. Select Create position.
  3. For the kind of trusted entity, select SAML 2.0 federation.
  4. For SAML supplier, select the supplier you created (adfs-saml-provider).
  5. Select Enable programmatic and AWS Administration Console entry.
  6. The Attribute and Worth fields ought to routinely populate with SAML:aud and
  7. Select Subsequent:Permissions.
  8. Add the required IAM permissions to this position. For this submit, connect AthenaFullAccess.

If the Amazon S3 location to your Athena question outcomes doesn’t begin with aws-athena-query-results, add one other coverage to permit customers write question outcomes into your Amazon S3 location. For extra data, see Specifying a Question End result Location Utilizing the Athena Console and Writing IAM Insurance policies: Learn how to Grant Entry to an Amazon S3 Bucket.

  1. Depart the defaults within the subsequent steps and for Position identify, enter adfs-data-access.
  2. Select Create position.
  3. Be aware of the SAML supplier and IAM position names to make use of in later steps when creating the belief between the AWS account and AD FS.

Configure AD FS

SAML-based federation has two participant events: the IdP (Lively Listing) and the relying occasion (AWS), which is the service or software that desires to make use of authentication from the IdP.

To configure AD FS, you first add a relying occasion belief, you then configure SAML declare guidelines for the relying occasion. Declare guidelines are the best way that AD FS types a SAML assertion despatched to a relying occasion. The SAML assertion states that the details about the AD person is true, and that it has authenticated the person.

Add a relying occasion belief

To create your relying occasion in AD FS, full the next steps:

  1. Log in to the AD FS server.
  2. On the Begin menu, open ServerManger.
  3. On the Instruments menu, select the AD FS Administration console.
  4. Beneath Belief Relationships within the navigation pane, select Relying Get together Trusts.
  5. Select Add Relying Get together Belief.
  6. Select Begin.
  7. Choose Import information concerning the relying occasion revealed on-line or on a neighborhood community and enter the URL

The metadata XML file is a typical SAML metadata doc that describes AWS as a relying occasion.

  1. Select Subsequent.
  2. For Show identify, enter a reputation to your relying occasion.
  3. Select Subsequent.
  4. Choose I don’t wish to configure multi-factor authentication.

For elevated safety, we suggest that you just configure multi-factor authentication to assist shield your AWS sources. We don’t allow multi-factor authentication for this submit as a result of we’re utilizing a pattern dataset.

  1. Select Subsequent.
  2. Choose Allow all customers to entry this relying occasion and select Subsequent.

This permits all customers in Lively Listing to make use of AD FS with AWS as a relying occasion. You need to contemplate your safety necessities and alter this configuration accordingly.

  1. End creating your relying occasion.

Configure SAML declare guidelines for the relying occasion

You create two units of declare guidelines on this submit. The primary set (guidelines 1–4) incorporates AD FS declare guidelines which can be required to imagine an IAM position based mostly on AD group membership. These are the principles that you just additionally create if you wish to set up federated entry to the AWS Administration Console. The second set (guidelines 5–6) are declare guidelines which can be required for Lake Formation fine-grained entry management.

To create AD FS declare guidelines, full the next steps:

  1. On the AD FS Administration console, discover the relying occasion you created within the earlier step.
  2. Proper-click the relying occasion and select Edit Declare Guidelines.
  3. Select Add Rule and create your six new guidelines.
  4. Create declare rule 1, referred to as NameID:
    1. For Rule template, use Remodel an Incoming Declare.
    2. For Incoming declare kind, select Home windows account identify.
    3. For Outgoing declare kind, select Title ID.
    4. For Outgoing identify ID format, select Persistent Identifier.
    5. Choose Move via all declare values.
  5. Create declare rule 2, referred to as RoleSessionName:
    1. For Rule template, use Ship LDAP Attribute as Claims.
    2. For Attribute retailer, select Lively Listing.
    3. For Mapping of LDAP attributes to outgoing declare varieties, add the attribute E-Mail-Addresses and outgoing declare kind
  6. Create declare rule 3, referred to as Get AD Teams:
    1. For Rule template, use Ship Claims Utilizing a Customized Rule.
    2. For Customized rule, enter the next code:
      c:[Type == "", Issuer == "AD AUTHORITY"]
      => add(retailer = "Lively Listing", varieties = ("http://temp/variable"), question = ";tokenGroups;{0}", param = c.Worth);

  7. Create declare rule 4, referred to as Roles:
    1. For Rule template, use Ship Claims Utilizing a Customized Rule.
    2. For Customized rule, enter the next code (enter your account quantity and identify of the SAML supplier you created earlier):
      c:[Type == "http://temp/variable", Value =~ "(?i)^aws-"]
      => subject(Sort = "", Worth = RegExReplace(c.Worth, "aws-", "arn:aws:iam::<AWS ACCOUNT NUMBER>:saml-provider/<adfs-saml-provider>,arn:aws:iam::<AWS ACCOUNT NUMBER>:position/"));

Declare guidelines 5 and 6 permit Lake Formation to make authorization selections based mostly on person identify or the AD group membership of the person.

  1. Create declare rule 5, referred to as LF-UserName, which passes the person identify and SAML assertion to Lake Formation:
    1. For Rule template, use Ship LDAP Attributes as Claims.
    2. For Attribute retailer, select Lively Listing.
    3. For Mapping of LDAP attributes to outgoing declare varieties, add the attribute Person-Principal-Title and outgoing declare kind
  2. Create declare rule 6, referred to as LF-Teams, which passes information and analytics-related AD teams that the person is a member of, together with the SAML assertion to Lake Formation:
    1. For Rule template, use Ship Claims Utilizing a Customized Rule.
    2. For Customized rule, enter the next code:
      c:[Type == "http://temp/variable", Value =~ "(?i)^data-"]
      => subject(Sort = "", Worth = c.Worth);

The previous rule snippet filters AD group names beginning with data-. That is an arbitrary naming conference; you may undertake your most popular naming conference for AD teams which can be associated to information lake entry.

Create Lively Listing customers and teams

On this part, we create two AD customers and required AD teams to reveal various ranges of entry to the information.

Create customers

You create two AD customers: FinanceUser and CustomerOpsUser. Every person corresponds to a person who’s a member of the Finance or Buyer enterprise models. The next desk summarizes the small print of every person.


FinanceUser CustomerOpsUser
First Title FinanceUser CustomerOpsUser
Person logon identify [email protected] [email protected]
Electronic mail [email protected] [email protected]

To create your customers, full the next steps:

  1. On the Server Supervisor Dashboard, on the Instruments menu, select Lively Listing Customers and Computer systems.
  2. Within the navigation pane, select Customers.
  3. On the instrument bar, select the Create person icon.
  4. For First identify, enter FinanceUser.
  5. For Full identify, enter FinanceUser.
  6. For Person logon identify, enter [email protected].
  7. Select Subsequent.
  8. Enter a password and deselect Person should change password at subsequent logon.

We select this feature for simplicity, however in real-world situations, newly created customers should change their password for safety causes.

  1. Select Subsequent.
  2. In Lively Listing Customers and Computer systems, select the person identify.
  3. For Electronic mail, enter [email protected].

Including an e mail is necessary as a result of it’s used because the RoleSessionName worth within the SAML assertion.

  1. Select OK.
  2. Repeat these steps to create CustomerOpsUser.

Create AD teams to characterize information entry patterns

Create the next AD teams to characterize three totally different entry patterns and likewise the flexibility to imagine an IAM position:

  • data-customer – Members have entry to non-PII columns of the buyer desk
  • data-customer-pii – Members have entry to all columns of the buyer desk, together with PII columns
  • data-order – Members have entry to the lineorder desk
  • aws-adfs-data-access – Members assume the adfs-data-access IAM position when logging in to AWS

To create the teams, full the next steps:

  1. On the Server Supervisor Dashboard, on the Instruments menu, select Lively Listing Customers and Computer systems.
  2. On the instrument bar, select the Create new group icon.
  3. For Group identify¸ enter data-customer.
  4. For Group scope, choose International.
  5. For Group kind¸ choose Safety.
  6. Select OK.
  7. Repeat these steps to create the remaining teams.

Add customers to acceptable teams

Now you add your newly created customers to their acceptable teams, as detailed within the following desk.

Person Group Membership Description
CustomerOpsUser data-customer-pii
Sees all buyer data together with PII and their orders
FinanceUser data-customer
Sees solely non-PII buyer information and orders

Full the next steps:

  1. On the Server Supervisor Dashboard, on the Instruments menu, select Lively Listing Customers and Computer systems.
  2. Select the person FinanceUser.
  3. On the Member Of tab, select Add.
  4. Add the suitable teams.
  5. Repeat these steps for CustomerOpsUser.

Create a database and tables within the information lake

On this step, you copy information information to an S3 bucket in your AWS account by operating the next AWS Command Line Interface (AWS CLI) instructions. For extra data on tips on how to arrange the AWS CLI, discuss with Configuration Fundamentals.

These instructions copy the information that include information for buyer and lineorder tables. Substitute <BUCKET NAME> with the identify of an S3 bucket in your AWS account.

aws s3 sync s3://awssampledb/load/ s3://<BUCKET NAME>/buyer/ 
--exclude "*" --include "customer-fw.tbl-00*" --exclude "*.bak"

aws s3api copy-object --copy-source awssampledb/load/lo/lineorder-single.tbl000.gz 
--key lineorder/lineorder-single.tbl000.gz --bucket <BUCKET NAME> 
--tagging-directive REPLACE

For this submit, we use the default settings for storing information and logging entry requests inside Amazon S3. You’ll be able to improve the safety of your delicate information with the next strategies:

  • Implement encryption at relaxation utilizing AWS Key Administration Service (AWS KMS) and buyer managed encryption keys
  • Use AWS CloudTrail and audit logging
  • Limit entry to AWS sources based mostly on the least privilege precept

Moreover, Lake Formation is built-in with CloudTrail, a service that gives a report of actions taken by a person, position, or AWS service in Lake Formation. CloudTrail captures all Lake Formation API calls as occasions and is enabled by default once you create a brand new AWS account. When exercise happens in Lake Formation, that exercise is recorded as a CloudTrail occasion together with different AWS service occasions in occasion historical past. For audit and entry monitoring functions, all federated person logins are logged by way of CloudTrail beneath the AssumeRoleWithSAML occasion identify. It’s also possible to view particular person exercise based mostly on their person identify in CloudTrail.

To create a database and tables within the Information Catalog, open the question editor on the Athena console and enter the next DDL statements. Substitute <BUCKET NAME> with the identify of the S3 bucket in your account.

    c_custkey VARCHAR(10),
    c_name VARCHAR(25),
    c_address VARCHAR(25),
    c_city VARCHAR(10),
    c_nation VARCHAR(15),
    c_region VARCHAR(12),
    c_phone VARCHAR(15),
    c_mktsegment VARCHAR(10)
-- The information information include fastened width columns therefore utilizing RegExSerDe
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    "enter.regex" = "(.{10})(.{25})(.{25})(.{10})(.{15})(.{12})(.{15})(.{10})"
LOCATION 's3://<BUCKET NAME>/buyer/';

CREATE EXTERNAL TABLE salesdata.lineorder(
  `lo_orderkey` int, 
  `lo_linenumber` int, 
  `lo_custkey` int, 
  `lo_partkey` int, 
  `lo_suppkey` int, 
  `lo_orderdate` int, 
  `lo_orderpriority` varchar(15), 
  `lo_shippriority` varchar(1), 
  `lo_quantity` int, 
  `lo_extendedprice` int, 
  `lo_ordertotalprice` int, 
  `lo_discount` int, 
  `lo_revenue` int, 
  `lo_supplycost` int, 
  `lo_tax` int, 
  `lo_commitdate` int, 
  `lo_shipmode` varchar(10))
LOCATION 's3://<BUCKET NAME>/lineorder/';

Confirm that tables are created and you’ll see the information:

SELECT * FROM "salesdata"."buyer" restrict 10;
SELECT * FROM "salesdata"."lineorder" restrict 10;

Arrange the Lake Formation permission mannequin

Lake Formation makes use of a mix of Lake Formation permissions and IAM permissions to attain fine-grained entry management. The beneficial strategy consists of the next:

  • Coarse-grained IAM permissions – These apply to the IAM position that customers assume when operating queries in Athena. IAM permissions management entry to Lake Formation, AWS Glue, and Athena APIs.
  • High-quality-grained Lake Formation grants – These management entry to Information Catalog sources, Amazon S3 places, and the underlying information at these places. With these grants, you may give entry to particular tables or solely columns that include particular information values.

Configure IAM position permissions

Earlier within the walkthrough, you created the IAM position adfs-data-access and hooked up the AWS managed IAM coverage AthenaFullAccess to it. This coverage has all of the permissions required for the needs of this submit.

For extra data, see the Information Analyst Permissions part in Lake Formation Personas and IAM Permissions Reference.

Register an S3 bucket as a knowledge lake location

The mechanism to control entry to an Amazon S3 location utilizing Lake Formation is to register a knowledge lake location. Full the next steps:

  1. On the Lake Formation console, select Information lake places.
  2. Select Register location.
  3. For Amazon S3 path, select Browse and find your bucket.
  4. For IAM position, select AWSServiceRoleForLakeFormationDataAccess.

On this step, you specify an IAM service-linked position, which Lake Formation assumes when it grants momentary credentials to built-in AWS companies that entry the information on this location. This position and its permissions are managed by Lake Formation and may’t be modified by IAM principals.

  1. Select Register location.

Configure information permissions

Now that you’ve got registered the Amazon S3 path, you may give AD teams acceptable permissions to entry tables and columns within the salesdata database. The next desk summarizes the brand new permissions.

Database and Desk AD Group Title Desk Permissions Information Permissions
salesdata.buyer data-customer Choose c_city, c_custkey, c_mktsegment, c_nation, and c_region
salesdata.buyer data-customer-pii Choose All information entry
salesdata.lineorder data-order Choose All information entry
  1. On the Lake Formation console, select Tables within the navigation pane.
  2. Filter tables by the salesdata database.
  3. Choose the buyer desk and on the Actions menu, select View permissions.

You need to see following present permissions. These entries permit the present information lake administrator to entry the desk and all its columns.

  1. So as to add new permissions, choose the desk and on the Actions menu, select Grant.
  2. Choose SAML person and teams.
  3. For SAML and Amazon QuickSight customers and teams, enter arn:aws:iam::<AWS ACCOUNT NUMBER>:saml-provider/adfs-saml-provider:group/data-customer.

To get this worth, get the ARN of the SAML supplier from the IAM console and append :group/data-customer to the tip of it.

  1. Choose Named information catalog sources.
  2. For Databases, select the salesdata database.
  3. For Tables, select the buyer desk.
  4. For Desk permissions, choose Choose.
  5. For Information permissions, choose Column-based entry.
  6. For Choose columns, add the columns c_city, c_custkey, c_mktsegment, c_nation, and c_region.
  7. Select Grant.

You’ve gotten now allowed members of the AD group data-customer to have entry to columns of the buyer desk that don’t embody PII.

  1. Repeat these steps for the buyer desk and data-customer-pii group with all information entry.
  2. Repeat these steps for the lineorder desk and data-order group with all information entry.

Arrange a SQL shopper with JDBC connection and confirm permissions

On this submit, we use SQL Workbench to entry Athena via AD authentication and confirm the Lake Formation permissions you created within the earlier part.

Put together the SQL shopper

To arrange the SQL shopper, full the next steps:

  1. Obtain and extract the Lake Formation-compatible Athena JDBC driver with AWS SDK (2.0.14 or later model) from Utilizing Athena with the JDBC Driver.
  2. Go to the SQL Workbench/J web site and obtain the most recent secure package deal.
  3. Set up SQL Workbench/J in your shopper laptop.
  4. In SQL Workbench, on the File menu, select Handle Drivers.
  5. Select the New driver icon.
  6. For Title, enter Athena JDBC Driver.
  7. For Library, browse to and select the Simba Athena JDBC .jar file that you just simply downloaded.
  8. Select OK.

You’re now able to create connections in SQL Workbench to your customers.

Create connections in SQL Workbench

To create your connections, full the next steps:

  1. On the File menu, select Join.
  2. Enter the identify Athena-FinanceUser.
  3. For Driver, select the Simba Athena JDBC driver.
  4. For URL, enter the next code (substitute the placeholders with precise values out of your setup and take away the road breaks to make a single line connection string):
jdbc:awsathena://AwsRegion=<AWS Area Title e.g. ap-southeast-2>;
S3OutputLocation=s3://<Athena Question End result Bucket Title>/jdbc;
idp_host=<adfs-server-name e.g.>;
preferred_role=<ARN of the position created in step1 e.g. arn>;
[email protected]<Area Title e.g.>;

For this submit, we used a self-signed certificates with AD FS. This certificates isn’t trusted by the shopper, subsequently authentication doesn’t succeed. Because of this the SSL_Insecure attribute is about to true to permit authentication regardless of the self-signed certificates. In real-world setups, you’ll use legitimate trusted certificates and may take away the SSL_Insecure attribute.

  1. Create a brand new SQL workbench profile named Athena-CustomerOpsUser and repeat the sooner steps with CustomerOpsUser within the connection URL string.
  2. To check the connections, select Take a look at for every person, and make sure that the connection succeeds.

Confirm entry permissions

Now we are able to confirm permissions for FinanceUser. Within the SQL Workbench Assertion window, run the next SQL SELECT assertion:

SELECT * FROM "salesdata"."lineorder" restrict 10;
SELECT * FROM "salesdata"."buyer" restrict 10;

Confirm that solely non-PII columns are returned from the buyer desk.

As you see within the previous screenshots, FinanceUser solely has entry to non-PII columns of the buyer desk and full entry to (all columns) of the lineorder desk. This permits FinanceUser, for instance, to run mixture and abstract queries based mostly on market phase or location of shoppers with out accessing their private data.

Run the same question for CustomerOpsUser. You need to have the ability to see all columns, together with columns containing PII, within the buyer desk.


This submit demonstrated tips on how to configure your information lake permissions utilizing Lake Formation for AD customers and teams. We configured AD FS 3.0 in your Lively Listing and used it as an IdP to federate into AWS utilizing SAML. This submit additionally confirmed how one can combine your Athena JDBC driver to AD FS and use your AD credentials immediately to connect with Athena.

Integrating your Lively Listing with the Athena JDBC driver provides you the pliability to entry Athena from enterprise intelligence instruments you’re already acquainted with to research the information in your Amazon S3 information lake. This allows you to have a constant central permission mannequin that’s managed via AD customers and their group memberships.

In regards to the Authors

Mostafa Safipour is a Options Architect at AWS based mostly out of Sydney. Over the previous decade he has helped many massive organizations within the ANZ area construct their information, digital, and enterprise workloads on AWS.

Praveen Kumar is a Specialist Resolution Architect at AWS with experience in designing, constructing, and implementing trendy information and analytics platforms utilizing cloud-native companies. His areas of pursuits are serverless know-how, streaming functions, and trendy cloud information warehouses.

Leave a Reply

Your email address will not be published.