# Amazon Redshift Construct Library
<!--BEGIN STABILITY BANNER-->---
![cdk-constructs: Experimental](https://img.shields.io/badge/cdk--constructs-experimental-important.svg?style=for-the-badge)
> The APIs of higher level constructs in this module are experimental and under active development.
> They are subject to non-backward compatible changes or removal in any future version. These are
> not subject to the [Semantic Versioning](https://semver.org/) model and breaking changes will be
> announced in the release notes. This means that while you may use them, you may need to update
> your source code when upgrading to a newer version of this package.
---
<!--END STABILITY BANNER-->
## Starting a Redshift Cluster Database
To set up a Redshift cluster, define a `Cluster`. It will be launched in a VPC.
You can specify a VPC, otherwise one will be created. The nodes are always launched in private subnets and are encrypted by default.
```python
import aws_cdk.aws_ec2 as ec2
vpc = ec2.Vpc(self, "Vpc")
cluster = Cluster(self, "Redshift",
master_user=Login(
master_username="admin"
),
vpc=vpc
)
```
By default, the master password will be generated and stored in AWS Secrets Manager.
A default database named `default_db` will be created in the cluster. To change the name of this database set the `defaultDatabaseName` attribute in the constructor properties.
By default, the cluster will not be publicly accessible.
Depending on your use case, you can make the cluster publicly accessible with the `publiclyAccessible` property.
## Adding a logging bucket for database audit logging to S3
Amazon Redshift logs information about connections and user activities in your database. These logs help you to monitor the database for security and troubleshooting purposes, a process called database auditing. To send these logs to an S3 bucket, specify the `loggingProperties` when creating a new cluster.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk.aws_s3 as s3
vpc = ec2.Vpc(self, "Vpc")
bucket = s3.Bucket.from_bucket_name(self, "bucket", "logging-bucket")
cluster = Cluster(self, "Redshift",
master_user=Login(
master_username="admin"
),
vpc=vpc,
logging_properties=LoggingProperties(
logging_bucket=bucket,
logging_key_prefix="prefix"
)
)
```
## Connecting
To control who can access the cluster, use the `.connections` attribute. Redshift Clusters have
a default port, so you don't need to specify the port:
```python
cluster.connections.allow_default_port_from_any_ipv4("Open to the world")
```
The endpoint to access your database cluster will be available as the `.clusterEndpoint` attribute:
```python
cluster.cluster_endpoint.socket_address
```
## Database Resources
This module allows for the creation of non-CloudFormation database resources such as users
and tables. This allows you to manage identities, permissions, and stateful resources
within your Redshift cluster from your CDK application.
Because these resources are not available in CloudFormation, this library leverages
[custom
resources](https://docs.aws.amazon.com/cdk/api/latest/docs/custom-resources-readme.html)
to manage them. In addition to the IAM permissions required to make Redshift service
calls, the execution role for the custom resource handler requires database credentials to
create resources within the cluster.
These database credentials can be supplied explicitly through the `adminUser` properties
of the various database resource constructs. Alternatively, the credentials can be
automatically pulled from the Redshift cluster's default administrator
credentials. However, this option is only available if the password for the credentials
was generated by the CDK application (ie., no value vas provided for [the `masterPassword`
property](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-redshift.Login.html#masterpasswordspan-classapi-icon-api-icon-experimental-titlethis-api-element-is-experimental-it-may-change-without-noticespan)
of
[`Cluster.masterUser`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-redshift.Cluster.html#masteruserspan-classapi-icon-api-icon-experimental-titlethis-api-element-is-experimental-it-may-change-without-noticespan)).
### Creating Users
Create a user within a Redshift cluster database by instantiating a `User` construct. This
will generate a username and password, store the credentials in a [AWS Secrets Manager
`Secret`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-secretsmanager.Secret.html),
and make a query to the Redshift cluster to create a new database user with the
credentials.
```python
User(self, "User",
cluster=cluster,
database_name="databaseName"
)
```
By default, the user credentials are encrypted with your AWS account's default Secrets
Manager encryption key. You can specify the encryption key used for this purpose by
supplying a key in the `encryptionKey` property.
```python
import aws_cdk.aws_kms as kms
encryption_key = kms.Key(self, "Key")
User(self, "User",
encryption_key=encryption_key,
cluster=cluster,
database_name="databaseName"
)
```
By default, a username is automatically generated from the user construct ID and its path
in the construct tree. You can specify a particular username by providing a value for the
`username` property. Usernames must be valid identifiers; see: [Names and
identifiers](https://docs.aws.amazon.com/redshift/latest/dg/r_names.html) in the *Amazon
Redshift Database Developer Guide*.
```python
User(self, "User",
username="myuser",
cluster=cluster,
database_name="databaseName"
)
```
The user password is generated by AWS Secrets Manager using the default configuration
found in
[`secretsmanager.SecretStringGenerator`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-secretsmanager.SecretStringGenerator.html),
except with password length `30` and some SQL-incompliant characters excluded. The
plaintext for the password will never be present in the CDK application; instead, a
[CloudFormation Dynamic
Reference](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/dynamic-references.html)
will be used wherever the password value is required.
### Creating Tables
Create a table within a Redshift cluster database by instantiating a `Table`
construct. This will make a query to the Redshift cluster to create a new database table
with the supplied schema.
```python
Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)"), Column(name="col2", data_type="float")],
cluster=cluster,
database_name="databaseName"
)
```
The table can be configured to have distStyle attribute and a distKey column:
```python
Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)", dist_key=True), Column(name="col2", data_type="float")
],
cluster=cluster,
database_name="databaseName",
dist_style=TableDistStyle.KEY
)
```
The table can also be configured to have sortStyle attribute and sortKey columns:
```python
Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)", sort_key=True), Column(name="col2", data_type="float", sort_key=True)
],
cluster=cluster,
database_name="databaseName",
sort_style=TableSortStyle.COMPOUND
)
```
Tables and their respective columns can be configured to contain comments:
```python
Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)", comment="This is a column comment"), Column(name="col2", data_type="float", comment="This is a another column comment")
],
cluster=cluster,
database_name="databaseName",
table_comment="This is a table comment"
)
```
Table columns can be configured to use a specific compression encoding:
```python
from aws_cdk.aws_redshift_alpha import ColumnEncoding
Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)", encoding=ColumnEncoding.TEXT32K), Column(name="col2", data_type="float", encoding=ColumnEncoding.DELTA32K)
],
cluster=cluster,
database_name="databaseName"
)
```
Table columns can also contain an `id` attribute, which can allow table columns to be renamed.
**NOTE** To use the `id` attribute, you must also enable the `@aws-cdk/aws-redshift:columnId` feature flag.
```python
Table(self, "Table",
table_columns=[Column(id="col1", name="col1", data_type="varchar(4)"), Column(id="col2", name="col2", data_type="float")
],
cluster=cluster,
database_name="databaseName"
)
```
### Granting Privileges
You can give a user privileges to perform certain actions on a table by using the
`Table.grant()` method.
```python
user = User(self, "User",
cluster=cluster,
database_name="databaseName"
)
table = Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)"), Column(name="col2", data_type="float")],
cluster=cluster,
database_name="databaseName"
)
table.grant(user, TableAction.DROP, TableAction.SELECT)
```
Take care when managing privileges via the CDK, as attempting to manage a user's
privileges on the same table in multiple CDK applications could lead to accidentally
overriding these permissions. Consider the following two CDK applications which both refer
to the same user and table. In application 1, the resources are created and the user is
given `INSERT` permissions on the table:
```python
database_name = "databaseName"
username = "myuser"
table_name = "mytable"
user = User(self, "User",
username=username,
cluster=cluster,
database_name=database_name
)
table = Table(self, "Table",
table_columns=[Column(name="col1", data_type="varchar(4)"), Column(name="col2", data_type="float")],
cluster=cluster,
database_name=database_name
)
table.grant(user, TableAction.INSERT)
```
In application 2, the resources are imported and the user is given `INSERT` permissions on
the table:
```python
database_name = "databaseName"
username = "myuser"
table_name = "mytable"
user = User.from_user_attributes(self, "User",
username=username,
password=SecretValue.unsafe_plain_text("NOT_FOR_PRODUCTION"),
cluster=cluster,
database_name=database_name
)
table = Table.from_table_attributes(self, "Table",
table_name=table_name,
table_columns=[Column(name="col1", data_type="varchar(4)"), Column(name="col2", data_type="float")],
cluster=cluster,
database_name="databaseName"
)
table.grant(user, TableAction.INSERT)
```
Both applications attempt to grant the user the appropriate privilege on the table by
submitting a `GRANT USER` SQL query to the Redshift cluster. Note that the latter of these
two calls will have no effect since the user has already been granted the privilege.
Now, if application 1 were to remove the call to `grant`, a `REVOKE USER` SQL query is
submitted to the Redshift cluster. In general, application 1 does not know that
application 2 has also granted this permission and thus cannot decide not to issue the
revocation. This leads to the undesirable state where application 2 still contains the
call to `grant` but the user does not have the specified permission.
Note that this does not occur when duplicate privileges are granted within the same
application, as such privileges are de-duplicated before any SQL query is submitted.
## Rotating credentials
When the master password is generated and stored in AWS Secrets Manager, it can be rotated automatically:
```python
cluster.add_rotation_single_user()
```
The multi user rotation scheme is also available:
```python
user = User(self, "User",
cluster=cluster,
database_name="databaseName"
)
cluster.add_rotation_multi_user("MultiUserRotation",
secret=user.secret
)
```
## Adding Parameters
You can add a parameter to a parameter group with`ClusterParameterGroup.addParameter()`.
```python
from aws_cdk.aws_redshift_alpha import ClusterParameterGroup
params = ClusterParameterGroup(self, "Params",
description="desc",
parameters={
"require_ssl": "true"
}
)
params.add_parameter("enable_user_activity_logging", "true")
```
Additionally, you can add a parameter to the cluster's associated parameter group with `Cluster.addToParameterGroup()`. If the cluster does not have an associated parameter group, a new parameter group is created.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk as cdk
# vpc: ec2.Vpc
cluster = Cluster(self, "Cluster",
master_user=Login(
master_username="admin",
master_password=cdk.SecretValue.unsafe_plain_text("tooshort")
),
vpc=vpc
)
cluster.add_to_parameter_group("enable_user_activity_logging", "true")
```
## Rebooting for Parameter Updates
In most cases, existing clusters [must be manually rebooted](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-parameter-groups.html) to apply parameter changes. You can automate parameter related reboots by setting the cluster's `rebootForParameterChanges` property to `true` , or by using `Cluster.enableRebootForParameterChanges()`.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk as cdk
# vpc: ec2.Vpc
cluster = Cluster(self, "Cluster",
master_user=Login(
master_username="admin",
master_password=cdk.SecretValue.unsafe_plain_text("tooshort")
),
vpc=vpc
)
cluster.add_to_parameter_group("enable_user_activity_logging", "true")
cluster.enable_reboot_for_parameter_changes()
```
## Elastic IP
If you configure your cluster to be publicly accessible, you can optionally select an *elastic IP address* to use for the external IP address. An elastic IP address is a static IP address that is associated with your AWS account. You can use an elastic IP address to connect to your cluster from outside the VPC. An elastic IP address gives you the ability to change your underlying configuration without affecting the IP address that clients use to connect to your cluster. This approach can be helpful for situations such as recovery after a failure.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk as cdk
# vpc: ec2.Vpc
Cluster(self, "Redshift",
master_user=Login(
master_username="admin",
master_password=cdk.SecretValue.unsafe_plain_text("tooshort")
),
vpc=vpc,
publicly_accessible=True,
elastic_ip="10.123.123.255"
)
```
If the Cluster is in a VPC and you want to connect to it using the private IP address from within the cluster, it is important to enable *DNS resolution* and *DNS hostnames* in the VPC config. If these parameters would not be set, connections from within the VPC would connect to the elastic IP address and not the private IP address.
```python
import aws_cdk.aws_ec2 as ec2
vpc = ec2.Vpc(self, "VPC",
enable_dns_support=True,
enable_dns_hostnames=True
)
```
Note that if there is already an existing, public accessible Cluster, which VPC configuration is changed to use *DNS hostnames* and *DNS resolution*, connections still use the elastic IP address until the cluster is resized.
### Elastic IP vs. Cluster node public IP
The elastic IP address is an external IP address for accessing the cluster outside of a VPC. It's not related to the cluster node public IP addresses and private IP addresses that are accessible via the `clusterEndpoint` property. The public and private cluster node IP addresses appear regardless of whether the cluster is publicly accessible or not. They are used only in certain circumstances to configure ingress rules on the remote host. These circumstances occur when you load data from an Amazon EC2 instance or other remote host using a Secure Shell (SSH) connection.
### Attach Elastic IP after Cluster creation
In some cases, you might want to associate the cluster with an elastic IP address or change an elastic IP address that is associated with the cluster. To attach an elastic IP address after the cluster is created, first update the cluster so that it is not publicly accessible, then make it both publicly accessible and add an Elastic IP address in the same operation.
## Enhanced VPC Routing
When you use Amazon Redshift enhanced VPC routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your virtual private cloud (VPC) based on the Amazon VPC service. By using enhanced VPC routing, you can use standard VPC features, such as VPC security groups, network access control lists (ACLs), VPC endpoints, VPC endpoint policies, internet gateways, and Domain Name System (DNS) servers, as described in the Amazon VPC User Guide. You use these features to tightly manage the flow of data between your Amazon Redshift cluster and other resources. When you use enhanced VPC routing to route traffic through your VPC, you can also use VPC flow logs to monitor COPY and UNLOAD traffic.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk as cdk
# vpc: ec2.Vpc
Cluster(self, "Redshift",
master_user=Login(
master_username="admin",
master_password=cdk.SecretValue.unsafe_plain_text("tooshort")
),
vpc=vpc,
enhanced_vpc_routing=True
)
```
If enhanced VPC routing is not enabled, Amazon Redshift routes traffic through the internet, including traffic to other services within the AWS network.
## Default IAM role
Some Amazon Redshift features require Amazon Redshift to access other AWS services on your behalf. For your Amazon Redshift clusters to act on your behalf, you supply security credentials to your clusters. The preferred method to supply security credentials is to specify an AWS Identity and Access Management (IAM) role.
When you create an IAM role and set it as the default for the cluster using console, you don't have to provide the IAM role's Amazon Resource Name (ARN) to perform authentication and authorization.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk.aws_iam as iam
# vpc: ec2.Vpc
default_role = iam.Role(self, "DefaultRole",
assumed_by=iam.ServicePrincipal("redshift.amazonaws.com")
)
Cluster(self, "Redshift",
master_user=Login(
master_username="admin"
),
vpc=vpc,
roles=[default_role],
default_role=default_role
)
```
A default role can also be added to a cluster using the `addDefaultIamRole` method.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk.aws_iam as iam
# vpc: ec2.Vpc
default_role = iam.Role(self, "DefaultRole",
assumed_by=iam.ServicePrincipal("redshift.amazonaws.com")
)
redshift_cluster = Cluster(self, "Redshift",
master_user=Login(
master_username="admin"
),
vpc=vpc,
roles=[default_role]
)
redshift_cluster.add_default_iam_role(default_role)
```
## IAM roles
Attaching IAM roles to a Redshift Cluster grants permissions to the Redshift service to perform actions on your behalf.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk.aws_iam as iam
# vpc: ec2.Vpc
role = iam.Role(self, "Role",
assumed_by=iam.ServicePrincipal("redshift.amazonaws.com")
)
cluster = Cluster(self, "Redshift",
master_user=Login(
master_username="admin"
),
vpc=vpc,
roles=[role]
)
```
Additional IAM roles can be attached to a cluster using the `addIamRole` method.
```python
import aws_cdk.aws_ec2 as ec2
import aws_cdk.aws_iam as iam
# vpc: ec2.Vpc
role = iam.Role(self, "Role",
assumed_by=iam.ServicePrincipal("redshift.amazonaws.com")
)
cluster = Cluster(self, "Redshift",
master_user=Login(
master_username="admin"
),
vpc=vpc
)
cluster.add_iam_role(role)
```