Creating a Documentation Site on S3 with Terraform
This explanatory tutorial will walk you through setting up a
password-protected static site, hosted on Amazon’s cloud
infrastructure, using Hashicorp’s Terraform tool. The contents
of the site will be stored in an S3 bucket and served by a
CloudFront distribution, using an SSL certificate. Access to
the site will be restricted by HTTP basic authentication,
implemented with CloudFront functions. Finally, DNS records will
be created with Route 53, so that the site can be accessed via
https://docs.<your-domain>
. This walk-through assumes a general
familiarity with the fundamental concepts of the web (e.g., HTTP, SSL,
DNS, etc.), but it does not assume any prior experience with AWS or
Terraform.
Though this walk-through is intended to be followed step-by-step, in order to better learn the concepts involved, here is an outline for future reference:
Motivation⌗
Civil and mechanical engineers have long kept notebooks to record ideas, observations, and details of their work. While these notebooks can sometimes be considered legal documents, with respect to patent law, they can also be regarded as potential sources of learning and growth. Reflecting on previously explored hypotheses, dead-ends, and solutions can help an engineer improve as an engineer.
Software and devops engineers can also benefit from keeping notes about their work. Such notes may be helpful to individuals over the course of their careers, encouraging them to think through potential solutions and trade-offs, instead of blindly copying and pasting code they find on the internet. These notes might also be useful for small teams working on a project, given that a central repository of documentation and ideas will likely be more meaningful than the ephemeral and scattered nature of chat applications and email threads.
It’s one thing to keep your documentation in a series of individual text files, strewn about your hard drive or some folder in the cloud, but it’s quite another to have a curated collection of interrelated notes, easily accessible by you or your team. This latter format is well-served by a password-protected static website, because: (1) a website can make it painless to navigate between notes, on a variety of devices; (2) a static site is both easy to generate, using any number of static site generators, and easy to host, using any kind of web server; and (3) a password-protected website is secure.
Though AWS offers a robust web interface for provisioning infrastructure on their platform, Terraform allows you to manage infrastructure just by writing configuration files, allowing you to avoid clicking through a website altogether. Because Terraform’s configuration files are human-readable, you can quickly and easily define resources in AWS, or multiple other cloud platforms. Because these files are plaintext, you can commit them into a source code repository and track the way your infrastructure changes over time. And because Terraform’s configuration language is declarative, running the tool multiple times is safe — it won’t create duplicate resources. These human-readable, plaintext, declarative configuration files are the embodiment of “Infrastructure as Code.”
Prerequisites⌗
You’ll need:
- Familiarity with the command line
- The Terrform CLI installed locally
- The AWS CLI installed locally
- An AWS account, with credentials configured for Terraform and the AWS CLI
- A domain name
Create Minimal Terraform Configuration⌗
Terraform configuration is written using HCL (Hashicorp
Configuration Language). Unless otherwise stated, the HCL presented in
this tutorial can be written in one file (e.g., main.tf
) or separate
files with the .tf
file extension.
To begin, create an initial terraform
block:
|
|
Line 2 declares the required version of Terraform that can be used with this configuration. Running a previous version of Terraform against your configuration won’t work. This is a safety mechanism designed to prevent potentially incorrect provisioning of your infrastructure.
Lines 4–9 specify all of the providers required for your
infrastructure. Because this documentation site is being provisioned
using AWS resources, “aws” is declared as the name representing the
required AWS provider. The source
directive (line 6) specifies a
source address and the version
directive (line 7) specifies a
version constraint for the provider.
Terraform configurations must declare the providers they require, and each provider may require its own configuration. For example, the AWS provider requires the specification of a cloud region in which resources should be provisioned:
|
|
Feel free to replace us-east-1 with the value of the AWS region you’d like to use.
At this point, you can run terraform init
, which will initialize the
working directory containing your Terraform configuration files. (Given
the configuration specified so far, this command will download the AWS
provider plugin).
Create an S3 Bucket for the Docs Site⌗
The next step is to create a bucket on S3 for storing the content of
your docs site (for the rest of the tutorial, replace <your-domain>
with your actual domain name):
|
|
Line 1 specifies the kind of resource to be created (aws_s3_bucket
)
and the name given to that bucket (docs
). This name functions as a
unique identifier which can be referred to throughout your Terraform
configuration.
Line 2 defines the name of the bucket to be created on S3. (Bucket names on S3 must be globally unique.)
Run terraform apply
to create the bucket.
To ensure that the bucket was created, you can upload a test HTML file to the bucket and access it with an HTTP request. The file can be created and uploaded as follows:
$ echo "<p>Docs</p>" > index.html
$ aws s3 cp index.html s3://docs.<your-domain>/
Now, in order to access files in the bucket via an HTTP request, you need to know the bucket’s S3 path. You can use Terraform’s output values feature for this:
|
|
(The join
function is used here simply for readability; the value
could easily be written on one line instead.) To get the value of
docs_bucket_path
, run:
$ terraform apply
$ terraform output docs_bucket_path
You can try to view the created file via its S3 URL, using the output of
terraform output docs_bucket_path
:
$ curl http://<docs_bucket_path>/index.html
This should result in “403 Forbidden” error, because, by default, all S3 buckets and their objects are private.
┌───────┐ │ ┌─────────┐
│Request│------->│ │S3 Bucket│
└───────┘ │ └─────────┘
You can change this by attaching a specific policy to the docs bucket, which will allow all objects in the bucket to be publicly readable:
|
|
Line 1 defines a Terraform data source, which can be accessed by other resources but is itself not a resource.
Lines 2–9 define a statement
which enables getting objects from S3
(line 3), for all objects in the specified bucket (line 4). A
principal
is an AWS user, account, or application to whom this
statement
applies (lines 5–8); in this case, anonymous access is
allowed to all the objects in the bucket, because the resources of a
website should be publicly accessible (lines 6–7).
Running terraform apply
will make the infrastructure look like this:
┌─────────────┐
│Bucket Policy│
└─────────────┘
|
|
v
┌───────┐ ┌─────────┐
│Request│----------->│S3 Bucket│
└───────┘ └─────────┘
You should now be able to access the uploaded file:
$ curl http://<docs_bucket_path>/index.html
Configure the S3 Bucket as a Website⌗
The index page of a website shouldn’t need to be accessed explicitly, as
in http://<docs_bucket_path>/index.html
. Typical web servers
automatically return the index.html
file for any request to a
directory on the server that contains such a file. Although not a
typical web server, an S3 bucket can be configured to behave like one:
resource "aws_s3_bucket" "docs" {
bucket = "docs.<your-domain>"
+
+ website {
+ index_document = "index.html"
+ }
}
The above website
directive instructs AWS to treat the S3 bucket as a
static website, available at a website endpoint URL. To find out the
value of this URL, you can create a new Terraform output value:
|
|
To get the value of docs_bucket_website_endpoint
, run:
$ terraform apply
$ terraform output docs_bucket_website_endpoint
Using the value of docs_bucket_website_endpoint
, you should now be
able to access the uploaded file as follows:
$ curl http://<docs_bucket_website_endpoint>
Create a CNAME Record to Point to the S3 Bucket⌗
You now have the ability view your docs site via the internet. But the
goal is to access the site on your own domain (e.g.,
http://docs.<your-domain>
) rather than using the amazonaws.com
domain. In order to do this, you’ll need to modify your DNS records.
The first thing you’ll need to do is create a Route 53 Hosted Zone. This functions as a container for the domain’s DNS records. (Note that a hosted zone record is not free.)
|
|
The next step is to create an “A” record for the docs site:
|
|
An “A” record (“A” stands for “address”) is the most fundamental type of
DNS record, indicating the IP address of a given domain. The alias
block defined above (lines 6–10) represents a Route 53-specific
extension to DNS functionality, which allows traffic to be routed to a
specific AWS resource. In this case, traffic to docs.<your-domain>
will be routed to your docs S3 bucket. The name
of the alias
corresponds to the website_domain
attribute of the S3 bucket (line 7),
which itself is available because the bucket is configured as a static
website. The required evaluate_target_health
argument specifies
whether to check the health of the resource (in this case, making sure
that the S3 bucket responds to requests) when determining whether to
respond to DNS queries.
Finally, you’ll need to point your domain’s name servers to your Route 53 name servers (created by the hosted zone). You can get the Route 53 servers by using a Terraform output variable:
|
|
Then run:
$ terraform apply
$ terraform output name_servers
You can use the values of the previous command to update the name
servers via your domain registrar’s website. You’ll need to wait for
these changes to propagate throughout the internet, but eventually
you’ll be able to access your static docs site at
http://docs.<your-domain>
.
Setup SSL⌗
In order to protect your site from public consumption, you’ll need some
form of authentication, which will require encrypting traffic to and
from the site with an SSL certificate. Although S3 provides secure
access via a wildcard SSL certificate (*.s3.amazonaws.com
), it only
works for requests made to the amazonaws.com
domain; it won’t work for
requests made to a custom domain. In order to securely serve content
from your S3 bucket using your custom domain, you need to use a
CloudFront distribution. Though Amazon’s CloudFront is a content
delivery network, whose primary purpose is to speed up distribution of
your web content to your users, it also enables you to deliver your
content securely with an SSL certificate.
You can request a free SSL certificate using the Amazon Certificate Manager (ACM). ACM allows you to quickly request a certificate and deploy it on specific AWS resources, such as CloudFront distributions. ACM handles certificate renewals, so you don’t have to worry about the manual process of purchasing, deploying, and renewing SSL certificates.
You could request a certificate only for the docs
subdomain, but it’s
worth requesting a wildcard certificate, in case you want to use SSL for
a different subdomain in the future. If you’re using an AWS region other
than us-east-1, you’ll need to set up a new provider, because ACM
certificates need to be created in that region:
|
|
You can then request the certificate as follows:
|
|
Line 4 declares the validation method for this certificate. Before the Amazon certificate authority can issue a certificate for your site, ACM must prove that you own or control the relevant domain. You can prove this either with DNS validation or via email, when you request your certificate. “DNS” is chosen here in order to avoid extra manual steps (e.g., checking your email and clicking a verification link). “DNS” validation requires your domain name servers to include specific validation records that point to AWS. See below for how to do this.
Line 5 specifies a set of domains that should be issued in the
certificate. In this case, the certificate will be issued for
<your-domain>
and *.<your-domain>
.
Finally, you can create the DNS validation records as follows:
|
|
Lines 2–8 make use of Terraform’s for_each
construct, which, in
this case, creates separate Route 53 records for <your-domain>
and
*.<your-domain>
.
Line 10 specifies that each of the Route 53 records can overwrite existing records (created outside of Terraform).
Line 11 defines the name of the record, supplied by the
domain_validation_options
for the ACM certificate.
Line 12 specifies the list of records, which is required for non-alias
records, such as these DNS validation records. This list is made up of
just one record, derived from the domain_validation_options
for the
ACM certificate.
Line 13 defines the TTL duration for the record. This is mandatory for non-alias records. The value here is set to 60, which is the same as the TTL for alias records (a value set by AWS which cannot be changed).
Line 14 specifies the type of the Route 53 record, which is derived from
the domain_validation_options
for the ACM certificate.
Lines 18–21 define an aws_acm_certificate_validation
resource, which
represents a successful validation of an ACM certificate. This resource
doesn’t represent a real-world entity in AWS; rather it simply
implements a part of the validation workflow. Make sure the domain’s
name servers point to Route 53 before “creating” this resource.
Line 20 specifies a list of Fully Qualified Domain Names that implement the DNS validation. Setting this will explicitly ensure that the certificate is valid for these domains.
Create a CloudFront Distribution⌗
The next step for using SSL is to create the CloudFront distribution itself:
|
|
Lines 2–12 specify an origin for the CloudFront distribution: the source location from where CloudFront will retrieve the content to be served to end users.
Line 3 defines the domain name for the origin. In this case, the name is the domain name for the S3 bucket.
Lines 5–10 contain the configuration for the “custom” origin. This
configuration is mandatory. Lines 6–7 specify the origin’s http and
https ports (S3 uses the standard ports for HTTP and HTTPS). Line 8
declares the (required) origin_protocol_policy
. “http-only” is the
default setting when the origin is an Amazon S3 static website
hosting endpoint (because S3 doesn’t support HTTPS connections for
static website hosting endpoints). Although line 8 specifies
“http-only,” the origin_ssl_protocols
attribute (line 9) must be
defined. In this case, the attribute specifies standard SSL protocols.
Line 13 specifies whether to accept end user requests for content. (The
enabled
directive is mandatory.)
Line 14 defines the object you want CloudFront to return when an end
user requests the root URL (e.g., <your-domain>/
).
Line 16 specifies the custom URL to use for this distribution. When you
create a distribution, CloudFront provides a domain name for it, such as
d111111abcdef8.cloudfront.net.
The aliases
directive allows you to
use your own domain. (In order to use this domain, you must have an SSL
certificate that validates your authorization to use the domain name —
if you’ve been following all the steps so far, you should already have
this You should have this.)
Lines 18–27 define the default cache behavior for the distribution.
Line 19 specifies which HTTP methods CloudFront processes and forwards
to the custom origin. (The allowed_methods
directive is mandatory.)
Line 20 controls whether CloudFront caches the response to requests using the specified HTTP methods.
Line 21 defines the value of the ID for the origin that you want
CloudFront to route requests to. (The target_origin_id
directive is
required.)
Lines 22–27 specifies the forwarded_values
configuration, which
declares how CloudFront handles query strings, cookies, and headers. In
this instance, given that S3 won’t process query strings or cookies,
there’s no need to forward them.
Line 28 defines the viewer_protocol_policy
. In this case, if an end
user makes a plain HTTP request to the site, they will be redirected to
make an HTTPS request to the same resource.
Line 29 specifies the default TTL (in seconds) that an object is in the
CloudFront cache before CloudFront forwards another request (in the
absence of a Cache-Control max-age
or Expires
header). It’s
important to set the TTL, otherwise CloudFront might think your objects
are stale (even if it has a copy of them) and then make another request
to the origin to check for staleness.
Lines 32–35 declare the required restrictions
configuration. The
required geo_restriction
resource (lines 33–35) specifies any
restrictions you may wish to impose based on the location of end user
requests. In this case, no restrictions are present.
Lines 38–41 include the required viewer_certificate
resource, which
specifies the previously created ACM certificate and the
ssl_support_method
(line 40). The latter is set to sni-only
, as
recommended by AWS, given that most browsers and clients support
SNI.
The final step of this process involves changing the DNS A record for
docs.<your-domain>
, so that it points to the new distribution:
|
|
The two changes here involve pointing the alias record to the CloudFront distribution instead of pointing directly to the S3 bucket.
Running terraform apply
at this point will request the SSL
certificate, create the DNS validation records for the certificate,
create the CloudFront distribution, and point the A record to the new
distribution. Note that CloudFront distributions take about 15 minutes
to enter a deployed state after creation or modification. After the
distribution is live, a request to http://docs.<your-domain>
will
redirect to https://<your-domain>
, which will return the index.html
from the S3 bucket.
Restrict Access to the S3 Bucket⌗
At this stage, the infrastructure looks like this:
┌─────────────┐
│Bucket Policy│
└─────────────┘
|
|
v
┌───────┐ ┌──────────┐ ┌─────────┐
│Request│--->│CloudFront│--->│S3 Bucket│
└───────┘ └──────────┘ └─────────┘
One problem with this setup is that content in the S3 bucket can still be accessed independently of the CloudFront distribution, via HTTP:
┌───────┐ ┌─────────────┐
│Request│-----------. │Bucket Policy│
└───────┘ \ └─────────────┘
\_______. |
| |
v v
┌───────┐ ┌──────────┐ ┌─────────┐
│Request│--->│CloudFront│--->│S3 Bucket│
└───────┘ └──────────┘ └─────────┘
In order to ensure that all requests to the S3 bucket are routed through the CloudFront distribution, via HTTPS, you’ll need to adjust the bucket policy so that it only allows the CloudFront distribution to access its contents.
The first step of this process is to create an Origin Access Identity (OAI). This is a virtual user identity that can be used to give a CloudFront distribution permission to fetch a private object from an origin server, such as an S3 bucket:
|
|
You can then update the bucket policy so that its contents can only be accessed by this OAI:
|
|
The wildcard type ("*") in line 7 is replaced by the “AWS” type in line 9 to indicate that the statement applies to an AWS resource rather than all public users.
The OAI specified in line 10 replaces the wildcard identifier in line 8, to restrict access to the OAI.
Now that access to the contents of the S3 bucket will be restricted to the OAI, there’s no need to specify the static website hosting feature of the bucket:
|
|
And if the bucket is no longer configured as a static website, then you can simplify the CloudFront distribution origin configuration:
|
|
Lines 11–12 specify the OAI to be used when accessing the S3 bucket
origin. When making reference to an OAI in CloudFront, the ID needs to
be prefixed with the origin-access-identity/cloudfront
path. Normally,
when referencing an origin access identity in CloudFront, you need to
prefix the ID with the origin-access-identity/cloudfront/
special
path. Terraform’s cloudfront_access_identity_path
prevents the need
for this.
After running terraform apply
, and after the CloudFront distribution
has been modified, all direct requests to the S3 bucket (via
http://<docs_bucket_website_endpoint>
) will be denied. All requests to
the bucket must now be routed through the CloudFront distribution, via
https://docs.<your-domain>
(any plain HTTP requests, to
http://docs.<your-domain>
will be redirected over HTTPS):
┌───────┐ ┌─────────────┐
│Request│---. │Bucket Policy│
└───────┘ | └─────────────┘
|____. |
| |
v v
┌───────┐ ┌──────────┐ ┌─────────┐
│Request│--->│CloudFront│--->│S3 Bucket│
└───────┘ └──────────┘ └─────────┘
Configure an Error Page⌗
It may happen that you or someone on your team requests a page on the
docs site that doesn’t exist. For such cases, a “404 Not Found” error
page should be returned. CloudFront makes this easy. Just upload a
404.html
page to the docs S3 bucket and then adjust the distribution
configuration as follows:
|
|
With this configuration, requesting a resource that doesn’t exist in the
S3 bucket will (surprisingly) return a 403 error, indicating that the
user doesn’t have permission to list the bucket contents. To fix this,
and return a 404, the s3_docs_policy
needs to be adjusted as follows:
|
|
Checking for the absence of a requested document in an S3 bucket
requires being able to list the entire contents of that bucket. This is
why you need the s3:ListBucket
action (line 5). But you also need
access to the ${aws_s3_bucket.docs.arn}
resource (line 6), so that the
s3:ListBucket
action can be applied to the bucket itself rather than
the items in the bucket (represented by the
${aws_s3_bucket.docs.arn}/*
resource).
Running terraform apply
will update the CloudFront distribution and
relevant bucket policy, so that “404 Not Found” errors are returned when
appropriate.
Protect the Site with Basic Auth Authentication⌗
In order to protect access to the docs site, you need to authorize end user requests. Basic authentication is the simplest of all authentication methods on the web. It works by sending an authorization HTTP header to the web service, which looks like this:
Authorization: Basic YWRtaW46YWRtaW4=
The header is made up of the string Basic
, followed a space, followed
by the base-64 encoded value of the string username:password
, where
username
and password
are replaced by the relevant username and
password. In the above example, YWRtaW46YWRtaW4=
is the base-64
encoding of the value “admin:admin”. When the web service receives this
request, it can decode the username and password and check that they
correspond to the expected values. If they aren’t valid, the service
returns a “401 Unauthorized” response. (Because base-64 encoding is not
secure, Basic authentication should only ever be used over HTTPS
connections.)
You can implement Basic authentication using CloudFront functions. First, add the following variables to your Terraform configuration:
|
|
Then, in a separate file, with the file extension .tfvars
(e.g.,
terraform.tfvars
), specify the corresponding variables and appropriate
values for those variables. For example:
docs_auth_user = "admin"
docs_auth_pass = "admin"
Then create the auth function for the CloudFront distribution, in a file
called viewer-request.js
:
|
|
Lines 1–9 define the 401 response object. Lines 11–28 represent the
handler
function which handles each incoming request.
Line 14 ensures that the authorization header is present with a non-blank value. If no authorization header is present, or if the header doesn’t start with the expected “Basic " value (line 16), then the 401 response should be returned.
Line 20 represents the expected credentials (the specific values will be
populated by the values in terraform.tfvars
).
Line 21 derives the encoded username:password
string by splitting the
value of the authorization header around a space (” “) and returning the
second element of the resulting array.
Line 22 decodes the encoded username and password.
Line 23 checks to see if the decoded credentials match the expected credentials. If they don’t, the 401 response will be returned. Otherwise the request will be returned, unmodified, which will then be passed through to the S3 bucket.
Then register the function with Terraform:
|
|
Lines 4–7 specify the code for the function, which is derived using
Terraform’s templatefile
function. The viewer-request.js
file
is treated as a template, and any variables are populated with the
relevant values. In this case, the ${user}
and ${pass}templates are replaced with the values of the "user" and "pass" variables defined in the
terraform.tfvars` file.
Finally, attach the function to the CloudFront distribution:
|
|
Line 8 declares the event_type
to be of type “viewer-request.” This is
appropriate for this function, which modifies the incoming request to
the CloudFront distribution rather than the response returned from the
origin.
Run terraform apply
to register the viewer-request.js
function with
Terraform and attach it to the CloudFront distribution.
Requesting the docs site should now result in “401 Unauthorized” error:
$ curl -I https://docs.example.org
But when using the defined username and password, you can access the site:
$ curl -u admin:admin -I https://docs.example.org
The infrastructure now looks like this:
┌──────────┐ ┌─────────────┐
│Basic Auth│ │Bucket Policy│
└──────────┘ └─────────────┘
| |
| |
v v
┌───────┐ ┌──────────┐ ┌─────────┐
│Request│--->│CloudFront│--->│S3 Bucket│
└───────┘ └──────────┘ └─────────┘
Implement Default Directory Indexes⌗
CloudFront allows the specification of a default root object for a
website. In the case presented here, a request to
https://docs.<your-domain>
will return the index.html
document in
the root of the S3 bucket. But this only works for the root of the
entire website, not for its subdirectories. So if the contents of the S3
bucket included a subdirectory called terraform
, which itself
contained an index.html
file, a request to
https://docs.<your-domain>/terraform/
would not return the
index.html
file, as a typical web server would do. This behavior can
be emulated by adjusting the viewer-request.js
function:
|
|
Lines 7–9 extract the final part of a given URL. For example, if the
URL is https://docs.<your-domain>/terraform
, then the basepath
is
terraform
.
Lines 11–14 determine whether the basepath
has a file extension. If
it does (e.g., if the URL points to a static asset such as an image or
CSS file), then the request should be returned unmodified.
Line 16 is only reached if the URL doesn’t end with a file extension.
This line modifies the request’s URI, so that CloudFront requests the
appropriate index.html
file from S3. For example, CloudFront will
change requests to /terraform/
or /terraform
to
/terraform/index.html
.
Next Steps⌗
Authoring the documentation site is beyond the scope of this tutorial, but using the Hugo static site generator might be a good place to start. For more involved documentation sites, the docsy Hugo theme is recommended.
However you build your static site, having a secure, easily accessible place to maintain your engineer’s notebook, or your team’s internal documentation, can prove worthwhile.