AWS is the most popular major cloud provider, and getting starting with their free tier is quite easy. My own experience is mainly with azure for the projects I have done, so here is how I have come up to streamline the use of the free tier to test and run services.
There are multiple ways of running code in AWS, probably the most cost effective is lambda functions if your runtime and memory requirements fit it with the paradigm function as a service, but in my case I wanted something more generic, that I could install and run anything, since I am planning on creating a mini data platform and I am going to be testing multiple services. For that EC2 seems the best, a general purpose VM in AWS that you can use for anything, but the trick is in how we can make it so we can stand up and tear down the VM and services quite easily and make sure we are not forgetting them and getting billed.
The goal here is to have a EC2 running whatever you want, making all the set up as seamless as possible. In my case for example, I am using it to run a series of pipelines every morning.
I am going to assume some basic knowledge on terraform and at least aws-cli setup, there are tones of guides for that. Also I would recommend not going to this guide directly or at least keep in mind that you should check more of AWS (create budget alerts if you haven’t!).
Infrastructure
As much as possible, we are going to leverage the free tier from AWS to keep the costs at a minimum. These are the free tier used (as of December 2024):
- 1 year of 750 hours per month of EC2
- Parameter store standard
- Event Bridge scheduler up to 14,000,000 invocations per month for free
- Lambda up to 1,000,000 requests per month
This looks a bit complicated for running a VM, and you are right, as mentioned, I am using this to schedule a workflow, something that will run daily, so if you don’t need that you can get rid of it, but this setup has the following advantages:
- IaC, easy to setup new services, just change the configuration and run it again
- Schedule to start the service
- Pessimistic stop of the instance using lambda on a schedule (in case the instance hangs)
- Configuration in parameter store, decoupled from infra
- Flexibility on the services or code to run (for example, I am downloading a repo as part of the instance setup to run my pipelines)
EC2 initialisation
Now to the main part, to create the EC2 instances. The terraform code is pretty straightforward. It uses cloud-init to install dependencies and set up the instance. It allows us to use a yaml configuration file to define what modules will run, a script and even a shutdown delay. There are more options to customise an EC2 instance initialisation, you can find all the information in the documentation
There are a few parts to customise the behaviour. First, to change the startup of the function, modify the cloud-init.yml
The main parts to change here are:
-
package
to setup the dependencies -
write_files
script is the main code that will prepare and run the service -
power_state
to shutdown after everything has run or setup a delay so it automatically turns off after a time
The script has a few extra steps aside from starting the service that you can use as an example:
- It sets up a swap file. The reason is that the free tier are very limited in memory and if they run out they will hang. Adding swap is a cheap way to make sure it does not hang at the cost of some performance
- It reads from parameter store a configuration file, this decouples the instance from the configuration
The other part to adjust is in the terraform, this example only creates the policy resource.aws_iam_role_policy.policy_ssm
, but if you want the EC2 instances to have access to more services (for example an S3 bucket), you will need to create more policies to grant those permissions
Now, given this, here are the 2 main files:
ec2.tf
:
//
// IAM role
//
resource "aws_iam_role" "role" {
name = "role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_instance_profile" "profile" {
name = "profile"
role = "${aws_iam_role.role.name}"
}
resource "aws_iam_role_policy" "policy_ssm" {
name = "policy_ssm"
role = "${aws_iam_role.role.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ssm:*"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
EOF
}
//
// EC2 Instance
//
data "cloudinit_config" "server_config" {
gzip = true
base64_encode = true
part {
content_type = "text/cloud-config"
content = file("${path.module}/cloud-init.yml")
}
}
resource "aws_instance" "ec2_instance" {
ami = "ami-00385a401487aefa4"
instance_type = "t2.micro"
key_name = "health_data_load"
iam_instance_profile = "${aws_iam_instance_profile.profile.name}"
vpc_security_group_ids = [aws_security_group.ec2_instance_security_group.id]
user_data = data.cloudinit_config.server_config.rendered
user_data_replace_on_change = true
tags = {
Name = "ec2_instance"
}
}
//
// Security group
//
resource "aws_security_group" "ec2_instance_security_group" {
name = "pbzhdl-sg"
description = "allow http port"
vpc_id = "vpc-0eb182d38a23b1b64"
ingress {
description = "allow ssh"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "pbzhdl-sg"
}
}
//
// Parameter store
//
resource "aws_ssm_parameter" "instance_id" {
name = "instance_id"
type = "String"
value = aws_instance.ec2_instance.id
}
cloud-init.yml
:
#cloud-config
# The modules that run in the 'final' stage
cloud_final_modules:
- package-update-upgrade-install
- write-files-deferred
- puppet
- chef
- mcollective
- salt-minion
- reset_rmc
- refresh_rmc_and_interface
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- ssh-authkey-fingerprints
- keys-to-console
- install-hotplug
- phone-home
- final-message
- [power_state_change, always]
write_files:
- content: |
#!/bin/bash
# Enable swapfile
dd if=/dev/zero of=/swapfile bs=128M count=16 # 128M * 16 = 2Gb
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
# Setup pipeline environment
mkdir -p /run/workdir
# Parameters
aws ssm get-parameter --name workflow_dotenv --with-decrypt --query "Parameter.Value" --output text > /run/workdir/.env
cd /run/workdir
# Any other commands to run service
path: /var/lib/cloud/scripts/per-boot/myScript.sh
permissions: "0755"
packages:
- python3.11
- python3.11-pip
- git
power_state:
delay: "now" # Or a number of minutes, eg 30 for 30 minutes
mode: poweroff
message: Bye Bye
timeout: 120
condition: /bin/true
With these files, after running terraform apply
, you should have and instance running that will shutdown after running the service. Only with this you can start playing, but in the next step we will create lambdas to start and stop them.
The stop lambda is important here, since the instances are so small, it is very easy to run out of memory. If that happens the instance will hang and it might not stop, so the lambda makes sure in the worst case it stopped externally.
Lambda for management
The lambda functions for starting and stopping are standard from the AWS documentation, but they make use of parameter store to pick up the instance id, which is automatically created during the terraform deployment.
You can adjust the schedule trigger in the terraform. Right now it is setup to start at 10:00 and force stop it at 11:00 if it has not finished by itself. Here is the code:
stop_lambda.py
:
import boto3
import os
region = os.environ["AWS_REGION"]
ssm = boto3.client("ssm", region_name=region)
def lambda_handler(event, context):
get_response = ssm.get_parameter(Name="instance_id")
instances = [get_response['Parameter']["Value"]]
ec2.stop_instances(InstanceIds=instances)
print('stopped your instances: ' + str(instances))
start_lambda.py
:
import boto3
import os
region = os.environ["AWS_REGION"]
ssm = boto3.client("ssm", region_name=region)
def lambda_handler(event, context):
get_response = ssm.get_parameter(Name="instance_id")
instances = [get_response['Parameter']["Value"]]
ec2.start_instances(InstanceIds=instances)
print('started your instances: ' + str(instances))
start_lambda.tf
:
//
// Schedule event
//
resource "aws_cloudwatch_event_rule" "start_lambda" {
name = "run_start_lambda"
description = "Schedule lambda function"
schedule_expression = "cron(00 10 * * ? *)"
tags = {
"app" = "healthdata"
}
}
resource "aws_cloudwatch_event_target" "start_lambda_target" {
target_id = "start_lambda_target"
rule = aws_cloudwatch_event_rule.start_lambda.name
arn = aws_lambda_function.start_lambda.arn
input = "{\"ssm_parameter\": \"load_instance_id\"}"
}
resource "aws_lambda_permission" "allow_cloudwatch" {
statement_id = "AllowExecutionFromCloudWatch"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.start_lambda.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.start_lambda.arn
}
//
// Lambda
//
data "archive_file" "start_lambda" {
type = "zip"
source_file = "${path.module}/start_lambda.py"
output_path = "start_lambda_function_payload.zip"
}
resource "aws_lambda_function" "start_lambda" {
filename = data.archive_file.start_lambda.output_path // "start_lambda_function_payload.zip"
function_name = "start_lambda"
role = aws_iam_role.iam_for_lambda.arn
handler = "start_lambda.lambda_handler"
source_code_hash = data.archive_file.start_lambda.output_base64sha256
runtime = "python3.10"
environment {
variables = {
instance_parameter = "load_instance_id"
}
}
tags = {
"app" = "healthdata"
}
}
//
// Iam role
//
data "aws_iam_policy_document" "assume_role" {
statement {
effect = "Allow"
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
actions = ["sts:AssumeRole"]
}
}
resource "aws_iam_role" "iam_for_lambda" {
name = "iam_for_lambda"
assume_role_policy = data.aws_iam_policy_document.assume_role.json
}
resource "aws_iam_role_policy" "iam_for_lambda_policy_ssm" {
name = "iam_for_lambda_policy_ssm"
role = "${aws_iam_role.iam_for_lambda.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ssm:*"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_role_policy" "iam_for_lambda_policy_ec2" {
name = "iam_for_lambda_policy_ec2"
role = "${aws_iam_role.iam_for_lambda.id}"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"ec2:Start*",
"ec2:Stop*"
],
"Resource": "*"
}
]
}
EOF
}
To keep it simple I have only added the start lambda terraform here, but the stop lambda is exactly the same, just change start
for stop
in the code and change the schedule (or if you want, just add parameters and turn it into a module).
To wrap up
This is a very technical post, with a lot of code. In the future I will add a repository for these examples. I am using this code to run my own mini orchestration daily for my personal data platform and after the few initial hiccups, it is running smoothly now. Hopefully it will be useful for anyone wanting to start with AWS. I am running an orchestration, but you can use it to try any kind of services that you can install and run on an EC2 instance.
Finally, I am not an expert in AWS, so there might be some best practices or other AWS details missing, please comment about any gap you find! This is a work in progress for me :)