Service Catalog Version 0.90.0Last updated in version 0.87.0

Amazon EKS Core Services

Overview

This service contains Terraform and Helm code to deploy core administrative services, such as FluentD and the ALB Ingress Controller, onto Elastic Kubernetes Service(EKS).

EKS Core Services architecture

Features

Deploy FluentD DaemonSet to ship container logs to CloudWatch Logs
Deploy ALB Ingress Controller to configure ALBs from within Kubernetes
Deploy external-dns to manage Route 53 DNS records from within Kubernetes
Deploy Kubernetes cluster-autoscaler to configure auto scaling of ASGs based on Pod demand
Deploy AWS CloudWatch Agent to configure container and node level metrics from worker nodes

Learn

note

This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!

Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-eks repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.

Core concepts

For information on each of the core services deployed by this service, see the documentation in the terraform-aws-eks repo.

Repo organization

modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
examples: This folder contains working examples of how to use the submodules.
test: Automated tests for the modules and examples.

Deploy

Non-production deployment (quick start for learning)

If you just want to try this repo out for experimenting and learning, check out the following resources:

examples/for-learning-and-testing folder: The examples/for-learning-and-testing folder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).

Production deployment

If you want to deploy this repo in production, check out the following resources:

examples/for-production folder: The examples/for-production folder contains sample code optimized for direct usage in production. This is code from the Gruntwork Reference Architecture, and it shows you how we build an end-to-end, integrated tech stack on top of the Gruntwork Service Catalog.
How to deploy a production-grade Kubernetes cluster on AWS: A step-by-step guide for deploying a production-grade EKS cluster on AWS using the code in this repo.

Reference

Inputs
Outputs

Required

aws_regionstringrequired

The AWS region in which all resources will be created

eks_cluster_namestringrequired

The name of the EKS cluster where the core services will be deployed into.

eks_iam_role_for_service_accounts_configobject(…)required

Configuration for using the IAM role with Service Accounts feature to provide permissions to the applications. This expects a map with two properties: openid_connect_provider_arn and openid_connect_provider_url. The openid_connect_provider_arn is the ARN of the OpenID Connect Provider for EKS to retrieve IAM credentials, while openid_connect_provider_url is the URL. Set to null if you do not wish to use IAM role with Service Accounts.

Type Details

object({
    openid_connect_provider_arn = string
    openid_connect_provider_url = string
  })

pod_execution_iam_role_arnstringrequired

ARN of IAM Role to use as the Pod execution role for Fargate. Required if any of the services are being scheduled on Fargate. Set to null if none of the Pods are being scheduled on Fargate.

vpc_idstringrequired

The ID of the VPC where the EKS cluster is deployed.

worker_vpc_subnet_idslist(string)required

The subnet IDs to use for EKS worker nodes. Used when provisioning Pods on to Fargate. Required if any of the services are being scheduled on Fargate. Set to empty list if none of the Pods are being scheduled on Fargate.

Optional

alb_ingress_controller_chart_versionstringoptional

The version of the aws-load-balancer-controller helmchart to use.

Default:"1.4.1"

alb_ingress_controller_docker_image_repostringoptional

The repository of the aws-load-balancer-controller docker image that should be deployed.

Default:"602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller"

alb_ingress_controller_docker_image_tagstringoptional

The tag of the aws-load-balancer-controller docker image that should be deployed.

Default:"v2.4.1"

alb_ingress_controller_pod_node_affinitylist(object(…))optional

Configure affinity rules for the ALB Ingress Controller Pod to control which nodes to schedule on. Each item in the list should be a map with the keys key, values, and operator, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.

Type Details

list(object({
    key      = string
    values   = list(string)
    operator = string
  }))

Default:[]

alb_ingress_controller_pod_tolerationslist(map(…))optional

Configure tolerations rules to allow the ALB Ingress Controller Pod to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.

Type Details

list(map(any))

Default:[]

autoscaler_down_delay_after_addstringoptional

Minimum time to wait after a scale up event before any node is considered for scale down.

Default:"10m"

autoscaler_log_level_verbositynumberoptional

Number for the log level verbosity. Lower numbers are less verbose, higher numbers are more verbose. (Default: 4)

Default:4

autoscaler_scale_down_unneeded_timestringoptional

Minimum time to wait since the node became unused before the node is considered for scale down by the autoscaler.

Default:"10m"

autoscaler_skip_nodes_with_local_storagebooloptional

If true cluster autoscaler will never delete nodes with pods with local storage, e.g. EmptyDir or HostPath

Default:true

aws_cloudwatch_agent_image_repositorystringoptional

The Container repository to use for looking up the cloudwatch-agent Container image when deploying the pods. When null, uses the default repository set in the chart. Only applies to non-fargate workers.

Default:null

aws_cloudwatch_agent_pod_node_affinitylist(object(…))optional

Configure affinity rules for the AWS CloudWatch Agent Pod to control which nodes to schedule on. Each item in the list should be a map with the keys key, values, and operator, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.

Type Details

list(object({
    key      = string
    values   = list(string)
    operator = string
  }))

Default:[]

aws_cloudwatch_agent_pod_resourcesanyoptional

Pod resource requests and limits to use. Refer to https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ for more information.

Type Details

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.

Default:null

aws_cloudwatch_agent_pod_tolerationslist(map(…))optional

Configure tolerations rules to allow the AWS CloudWatch Agent Pods to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.

Type Details

list(map(any))

Default:[]

aws_cloudwatch_agent_versionstringoptional

Which version of amazon/cloudwatch-agent to install. When null, uses the default version set in the chart. Only applies to non-fargate workers.

Default:null

cluster_autoscaler_pod_annotationsmap(string)optional

Annotations to apply to the cluster autoscaler pod(s), as key value pairs.

Default:{}

cluster_autoscaler_pod_labelsmap(string)optional

Labels to apply to the cluster autoscaler pod(s), as key value pairs.

Default:{}

cluster_autoscaler_pod_node_affinitylist(object(…))optional

Configure affinity rules for the cluster-autoscaler Pod to control which nodes to schedule on. Each item in the list should be a map with the keys key, values, and operator, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.

Type Details

list(object({
    key      = string
    values   = list(string)
    operator = string
  }))

Default:[]

cluster_autoscaler_pod_resourcesanyoptional

Pod resource requests and limits to use. Refer to https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ for more information. This is most useful for configuring CPU+Memory availability for Fargate, which defaults to 0.25 vCPU and 256MB RAM.

Type Details

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.

Default

{
  limits = {
    cpu = "250m",
    memory = "1024Mi"
  },
  requests = {
    cpu = "250m",
    memory = "1024Mi"
  }
}

cluster_autoscaler_pod_tolerationslist(map(…))optional

Configure tolerations rules to allow the cluster-autoscaler Pod to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.

Type Details

list(map(any))

Default:[]

cluster_autoscaler_release_namestringoptional

The name to use for the helm release for cluster-autoscaler. This is useful to force a redeployment of the cluster-autoscaler component.

Default:"cluster-autoscaler"

cluster_autoscaler_repositorystringoptional

Which docker repository to use to install the cluster autoscaler. Check the following link for valid repositories to use https://github.com/kubernetes/autoscaler/releases

Default:"us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler"

cluster_autoscaler_scaling_strategystringoptional

Specifies an 'expander' for the cluster autoscaler. This helps determine which ASG to scale when additional resource capacity is needed.

Default:"least-waste"

cluster_autoscaler_versionstringoptional

Which version of the cluster autoscaler to install. This should match the major/minor version (e.g., v1.20) of your Kubernetes Installation. See https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#releases for a list of versions.

Default:"v1.22.2"

enable_alb_ingress_controllerbooloptional

Whether or not to enable the AWS LB Ingress controller.

Default:true

enable_aws_cloudwatch_agentbooloptional

Whether to enable the AWS CloudWatch Agent DaemonSet for collecting container and node metrics from worker nodes (self-managed ASG or managed node groups).

Default:true

enable_cluster_autoscalerbooloptional

Whether or not to enable cluster-autoscaler for Autoscaling EKS worker nodes.

Default:true

enable_external_dnsbooloptional

Whether or not to enable external-dns for DNS entry syncing with Route 53 for Services and Ingresses.

Default:true

enable_fargate_fluent_bitbooloptional

Whether or not to enable fluent-bit on EKS Fargate workers for log aggregation.

Default:true

enable_fluent_bitbooloptional

Whether or not to enable fluent-bit for log aggregation.

Default:true

external_dns_batch_change_intervalstringoptional

Duration string (e.g. 1m) indicating the interval between making changes to Route 53 by external-dns. When null, use the default defined in the chart (1s).

Default:null

external_dns_batch_change_sizenumberoptional

The maximum number of changes that should be applied in a batch by external-dns. When null, use the default defined in the chart (1000).

Default:null

external_dns_chart_versionstringoptional

The version of the helm chart to use. Note that this is different from the app/container version.

Default:"6.2.4"

external_dns_pod_node_affinitylist(object(…))optional

Configure affinity rules for the external-dns Pod to control which nodes to schedule on. Each item in the list should be a map with the keys key, values, and operator, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.

Type Details

list(object({
    key      = string
    values   = list(string)
    operator = string
  }))

Default:[]

external_dns_pod_tolerationslist(map(…))optional

Configure tolerations rules to allow the external-dns Pod to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.

Type Details

list(map(any))

Default:[]

external_dns_poll_intervalstringoptional

Duration string (e.g. 1m) indicating the polling interval for syncing the domains by external-dns. When null, use the default defined in the chart (1m).

Default:null

external_dns_route53_hosted_zone_domain_filterslist(string)optional

Only create records in hosted zones that match the provided domain names. Empty list (default) means match all zones. Zones must satisfy all three constraints (external_dns_route53_hosted_zone_tag_filters, external_dns_route53_hosted_zone_id_filters, and external_dns_route53_hosted_zone_domain_filters).

Default:[]

external_dns_route53_hosted_zone_id_filterslist(string)optional

Only create records in hosted zones that match the provided IDs. Empty list (default) means match all zones. Zones must satisfy all three constraints (external_dns_route53_hosted_zone_tag_filters, external_dns_route53_hosted_zone_id_filters, and external_dns_route53_hosted_zone_domain_filters).

Default:[]

external_dns_route53_hosted_zone_tag_filterslist(object(…))optional

Only create records in hosted zones that match the provided tags. Each item in the list should specify tag key and tag value as a map. Empty list (default) means match all zones. Zones must satisfy all three constraints (external_dns_route53_hosted_zone_tag_filters, external_dns_route53_hosted_zone_id_filters, and external_dns_route53_hosted_zone_domain_filters).

Type Details

list(object({
    key   = string
    value = string
  }))

Default:[]

external_dns_route53_zones_cache_durationstringoptional

Duration string (e.g. 1m) indicating the amount of time the Hosted Zones are cached in external-dns. When null, use the default defined in the chart (0 - no caching).

Default:null

external_dns_sourceslist(string)optional

K8s resources type to be observed for new DNS entries by ExternalDNS.

Default

[
  "ingress",
  "service"
]

external_dns_trigger_loop_on_eventbooloptional

When enabled, triggers external-dns run loop on create/update/delete events (optional, in addition of regular interval)

Default:false

fargate_fluent_bit_execution_iam_role_arnslist(string)optional

List of ARNs of Fargate execution IAM Roles that should get permissions to ship logs using fluent-bit. This must be provided if enable_fargate_fluent_bit is true.

Default:[]

fargate_fluent_bit_extra_filtersstringoptional

Additional filters that fluent-bit should apply to log output. This string should be formatted according to the Fluent-bit docs (https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/configuration-file#config_filter).

Default:""

fargate_fluent_bit_extra_parsersstringoptional

Additional parsers that fluent-bit should export logs to. This string should be formatted according to the Fluent-bit docs (https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/configuration-file#config_output).

Default:""

fargate_fluent_bit_log_stream_prefixstringoptional

Prefix string to use for the CloudWatch Log Stream that gets created for each Fargate pod.

Default:"fargate"

fargate_worker_disallowed_availability_zoneslist(string)optional

A list of availability zones in the region that we CANNOT use to deploy the EKS Fargate workers. You can use this to avoid availability zones that may not be able to provision the resources (e.g ran out of capacity). If empty, will allow all availability zones.

Default

[
  "us-east-1d",
  "us-east-1e",
  "ca-central-1d"
]

fluent_bit_extra_filtersstringoptional

Default:""

fluent_bit_extra_outputsstringoptional

Additional output streams that fluent-bit should export logs to. This string should be formatted according to the Fluent-bit docs (https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/configuration-file#config_output).

Default:""

fluent_bit_image_repositorystringoptional

The Container repository to use for looking up the aws-for-fluent-bit Container image when deploying the pods. When null, uses the default repository set in the chart. Only applies to non-fargate workers.

Default:null

fluent_bit_log_group_already_existsbooloptional

If set to true, that means that the CloudWatch Log Group fluent-bit should use for streaming logs already exists and does not need to be created.

Default:false

fluent_bit_log_group_kms_key_idstringoptional

The ARN of the KMS key to use to encrypt the logs in the CloudWatch Log Group used for storing container logs streamed with FluentBit. Set to null to disable encryption.

Default:null

fluent_bit_log_group_namestringoptional

Name of the CloudWatch Log Group fluent-bit should use to stream logs to. When null (default), uses the eks_cluster_name as the Log Group name.

Default:null

fluent_bit_log_group_retentionnumberoptional

number of days to retain log events. Possible values are: 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653, and 0. Select 0 to never expire.

Default:0

fluent_bit_log_group_subscription_arnstringoptional

ARN of the lambda function to trigger when events arrive at the fluent bit log group.

Default:null

fluent_bit_log_group_subscription_filterstringoptional

Filter pattern for the CloudWatch subscription. Only used if fluent_bit_log_group_subscription_arn is set.

Default:""

fluent_bit_log_stream_prefixstringoptional

Prefix string to use for the CloudWatch Log Stream that gets created for each pod. When null (default), the prefix is set to 'fluentbit'.

Default:null

fluent_bit_pod_node_affinitylist(object(…))optional

Configure affinity rules for the fluent-bit Pods to control which nodes to schedule on. Each item in the list should be a map with the keys key, values, and operator, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.

Type Details

list(object({
    key      = string
    values   = list(string)
    operator = string
  }))

Default:[]

fluent_bit_pod_tolerationslist(map(…))optional

Configure tolerations rules to allow the fluent-bit Pods to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.

Type Details

list(map(any))

Default:[]

fluent_bit_versionstringoptional

Which version of aws-for-fluent-bit to install. When null, uses the default version set in the chart. Only applies to non-fargate workers.

Default:null

kubernetes_priority_classesmap(object(…))optional

A map of PriorityClass configurations, with the key as the PriorityClass name. https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass

Type Details

map(object({
    description    = string
    global_default = bool
    value          = number
  }))

Default:{}

route53_record_update_policystringoptional

Policy for how DNS records are sychronized between sources and providers (options: sync, upsert-only).

Default:"sync"

schedule_alb_ingress_controller_on_fargatebooloptional

When true, the ALB ingress controller pods will be scheduled on Fargate.

Default:false

schedule_cluster_autoscaler_on_fargatebooloptional

When true, the cluster autoscaler pods will be scheduled on Fargate. It is recommended to run the cluster autoscaler on Fargate to avoid the autoscaler scaling down a node where it is running (and thus shutting itself down during a scale down event). However, since Fargate is only supported on a handful of regions, we don't default to true here.

Default:false

schedule_external_dns_on_fargatebooloptional

When true, the external-dns pods will be scheduled on Fargate.

Default:false

service_dns_mappingsmap(object(…))optional

Configure Kubernetes Services to lookup external DNS records. This can be useful to bind friendly internal service names to domains (e.g. the RDS database endpoint).

Type Details

map(object({
    # DNS record to route requests to the Kubernetes Service to.
    target_dns = string

    # Port to route requests
    target_port = number

    # Namespace to create the underlying Kubernetes Service in.
    namespace = string
  }))

Default:{}

use_exec_plugin_for_authbooloptional

If this variable is set to true, then use an exec-based plugin to authenticate and fetch tokens for EKS. This is useful because EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy', and since the native Kubernetes provider in Terraform doesn't have a way to fetch up-to-date tokens, we recommend using an exec-based provider as a workaround. Use the use_kubergrunt_to_fetch_token input variable to control whether kubergrunt or aws is used to fetch tokens.

Default:true

use_kubergrunt_to_fetch_tokenbooloptional

EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy'. To avoid this issue, we use an exec-based plugin to fetch an up-to-date token. If this variable is set to true, we'll use kubergrunt to fetch the token (in which case, kubergrunt must be installed and on PATH); if this variable is set to false, we'll use the aws CLI to fetch the token (in which case, aws must be installed and on PATH). Note this functionality is only enabled if use_exec_plugin_for_auth is set to true.

Default:true

use_managed_iam_policiesbooloptional

When true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.

Default:true

container_logs_cloudwatch_log_group_name

Name of the CloudWatch Log Group used to store the container logs.

kubernetes_priority_class_names

A list of names of Kubernetes PriorityClass objects created by this module.

Overview​

Features​

Learn​

note

Core concepts​

Repo organization​

Deploy​

Non-production deployment (quick start for learning)​

Production deployment​

Reference​

Required​

Optional​

Overview

Features

Learn

Core concepts

Repo organization

Deploy

Non-production deployment (quick start for learning)

Production deployment

Reference

Required

Optional