Zak's Notes

SRE Field Notes

Prometheus K8s

Here’s a tutorial on how to set up Prometheus auto-discovery for pods with the label app=beta: Deploy Prometheus: To get started, you’ll need to deploy Prometheus in your Kubernetes cluster. There are several ways to do this, but one common approach is to use a Prometheus Operator, which is a tool that automates the deployment and management of Prometheus instances. If you don’t have the Prometheus Operator installed, you can find installation instructions in the official documentation.

Building Event Driven Infrastructure with EventBridge

Purpose Eventbridge used to be called cloudWatch Events. I believe custom events do cost money however AWS events are free as long as its in the same account. Sometimes when you build applications you might request resources that you don’t use. If your use case is that you want to trigger something to happen only when an event occurs in your system then AWS EventBridge might be for you. It is essentially a eventbus which captures all events in your aws account.

Dns 101

Introduction to DNS DNS (Domain Name System) is the backbone of the internet, converting human-readable domain names into machine-readable IP addresses. It is the system that matches domain names to IP addresses so that internet users can access websites and other online resources by typing in domain names instead of IP addresses. Here is a tutorial on how DNS works and how it can be used to configure domain names and websites.

Knative Introduction

Introduction The knative project is very interesting as I work on an always on ML Service called log anomaly detector. One idea came to mind is to perform the machine learning encoding on logs that come in and have it scale to zero when no data is streamed in. Also to have the system scale up as the demand increases. If I was running something like this on the cloud where I’m charged by the minute then I’d use serverless as a way to cut costs.

EC2 Autoscaling on CPU

Here’s an example Terraform configuration for automatically scaling an EC2 cluster based on CPU load: provider "aws" { region = "us-west-2" } data "aws_ami" "ubuntu" { most_recent = true filter { name = "name" values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"] } filter { name = "virtualization-type" values = ["hvm"] } } resource "aws_autoscaling_group" "example" { name_prefix = "example-asg-" vpc_zone_identifier = ["subnet-0123456789abcdef0", "subnet-0123456789abcdef1"] launch_template = { id = aws_launch_template.example.id } min_size = 1 max_size = 5 tag { key = "Name" value = "example-asg" propagate_at_launch = true } metric_collection { granularity = "1Minute" } scaling_policy { name = "cpu_scale_up_policy" adjustment_type = "ChangeInCapacity" scaling_adjustment = 1 cool_down_period = 300 evaluation_periods = "2" policy_type = "TargetTrackingScaling" target_tracking_configuration { predefined_metric_specification { predefined_metric_type = "ECSSystemCPUUtilization" } target_value = 80 } } scaling_policy { name = "cpu_scale_down_policy" adjustment_type = "ChangeInCapacity" scaling_adjustment = -1 cool_down_period = 300 evaluation_periods = "2" policy_type = "TargetTrackingScaling" target_tracking_configuration { predefined_metric_specification { predefined_metric_type = "ECSSystemCPUUtilization" } target_value = 40 } } } resource "aws_launch_template" "example" { image_id = data.