Job Type: Full Time
Job Category: IT
Job Description
Job Title: Site Reliability Engineer
Location: Chicago, IL
FTE Only
Job Description
Must Have Technical/Functional Skills
- We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure, automation, observability, and production support. As an SRE, you will ensure our cloud-native systems are resilient, scalable, and efficient, driving reliability through code, not just processes.
- 5+ years of experience in SRE, DevOps, or Cloud Engineering
- Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
- Hands-on experience with Terraform, Ansible, or other IaC tools
- Strong scripting/coding skills (Python, Go, Shell, etc.)
- Experience with Kubernetes, containerization, and orchestration
- Deep knowledge of Linux systems and networking
- Experience with Service Meshes (e.g., Istio, App Mesh)
- Familiarity with AWS Well-Architected Framework
- Experience building self-healing systems and automated remediation
- Background in security, compliance, or multi-account/multi-region AWS architectures
Roles & Responsibilities
- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
- Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
- Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
- Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes Optimize systems for cost, performance, and reliability
- Drive chaos engineering and resilience testing
- Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
- Mentor junior SREs and promote DevOps/SRE culture across the org
Required Skills
DevOps Engineer Senior Email Security Engineer