Job Type: Full Time
Job Category: IT
Job Description
Role- SRE
Location- Chicago, IL Onsite
FTE
Visa- USC, GC
Exp-12+
Job Description
- We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure, automation, observability, and production support. As an SRE, you will ensure our cloud-native systems are resilient, scalable, and efficient, driving reliability through code, not just processes.
- 5+ years of experience in SRE, DevOps, or Cloud Engineering
- Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
- Hands-on experience with Terraform, Ansible, or other IaC tools
- Strong scripting/coding skills (Python, Go, Shell, etc.)
- Experience with Kubernetes, containerization, and orchestration
- Deep knowledge of Linux systems and networking
- Experience with Service Meshes (e.g., Istio, App Mesh)
- Familiarity with AWS Well-Architected Framework
- Experience building self-healing systems and automated remediation
- Background in security, compliance, or multi-account/multi-region AWS architectures
Roles & Responsibilities
- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
- Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
- Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
- Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processesOptimize systems for cost, performance, and reliability
- Drive chaos engineering and resilience testing
- Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
- Mentor junior SREs and promote DevOps/SRE culture across the org
Required Skills
SRE Engineer (Site Reliability/Resiliency)