Job Type: Full Time
Job Category: IT

Job Description

SRE/ Observablity Engineer

Toronto, ON - Onsite

Total Experience: 8-10 years
Required Skill Set:
• We are looking for a Mid-Level Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms.
• This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.
• You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.
• Observability Implementation Implement and maintain metrics, logs, and traces for applications and infrastructure
• Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
• Configure dashboards, alerts, and basic anomaly detection Application Support Instrumentation
• Work with development teams to enable structured logging, basic distributed tracing, and core metrics
• Validate observability requirements during Production Readiness Reviews (PRR)Troubleshoot missing or low-quality telemetry
• Monitoring Alerting Configure alerts based on golden signals (latency, errors, traffic, saturation)
• Help reduce alert noise by tuning thresholds and alert logic
• Support incident response by gathering logs, metrics, and traces Operations Reliability Support root cause analysis using observability tools
• Maintain dashboards and documentation used by on-call and support teams
• Participate in on-call rotations (as applicable) Automation Continuous Improvement Assist in automating observability onboarding and validation tasks
• Create and maintain reusable dashboards and alert templates
• Follow established observability standards and best practices Required Qualifications 24 years of experience in Observability, or SRE
• Working knowledge of metrics, logs, and basic tracing concepts
• Hands-on experience with at least one observability platform (Dynatrace, Elastic ELK, Datadog, New Relic, etc.)
• Basic understanding of SLIs SLOs and service health indicators
• Experience with cloud platforms or hybrid environments
• Ability to write scripts (Python, Bash, PowerShell) for automation and troubleshooting
• Preferred Qualifications Experience with Open Telemetry or APM agents
• Familiarity with Kubernetes or containerized workloads
• Experience working with incident management tools (PagerDuty, ServiceNow)Exposure to Dynatrace Kibana ELK or similar cloud-native monitoring
• Experience in regulated or enterprise environments

Required Skills
Technical Project Manager

Fill below details & click “Apply”

Only add 10 digit number without prefix
Resume can be attached in PDF, JPG, Word , Txt format only

Share This Job