Job Type: Full Time
Job Category: IT

Job Description

AI SRE / AI Ops engineer

Montreal, QC - Hybrid

Skills Required :
• Production experience in SRE / Infrastructure / ops for large-scale systems
• Strong programming/scripting skills (Python, Go, Java, or equivalent)
• Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
• Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
• Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
• Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
• Production experience in SRE / Infrastructure / ops for large-scale systems
• Strong programming/scripting skills (Python, Go, Java, or equivalent)
• Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
• Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
• Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures
• Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
• Networking & systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
• Solid experience in capacity planning, performance tuning, scaling, and incident response
• Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements
• Experience in regulated environments (financial services, compliance, audit, security) is a strong plus
• Excellent communication, documentation, and cross-team collaboration skills
• Proven track record of reducing operational toil via automation

Required Skills
DevOps Engineer Senior Email Security Engineer

Fill below details & click “Apply”

Only add 10 digit number without prefix
Resume can be attached in PDF, JPG, Word , Txt format only

Share This Job