Job Type: Contract
Job Category: IT

Job Description

Job Title: SRE Lead – Banking Domain (Wealth Management Preferred)

Location: Toronto Downtown, ON (Onsite – 5 Days/Week)
Experience: 10+ Years


 

About the Role:

We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain, ideally within Wealth Management. The ideal candidate will lead the SRE function to ensure system reliability, scalability, and performance across mission-critical financial applications. This role involves hands-on technical expertise combined with leadership responsibilities to drive service excellence and operational efficiency.

 

Key Responsibilities:

·         Lead and mentor a team of SREs responsible for production stability, reliability, and availability of banking and wealth management systems.

·         Design and implement monitoring, alerting, and incident response strategies to proactively manage system health.

·         Collaborate with development and infrastructure teams to drive DevOps and automation initiatives, ensuring smooth CI/CD pipelines.

·         Define and implement SLIs, SLOs, and SLAs to measure and improve service performance.

·         Manage and drive incident management, root cause analysis (RCA), and problem resolution to ensure minimal downtime and business impact.

·         Lead capacity planning, performance tuning, and disaster recovery strategies.

·         Drive observability and resilience engineering best practices across all platforms.

·         Work closely with stakeholders in banking and wealth management domains to align reliability goals with business needs.

·         Establish governance processes and ensure compliance with financial regulatory and security standards.

·         Develop dashboards and reporting metrics to provide visibility into system performance and reliability.

·         Champion a culture of continuous improvement, automation, and reliability-first mindset.

 

Required Skills & Experience:

·         10+ years of total IT experience, with at least 4+ years in Site Reliability Engineering or Production Operations leadership roles.

·         Strong domain experience in Banking, with exposure to Wealth Management systems (highly desirable).

·         Expertise in Linux/Unix administration, networking, and cloud infrastructure (AWS, Azure, or GCP).

·         Strong scripting and automation experience (Python, Shell, or similar).

·         Proficiency in monitoring and observability tools such as Prometheus, Grafana, Splunk, ELK, AppDynamics, or Dynatrace.

·         Experience with CI/CD pipelines, Git, Jenkins, Ansible, Terraform, or equivalent tools.

·         In-depth understanding of incident, problem, and change management based on ITIL principles.

·         Proven track record in managing production systems supporting large-scale, high-availability financial applications.

·         Excellent communication, stakeholder management, and team leadership skills.

 

Required Skills
GCP Domain Architect

Fill below details & click “Apply”

Only add 10 digit number without prefix
Resume can be attached in PDF, JPG, Word , Txt format only

Share This Job