Job Type: Contract
Job Category: IT
Job Description
Job Title: Triage Lead (Incident Triage Experience) – Banking Domain
Location: Toronto, ON (Hybrid – 3 Days/Week Onsite)
Type: Contract
Experience Required: 9+ Years (must include strong Incident Triage experience)
Job Summary
We are looking for a highly skilled Triage Lead with deep, hands-on Incident Triage experience to manage the initial assessment, prioritization, and routing of critical technology incidents. The Triage Lead will act as the central command point for evaluating incoming incidents, engaging the right technical teams, and ensuring rapid, accurate response in a fast-paced enterprise environment.
Key Responsibilities
Incident Triage – Core Focus
- Own the first-line triage of all high-impact incidents across applications, services, and infrastructure.
- Perform rapid incident assessment, validate symptoms, gather logs, analyze impact, and determine severity (P1–P4).
- Identify whether incidents belong to application, API, database, network, cloud, or security domains, and route accordingly.
- Use triage playbooks, runbooks, logs, and dashboards to perform quick root-cause directioning.
- Serve as the primary point-of-contact for incoming incidents from monitoring tools, L1/L2 support, and business teams.
Incident Management & Escalation
- Lead real-time incident coordination, ensuring updates are delivered to stakeholders every few minutes for P1/P2.
- Activate war rooms and mobilize technical SMEs instantly based on triage outcomes.
- Ensure incidents adhere to SLA timelines and drive incidents toward resolution.
- Maintain incident records, timelines, and communication logs.
Monitoring, Tools & Impact Analysis
- Monitor alerts using enterprise tools such as Splunk, Dynatrace, AppDynamics, Datadog, New Relic, CloudWatch, etc.
- Analyze logs, CPU/memory metrics, transaction failures, API latency, and system health indicators to guide the triage process.
- Work with ServiceNow, JIRA, or Remedy to log, update, and track incident lifecycle events.
Root Cause Direction & Problem Management
- Provide initial RCA direction (not full RCA) based on triage findings.
- Highlight recurring issues and design recommendations for Problem Management teams.
- Suggest improvements to triage workflows and incident handling processes.
Team Leadership & Collaboration
- Guide Triage Analysts and L1/L2 support teams.
- Communicate effectively with Engineering, Cloud, DevOps, Security, and Infrastructure teams.
- Provide executive-level summaries during and after critical incidents.
Required Skills & Qualifications
- 9+ years of IT experience, with at least 5+ years of strong hands-on Incident Triage experience handling P1/P2 incidents.
- Proven experience triaging incidents across:
- Application/API failures
- Batch jobs / data pipelines
- Database outages
- Cloud / on-prem infrastructure
- Networking & integration issues
- Expert in identifying symptoms, validating issues, reading logs, and determining probable root-cause direction.
- Strong command over ITIL-based incident management practices.
- Hands-on experience with monitoring & triage tools such as:
- Splunk, Dynatrace, AppDynamics, Datadog, Grafana, Kibana, New Relic
- Strong experience with ServiceNow (required).
- Ability to remain calm, structured, and decisive under pressure.
- Excellent communication skills for real-time stakeholder updates.
- Experience in banking/financial services environments is a major plus.
Preferred Qualifications
- ITIL Foundation certification.
- Experience in SRE, DevOps, Production Support, or Command Center roles.
- Familiarity with cloud platforms (Azure/AWS).
Required Skills
Full-Stack Lead Technical Lead