Job Type: Full Time
Job Category: IT

Job Description

Role: Kafka Operations Administrator

Location: Seattle, WA/St. Louis, Mo / TX

FTE Only

 

Job Description

Must Have Technical/Functional Skills

• Production-grade Apache Kafka operations experience, managing, maintaining and upgrading Kafka clusters in production environments with a focus on high availability, disaster recovery, fail-over and overall reliability

• Proficiency in installing and configuring monitoring systems using Grafana (building dashboards), Prometheus, Splunk , JMX metrics. 

• Automation and orchestration experience: Terraform , Ansible, Helm, Kubernetes (EKS/AKS/GKE).

• Strong Linux system administration experience, including troubleshooting, automation and scripting for efficient infrastructure management. 

• Experience in Production Support (ITIL processes followed) and participating in 24x7 on-call rotations , documenting incidents/postmortems.

• Experience in supporting JVM tuning, GC Analysis, network and disk I/O diagnostics 

• Experience in TCP/IP, routing, switching and firewall configurations relevant to Kafka operations

 

Good to Have:

• Deep Kafka performance tuning and capacity planning experience

• Knowledge of message delivery semantics and guarantees (at-least-once, exactly-once)

• Cloud-native security/compliance experience (IAM, VPC, KMS, Security Groups)

• Certifications: Confluent Certified Administrator, AWS/Azure/GCP certifications

• Experience with Apache Kafka in KRaft mode, including set up, configuration, troubleshooting and cluster management

• Containerization and Container Orchestration Tools experience: Docker, Kubernetes

• Experience with CI/CD pipelines and Git-based workflows

• Experience building custom Kafka connect libraries and understanding of data serialization formats (eg: Avro, JSON)

• Knowledge of networking concepts across on-prem VMs and cloud environments, ensuring seamless integration and communication between services. 

• Strong understanding of topic management and security best practices for streaming platforms: TLS, ACLs, RBAC, encryption at rest/in transit

• Kafka ecosystem tooling experience: Kafka Connect, Schema Registry

 

Role and Responsibilities :

• Deploy, configure and manage Kafka clusters and related services to meet SLA requirement

• Participate in 24x7 on-call rotation to respond to incidents, alerts, and escalations

• Triage, diagnose, and remediate production incidents; coordinate with stakeholders, developers and infrastructure teams

• Implement automation for provisioning, scaling, server/data backups, and disaster recovery

• Maintain monitoring, alerting thresholds, dashboards, and Kafka ecosystem health

• Harden Kafka deployments: configure TLS, ACLs, RBAC, encryption, and vulnerability remediation

• Perform routine maintenance: Kafka ecosystem upgrades (controllers, brokers, connect, and schema registry), rolling restarts, etc.

• Create and maintain runbooks, runbook automation, and post-incident reports

• Optimize performance and resource utilization; benchmark and tune clusters

• Support Kafka Connect/Schema Registry service and troubleshoot connector issues

• Contribute to CI/CD pipeline improvements for infrastructure and deployment automation

Required Skills
Database Administrator

Fill below details & click “Apply”

Only add 10 digit number without prefix
Resume can be attached in PDF, JPG, Word , Txt format only

Share This Job