Role : Senior Databricks Developer
Location: Remote
Full Time
Job Description
Must Have Technical/Functional Skills
• 7+ years of experience in Data Engineering, with 3–5+ years on Databricks.
• Advanced proficiency in Apache Spark, PySpark, SQL, and distributed data processing.
• Strong experience with DBT (Core or Cloud) for building robust transformation layers.
• Hands-on expertise in data asset modeling, curation, optimization, and lifecycle management.
• Proven experience with job tuning, performance debugging, and cluster optimization.
• Experience implementing observability solutions for data pipelines.
• Solid understanding of Delta Lake, lakehouse architecture, and data governance.
• Experience with cloud platforms (Azure preferred; AWS/GCP acceptable).
• Strong Git-based development workflows and CI/CD experience.
Roles & Responsibilities
• Design, develop, and maintain scalable ETL/ELT pipelines using Databricks, PySpark, and Spark SQL.
• Optimize Spark jobs—including partitioning, caching, cluster sizing, shuffle minimization, and cost-efficient workload design.
• Build and manage workflows using Databricks Jobs, Repos, Delta Live Tables, and Unity Catalog.
• Develop and refine DBT models, tests, seeds, macros, and documentation to support standardized transformation layers.
• Implement modular, version-controlled DBT pipelines aligned with data governance and quality practices.
• Partner with data consumers to ensure models align with business definitions, lineage, and auditability.
• Create curated, reusable, and well-governed data assets (gold/silver/bronze layers) for analytics, reporting, and ML use cases.
• Continuously refine and optimize data assets for consistency, reliability, and usability across teams.
• Drive standardization of data patterns, frameworks, and reusable components.
• Identify and implement engineering efficiencies across Databricks and Spark workloads—cluster optimization, code improvements, auto-scaling patterns, and job orchestration enhancements.
• Collaborate with platform engineering to enhance DevOps automation, CI/CD pipelines, and environment management.
• Improve cost governance through workload analysis, optimization, and proactive cost monitoring.
• Conduct Spark job tuning and pipeline performance optimization to improve processing speed and reduce compute spend.
• Troubleshoot production issues and deliver durable fixes that improve long term reliability.
• Implement best practices for Delta Lake performance (ZORDER, auto-optimize, vacuum, retention tuning).
• Implement end-to-end observability for data pipelines, including logging, metrics, tracing, and alerting.
• Integrate Databricks with monitoring ecosystems (e.g., Azure Monitor, CloudWatch, Datadog).
• Ensure pipeline SLAs/SLOs are clearly defined and consistently met.
• Work closely with data architects, analysts, business SMEs, and platform teams.
• Provide technical leadership, review code, mentor junior engineers, and advocate for engineering excellence.
• Translate business requirements into scalable, production-quality data solutions.