Job Title: AWS Data Engineer with Databricks
Location: Toronto, ON (Hybrid/Onsite)
Experience: 6–10 Years
Employment Type: Contract
Job Summary
We are looking for an experienced AWS Data Engineer with strong Databricks expertise to design, build, and optimize scalable data solutions in a cloud-native environment. The ideal candidate will have hands-on experience in AWS services, Databricks, Spark, and modern data lake/lakehouse architectures.
You will play a critical role in building high-performance data pipelines, ensuring data reliability, and enabling analytics and machine learning use cases.
Key Responsibilities
Design and develop scalable data pipelines using AWS and Databricks.
Build ETL/ELT processes using PySpark/Scala in Databricks.
Implement and manage data lake/lakehouse architecture using AWS S3 and Delta Lake.
Develop batch and real-time data processing pipelines.
Integrate data from multiple sources (RDS, Redshift, APIs, Kafka, S3, etc.).
Optimize Spark jobs and Databricks clusters for performance and cost efficiency.
Implement data quality checks, monitoring, logging, and alerting mechanisms.
Work closely with Data Scientists and BI teams to support analytics workloads.
Ensure data security, governance, and compliance best practices.
Support CI/CD deployment and infrastructure automation.
______________________________
Required Technical Skills
Cloud & AWS Services
Amazon S3
AWS Glue
AWS Lambda
Amazon Redshift
Amazon RDS
AWS Athena
IAM & Security configuration
AWS CloudWatch (monitoring)
Databricks & Big Data
Hands-on experience with Databricks (Workspace, Jobs, Clusters)
Apache Spark (PySpark/Scala)
Delta Lake
Structured Streaming
Data optimization & partitioning strategies
Programming & Querying
Python (mandatory)
Advanced SQL
Scala (preferred)
Data Engineering Concepts
Data Warehousing & Dimensional Modeling
ETL/ELT architecture
Lakehouse Architecture
Performance tuning
Data validation and quality frameworks
______________________________
Preferred Skills
Experience with orchestration tools (Airflow, AWS Step Functions, Databricks Workflows)
Infrastructure as Code (Terraform/CloudFormation)
Kafka or Kinesis for streaming
Docker and containerization
Experience in CI/CD implementation
Exposure to BI tools (Power BI, Tableau)