Role: Hadoop Data Engineer
Location: Addison, TX
FTE only
Job Description
Must Have Technical/Functional Skills
Primary Skill: Hadoop (HDFS, Hive, Spark), Big Data ETL/ELT, Distributed Processing, MS SQL Server,
Secondary: Artificial Intelligence/ Machine learning
Experience: Minimum 10 years
Roles & Responsibilities
? Design and implement scalable batch and/or streaming data pipelines using Hadoop ecosystem tools.
? Develop and optimize data ingestion processes from multiple sources (RDBMS, files, APIs, logs).
? Build and maintain datasets in HDFS/Hive and ensure data quality, lineage, and governance.
? Perform performance tuning for distributed workloads (partitioning, file formats, resource management).
? Create and optimize complex queries, stored procedures, and ETL workflows in MS SQL Server.
? Collaborate with data scientists/analysts to deliver feature-ready datasets for ML models.
? Implement monitoring and alerting for pipeline health and data SLAs.
? Document architecture, workflows, data dictionaries, and operational runbooks.
? Support production deployments, incident triage, and root cause analysis.
Required Skills & Qualifications
? Strong hands-on experience with Hadoop components (e.g., HDFS, Hive, YARN, MapReduce/Spark).
? Experience with data modeling and data warehousing concepts.
? Solid proficiency in MS SQL Server (T-SQL, query optimization, indexing, stored procedures).
? Experience with ETL/ELT design patterns and job scheduling (e.g., Oozie/Airflow/Control-M).
? Strong understanding of distributed computing concepts and performance tuning.
? Familiarity with Python/Scala/Java for data processing (any one preferred).
? Bachelor’s degree in Computer Science, Engineering, or equivalent experience.