Position Title: Principal Palantir Foundry Engineer
Location: Nashville, TN or Remote
Contract
Role Summary:
Hands-on Foundry specialist who can design ontology-first data products, engineer high-reliability pipelines, and operationalize them into secure, observable, and reusable building blocks used by multiple applications (Workshop/Slate, AIP/Actions). You’ll own the full lifecycle: from raw sources to governed, versioned, materialized datasets wired into operational apps and AIP agents. Core Responsibilities
• Ontology & Data Product Design: Model Object Types, relationships, and semantics; enforce schema evolution strategies; define authoritative datasets with lineage and provenance
• Pipelines & Materializations: Build Code Workbook transforms (SQL, PySpark/Scala), orchestrate multi-stage DAGs, tune cluster/runtime parameters, and implement incremental + snapshot patterns with backfills and recovery.
• Operationalization: Configure schedules, SLAs/SLOs, alerts/health checks, and data quality tests (constraints, anomaly/volume checks); implement idempotency, checkpointing, and graceful retries.
• Governance & Security: Apply RBAC, object-level permissions, policy tags/PII handling, and least-privilege patterns; integrate with enterprise identity; document data contracts.
• Performance Engineering: Optimize joins/partitions, caching/materialization strategies, file layout (e.g., Parquet/Delta), and shuffle minimization; instrument with runtime metrics and cost controls.
• Dev Productivity & SDLC: Use Git-backed code repos, branching/versioning, code reviews, unit/integration tests for transforms; templatize patterns for reuse across domains.
• Applications & Interfaces: Expose ontology-backed data to Workshop/Slate apps wire Actions and AIP agents to governed datasets; publish clean APIs/feeds for downstream systems.
• Reliability & Incident Response: Own on-call for data products, run RCAs, create runbooks, and drive preventive engineering.
• Documentation & Enablement: Produce playbooks, data product specs, and runbooks; mentor engineers and analysts on Foundry best practices. Required Qualifications
• 7+ years in data engineering/analytics engineering with 4+ years hands-on Palantir Foundry at scale.
• Deep expertise in Foundry Ontology, Code Workbooks, Pipelines, Materializations, Lineage/Provenance, and object permissions.
• Strong SQL and PySpark/Scala in Foundry; comfort with UDFs, window functions, and partitioning/bucketing strategies.
• Proven operational excellence: SLAs/SLOs, alerting, data quality frameworks, backfills, rollbacks, blue/green or canary data releases.
• Fluency with Git, CI/CD for Foundry code repos, test automation for transforms, and environment promotion.
• Hands-on with cloud storage & compute (AWS/Azure/GCP), file formats (Parquet/Delta), and cost/perf tuning.
• Strong grasp of data governance (PII, masking, policy tags) and security models within Foundry. Nice to Have
• Building Workshop/Slate UX tied to ontology objects; authoring Actions and integrating AIP use cases.
• Streaming/event ingestion patterns (e.g., Kafka/Kinesis) materialized into curated datasets.
• Observability stacks (e.g., Datadog/CloudWatch/Prometheus) for pipeline telemetry; FinOps/cost governance.
• Experience establishing platform standards: templates, code style, testing frameworks, domain data product catalogs. Success Metrics (90–180 Days)
• =99.5% pipeline success rate, with documented SLOs and active alerting.
• <20% runtime/cost reduction via optimization and materialization strategy.
• Zero P1 data incidents and =4h MTTR with playbooks and automated remediation.
• 3+ reusable templates (ingestion, CDC, enrichment) adopted by partner teams.
• Ontology coverage for priority domains with versioned contracts and lineage. Example Work You’ll Own
• Stand up incremental CDC pipelines with watermarking & late-arrivals handling; backfill historical data safely.
• Define business-ready ontology for a domain and wire it to Workshop apps and AIP agents that trigger Actions.
• Implement DQ gates (null/dup checks, distribution drift) that fail fast and auto-open incidents with context.
• Build promotion workflows (dev ? staging ? prod) with automated tests on transforms and compatibility checks for ontology changes.