Job Type: Full Time
Job Category: IT
Job Description
Job Title : ML Engineer - AI Operations
Location : Morristown, NJ (Onsite)
Fulltime
Skill: ML Engineer - AI Operations
Key responsibilities:
- Own and operate CI/CD for existing ML services across dev/test/prod; standardize blue/green and canary releases with automated rollbacks.
- Run model/data drift and performance monitoring with SLAs; define alerts, thresholds, and retraining triggers.
- Build and maintain production dashboards, alerts, and incident workflows; codify on-call runbooks and escalation paths.
- Partner with onshore model owners to diagnose metric degradation and land mitigations aligned to governance and controls.
- Provide day-to-day L2/L3 support for production ML: triage, root-cause analysis, permanent fixes, and post-incident reviews.
- Own operational documentation: runbooks, standard operating procedures, and recurring health checks.
- Coordinate hotfixes and safe rollbacks with onshore teams; verify recovery via automated smoke tests.
- Harden and productionize research notebooks into maintainable, testable services with CI, unit/integration tests, and linting.
- Operate and evolve model-serving APIs and batch scoring jobs; integrate with enterprise schedulers and data platforms.
- Ensure models are fully integrated into CI/CD, observability, and monitoring stacks; enforce traceability with experiment and model registries.
- Validate successful delivery of model outputs to apps, chatbots, reports, and downstream systems with contract tests and data quality checks.
Required Skills:
- Git/GitLab, Python, SQL, MLflow, Power BI, Snowflake.
- OLAP/OLTP data modeling and architecture.
- API frameworks (FastAPI/Flask), and
Nice to have:
- Modern ELT tools (Fivetran/Airbyte).
- Streaming/real-time data pipelines (e.g., Kafka, Kinesis, Redpanda).
- Production ML service operations experience (experience in broader full-stack environments is a plus.
Required Skills
DevOps Engineer Senior Email Security Engineer