Infrastructure / Leadership role
Reliability & AI Operations Lead
Infrastructure, observability, model operations, and eval systems
A senior infrastructure and AI operations role for someone who can make ambitious product workflows dependable.
Own the runtime, infrastructure, observability, and AI operations needed for trustworthy customer deployments.
A concise note is enough to start.
What you would own
- Production architecture for runtime, background jobs, integrations, data movement, and customer-facing reliability.
- Model operations for evaluation, observability, prompt and tool versioning, guardrails, cost control, and data quality.
- Runbooks, test harnesses, canaries, rollback paths, and automations that reduce toil without hiding risk.
What we would look for
- You have built reliable SaaS infrastructure for real customers.
- You can move between cloud architecture, DevOps, backend systems, data infrastructure, model operations, and application-level product judgment.
- You care about evals, reproducibility, observability, incident review, and human approval paths.
Questions you would help answer
- What infrastructure lets AI-assisted workflows run observably, safely, and cost-effectively?
- How should Zentrik evaluate and monitor model behavior, job pipelines, data freshness, and reliability?
- Where should automation remove operational toil without lowering the bar for review?

