Gathering your results ...
5 days
Not Specified
Not Specified
Not Specified
<ol> <li>Technical Expertise </li></ol> <ul> <li>Deep understanding of SRE principles, SRE model, and DevOps methodologies. </li><li>Experience designing highly available, scalable, and resilient distributed systems. </li><li>Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture). </li><li>Skilled in cloud platforms: Azure, GCP. </li><li>Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Relic, Splunk, AppDynamics. </li></ul> <ol start="2"> <li>Framework Design & Governance </li></ol> <ul> <li>Define and validate SLOs, SLIs, SLAs, error budgets, and availability targets. </li><li>Design runbooks, escalation policies, and chaos testing frameworks. </li><li>Create reusable templates for observability, alerting, and logging. </li><li>Ensure compliance and audit readiness. </li></ul> <ol start="3"> <li>Communication & Cross-Functional Leadership </li></ol> <ul> <li>Collaborate with architects, designers, platform and infra teams. </li><li>Document frameworks and lead adoption across teams. </li><li>Review designs and validate reliability criteria. </li></ul> <p>Roles & Responsibilities:</p> <ol> <li>Framework & Standardization </li></ol> <ul> <li>Define and maintain the SRE operating model, framework, and onboarding guide. </li><li>Create templates and reference architectures for observability, alerting, and runbooks. </li><li>Standardize definitions of availability, reliability, latency, and performance. </li></ul> <ol start="2"> <li>Architectural Integration </li></ol> <ul> <li>Participate in application architecture reviews to validate SRE compliance. </li><li>Recommend design patterns for fault tolerance, failover, auto-scaling, and DR. </li><li>Define observability-by-design principles. </li></ul> <ol start="3"> <li>Governance, Audit & Optimization </li></ol> <ul> <li>Establish and lead SRE councils or review boards. </li><li>Define SRE maturity models, scorecards, and compliance checks. </li><li>Perform SRE audits across product portfolios. </li><li>Guide teams on capacity modeling, load distribution, and cost-efficiency strategies. </li><li>Collaborate with platform teams on resource reservations and right-sizing. </li></ul> <ol start="4"> <li>Tool Rationalization & Strategy </li></ol> <ul> <li>Evaluate and recommend standard SRE toolchains for monitoring, logging, tracing. </li><li>Own the integration strategy across observability platforms. </li></ul> <ol start="5"> <li>Training, Leadership & Evangelism </li></ol> <ul> <li>Conduct SRE bootcamps for application and infra teams. </li><li>Champion a blameless culture and continuous improvement mindset. </li><li>Drive Error Budget policies and reliability trade-off discussions. </li><li>Mentor product teams on SRE integration strategies. </li><li>Influence architectural decisions with SRE perspectives. </li></ul> <p>#LI-RJ2</p> <p>Salary Range-$110,000-$125,000 a year</p>
POST A JOB
It's completely FREE to post your jobs on ZiNG! There's no catch, no credit card needed, and no limits to number of job posts.
The first step is to SIGN UP so that you can manage all your job postings under your profile.
If you already have an account, you can LOGIN to post a job or manage your other postings.
Thank you for helping us get Americans back to work!
It's completely FREE to post your jobs on ZiNG! There's no catch, no credit card needed, and no limits to number of job posts.
The first step is to SIGN UP so that you can manage all your job postings under your profile.
If you already have an account, you can LOGIN to post a job or manage your other postings.
Thank you for helping us get Americans back to work!