Lead Observability Engineer - Charlotte, NC-Irving, TX

Gathering your results ...

Job Details

Lead Observability Engineer - Charlotte, NC-Irving, TX - Iselin, NJ

Mphasis Limited New York, NY

Days Posted: 2 days

Experience Level:Not Specified

Employment Type:Not Specified

Pay Range: Not Specified

Lead Observability Engineer - Charlotte, NC-Irving, TX - Iselin, NJ Job Title: Lead Observability Engineer< Location: Charlotte, NC-Irving, TX - Iselin, NJ Employment Type: Full-Time< About this Role< An observability engineer designs, implements, and maintains systems to monitor, analyze, and report on the health and performance of software applications and infrastructure, ensuring high availability, performance, and security. They are crucial in understanding complex IT systems and proactively addressing potential issues< In this Role, You Will:< < Designing and Implementing Observability Pipelines: Observability engineers create robust pipelines to collect, aggregate, and analyze data from various sources.< Monitoring and ing: They establish monitoring systems and s to detect anomalies and performance issues in real-time.< Metric & Instrumentation Standards: Defining common metric standards for every stage of the Application Lifecycle process and Instrumentation standards and scripting including OTel standards alignment< Data Analysis and Visualization: They analyze telemetry data (logs, metrics, traces) to gain insights into system behavior and identify trends.< Incident Response: They investigate and troubleshoot incidents, using observability data to understand the root cause and implement solutions.< Collaboration and Communication: They collaborate with development, SRE, and other teams to ensure observability practices are integrated into workflows and to share insights. Staying Up-to-Date: They stay current with the latest trends in observability, logging, monitoring, and cloud technologies. Documentation and Knowledge Sharing: They create comprehensive documentation for observability systems and processes and share knowledge with other teams. Skills and Knowledge: <ul> <li>Strong understanding of distributed systems: They need to understand the complexities of modern architectures, including microservices, cloud-native environments, and hybrid infrastructure. </li><li>Proficiency in observability tools: They are familiar with tools for logging, metrics, and tracing, such as ELK Stack, Prometheus, Grafana, and distributed tracing systems. </li><li>Data analysis and visualization skills: They can analyze telemetry data to identify trends and patterns and create visualizations to communicate insights. </li><li>Scripting and automation: They can automate tasks and create scripts to manage observability infrastructure.< </li><li>Problem-solving skills: They can diagnose and troubleshoot system issues using observability data.< </li><li>Communication skills: They can effectively communicate technical information to both technical and non-technical audiences.< </li><li>Experience with cloud platforms: They have experience with cloud platforms like AWS, Azure, and GCP.< </li><li>Understanding of IT service management practices: They understand IT service management practices like change management, release management, incident management, and problem management.< < </li></ul> Required Qualifications:< < <ul> <li>Demonstrated experience in Observability monitor, analyze, and report on the health and performance of software applications and infrastructure .< < </li></ul> Desired Qualifications:< 8+ years of experience in observability, monitoring, and reliability engineering across large-scale enterprise or cloud-native environments.< Strong expertise in observability tools and platforms such as Prometheus, Grafana, ELK/OpenSearch, Splunk, Dynatrace, AppDynamics, or equivalent.< Hands-on experience designing and implementing observability pipelines for logs, metrics, and traces in distributed systems.< Deep understanding of OpenTelemetry (OTel), including instrumentation standards, collectors, exporters, and vendor-neutral telemetry architectures.< Strong analytical and troubleshooting skills, using telemetry data for incident investigation, root-cause analysis, and performance optimization.< Proficiency in scripting and automation (Python, Go, Bash/PowerShell) with strong collaboration skills to work across Dev, SRE, and Platform teams.< Work Environment & Benefits:< < <ul> <li>Hybrid Work Model: Combination of on-site and remote work, depending on business needs.< </li><li>Collaborative Culture: Work closely with cross-functional teams, vendors, and senior leadership.< </li><li>Professional Development: Access to training programs, certifications, and career advancement opportunities.< </li><li>Global Impact: Support a mission-critical network infrastructure serving millions of customers worldwide.< </li></ul> New York5 - 8 Years10H01-Apr-2026YACTIVE117792-3-1

POST A JOB

It's completely FREE to post your jobs on ZiNG! There's no catch, no credit card needed, and no limits to number of job posts.

The first step is to SIGN UP so that you can manage all your job postings under your profile.

If you already have an account, you can LOGIN to post a job or manage your other postings.

Thank you for helping us get Americans back to work!