Job was saved successfully.
Job was removed from Saved Jobs.

Job Details

UnitedHealth Group

Senior Site Reliability Engineer - Noida, UP


Senior Network Engineer


Noida, Uttar Pradesh, India

Combine two of the fastest-growing fields on the planet with a culture of performance, collaboration and opportunity and this is what you get. Leading edge technology in an industry that's improving the lives of millions. Here, innovation isn't about another gadget, it's about making health care data available wherever and whenever people need it, safely and reliably. There's no room for error. Join us and start doing your life's best work.(sm)

This is a Lead Site Reliability Engineer position. The resource will primarily focus focuses on driving high reliability into systems by working closely with software development and IT-operations teams. Site reliability engineers incorporate various software engineering aspects to develop and implement services that improve IT and support teams. Services can range from production code changes to alerting and monitoring adjustments. The site reliability engineer job also includes tasks like building proprietary tools from the scratch to mitigate weaknesses in incident management or software delivery.

Primary Responsibilities:

  • Automate every operational task is a core requirement for this role. For example, package updates, configuration changes across all environments, creating tools for automatic provisioning of user facing services, etc.
  • Respond to platform emergencies, alerts, and escalations from Customer Support
  • Ensure systems exist to manage software life-cycles (e.g. Operating Systems) with a minimum of manual effort
  • Develop a fully automated multi-environment observability stack based on the existing SaaS system, and extend it to predict capacity needs based on the usage patterns
  • Plan for new service roll-outs, expansion and capacity management of existing services, and work with users to optimize their resource consumption
  • Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so

Success Factors:

  • Troubleshoot, evaluate, and resolve operational challenges and come up with solutions that make issues debugging easier for Ops and Dev team
  • Design/Architect dashboards on Datadog/Grafana that comprehensibly covers member journeys and make it easier for Ops and application teams to find root cause (preferably within 5 minutes)
  • Being able to act as a gatekeeper for Changes rolling into Production and ensure zero downtime because of Production Changes
  • Being able to identify performance loopholes and contribute in betterment of performance of APIs and Services and page load times
  • Work with other engineering stakeholders on resolving larger architectural bottlenecks and participate
  • Scale systems through automation, improving change velocity and reliability
  • Leverage technical skills to partner with team members and be comfortable diving into a problem as needed
  • Work to enable other teams to scale through automation, knowledge-sharing, and self-service activities

Required Qualifications:

  • Bachelor’s degree in computer science (or equivalent)
  • 8+ years in the software engineering profession across entire SDLC
  • Proven work experience as a Site Reliability Engineer or similar role
  • Strong experience in projects requiring application re-engineering, modernization
  • Strong knowledge of modern Digital technologies and landscape
  • Strong knowledge of scripting
  • Strong knowledge of monitoring and alerting concepts
  • Strong knowledge of monitoring tools like DTSaas, Datadog, Splunk etc.
  • Strong knowledge of Grafana and how to hook it with various monitoring tools
  • Strong knowledge of application availability concepts
  • Strong knowledge of Infrastructure, networking concepts
  • Advanced knowledge of cloud (preferably AWS) concepts
  • Ability to collaborate and communicate asynchronously

Careers with Optum. Here's the idea. We built an entire organization around one giant objective; make health care work better for everyone. So when it comes to how we use the world's large accumulation of health-related information, or guide health and lifestyle choices or manage pharmacy benefits for millions, our first goal is to leap beyond the status quo and uncover new ways to serve. Optum, part of the UnitedHealth Group family of businesses, brings together some of the greatest minds and most advanced ideas on where health care has to go in order to reach its fullest potential. For you, that means working on high performance teams against sophisticated challenges that matter. Optum, incredible ideas in one incredible company and a singular opportunity to do your life's best work.(sm)