Senior Site Reliability Engineer
When you join Verizon
Verizon is one of the world’s leading providers of technology and communications services, transforming the way we connect across the globe. We’re a diverse network of people driven by our shared ambition to shape a better future. Here, we have the ability to learn and grow at the speed of technology, and the space to create within every role. Together, we are moving the world forward – and you can too. Dream it. Build it. Do it here.
What you’ll be doing...
We are looking to hire Senior Site Reliability Engineer for the VCG GTS Organization. In this role, you will lead a cross-functional team that develops the SRE Continuous Integration and Deployment framework and practice all tenets of SRE, vision and technical leadership to enable the execution of best-in-class middleware engineering practices that would improve reliability of applications. Help execute on our vision for Site Reliability Engineering (SRE), determining how each system relates to each other and using a breadth of tools, build CICD framework and automation to improve reliability for customers. Develop Source Code Management, centralizing and automating the configuration management process.
- Developing and maintaining pipeline configurations.
- Providing CI/CD environment build automation in a containerized environment that uses technologies such as EKS, Docker, Artifactory, Python, Shell, Ansible and Gitlab.
- Performing implementation, configuration and ongoing performance enhancements for ELK/EFK Logging platform in the on-prem & AWS environments.
- Delivering solutions towards automating, optimizing and supporting mission critical deployments in AWS, leveraging configuration management, CI/CD, and DevOps processes.
- Developing and implementing the next-gen monitoring solution for the enterprise based on SRE principles and practices.
- Developing an effective data-driven approach for monitoring and alerting that enables the SRE team to maintain high availability and deliver a high quality of service.
- Developing a log monitoring framework based on exceptions in access logs, server logs and platform logs.
- Analyzing data to understand customer experience and usage patterns to identify gaps in current monitoring.
- Working with SRE and dev engineers to fine tune alert thresholds, increase alert effectiveness by event correlation and pattern recognition.
- Developing and onboarding new monitoring features and capabilities for critical metrics and transition to operations.
- Defining standards, guidelines and templates for operational and business dashboards and metrics alerting.
- Developing a robust alerting system that can identify problematic anomalies and minimizes false alarms.
- Performing sustainable incident response.
Where you’ll be working…
In this hybrid role, you'll have a defined work location that includes work from home and assigned office days set by your manager.
What we’re looking for...
You’re a technical expert with solid credentials and a remarkable ability to find, break down, and troubleshoot problems. You work effectively with a wide range of internal and external stakeholders, and you’re great at partnering with the groups you support. You constantly look for ways to make our integration processes and deployments better, and you’re a natural mentor to junior engineers looking to develop their technical skills.
You’ll need to have:
- Bachelor’s degree or four or more years of work experience.
- Four or more years of relevant work experience.
Even better if you have one or more of the following:
- Bachelor's degree in Computer Science.
- Knowledge of SRE practices and principles to build resilient systems and to provide business continuity.
- Experience in Application Performance Management, Containerized Workloads , Logging, Alerting, Configuration Management Tools, API Management Tools, Cloud and Source Code Management.
- Experience in creating rich Grafana, Kibana and new Relic visualizations and dashboards for providing key metric monitoring information to users and support staff.
- Experience in development of log monitoring framework based on exceptions parsing.
- Experience in installing, configuring and maintaining Elasticsearch, Logstash, and Kibana logging platform.
- Experience in working with version control systems like (GIT, subversion or mercurial), creating Jenkins pipelines to build code artifacts and deploy the code.
- Experience in automation and ability to code or script at an advanced level.
If Verizon and this role sound like a fit for you, we encourage you to apply even if you don’t meet every “even better” qualification listed above.
Equal Employment Opportunity
We're proud to be an equal opportunity employer - and celebrate our employees' differences, including race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, and Veteran status. At Verizon, we know that diversity makes us stronger. We are committed to a collaborative, inclusive environment that encourages authenticity and fosters a sense of belonging. We strive for everyone to feel valued, connected, and empowered to reach their potential and contribute their best. Check out our diversity and inclusion page to learn more.