Oracle Site Reliability Developer 3 in Bengaluru, India
Site Reliability Developer 3
MySQLaaS is Oracle's MySQL database as a Service, a managed PaaS service operating on Oracle Cloud Infrastructure (OCI). MySQL is the world's most popular open source database, and is used by many of the world's largest and fastest-growing organizations. Oracle Cloud is the industry's broadest and most integrated public cloud and helps organizations increase business agility, lower costs, and reduce IT complexity.
MySQLaaS delivers a next-generation managed PaaS cloud service for MySQL. Built, operated and supported by the Oracle staff responsible for the MySQL products, MySQLaaS offers MySQL users secure, stable and performant MySQL services for those requiring an enterprise-class experience.
The MySQLaaS Site Reliability Engineering (SRE) team partners with the MySQLaaS Control and Data Plane development teams to provide operational support through deployment, monitoring, change management, problem analysis, capacity planning and mitigation or remediation of problms within all deployed components of MySQLaaS, consistent with established processes and operating procedures in order to help meet our industry competitive SLA's.
We are looking for a strong Site Reliability Engineer who will help ensure the availability of MySQLaaS services 24x7x365. Your goal is to improve availability by reducing time to mitigate, ensure we are measuring the right things, and automating tasks that impact development velocity, availability or productivity. You will leverage excellence in communication, technical/business analysis, problem solving and attention to detail to methodically resolve issues. Technically, you will understand the full stack of the services you support and are able to dig deep into the service to determine how to best mitigate customer impact.
Further, you will drive improvements through the development of tools and engage partner teams to drive down incident counts, reduce severity of events and minimize Time to Mitigate. We will look to you in SRE to continually review and enhance systems, methods and applications to enable the delivery of a positive customer experience to OCI and MySQLaaS.
Working and coordinating closely with colleagues in EMEA and the Americas, this team will ramp up staffing to eventually cover APAC daylight hours on both weekdays and weekends. The standard Oracle compensation rules for on-call, shifts and weekends will apply.
Perform proactive service checks and monitor/triage incoming system/application alerts, E-mails and phone calls to ensure appropriate priority and response.
Triage and troubleshoot service impacting events from multiple signals including phone, E-mail, service telemetry and alerting.
Identify and work with engineering to implement opportunities for automation, signal noise reduction, recurring issues and other actions to reduce time to mitigate service impacting events and increase the productivity of cloud operations and development resources.
Coordination, documentation and tracking of critical incidents ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.
Participate in project delivery aimed at increasing capabilities around monitoring, notification, configuration and deployment of services within the Oracle Cloud Infrastructure.
Though we are not expecting the Site Reliability Engineering team to take on shifts and work on weekends initially, we do want this team be open to work shifts and weekends. The standard Oracle compensation rules for on-call, shifts and weekends will apply.
Strong experience building highly scalable distributed systems.
Hands-on experience building and operating entire components or services.
Experience working with distributed teams.
Understanding of Linux operating systems, administration and networking.
Strong development experience in DevOps tooling, such as Python, Bash, Ansible and Chef.
Strong understanding of MySQL, NoSQL systems, storage and distributed persistence technologies.
Strong troubleshooting and performance tuning skills.
BS degree or equivalent experience relevant to functional area.
- Experience in Java, C , C#, or similar OO languages.
Detailed Description and Job Requirements
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
A BS or MS in Computer Science, or equivalent. Identifies solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 5 years experience of running large scale customer facing web services.
Job: Product Development
Job Type: Regular Employee Hire
- Oracle Jobs