Job was saved successfully.
Job was removed from Saved Jobs.

Job Details


Oracle

Principal Site Reliability Engineer

Technology

Applications Engineer

No

Santa Clara, California, United States

"**Principal Site Reliability Engineer****Preferred Qualifications****Who are we?**Oracle Database-as-a-Service (DBaaS) brings the power and versatility of the Oracle Database to the cloud. As a set of service offerings built on top of Oracle Cloud Infrastructure, it integrates award winning Oracle database technologies such as the Oracle Autonomous Database, Oracle Exadata and Oracle Real Application Clusters (RAC) making them available to customers of all stripes both for Online Transaction Processing as well as Data Warehousing workloads.More information:https://www.oracle.com/database/what-is-a-cloud-databasehttps://www.oracle.com/database/autonomous-database.htmlhttps://www.oracle.com/database/what-is-autonomous-database.html**What do we do?**The Oracle DBaaS group builds and operates Oracles DBaaS service offerings on top of a massive fleet of highly available, enterprise grade baremetal, virtual machine and exadata infrastructure.**What are we looking for?**We are looking for experienced DevOps and Site Reliability Engineers to create highly reliable and ultra-scalable software systems to manage the operations of our services. An SRE spends a significant amount of their time doing operations work such as troubleshooting production issues and service on-call. When not working on operations the SRE is working on software engineering tasks such as design and development of systems that increase our reliability, scalability and reduce operational overhead through automation. The ideal SRE candidate is a programmer who also has broad knowledge and experience in areas such as networking, containers, virtual machines, databases and Linux systems.**Qualifications:**+ Masters or Bachelors degree in Computer Science or related discipline+ A minimum of 5 years of work experience as a software, site reliability or customer support engineer+ Top notch troubleshooting skills in Oracle database, Linux based distributed systems and Micro services+ Excellent verbal and written communication skills. Ability to communicate with all levels during critical events.+ A solid understanding of one or more of the following:+ Knowledge of internet protocols and virtualization technologies (Xen, KVM)+ Linux internals especially performance monitoring tools such as sar, vmstat, mpstat+ Experience with scripting/programming languages such as Python, Java, Perl, Bash, etc+ Experience with metric and log aggregation tools (ELK stack, Grafana)+ Experience with infrastructure automation tools such as Terraform / Ansible / Chef+ Highly desirable skills (not a must-have):+ Previous experience in DevOps, Site Reliability Engineering a huge plus+ Experience developing and operating REST APIs and Docker containers**Responsibilities**+ Design, build, develop and maintain automation software, scripts, and tools+ Work closely with development team to maintain operational health of core DBaaS services for API availability and low latency+ Managing and triaging tickets. Driving prioritization and execution of work based on impact.+ Drive new run-books to help reduce mean triage time of incidents. Prioritize and automate high hit count run-books+ Configure and tune monitoring and instrumentation system(s) to efficiently detect existing issues and predict future issues based on trends+ Collaborate with service owners to ensure important metrics are being monitored+ Lead root cause analysis, effectively implement identified preventive measures, use data to identify the scope and scale of impact+ Incident response, On-call management. Being on-call+ Triage and troubleshoot service impacting events from multiple signals including phone, E-mail, service telemetry and alerting.+ Communicate with professionalism and precision to internal and external customers during high priority situations.**Detailed Description and Job Requirements**Design, develop, troubleshoot and debug software programs for databases, applications, tools, networks etc.As a member of the software engineering division, you will take an active role in the definition and evolution of standard practices and procedures. You will be responsible for defining and developing software for tasks associated with the developing, designing and debugging of software applications or operating systems.Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience.**Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.****Job:** Product Development**Location:** US-CA,California-Santa Clara**Job Type:** Regular Employee Hire**Organization:** Oracle"