Oracle Monitoring Software Developer - Prometheus in Frisco, Texas
Monitoring Software Developer - Prometheus
Monitoring and Observability Services
Organization : GBU Cloud Development Services, Cloud Reliability Services
Cloud Reliability Services (CRS) Description:
Cloud Reliability Services (CRS) is a strategic component that will transform Oracle’s Global Business Unit’s (GBU) cloud operations. As cloud service operations evolve from a predominately re-active model (i.e., responding to emergencies with high levels of human involvement) to a pro-active model (i.e., preventing emergencies and outages with intelligent tools, services, and automation), the mission of CRS will ensure that all GBUs can efficiently operate ultra-scalable and highly-reliable SaaS/Foundation services, across multiple operating models as they iterate to become Cloud Native.
Monitoring & Observability Team Description:
The Cloud Reliability Services (CRS) - Service Monitoring & Observability (M&O) team provides an integrated suite of tools to allow partners to monitor core Business KPIs as well as user experience and Service SLA reporting. Besides, the service allows for the monitoring of Compute, Dependencies, and Network Infrastructure. The service provides greater visibility into core components while also overlaying Customer Experience to better focus engineering resources when incidents occur as well as prevent occurrence. The M&O team is a fast-paced, highly motivated team that embodies “We” rather than “I”. CRS – M&O will be a geo-diverse team that will allow it to quickly respond to customer engagements and challenges, and adapt quickly to incoming incidents while still delivering on committed features and enhancements.
Roles & Responsibilities:
The candidate will work with highly-skilled, highly motivated engineers using agile methodologies based on Scrum or Kanban, and incorporating enterprise agile practices from Scaled Agile Framework (SAFe).
The team embraces a DevOps environment – the Developers are the Operators. The work environment is to treat everything as code (code, configuration, infrastructure, pipelines, everything) to achieve the highest quality product in the most efficient amount of time.
You will work alongside a software development team within the greater Oracle Cloud Reliability Engineering team where you will develop new features as well as expand and support existing features.
One week you may be writing automated tests for an existing feature. The next week you may be developing a new feature (design, code, test, and deploy) for a customer in our environment. The next week you may be providing support to a customer on your new feature.
You will learn new technologies based on what we already deploy and use. You will also learn about and research new technologies that you bring to the team to better our offerings.
You will play a key role in building more intelligence, into CRS services that we deliver so that SaaS services function more and more autonomously over time.
Per team roles and responsibilities:
Work with the Product Owner and team members to build new features and enhancements, while supporting existing M&O services being heavily utilized across all Oracle Global Business Units
Planning, designing, coding, documenting and testing of new Monitoring & Observability services used by multiple Oracle Software as a Service products
Develop software using Agile methodologies and participate as a member of scrum development teams
Use Everything-As-Code methodologies to ensure traceability, configurability, immutability, repeatability, and governability
Participate in a follow-the-sun model for 24x7 support of CRS – M&O services for designated engineers on a rotating basis
Manage and continuously improve existing CRS – M&O capabilities
Review and approve the work products of other team members
Supporting the operation of services using DevOps methodologies for the rapid introduction to production of new services and operational enhancements
Technical thought leadership and mentoring of junior colleagues
Attend training as required to meet Oracle and CRS compliance and regulatory standards. Perform daily tasks in accordance with compliance and regulatory standards
Other duties as assigned
Ability to explore and learn multiple, cutting edge technologies in the Cloud industry
Skills to solve complex technical problems and communicate effectively in a team environment
Good understanding of CI/CD best practices
Ability to advance automation of standard/recurring tasks
Experience with development/test in an open source environment including operation of SSH and shell functions
Good networking knowledge
Ability to assimilate and apply new technologies
Experience with Software Configuration Management (SCM) tools and software engineering best practices
Willingness to work with remote, global teams as well as individually
Ability to produce documentation for application engineers in support of developed work
Agile methodology knowledge
Self-motivation and fast learning skills
An ideal candidate will have expertise with as many as the following:
Programming and scripting languages (Python, Bash, Java Script - additional experience with, Java, Ansible, and/or Go is a plus)
Containers and orchestration (Docker, Kubernetes, and docker-compose)
Experience in Prometheus and/or Grafana is a must
JAEGER tracing technology
Linux/Unix development (Oracle Linux preferred)
Oracle database experience
CI/CD (Jenkins and GitLab CI)
Cloud computing platform (Oracle Cloud Infrastructure Services)
Git version-control and collaboration (GitLab)
Issue tracking and collaboration (Jira and Confluence)
Experience with market-leading Monitoring solutions is expected
Product/Service ownership or Project Management experience is a plus
Experience in ITIL V3 or V4, Foundation Level certified is preferred
5 years of experience in Agile methodology and Scrum framework is expected
Oracle GBUs provide services to many critical systems globally requiring 24x7 support. DevOps engineers will rotate with other team members in a designated, on-call status following in-country requirements. CRS’s primary support model is Follow the Sun, utilizing geographically diverse team members during normal working hours to provide support. CRS will strive to have subject matter experts distributed globally. With geographic diversity, countries and regions have a broader array of holidays requiring a flexible support schedule across multiple geographies. Additionally, coverage is required throughout the weekend.
To provide the required support to Oracle customers, CRS will use additional compensation to cover extended business hours and/or on-call pay based on in-country laws and Oracle policy. In general, there will be a Primary and Secondary engineer designated in advance to provide coverage for select services. If numerous services are supported in a specific geographic region, there may be more than one set of Primary/Secondary engineers selected. Software engineers and database administrators will engage in activities to restore services that are down or degraded. This may be as simple as running an existing script to restart a service or executing a standard operating procedure or may require code changes with review steps, integration, testing, and software deployments to restore service to normal operation. Besides, engagement with other Oracle development and support teams may be needed.
Detailed Description and Job Requirements
Design, develop, troubleshoot and debug software programs for databases, applications, tools, networks etc.
As a member of the software engineering division, you will take an active role in the definition and evolution of standard practices and procedures. You will be responsible for defining and developing software for tasks associated with the developing, designing and debugging of software applications or operating systems.
Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience.
As part of Oracle's employment process candidates will be required to successfully complete a pre-employment screening process. This will involve identity and employment verification, professional references, education verification and professional qualifications and memberships (if applicable).
Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.
Job: Product Development
Other Locations: US-TX,Texas-Frisco, IN-IN,India-Bengaluru, IN-IN,India-Pune
Job Type: Regular Employee Hire
- Oracle Jobs