Job was saved successfully.
Job was removed from Saved Jobs.

Job Details

Sr Splunk Cloud Engineer


Cloud System Engineer


Baltimore, Maryland, United States


Leidos is hiring an energetic, motivated, innovative individual to be a part of our team supporting Center for Medicare and Medicaid Services (CMS) in Baltimore, MD. The Cloud SRE works closely with the Program team to manage, maintain, and optimize application’s data and infrastructure that support CMS and the public. You will deliver solutions that ultimately ensure that the functions of Medicare, Medicaid, and Marketplace are carried out for the US citizen and contribute to efforts to reduce healthcare costs.

The role of a Sr Splunk Cloud Engineer-SRE will require you to develop solutions that are highly innovative and achieved through research and integration of best practices. Influence development of solutions that impact strategic project/program goals and business results while also leading work of other technical staff. You will resolve highly complex problems using significant application of technical knowledge, conceptualizing, reasoning, and interpretation. You will interact daily with various technical resources across different vendors which are fulfilling technical requirements for the customer.

Your goal will be to work with all stakeholders to help Leidos ensure delivery of high-quality, robust, and scalable solutions with minimal business impact. Lastly and most importantly, you will represent our program when meeting with the Application Development Organizations (ADO) and identify opportunities to provide support as well as modernization and innovation opportunities for their application.

The current work environment is remote leveraging various tools such as Slack, Microsoft Teams, and Zoom.

Primary Responsibilities

  • Successful candidate will be a member of a cross functional team comprised of well-rounded engineers who can learn new skills rapidly and work across multiple functional domains to carry out end-to-end delivery of infrastructure services.

  • Support the full system life-cycle of Splunk across geographically dispersed enterprise datacenters.

  • Customize queries, reports and dashboards.

  • Participate in architecture and on-going design meetings to ensure adequate logging while enabling business value and outcomes.

  • Monitor system stability and performance and ensure system availability, reliability, and usability.

  • Troubleshoot complex problems, resolving operational issues, software fault diagnosis, & interacting with vendors, etc.

  • Work closely with Leidos Engineering, Operations staff as well as the customer’s application owners to solve technical problems at the network, system, and application levels.

  • Lead the team in all areas of telemetry and observability.

  • Responsible and accountable for managing and following up on incidents, changes, and application release problems through the management channels.

  • Participate in on-call rotation and respond to incident alerts.

  • Focus on proactivity and enablement of self-healing systems.

  • Serve as the expert in creation of KPI’s and alerting thresholds for meaningful metrics relative to the health and performance of the applications the team manages.

  • Must be a team player, but able to work independently on large, complex projects and assignments in fast paced environment.

  • Provide leadership in problem determination/analysis, isolating system problems utilizing diagnostic and system management tools.

  • Always provide professional and courteous service with excellent verbal and written communications skills.

  • Model inclusive leadership to teammates by building diversity into activities and meetings.

Basic Qualifications:

  • BS degree in in computer science or some equivalent, highly technical discipline. Experience may be substituted in lieu of degree.

  • 5+ years in technical engineering relative to the responsibilities of Cloud Engineering and Site Reliability Engineering.

  • Strong background designing, deploying, and maintaining Splunk in a large, distributed environment.

  • Experience with Splunk Search Processing Language.

  • Experience creating Spunk dashboards.

  • Experience with IT Service Intelligence (ITSI).

  • General understanding of Splunk knowledge objects (e.g. fields, lookups, macros, etc.).

  • Through understanding of coding best practices, including knowing how to code, typically in a variety of languages, both in a structured and OOP way (e.g., Python, Golang, Ruby, C/C++).

  • Proficient in programming languages for automation (e.g., python) and shell scripting (e.g., bash).

  • Deep knowledge of version control (e.g., Git) and ability to create GitOps practices.

  • Extensive experience with configuring and maintaining monitoring and alerting tools such as Nagios, CloudWatch, Grafana, Prometheus, Splunk ITSI.

  • Proficient in incident management tools (e.g., Splunk On-Call, PagerDuty).

  • Experience with variety of relational and non-relational databases/RDS (e.g., DynamoDB, MongoDB, CosmoDB, PostgreSQL).

  • Strong and relevant experience in cloud technologies, cloud services, IaC, cloud storage, cloud networking and cloud security.

  • Strong knowledge and experience with Cloud IaaS, PaaS, and SaaS offerings.

  • Strong experience with automation and CI/CD tools (e.g., Argo, Jenkins, Travis, Ansible).

  • Knowledge of cloud-based security tools, best practices and policies including demonstrated experience protecting all layers of the application stack.

  • Proficient in DEVOPS tools, processes, and practice.

  • Is well versed in the development and implementation of automation scripts and processes.

  • Knowledge of the Software Delivery Life Cycle (SDLC).

  • Excellent writing and verbal communication skills.

  • Ability to manage conflict effectively.

  • Ability to adapt and be productive in a fast-paced dynamic environment.

  • Excellent communication and collaboration skills supporting multiple stakeholders and business operations.

  • Self-starter, self-managed, and a team player.

Preferred Qualifications

  • Cloud certification (e.g., AWS Solutions Architect Associate, Azure Administrator).

  • Certification as a Splunk Certified Architect or Splunk Certified Admin.

  • Experience with setting up self-healing components within an application’s infrastructure.

  • Agile-based knowledge and skill, including experience with Scrum Ceremonies and work management tools (e.g., (JIRA, Confluence).

  • Security Skills—Knowledge of information assurance compliance and information security basics within CMS.

Required Clearance

  • Ability to obtain a Public Trust clearance.

All candidates supporting the CMS programs must have lived in the United States at least three (3) out of the last five (5) years prior in order to be considered.

Pay Range:

Pay Range $97,500.00 - $176,250.00

The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.