Site Reliability Engineer ( R-00069708 )
The National Solutions Group at Leidos has an opening for Site Reliability Engineering to focus on the overall reliability of a complex and ever-changing system with a fast growing user base. Your responsibilities will include maintaining complex computer systems by writing code to automate software releases, monitor systems, and detect and fix problems before users even know there is an issue. You will use these skills to improve site performance and overall reliability.
As a Site Reliability Engineer, you will:
- Work alongside the development team to ensure speedy and reliable software deployments, monitor systems, and improve overall reliability of the platform. In addition, as you discover and document system bugs, you have the motivation to go off and fix them yourself.
- Develop features to automate, scale, test, and secure the cloud infrastructure and the pipelines
- Enhance performance monitoring of the various systems
- Identify performance bottlenecks and optimize the performance of cloud infrastructure
- Perform and support deployments to multiple AWS cloud systems
- The ideal site reliability engineer candidate is either a software engineer with a good administration background or a highly skilled system administrator with knowledge of coding and automation.
- Must be a highly adaptable engineer who are willing to step outside of their comfort zone to accomplish the mission
- Candidate must have a TS/SCI clearance and have or be willing to obtain a current Polygraph
- Bachelor's degree with 4-8 years of experience or equivalent in Computer Science, Computer Engineering, or related field degree or 2-4 years with a Masters degree.
- Strong Infrastructure as Code programming skills
- Advanced automated script design, coding, debugging, and maintenance skills (using bash, python, etc.)
- Experience in CI/CD toolsets (e.g. Jenkins, GitLab, SonarQube, etc.)
- Experience with Containerization (Docker) and Container Orchestration (Kubernetes)
- Good command of Linux/Unix and command line knowledge
- Experience in application administration, configuration, and integration
- Familiarity with agile development methodologies
- Skilled and disciplined to work with a distributed team
- Ability to multi-task in a fast-paced environment with multiple deadlines is essential
- Passion for new technologies, software, and processes
- Significant experience with cloud native technologies
- Familiarity with AWS
- Current certification compliant with DoD 8570 IAT level 3 (COMPTIA Security +)