Computer Scientist / Systems Engineer ( R-00068309 )
Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact in High Performance Computing and AI solutions within the Intelligence Community? Our clients are building high performance (HPC) and accelerated compute environments from the ground up, performing modeling and simulation on GPUs designed for a variety of workloads. You will help in the creation and maintenance of a DevOps process for these efforts, from the basic data collection and preprocessing, to assisting with the frameworks to build and train models in AI and Machine Learning within a Research and Development environment. Your ability to translate mission needs into technical solutions, makes you an integral part of delivering a customer-focused engineering solution.
As a Systems and DevOps Engineer on the RECON project, you have the chance to shape the geospatial intelligence mission by being a part of, or leading, a multi-functional accelerated-compute engineering team. Your customer will trust you to not only architect and engineer these environments, but also evolve them with advanced technology solutions. On our team, you’ll be able to broaden your skillset into areas like DevOps, accelerated-compute, GPU-processing, and cluster management. Grow your skills by merging systems engineering, on-premise environments, Cloud and virtual architecture, and AI and ML frameworks to create a high-performance environment. Join our team and create the future of accelerated compute in the GEOINT mission.
Empower change with us.
The selected candidate will have:
- BS and 8+ years' directly relevant experience OR MS and 6+ years' directly relevant experience (add'l experience may be considered in lieu of degree)
- Active TS/SCI with Poly
- A strong experience with working on Linux systems
- Experience with building and deploying containerized, GPU-enabled applications in Docker, Singularity, or Kubernetes
- Experience with orchestration and cluster management tools, including Slurm, Mesos, or Moab
- Experience with AI and Machine Learning Development Tool Sets, including Jupyter, Keras, TensorFlow,
- MPI, OpenMP, OpenCL, or CUDA
- Lustre and Infiniband maintenance and troubleshooting. Infiniband/fibre/network plumbing, configuration, and maintenance.
- Experience with deploying systems in both on-premise and Cloud environments, including AWS, Azure, or Google
- Server hardware maintenance and troubleshooting.
- Created and maintained system documentation.
- RHEL and CentOS administration and ACE cluster administration for HPC clusters.
- Experience with supporting environments for massively parallel computation
- Experience with certification and accreditation of containers
- Experience with programming and implementing scientific and physics M&S algorithms, Big Data, and Data Science
- Experience with optimizing applications to use AI and ML toolsets