Member of Technical Staff - Cloud Infrastructure / DevOps (Boston)
Location: Boston (Hybrid)
Compensation: $250K-$350K Base + Equity + Benefits
Stack: AWS, GCP, Kubernetes, Terraform, CI/CD, cloud infrastructure, infrastructure-as-code, observability, distributed systems, and production AI platforms.
TLDR
- Our client is a well-funded frontier AI startup operating at the intersection of artificial intelligence and fundamental science, building advanced systems designed to accelerate scientific discovery.
- Engineers have the opportunity to build foundational infrastructure that supports cutting-edge AI research, large-scale experimentation, production services, and next-generation scientific computing workloads.
- This is a highly impactful role with ownership of the company’s infrastructure platform end-to-end, spanning cloud architecture, release engineering, observability, reliability, and developer productivity.
- The infrastructure team serves as a force multiplier for the entire engineering organization, enabling researchers and engineers to move faster while maintaining high operational standards.
- For infrastructure engineers who enjoy building systems from the ground up, influencing technical direction, and operating in high-talent, high-autonomy environments, this role offers significant ownership and long-term upside.
Requirements
- 4+ years of experience building and operating cloud infrastructure in production environments with strong engineering standards.
- Deep expertise with infrastructure-as-code, Kubernetes, CI/CD systems, cloud platforms, and modern DevOps practices.
- Strong hands-on experience with AWS, GCP, or multi-cloud environments supporting production workloads at scale.
- Proven operational excellence, including incident response, observability, automation, reliability engineering, and reducing operational toil through tooling.
- Strong communication and collaboration skills with a track record of enabling engineering teams through self-service infrastructure and developer productivity improvements.
Bonus Skills
- Experience supporting machine learning infrastructure, model training workloads, GPU scheduling, or AI platforms.
- Experience with model serving, inference infrastructure, or production ML systems.
- Experience with observability platforms, monitoring systems, and large-scale operational tooling.
- Background in scientific computing, HPC, research computing, or other compute-intensive environments.
- Strong systems design, platform engineering, or technical leadership experience.
Responsibilities
- Design, build, and operate scalable cloud infrastructure that supports research, production services, and growing engineering workloads.
- Own CI/CD, release engineering, and deployment systems that enable fast, reliable, and secure software delivery.
- Establish and evolve infrastructure-as-code practices, automation frameworks, and operational standards across the organization.
- Build observability, monitoring, alerting, and reliability systems that improve platform stability and developer productivity.
- Partner closely with engineering and research teams to remove bottlenecks and enable rapid experimentation and execution.
- Help shape the long-term infrastructure strategy supporting next-generation AI and scientific computing platforms.
About
- Our client is building advanced AI systems designed to accelerate scientific discovery by combining frontier machine learning techniques with deep scientific expertise.
- Infrastructure Engineers play a critical role in the company’s success, owning the platform that powers research workflows, production services, deployment pipelines, and future AI infrastructure initiatives.
- This is a highly autonomous role with significant influence over architecture, tooling, operational practices, and developer experience within a rapidly growing technical organization.
- The role offers exposure to cloud infrastructure, distributed systems, AI platforms, large-scale compute environments, and the operational challenges associated with frontier AI research and production systems.