Member of Technical Staff - Cloud Infrastructure / DevOps (Boston)

Share this job

Boston, Massachusetts, United States

Member of Technical Staff - Cloud Infrastructure / DevOps (Boston)

Location: Boston (Hybrid)

Compensation: $250K-$350K Base + Equity + Benefits

Stack: AWS, GCP, Kubernetes, Terraform, CI/CD, cloud infrastructure, infrastructure-as-code, observability, distributed systems, and production AI platforms.

TLDR

Our client is a well-funded frontier AI startup operating at the intersection of artificial intelligence and fundamental science, building advanced systems designed to accelerate scientific discovery.
Engineers have the opportunity to build foundational infrastructure that supports cutting-edge AI research, large-scale experimentation, production services, and next-generation scientific computing workloads.
This is a highly impactful role with ownership of the company’s infrastructure platform end-to-end, spanning cloud architecture, release engineering, observability, reliability, and developer productivity.
The infrastructure team serves as a force multiplier for the entire engineering organization, enabling researchers and engineers to move faster while maintaining high operational standards.
For infrastructure engineers who enjoy building systems from the ground up, influencing technical direction, and operating in high-talent, high-autonomy environments, this role offers significant ownership and long-term upside.

Requirements

4+ years of experience building and operating cloud infrastructure in production environments with strong engineering standards.
Deep expertise with infrastructure-as-code, Kubernetes, CI/CD systems, cloud platforms, and modern DevOps practices.
Strong hands-on experience with AWS, GCP, or multi-cloud environments supporting production workloads at scale.
Proven operational excellence, including incident response, observability, automation, reliability engineering, and reducing operational toil through tooling.
Strong communication and collaboration skills with a track record of enabling engineering teams through self-service infrastructure and developer productivity improvements.

Bonus Skills

Experience supporting machine learning infrastructure, model training workloads, GPU scheduling, or AI platforms.
Experience with model serving, inference infrastructure, or production ML systems.
Experience with observability platforms, monitoring systems, and large-scale operational tooling.
Background in scientific computing, HPC, research computing, or other compute-intensive environments.
Strong systems design, platform engineering, or technical leadership experience.

Responsibilities

Design, build, and operate scalable cloud infrastructure that supports research, production services, and growing engineering workloads.
Own CI/CD, release engineering, and deployment systems that enable fast, reliable, and secure software delivery.
Establish and evolve infrastructure-as-code practices, automation frameworks, and operational standards across the organization.
Build observability, monitoring, alerting, and reliability systems that improve platform stability and developer productivity.
Partner closely with engineering and research teams to remove bottlenecks and enable rapid experimentation and execution.
Help shape the long-term infrastructure strategy supporting next-generation AI and scientific computing platforms.

About

Our client is building advanced AI systems designed to accelerate scientific discovery by combining frontier machine learning techniques with deep scientific expertise.
Infrastructure Engineers play a critical role in the company’s success, owning the platform that powers research workflows, production services, deployment pipelines, and future AI infrastructure initiatives.
This is a highly autonomous role with significant influence over architecture, tooling, operational practices, and developer experience within a rapidly growing technical organization.
The role offers exposure to cloud infrastructure, distributed systems, AI platforms, large-scale compute environments, and the operational challenges associated with frontier AI research and production systems.

Apply for this job