Share this job
Member of Technical Staff - Cloud Infrastructure / DevOps (Boston)
Boston, Massachusetts, United States
Apply for this job

Member of Technical Staff - Cloud Infrastructure / DevOps (Boston)


Location: Boston (Hybrid)

Compensation: $250K-$350K Base + Equity + Benefits

Stack: AWS, GCP, Kubernetes, Terraform, CI/CD, cloud infrastructure, infrastructure-as-code, observability, distributed systems, and production AI platforms.


TLDR

  • Our client is a well-funded frontier AI startup operating at the intersection of artificial intelligence and fundamental science, building advanced systems designed to accelerate scientific discovery.
  • Engineers have the opportunity to build foundational infrastructure that supports cutting-edge AI research, large-scale experimentation, production services, and next-generation scientific computing workloads.
  • This is a highly impactful role with ownership of the company’s infrastructure platform end-to-end, spanning cloud architecture, release engineering, observability, reliability, and developer productivity.
  • The infrastructure team serves as a force multiplier for the entire engineering organization, enabling researchers and engineers to move faster while maintaining high operational standards.
  • For infrastructure engineers who enjoy building systems from the ground up, influencing technical direction, and operating in high-talent, high-autonomy environments, this role offers significant ownership and long-term upside.


Requirements

  • 4+ years of experience building and operating cloud infrastructure in production environments with strong engineering standards.
  • Deep expertise with infrastructure-as-code, Kubernetes, CI/CD systems, cloud platforms, and modern DevOps practices.
  • Strong hands-on experience with AWS, GCP, or multi-cloud environments supporting production workloads at scale.
  • Proven operational excellence, including incident response, observability, automation, reliability engineering, and reducing operational toil through tooling.
  • Strong communication and collaboration skills with a track record of enabling engineering teams through self-service infrastructure and developer productivity improvements.


Bonus Skills

  • Experience supporting machine learning infrastructure, model training workloads, GPU scheduling, or AI platforms.
  • Experience with model serving, inference infrastructure, or production ML systems.
  • Experience with observability platforms, monitoring systems, and large-scale operational tooling.
  • Background in scientific computing, HPC, research computing, or other compute-intensive environments.
  • Strong systems design, platform engineering, or technical leadership experience.


Responsibilities

  • Design, build, and operate scalable cloud infrastructure that supports research, production services, and growing engineering workloads.
  • Own CI/CD, release engineering, and deployment systems that enable fast, reliable, and secure software delivery.
  • Establish and evolve infrastructure-as-code practices, automation frameworks, and operational standards across the organization.
  • Build observability, monitoring, alerting, and reliability systems that improve platform stability and developer productivity.
  • Partner closely with engineering and research teams to remove bottlenecks and enable rapid experimentation and execution.
  • Help shape the long-term infrastructure strategy supporting next-generation AI and scientific computing platforms.


About

  • Our client is building advanced AI systems designed to accelerate scientific discovery by combining frontier machine learning techniques with deep scientific expertise.
  • Infrastructure Engineers play a critical role in the company’s success, owning the platform that powers research workflows, production services, deployment pipelines, and future AI infrastructure initiatives.
  • This is a highly autonomous role with significant influence over architecture, tooling, operational practices, and developer experience within a rapidly growing technical organization.
  • The role offers exposure to cloud infrastructure, distributed systems, AI platforms, large-scale compute environments, and the operational challenges associated with frontier AI research and production systems.
Apply for this job