Senior AI Infrastructure Engineer

Share this job

New York City, New York, United States

About the Company

We're one of the leading AI creation platforms, with roughly 1.4M creators visiting each month to generate and share images, video, and audio using the best AI models on the market. A single subscription gives users unlimited access to top models from across the ecosystem — OpenAI, leading open-source labs, and others — without them having to think about where those models come from. We're best known for making it effortless to create stories and characters.

Our long-term vision is to become the home for all AI entertainment — think of us as the YouTube of AI creation, sharing, and generation. We've grown ~30% month-over-month for the last seven months and are confident we've hit product-market fit.

We're profitable and capital-efficient: we raised a seed round at a $20M valuation and haven't needed to raise since. We're a small, lean, in-person team based at Hudson Yards in NYC, where engineers get significant autonomy — many of our most successful features started as an engineer's own idea, pitched and shipped by them.

About the Role

This is the final infrastructure role our CTO still owns directly, and it's the company's top hiring priority. You'll own the heart of our backend — the generation services for image, video, and audio — along with the integrations that deliver new models to users. AI moves fast: new models and provider APIs launch daily, and we need someone who can get the newest releases into production within 24–48 hours, because in AI a single day counts. High ownership, rapid shipping, real users.

Key Responsibilities

Own and run the core backend generation services (image, video, audio).
Continuously integrate new models — both open-source (Stable Diffusion, SDXL, diffusers) and closed/third-party provider APIs (e.g. Seedance 2, Krea 2) — shipping them to users within 24–48h of release.
Design clean, consistent APIs that sit on top of a constantly shifting set of models.
Deploy and operate GPU inference, managing cold starts, concurrency, autoscaling, latency, and cost.
Build production orchestration spanning inference, post-processing, and media pipelines, including retries, moderation, and error handling.
Run everything on Google Cloud.

Requirements

Excellent Python and strong API design (e.g. FastAPI) for production backends.
Deep command of backend fundamentals — a real understanding of how the internet and scalable backends actually work.
A live, shipped project that demonstrates your ability ("show, don't tell") — you've owned a feature from idea through to production serving real users (anywhere from hundreds to millions). Solo or team-based is fine, as long as you can speak to it in depth.
Cloud and containerized deployment experience — Google Cloud (Cloud Run, GCS) and Docker.

Bonus Skills

GPU deployment and inference at production scale (cold starts, concurrency, autoscaling, latency, cost) — Modal or equivalent.
Open-source model hacking — hands-on experience with, or contributions to, vLLM, diffusers, or Hugging Face projects (a major plus given our open-source roots).
Familiarity with the generative-AI ecosystem: image/video/audio models and inference providers.
PyTorch, Redis.

What We're Looking For (Green Flags)

Entrepreneurial and high-agency; you take end-to-end ownership of products, from idea to ship.
"Show, don't tell" — a live, shipped project matters more to us than anything else on your resume.
Open-source contributions (vLLM, diffusers, Hugging Face).
Consumer product and/or startup experience.
A genuine love for AI and belief in where the technology is headed; you want to ship daily and have real impact.
A fast learner who moves laterally across new tools and models with ease.

Logistics

Location: NYC, in-person at our Hudson Yards office. The ideal is 5 days/week with in-person time every week; flexibility is possible for the right person (e.g. family obligations) and handled case-by-case. Non-complex relocation is fine (e.g. Boston or New Jersey → NYC)

Compensation: ~$200K base, flexing ± with seniority and the individual, plus meaningful equity. Equity is weighted heavily — seed valued at $20M, profitable since, and the founders want everyone to have skin in the game.

Openings: 1

Benefits/Other: No visa sponsorship — strong preference for US citizens or candidates already authorized to work in the US.

Interview Process

Intro phone call with the founder/CTO — the platform, our vision, your background, what excites you, and your skill set; a mutual-fit check.
Technical interview — remote, screen-share: a technical overview plus some live coding together.
Call with the co-founder — business fundamentals and company vision, with deeper questions.
In-person final at the Hudson Yards office — meet the team and check culture fit (30–60 min). Depending on how the earlier technical round went, this may include a second technical portion (up to ~120 min).

Apply for this job