Share this job
Founding ML Researcher - Document AI / Vision-Language Models (San Francisco)
San Francisco, California, United States
Apply for this job

Founding ML Researcher - Document AI / Vision-Language Models


Location: San Francisco, CA (In-Person)

Compensation: $200K-$300K Base + Equity + Benefits

Stack: Python, machine learning, vision-language models (VLMs), document AI, multimodal systems, model training, evaluation frameworks, inference optimization, and production ML infrastructure.


TLDR

  • Our client is building AI systems that transform unstructured documents into structured, actionable information, tackling one of the largest and most valuable data problems in enterprise software.
  • This is a true founding-team opportunity where one of the earliest ML hires will directly shape the company’s research direction, engineering culture, and long-term technical strategy.
  • Engineers own the entire machine learning lifecycle—from research and experimentation through deployment, evaluation, and production iteration—rather than focusing solely on model development.
  • The company is operating in one of the fastest-moving areas of AI, applying multimodal models and document understanding systems to real-world business problems with immediate customer impact.
  • Candidates will have significant ownership, direct access to the founders, and the opportunity to help define the technical foundation of the company from its earliest stages.


Requirements

  • Experience building and deploying production machine learning systems, with ownership across research, experimentation, evaluation, and deployment.
  • Strong background in machine learning, deep learning, computer vision, multimodal AI, document understanding, or closely related fields.
  • Experience training, fine-tuning, evaluating, or deploying state-of-the-art models that operate on unstructured data.
  • Strong Python skills and experience building scalable ML pipelines, experimentation frameworks, and production systems.
  • Ability to operate independently in highly autonomous startup environments while working closely with founders to shape technical direction.


Bonus Skills

  • Experience with vision-language models, document AI, OCR, information extraction, or multimodal reasoning systems.
  • PhD or equivalent research experience in machine learning, computer vision, multimodal AI, or a related discipline.
  • Publications, open-source contributions, or demonstrated technical leadership within advanced ML domains.
  • Experience with model serving, inference optimization, distributed training, or large-scale ML infrastructure.
  • Experience building AI systems that bridge research innovation and production deployment.


Responsibilities

  • Own the end-to-end machine learning lifecycle, from research and experimentation through deployment, evaluation, and continuous improvement.
  • Develop and improve models that extract, understand, and reason over complex unstructured information.
  • Design evaluation frameworks and experimentation systems that drive measurable improvements in model quality and performance.
  • Partner closely with engineering leadership to translate research breakthroughs into scalable production systems.
  • Influence the company’s long-term ML strategy, technical roadmap, and research direction.
  • Help establish engineering and research best practices that will scale with the organization as it grows.


About

  • Our client is building advanced AI systems focused on understanding and extracting value from complex unstructured information, helping organizations unlock insights that were previously difficult to access or automate.
  • Founding ML Researchers sit at the center of the company’s technical strategy, combining cutting-edge machine learning research with real-world product development and customer impact.
  • This is a highly autonomous role with significant ownership across research, engineering, and product direction, offering the opportunity to shape both the technology and the future team.
  • The role provides deep exposure to multimodal AI, vision-language models, document understanding, production ML systems, and the challenges of bringing frontier AI capabilities into real-world applications.
Apply for this job