Member of Technical Staff - Post-Training
Microsoft is dedicated to advancing post-training methods for AI models, and they are seeking a highly skilled AI Data & Training Technical Staff to join their team. This role involves creating world-class datasets, training models, developing scalable data pipelines, and driving impactful experiments in the field of AI.
Responsibilities
- Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models; run ablation studies to measure impact and optimize data effectiveness
- Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
- Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
- Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance; identify gaps and propose improvements
- Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
- Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
- Embody our Culture and Values
Skills
- Bachelor's Degree (complete or in progress) in relevant field AND 3+ months related research internship experience OR Master's Degree in relevant field OR equivalent experience
- Software engineering skills with fluency in Python and modern data libraries
- The ability to meet Microsoft, customer and/or government security screening requirements are required for this role
- Master's Degree in relevant field AND 1+ year(s) related research experience OR equivalent experience
- Coding expertise in Python and data libraries (Pandas, NumPy, etc.)
- Proficiency with distributed data frameworks (Spark, Ray, Apache Beam) and cloud ecosystems (Azure, data lakes)
- Hands-on experience with large-scale, unstructured or semi-structured datasets: images, video, audio, and code
- Proven experience training AI models at significant scale
- Demonstrated ability to collaborate within interdisciplinary teams and communicate complex, multimodal research concepts effectively
Benefits
- Certain roles may be eligible for benefits and other compensation.
Company Overview
Company H1B Sponsorship