[Remote] Ara Gamma – Data Engineer (LLM Data & Prompt Engineering) - English language
Note: The job is a remote job and is open to candidates in USA. Welocalize is a company that provides high-quality datasets for AI models. They are seeking Data Engineers to support the development and refinement of datasets used for training Large Language Models (LLMs), focusing on prompt generation, data validation, and quality assurance.
Responsibilities
- Query & Prompt Generation: Design complex LLM prompts that accurately represent real customer journeys and service interactions
- Data Shaping & Collaboration: Partner with Field Engineers to transform raw data into structured, high-quality tasks for model training
- Annotation & Evaluation: Annotate and review tasks to ensure strict quality standards and alignment with expected customer outcomes
- Quality Assurance: Validate and assess model responses to ensure accuracy, relevance, and confidence in outputs
Skills
- Language: Native or professional fluency (C1/C2) in English
- LLM & Prompting Knowledge: Understanding of LLM behavior and prompt engineering principles
- Analytical Skills: Strong attention to detail, critical thinking, and comfort working with ambiguous scenarios
- Technical Skills: SQL for data extraction
- Technical Skills: Python (Pandas, NumPy) for data manipulation
- Technical Skills: Experience with annotation tools (e.g., Labelbox, Prodigy, or similar platforms)
- Technical Skills: Advanced proficiency in Google Sheets/Drive
- Technical Skills: Familiarity with version control tools (GitHub)
- AI/ML Tools: Experience working with playground environments and prompt debugging
- Communication: Excellent technical writing skills and ability to clearly explain data requirements
- Prior experience in data labeling, technical support analysis, or AI model evaluation
- Background or exposure to AI-related projects
Company Overview
Company H1B Sponsorship