Gathering your results ...
7 days
Not Specified
Not Specified
$38.28/hr - $64.46/hr (Estimated)
<p>Role Overview</p> <p>As an Applied Research intern at Labelbox, you will design, build, and productionize evaluation and post-training systems for frontier LLMs and multimodal models. You'll own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool-use, long-context, vision-language, et al.), create and curate post-training datasets (human + synthetic), and prototype RLHF/RLAIF/RLVR/RM/DPO-style training loops to measure and improve real-world task and agent performance.</p> <p>Your Impact</p> <ul> <li>Build and own evaluation and benchmark suites for reasoning, code, agents, long-context, and V/LLMs. </li><li>Create post-training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals. </li><li>Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO-style training loops to improve real-world task and agent performance. </li><li>Land research in product: ship improvements into Labelbox workflows, services, and customer-facing evaluation/quality features; quantify impact with customer and internal metrics. </li><li>Engage with customer research teams: run pilots, co-design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers. </li></ul> <p>What You Bring</p> <ul> <li>A strong foundation in AI and machine learning, backed by a Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions). </li><li>A deep understanding of frontier autoregressive and diffusion multimodal models, along with the human and synthetic data strategies needed to optimize them. </li><li>Passion and experience for LLM evaluation and benchmarking. </li><li>Expertise in training data quality construction, measurement and refinement. </li><li>The ability to bridge research and application by interpreting new findings and translating them into functional prototypes. </li><li>A track record of publishing in top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL) and contributing to the broader research community. </li><li>Proficiency in Python and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow. </li><li>Exceptional communication and collaboration skills. </li></ul> <p>Applied Research at Labelbox</p> <p>At Labelbox Applied Research, we're committed to pushing the boundaries of AI and data-centric machine learning, with a particular focus on advancing human-AI interaction techniques. We believe that high-quality human data and sophisticated human feedback integration methods are key to unlocking the next generation of AI capabilities. Our research team works at the intersection of machine learning, human-computer interaction, and AI ethics to develop innovative solutions that can be practically applied in real-world scenarios.</p>
POST A JOB
It's completely FREE to post your jobs on ZiNG! There's no catch, no credit card needed, and no limits to number of job posts.
The first step is to SIGN UP so that you can manage all your job postings under your profile.
If you already have an account, you can LOGIN to post a job or manage your other postings.
Thank you for helping us get Americans back to work!
It's completely FREE to post your jobs on ZiNG! There's no catch, no credit card needed, and no limits to number of job posts.
The first step is to SIGN UP so that you can manage all your job postings under your profile.
If you already have an account, you can LOGIN to post a job or manage your other postings.
Thank you for helping us get Americans back to work!