Cohere
Sep 2025 - Jan 2026
ML Data Annotation (Contract)
Evaluation and QA work supporting production LLM training and benchmarking.
This contract focused on dataset quality and evaluation inside production language model workflows. The work required consistency, careful judgment, and a clear understanding of how annotation quality affects downstream model performance.
Notes
Tools
NLP datasets, evaluation workflows, benchmarking processes, and QA review frameworks.
Built
The work centered on annotation and evaluation for production-facing language model datasets. My job was to apply careful judgment consistently so the resulting data was actually useful for training, benchmarking, and review.
Learned
It gave me a better feel for how much model quality depends on the discipline of the evaluation pipeline. Good data work is not flashy, but it has a direct effect on how credible the final system is.