Work

LLM QA

Annotated and evaluated NLP datasets used for LLM training, benchmarking, and quality assurance.

Cohere

Sep 2025 - Jan 2026

ML Data Annotation (Contract)

Evaluation and QA work supporting production LLM training and benchmarking.

This contract focused on dataset quality and evaluation inside production language model workflows. The work required consistency, careful judgment, and a clear understanding of how annotation quality affects downstream model performance.

Dataset annotationBenchmarkingQA

Notes

Tools

NLP datasets, evaluation workflows, benchmarking processes, and QA review frameworks.

Built

The work centered on annotation and evaluation for production-facing language model datasets. My job was to apply careful judgment consistently so the resulting data was actually useful for training, benchmarking, and review.

Learned

It gave me a better feel for how much model quality depends on the discipline of the evaluation pipeline. Good data work is not flashy, but it has a direct effect on how credible the final system is.

Back to work Discuss a similar project