YOLO custom object detection

Transfer-learning YOLO on small, custom datasets for objects not in COCO — working the dataset quality and overfitting problems more than the model architecture.

status: exploratory
started: 2026-02
updated: 2026-04
tags: AI/ML

What

Custom object detection using the YOLO family, trained on hand-curated datasets for objects that aren’t covered by COCO. The point of the project is less “use YOLO” and more “learn what actually determines mAP in practice on a small, specialized set.”

Why

Real-time perception is the part of computer vision that maps most cleanly onto backend intuitions — throughput, latency, failure modes under load. Training on a custom dataset surfaces the parts of ML that a pretrained-model tutorial never makes you confront: class imbalance, label noise, augmentation choices that help your val set but hurt generalization.

How

Transfer learning from a YOLO checkpoint rather than training from scratch. For a small dataset this isn’t optional.
Annotation: hand-labeled. A small number of bad labels costs more than a large number of borderline-missing ones, so the pass I spend most time on is label audit, not adding more data.
Augmentation: conservative. Heavy augmentation on a small set is a fast way to inflate val mAP while the model silently learns the augmentation, not the object.

What’s hard

Two things, both unglamorous:

Overfitting on small specialized sets. Val mAP climbs, test mAP on new scenes doesn’t follow. The fix is almost always “more diverse data,” not “more epochs” or “different architecture.” I keep relearning this.
Label quality. A label audit usually catches more mAP points than any hyperparameter sweep. I’ve started treating dataset work as the primary work and training as the cheap part.

Evaluation

Current focus is mAP@50 and mAP@50:95 on a held-out test set from a different session than training data (important — same-session splits leak). Numbers are moving and I don’t want to post a point estimate that will be wrong by next week. I’ll publish a full writeup when the dataset is stable enough that the numbers are meaningful rather than just indicative.

Limitations

Dataset size is the binding constraint — every other improvement is downstream of it.
Haven’t yet done a proper ablation (frozen backbone vs. full fine-tune, different input sizes, different augmentation stacks). That’s the next systematic pass.
Inference-side work (export to ONNX / TensorRT, batch throughput, on-device) is downstream of getting the model good enough to be worth deploying.