Product-Package OCR Pipeline — Sujith K. Surendran

Off-the-shelf OCR is trained on clean document imagery. Product packaging is the opposite: high visual noise, overlapping type on complex backgrounds, irregular font choices, and photography that varies by supplier. Rather than attempting to clean inputs to fit a general model, the pipeline was designed around the specific characteristics of packaging imagery — pixel-intensity variation as a segmentation signal, classification tuned to the attribute vocabulary the business cared about.

Modifying FastText rather than choosing a heavier model was a deliberate engineering decision: the throughput requirements favored a lean, fast algorithm that could be tuned precisely over a more capable but opaque neural network that would be harder to maintain and explain to downstream consumers.

This is the clearest from-scratch ML engineering example in the portfolio — algorithm modification, custom pipeline, and domain-specific tuning rather than API integration.