Our eyesight is probably the most important of our senses and we rely on it for almost all of our everyday activities. If that sense is damaged or fails entirely, those tasks become significantly more difficult. Grocery shopping, for example, is almost impossible without help. Special devices intended to assist in this task usually rely on barcodes to identify products by their packaging. Unfortunately, the products have to be taken out of the shelves to find the barcode. With recent advances in machine learning technology, it has become feasible to identify products by a simple image of any of its sides. In order to aid the visually impaired and elderly persons, this work proposes and evaluates image processing and machine learning based solutions capable of being hosted on small handheld devices such as smartphones.
After detailing various preprocessing steps including image filtering, thresholding, and edge detection, a number of methods suited to object recognition are investigated. While some approaches such as Haar cascades and pure optical character recognition (OCR) with libraries such as pytesseract did not yield satisfactory results, a solution using the Google Vision API’s text recognition features and the state-of-the-art Inception-v3 convolutional neural network delivers significantly improved test scores in realistic testing scenarios. The Levenshtein distance is used to associate the recognized text to a specific product. Unlike commercial solutions, the method presented in this work is capable of running on a low-budget smartphone. This work shows how optical character recognition successfully improves the neural network’s product recognition capabilities and demonstrates that multi-faceted approaches like the one described here offer high potential for future research and practical application.