FLODNet - Detecção e reconhecimento de objetos em dispositivos de baixa especificação: um estudo de caso em classificação de alimentos
Resumo
A capacidade intrínseca dos humanos de detectar, diferenciar e classificar rapidamente os objetos nos permite tomar decisões rápidas em relação ao que é visto. Aplicações podem se beneficiar de detecção rápida e leve de objetos para imagens ou vídeos. Embora, nos últimos 5 anos, o setor de tecnologia tenha apresentado dispositivos com recursos de processamento e armazenamento impressionantes, os métodos de detecção e reconhecimento de objetos geralmente requerem alto poder de processamento e/ou grande disponibilidade de armazenamento, tornando difícil para os dispositivos com recursos restritos realizar a detecção e reconhecimento em tempo real sem uma conexão com um servidor. O modelo apresentado neste documento requer apenas 95 megabytes de armazenamento e a execução requer 113 ms em média por imagem em CPU de um laptop, tornando-o adequado para dispositivos que podem ser usados em qualquer lugar.
Referências
Albers, S. (2010). Energy-efficient algorithms. Communications of the ACM, 53(5):86–96.
Alexe, B., Deselaers, T., and Ferrari, V. (2010). What is an object? In Computer Vision and Pattern Recognition, pages 73–80.
Alexe, B., Deselaers, T., and Ferrari, V. (2012). Measuring the objectness of image windows. IEEE transactions on pattern analysis and machine intelligence, 34(11):2189–2202.
Baldi, P. and Sadowski, P. J. (2013). Understanding dropout. In Advances in neural information processing systems, pages 2814–2822.
de Freitas Alves, C. C., Monteiro, G. B. M., Rabello, S., Gasparetto, M. E. R. F., and de Carvalho, K. M. (2009). Assistive technology applied to education of students with visual impairment. Revista Panamericana de Salud Pública, 26(2):148–152.
de Oliveira, B. A. G., Ferreira, F. M. F., and d. S. Martins, C. A. P. (2018). Fast and lightweight object detection network: Detection and recognition on resource constrained devices. IEEE Access, 6:8714–8724.
de Sande, K. E. V., Uijlings, J. R., Gevers, T., and Smeulders, A. W. (2011). Segmentation as selective search for object recognition. In IEEE International Conference on Computer Vision (ICCV), pages 1879–1886.
Endres, I. and Hoiem, D. (2010). Category independent object proposals. In European Conference on Computer Vision, pages 575–588. Springer.
Felzenszwalb, P. F., Girshick, R. B., and McAllester, D. (2010). Cascade object detection with deformable part models. In Computer vision and pattern recognition, pages 2241–2248.
Girshick, R. (2015). Fast r-cnn. In IEEE International Conference on Computer Vision, pages 1440–1448.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition, pages 580–587.
Hosang, J., Benenson, R., and Schiele, B. (2014). How good are detection proposals, really? In 25th British Machine Vision Conference, pages 1–12.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456.
Kagaya, H., Aizawa, K., and Ogawa, M. (2014). Food detection and recognition using convolutional neural network. In 22nd ACM international conference on Multimedia, pages 1085–1088.
Kaggle (2004). Dogs vs. cats competition. [link]. (Acessado em 06/02/2017).
Krähenbühl, P. and Koltun, V. (2014). Geodesic object proposals. In European Conference on Computer Vision, pages 725–739.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
Oquab, M., Bottou, L., Laptev, I., and Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In IEEE conference on computer vision and pattern recognition, pages 1717–1724.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99.
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. (2008). Labelme: a database and web-based tool for image annotation. International journal of computer vision, 77(1-3):157–173.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
Zhang, Z., Warrell, J., and Torr, P. H. (2011). Proposal generation for object detection using cascaded ranking svms. In Computer Vision and Pattern Recognition, pages 1497–1504.