Adaptive Face Tracking Based on Online Learning
Resumo
Object tracking can be used to localize objects in scenes, and also can be used for locating changes in the object’s appearance or shape over time. Most of the available object tracking methods tend to perform satisfactorily in controlled environments but tend to fail when the objects appearance or shape changes, or even when the illumination changes (e.g., when tracking non-rigid objects such as a human face). Also, in many available tracking methods, the tracking error tends to increase indefinitely when the target is missed. Therefore, tracking the target objects in long and uninterrupted video sequences tends to be quite challenging for these methods. This work proposes a face tracking algorithm that contains two operating modes. Both the operating modes are based on feature learning techniques that utilize the useful data accumulated during the face tracking and implements an incremental learning framework. To accumulate the training data, the quality of the test sample is checked before its utilization in the incremental and online training scheme. Furthermore, a novel error prediction scheme is proposed that is capable of estimating the tracking error during the execution of the tracking algorithm. Furthermore, an improvement in the Constrained Local Model (CLM), called weighted-CLM (W-CLM) is proposed that utilize the raining data to assign weights to the landmarks based on their consistency. These weights are used in the CLM search process to improve CLM search optimization process. The experimental results show that the proposed tracking method (both variants) perform better than the comparative state of the art methods in terms of Root Mean Squared Error (RMSE) and Center Location Error (CLE). In order to prove the efficiency of the proposed techniques, an application in yawning detection is presented.
Referências
D. A. Ross, J. Lim, R. S. Lin, and M. H. Yang, “Incremental learning for robust visual tracking,” International Journal of Computer Vision, vol. 77, no. 1-3, pp. 125–141, 2008. https://doi.org/10.1007/s11263-007-0075-7
B. Babenko, M.-H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 983–990. https://doi.org/10.1109/CVPR.2009.5206737
T. F. Cootes, G. J. Edwards, C. J. Taylor et al., “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681–685, 2001. https://doi.org/10.1109/34.927467
S. Lucey, Y. Wang, M. Cox, S. Sridharan, and J. F. Cohn, “Efficient constrained local model fitting for non-rigid face alignment,” Image and Vision Computing, vol. 27, no. 12, pp. 1804–1813, 2009. https://dx.doi.org/10.1016%2Fj.imavis.2009.03.002
D. Cristinacce and T. F. Cootes, “Feature detection and tracking with constrained local models,” in BMVC, vol. 1, no. 2, 2006, pp. 929–938. http://dx.doi.org/10.5244/C.20.95
G. R. Bradski, “Real time face and object tracking as a component of a perceptual user interface,” in Applications of Computer Vision, 1998. WACV’98. Proceedings., Fourth IEEE Workshop on. IEEE, 1998, pp. 214–219. https://doi.org/10.1109/ACV.1998.732882
A. Khurshid and J. Scharcanski, “Incremental multi-model dictionary learning for face tracking,” in 2018 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), May 2018, pp. 1–6. https://doi.org/10.1109/I2MTC.2018.8409614
M. Omidyeganeh, S. Shirmohammadi, S. Abtahi, A. Khurshid, M. Farhan, J. Scharcanski, B. Hariri, D. Laroche, and L. Martel, “Yawning detection using embedded smart cameras,” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 3, pp. 570–582, 2016. https://doi.org/10.1109/TIM.2015.2507378
S. Vater, R. Ivancevic, and F. P. Len, “Integration of precise iris localization into active appearance models for automatic initialization and robust deformable face tracking,” in 2017 IEEE International Conference on Image Processing (ICIP), Sept 2017, pp. 2617–2621. https://doi.org/10.1109/ICIP.2017.8296756
J. Soldera, G. Schu, L. R. Schardosim, and E. T. Beltrao, “Facial biometrics and applications,” IEEE Instrumentation Measurement Magazine, vol. 20, no. 2, pp. 4–30, April 2017. https://doi.org/10.1109/MIM.2017.7919105
J. Soldera, K. Dodson, and J. Scharcanski, “Face recognition based on geodesic distance approximations between multivariate normal distributions,” in 2017 IEEE International Conference on Imaging Systems and Techniques (IST), Oct 2017, pp. 1–6. https://doi.org/10.1109/IST.2017.8261567
S. Shirmohammadi and A. Ferrero, “Camera as the instrument: the rising trend of vision based measurement,” IEEE Instrumentation & Measurement Magazine, vol. 17, no. 3, pp. 41–47, 2014. https://doi.org/10.1109/MIM.2014.6825388
A. Khurshid and J. Scharcanski, “Adaptive face tracking based on online learning,” Ph.D. dissertation, Universidade Federal do Rio Grande do Sul, 2018.
H. Liu, S. Li, and L. Fang, “Robust object tracking based on principal component analysis and local sparse representation,” IEEE Transactions on Instrumentation and Measurement, vol. 64, no. 11, pp. 2863–2875, 2015. https://doi.org/10.1109/TIM.2015.2437636
X. Cheng, N. Li, T. Zhou, L. Zhou, and Z. Wu, “Visual tracking via sparse representation and online dictionary learning,” in International Workshop on Activity Monitoring by Multiple Distributed Sensing. Springer, 2014, pp. 87–103. https://doi.org/10.1007/978-3-319-13323-2_8
Y. Xie, W. Zhang, C. Li, S. Lin, Y. Qu, and Y. Zhang, “Discriminative object tracking via sparse representation and online dictionary learning,” IEEE Transactions on Cybernetics, vol. 44, no. 4, pp. 539–553, 2014. https://doi.org/10.1109/TCYB.2013.2259230
M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736–3745, 2006. https://doi.org/10.1109/TIP.2006.881969
D. Terissi and J. C. Gómez, “Facial motion tracking and animation: An ica-based approach,” in Proceedings of 15th European Signal Processing Conference, Poznan, Poland, September, 2007, pp. 3–7.
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–511. https://doi.org/10.1109/CVPR.2001.990517
Y. Yuan, S. Emmanuel, W. Lin, and Y. Fang, “Visual object tracking based on appearance model selection,” in Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on. IEEE, 2013, pp. 1–4. https://doi.org/10.1109/ICMEW.2013.6618245
S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, “Yawdd: a yawning detection dataset,” in Proceedings of the 5th ACM Multimedia Systems Conference. ACM, 2014, pp. 24–28. https://doi.org/10.1145/2557642.2563678
E. Sánchez-Lozano, B. Martinez, G. Tzimiropoulos, and M. Valstar, “Cascaded continuous regression for real-time incremental face tracking,” in European Conference on Computer Vision. Springer, 2016, pp. 645–661. https://doi.org/10.1007/978-3-319-46484-8_39
S. Zheng, P. Sturgess, and P. Torr, “Approximate structured output learning for constrained local models with application to real-time facial feature detection and tracking on low-power devices,” in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013, pp. 1–8. https://doi.org/10.1109/FG.2013.6553701
C.-C. Chiang, W.-K. Tai, M.-T. Yang, Y.-T. Huang, and C.-J. Huang, “A novel method for detecting lips, eyes and faces in real time,” Real-time Imaging, vol. 9, no. 4, pp. 277–287, 2003. https://doi.org/10.1016/j.rti.2003.08.003
C. Bouvier, A. Benoit, A. Caplier, and P.-Y. Coulon, “Open or closed mouth state detection: static supervised classification based on logpolar signature,” in International Conference on Advanced Concepts for Intelligent Vision Systems. Springer, 2008, pp. 1093–1102. https://doi.org/10.1007/978-3-540-88458-3_99