Semantic Hyperlapse: a Sparse Coding-based and Multi-Importance Approach for First-Person Videos

Michel M. Silva; Mario F. M. Campos; Erickson R. Nascimento

doi:10.5753/ctd.2020.11364

Michel M. Silva UFMG
Mario F. M. Campos UFMG
Erickson R. Nascimento UFMG

DOI: https://doi.org/10.5753/ctd.2020.11364

Resumo

The availability of low-cost and high-quality wearable cameras combined with the unlimited storage capacity of video-sharing websites have evoked a growing interest in First-Person Videos. Such videos are usually composed of long-running unedited streams captured by a device attached to the user body, which makes them tedious and visually unpleasant to watch. Consequently, it raises the need to provide quick access to the information therein. We propose a Sparse Coding based methodology to fast-forward First-Person Videos adaptively. Experimental evaluations show that the shorter version video resulting from the proposed method is more stable and retain more semantic information than the state-of-the-art. Visual results and graphical explanation of the methodology can be visualized through the link: https://youtu.be/rTEZurH64ME

Palavras-chave: First-Person Videos, video fast-forward, semantic information, sparse coding

Referências

del Molino, A. G., Tan, C., Lim, J. H., and Tan, A. H. (2017). Summarization of Egocentric Videos: A Comprehensive Survey. IEEE Trans. Human-Machine Syst., 47(1):65-76.

Joshi, N., Kienzle, W., Toelle, M., Uyttendaele, M., and Cohen, M. F. (2015). Real-time hyperlapse creation via optimal frame selection. ACM Trans. Graph., 34(4):63:1-63:9.

Kopf, J., Cohen, M. F., and Szeliski, R. (2014). First-person hyper-lapse videos. ACM Trans. Graph., 33(4):78:1-78:10.

Lai, W. S., Huang, Y., Joshi, N., Buehler, C., Yang, M. H., and Kang, S. B. (2018). Semantic-driven generation of hyperlapse from 360◦ video. IEEE Trans. Visualization and Computer Graphics, 24(9):2610-2621.

Lan, S., Panda, R., Zhu, Q., and Roy-Chowdhury, A. K. (2018). FFNet: Video fast-forwarding via reinforcement learning. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 6771-6780, Salt Lake City, USA.

Poleg, Y., Halperin, T., Arora, C., and Peleg, S. (2015). Egosampling: Fast-forward and stereo for egocentric videos. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 4768-4776, Boston, USA.

Ramos, W. L. S., Silva, M. M., Campos, M. F. M., and Nascimento, E. R. (2016). Fast-forward video based on semantic extraction. In Proc. IEEE Int. Conf. Image Process. (ICIP), pages 3334-3338, Phoenix, USA.

Silva, M. M., Ramos, W. L. S., Ferreira, J. P. K., Campos, M. F. M., and Nascimento, E. R. (2016). Towards semantic fast-forward and stabilized egocentric videos. In Proc. Europ. Conf. Comput. Vis. Workshops (ECCVW), pages 557-571, Amsterdam, NLD.

Traffic-Inquiries (2018). Cisco visual networking index: Forecast and methodology, 2017-2022. Technical Report 1543280537836565, CISCO.

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 3360-3367, San Francisco, USA.