Semantic Hyperlapse: a Sparse Coding-based and Multi-Importance Approach for First-Person Videos
Resumo
The availability of low-cost and high-quality wearable cameras combined with the unlimited storage capacity of video-sharing websites have evoked a growing interest in First-Person Videos. Such videos are usually composed of long-running unedited streams captured by a device attached to the user body, which makes them tedious and visually unpleasant to watch. Consequently, it raises the need to provide quick access to the information therein. We propose a Sparse Coding based methodology to fast-forward First-Person Videos adaptively. Experimental evaluations show that the shorter version video resulting from the proposed method is more stable and retain more semantic information than the state-of-the-art. Visual results and graphical explanation of the methodology can be visualized through the link: https://youtu.be/rTEZurH64ME
Referências
Joshi, N., Kienzle, W., Toelle, M., Uyttendaele, M., and Cohen, M. F. (2015). Real-time hyperlapse creation via optimal frame selection. ACM Trans. Graph., 34(4):63:1-63:9.
Kopf, J., Cohen, M. F., and Szeliski, R. (2014). First-person hyper-lapse videos. ACM Trans. Graph., 33(4):78:1-78:10.
Lai, W. S., Huang, Y., Joshi, N., Buehler, C., Yang, M. H., and Kang, S. B. (2018). Semantic-driven generation of hyperlapse from 360◦ video. IEEE Trans. Visualization and Computer Graphics, 24(9):2610-2621.
Lan, S., Panda, R., Zhu, Q., and Roy-Chowdhury, A. K. (2018). FFNet: Video fast-forwarding via reinforcement learning. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 6771-6780, Salt Lake City, USA.
Poleg, Y., Halperin, T., Arora, C., and Peleg, S. (2015). Egosampling: Fast-forward and stereo for egocentric videos. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 4768-4776, Boston, USA.
Ramos, W. L. S., Silva, M. M., Campos, M. F. M., and Nascimento, E. R. (2016). Fast-forward video based on semantic extraction. In Proc. IEEE Int. Conf. Image Process. (ICIP), pages 3334-3338, Phoenix, USA.
Silva, M. M., Ramos, W. L. S., Ferreira, J. P. K., Campos, M. F. M., and Nascimento, E. R. (2016). Towards semantic fast-forward and stabilized egocentric videos. In Proc. Europ. Conf. Comput. Vis. Workshops (ECCVW), pages 557-571, Amsterdam, NLD.
Traffic-Inquiries (2018). Cisco visual networking index: Forecast and methodology, 2017-2022. Technical Report 1543280537836565, CISCO.
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 3360-3367, San Francisco, USA.