Fast-Forward Methods for Egocentric Videos: A Review
Resumo
The emergence of low-cost, high-quality personal wearable cameras combined with the almost unlimited storage capacity of video-sharing websites have evoked a growing interest in First-Person Videos. A first-person video is usually composed of long-running unedited streams captured by a device attached to the user body, which makes it tedious and visually unpleasant to watch. Consequently, there is a rise in the need to provide quick access to the information therein. Video summarization techniques create a summary of the video in the format of a sequence of keyframes or video skimmings with the most representative parts of the narrative. The main drawback of applying Video Summarization technique is the loss of temporal continuity, which breaks the context of the narrative represented in the video. New Hyperlapse methods have proposed different adaptive frame sampling to accelerate First-Person Videos concerning both the visual instability of the final output video and temporal continuity. Nevertheless, Hyperlapse techniques neglect the semantic load of the video, treating the whole video as equally relevant. Since the scope is First-Person Videos, some parts of them are undoubtedly more important than others, so they should have their proper attention. Semantic Hyperlapse techniques address this lack of care with the semantic load of the videos by creating emphasis on the relevant information in the output video. Such techniques aim to create visually pleasant shorter videos and also emphasize the semantic portions. In this tutorial, we will cover the context behind the Fast-Forward methods, starting with Video Summarization, passing through Fast-Forward and Hyperlapse, until reaching the Semantic Hyperlapse methods. During the introduction of the concepts, we will interact with the audience performing a hands-on practice using the acceleration methods. Finally, we will present some datasets proposed in the literature and discuss future directions of the area.
Referências
J. Kopf, M. F. Cohen, R. Szeliski, "First-person hyper-lapse videos", ACM Trans. Graph., vol. no. 4, pp. 78:1-78:Jul. 2014.
Y. Poleg, T. Halperin, C. Arora, S. Peleg, "Egosampling: Fast-forward and stereo for egocentric videos", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4768-47June 2015.
W. Ramos, M. Silva, M. Campos, E. Nascimento, "Fast-forward video based on semantic extraction", IEEE Inter. Conf. on Image Processing (ICIP), pp. 3334-33Sept 2016.
Z. Elkhattabi, Y. Tabii, A. Benkaddour, Video summarization: Techniques and applications, Mar. 20[online] Available: https://doi.org/10.5281/zenodo.1100164.
M. Okamoto, K. Yanai, Summarization of Egocentric Moving Videos for Generating Walking Route Guidance, Berlin, Heidelberg:Springer Berlin Heidelberg, pp. 431-42014.
A. Karpenko, The technology behind hyperlapse from instagram, Aug. 20[online] Available: http://instagram-engineering.tumblr.com/post/95922900787/hyperlapse.
N. Joshi, W. Kienzle, M. Toelle, M. Uyttendaele, M. F. Cohen, "Real-time hyperlapse creation via optimal frame selection", ACM Trans. Graph., vol. no. 4, pp. 63:1-63:9, Jul. 2015.
M. Silva, W. Ramos, J. Ferreira, M. Campos, E. Nascimento, "Towards semantic fast-forward and stabilized egocentric videos", European Conf. on Computer Vision Workshop (ECCVW), pp. 557-5October 2016.
T. Yao, T. Mei, Y. Rui, "Highlight detection with pairwise deep ranking for first-person video summarization", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 982-990, June 2016.
T. Halperin, Y. Poleg, C. Arora, S. Peleg, "Egosampling: Wide view hyperlapse from egocentric videos", IEEE Trans. on Circuits and Systems for Video Technology, no. 99, pp. 1-1, 2017.
W. S. Lai, Y. Huang, N. Joshi, C. Buehler, M. H. Yang, S. B. Kang, "Semantic-driven generation of hyperlapse from 3video", IEEE Trans. on Visualization and Computer Graphics, no. 99, 2017.
M. Silva, W. Ramos, F. Chamone, J. Ferreira, M. Campos, E. Nascimento, "Making a long story short: A multi-importance fast-forwarding egocentric videos with the emphasis on relevant objects", Journal of Visual Communication and Image Representation, vol. pp. 55-2018.
M. Silva, W. Ramos, J. Ferreira, F. Chamone, M. Campos, E. Nascimento, "A weighted sparse sampling and smoothing frame transition approach for semantic fast-forward first-person videos", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2383-2392, Jun. 2018.
M. Wang, J. Liang, S. Zhang, S. Lu, A. Shamir, S. Hu, "Hyper-lapse from multiple spatially-overlapping videos", IEEE Trans. on Image Processing, vol. no. 4, pp. 1735-17April 2018.
S. Lan, R. Panda, Q. Zhu, A. K. Roy-Chowdhury, "Ffnet: Video fast-forwarding via reinforcement learning", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 6771-67June 2018.
K. Higuchi, R. Yonetani, Y. Sato, "Egoscanning: Quickly scanning first-person videos with egocentric elastic timelines", The Conf. on Human Factors in Computing Systems (CHI) ser. CHI 'pp. 6536-652017.
C. Bai, A. R. Reibman, "Characterizing distortions in first-person videos", IEEE Inter. Conf. on Image Processing (ICIP), pp. 2440-24Sept 2016.
A. Betancourt, P. Morerio, C. S. Regazzoni, M. Rauterberg, "The evolution of first person vision methods: A survey", IEEE Transactions on Circuits and Systems for Video Technology, vol. no. 5, pp. 744-7May 2015.
M. Bolaos, M. Dimiccoli, P. Radeva, "Toward storytelling from visual lifelogging: An overview", IEEE Transactions on Human-Machine Systems, vol. no. 1, pp. 77-90, Feb 2017.
Y. J. Lee, J. Ghosh, K. Grauman, "Discovering important people and objects for egocentric video summarization", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1346-13June 2012.
S. Zhang, A. K. Roy-Chowdhury, "Video summarization through change detection in a non-overlapping camera network", Image Processing (ICIP) 2015 IEEE International Conference on, pp. 3832-38Sept 2015.
S. Mei, Z. Wang, M. He, D. Feng, "Resource restricted on-line video summarization with minimum sparse reconstruction", Picture Coding Symposium (PCS), pp. 139-1May 2015.
Y. L. Lin, V. I. Morariu, W. Hsu, "Summarizing while recording: Context-based highlight detection for egocentric videos", IEEE Inter. Conf. on Computer Vision Workshop (ICCVW), pp. 443-4Dec 2015.
G. Kim, L. Sigal, E. P. Xing, "Joint summarization of large-scale collections of web images and videos for storyline reconstruction", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4225-422014.
S. E. F. de Avila, A. d. Luz, A. de A. Arajo, M. Cord, "Vsumm: An approach for automatic video summarization and quantitative eval-uation", 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing, pp. 103-1Oct 2008.
Y. J. Lee, K. Grauman, "Predicting important objects for egocentric video summarization", International Journal of Computer Vision, vol. 1no. 1, pp. 38-Aug 2015.
V. Bettadapura, D. Castro, I. Essa, "Discovering picturesque highlights from egocentric vacation videos", IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 1-9, March 2016.
A. G. del Molino, C. Tan, J. H. Lim, A. H. Tan, "Summarization of Egocentric Videos: A Comprehensive Survey", IEEE Trans. on Human-Machine Systems, vol. no. 1, pp. 65-Feb 2017.
Z. Lu, K. Grauman, "Story-driven summarization for egocentric video", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2714-27June 2013.
M. Gygli, H. Grabner, L. V. Gool, "Video summarization by learning submodular mixtures of objectives", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3090-3098, June 2015.
P. Varini, G. Serra, R. Cucchiara, "Personalized egocentric video summarization of cultural tour on user preferences input", IEEE Trans. on Multimedia, no. 99, pp. 1-1, 2017.
A. Sharghi, J. S. Laurel, B. Gong, "Query-focused video summarization: Dataset evaluation and a memory network based approach", The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2127-21July 2017.
J. Choi, T. Oh, I. S. Kweon, "Contextually customized video summaries via natural language", 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1718-17March 2018.
Y. Cong, J. Yuan, J. Luo, "Towards scalable summarization of consumer videos via sparse dictionary selection", IEEE Trans. on Multimedia, vol. no. 1, pp. 66-Feb 2012.
K. Zhang, W.-L. Chao, F. Sha, K. Grauman, "Video Summarization with Long Short-Term Memory", European Conf. on Computer Vision (ECCV), pp. 766-7October 2016.
M. Ogawa, T. Yamasaki, K. Aizawa, "Hyperlapse generation of omnidirectional videos by adaptive sampling based on 3d camera positions", IEEE Inter. Conf. on Image Processing (ICIP), pp. 2124-21Sep. 2017.
P. Rani, A. Jangid, V. P. Namboodiri, K. S. Venkatesh, R. Rameshan, C. Arora, S. Dutta Roy, "Visual odometry based omni-directional hyperlapse", National Conf. on Computer Vision Pattern Recognition Image Processing and Graphics, pp. 3-2018.
N. Otsu, "A threshold selection method from gray-level histograms", IEEE Trans. on Systems Man and Cybernetics, vol. 9, no. 1, pp. 62-Jan 1979.
W. L. S. Ramos, Semantic Hyperlapse for Egocentric Videos, 2017.
A. Fathi, Y. Li, J. M. Rehg, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, C. Schmid, "Learning to Recognize Daily Actions Using Gaze", European Conf. on Computer Vision (ECCV), pp. 314-32012.
Y. Poleg, C. Arora, S. Peleg, "Temporal segmentation of egocentric videos", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2537-25June 2014.
K.-T. Ma, R. Lim, P. Dai, L. Li, J.-H. Lim, "Unconstrained egocentric videos with eye-tracking data", IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015.
D. Damen, H. Doughty, G. M. Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Perrett, W. Price, M. Wray, "Scaling egocentric vision: The epic-kitchens dataset", European Conference on Computer Vision (ECCV), 2018.
S. Yeung, A. Fathi, L. Fei-Fei, Videoset: Video summary evaluation through text, 2014.
B.-C. Chen, Y.-Y. Chen, F. Chen, "Video to text summary: Joint video summarization and captioning with recurrent neural networks", BMVC, 2017.
Y. Li, Ye Zhefan, J. M. Rehg, "Delving into egocentric actions", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 287-295, June 2015.
M. Ma, H. Fan, K. M. Kitani, "Going deeper into first-person activity recognition", IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1894-1903, June 2016.
Y. Huang, Z. Li, M. Cai, Y. Sato, Mutual context network for jointly estimating egocentric gaze and actions, 2019.
Y. Li, A. Fathi, J. M. Rehg, "Learning to predict gaze in egocentric video", IEEE Inter. Conf. on Computer Vision (ICCV), pp. 3216-32Dec 2013.
Y. Huang, M. Cai, Z. Li, Y. Sato, "Predicting gaze in egocentric video by learning task-dependent attention transition", European Conf. on Computer Vision (ECCV), pp. 789-804, 2018.