skip to main content
10.1145/3243082.3243109acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

Temporal Video Scene Segmentation By Fused Bags-of-Features

Published:16 October 2018Publication History

ABSTRACT

Temporal segmentation of video into semantically coherent scenes is a fundamental step to enhance video operations like browsing, retrieval and recommendation. Available automatic scene segmentation methods in the literature are still far, in terms of efficacy, from reasonable practical application requirements. Towards to lowering this gap, this paper presents a new multimodal early fusion based scene segmentation method, which extends the classical and powerful singlemodal bags-of-features latent semantics discriminative capability to a multimodal paradigm. This approach was designed to refine the latent semantics from singlemodal data by identifying and representing audiovisual patterns while still preserving singlemodal visual/aural words patterns. Experiments have been performed over a publicly available dataset where the proposed method achieved higher average values for the FCO metric than previous state-of-the-art approaches.

References

  1. Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal Fusion for Multimedia Analysis: A Survey. Multimedia Syst. 16, 6 (Nov. 2010), 345--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. A Deep Siamese Network for Scene Detection in Broadcast Videos. In Proceedings of the 23rd ACM International Conference on Multimedia (MM '15). ACM, New York, NY, USA, 1199--1202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. Measuring Scene Detection Performance. In Pattern Recognition and Image Analysis, Roberto Paredes, Jaime S. Cardoso, and Xosé M. Pardo (Eds.). Springer International Publishing, Cham, 395--403.Google ScholarGoogle Scholar
  4. BBC. 2006. Planet Earth. http://www.bbc.co.uk/programmes/b006mywy. {Online; accessed 25-may-2018}.Google ScholarGoogle Scholar
  5. Gertjan J. Burghouts and Jan-Mark Geusebroek. 2009. Performance Evaluation of Local Colour Invariants. Comput. Vis. Image Underst. 113, 1 (Jan. 2009), 48--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. G. Cula and K. J. Dana. 2001. Compact representation of bidirectional texture functions. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, Kauai, HI, USA, USA, I--1041--I--1047 vol.1.Google ScholarGoogle Scholar
  7. S. Davis and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4 (August 1980), 357--366.Google ScholarGoogle ScholarCross RefCross Ref
  8. Manfred Del Fabro and Laszlo Böszörmenyi. 2013. State-of-the-art and future challenges in video scene detection: a survey. Multimedia Systems 19, 5 (2013), 427--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Gao and H. Ma. 2012. Multi-modality movie scene detection using Kernel Canonical Correlation Analysis. In Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, Tsukuba, Japan, 3074--3077.Google ScholarGoogle Scholar
  10. Bo Han and Weiguo Wu. 2011. Video scene segmentation using a novel boundary evaluation criterion and dynamic programming. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, Barcelona, Spain, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xian-Sheng Hua, Dong Zhang, Mingjing Li, and Hong-Jiang Zhang. 2002. Performance Evaluation Protocol for Video Scene Detection Algorithms. In Workshop on Multimedia Information Retrieval, in conjunction with 10th ACM Multimedia. ACM, Juan-les-Pins, France.Google ScholarGoogle Scholar
  12. I-Hong Jhuo, Guangnan Ye, Shenghua Gao, Dong Liu, Yu-Gang Jiang, D. T. Lee, and Shih-Fu Chang. 2014. Discovering joint audio--visual codewords for video event detection. Machine Vision and Applications 25, 1 (2014), 33--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jonathan Hare, Sina Samangooei, and David Dupplaw. 2011. OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia Analysis and Indexing of Images. In ACM Multimedia 2011. ACM, Scottsdale, Arizona, USA, 691--694. http://eprints.soton.ac.uk/273040/ Event Dates: 28/11/2011 until 1/12/2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rodrigo Mitsuo Kishi, Tiago Henrique Trojahn, and Rudinei Goularte. 2016. An Evaluation of Readily Usable Automatic Video Shot Segmentation Techniques. In Proceedings of the 22Nd Brazilian Symposium on Multimedia and the Web (Webmedia '16). ACM, New York, NY, USA, 199--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Irena Koprinska and Sergio Carrato. 2001. Temporal video segmentation: A survey. Signal Processing: Image Communication 16 (2001), 477--500.Google ScholarGoogle ScholarCross RefCross Ref
  16. Stuart P. Lloyd. 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory 28 (1982), 129--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bruno Lorenço Lopes, Tiago Henrique Trojahn, and Rudinei Goularte. 2014. Video Scene Detection by Multimodal Bag of Features. Journal of Information and Data Management 5, 2 (2014), 194.Google ScholarGoogle Scholar
  18. Daniel Moreira, Sandra Avila, Mauricio Perez, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, and Anderson Rocha. 2019. Multimodal data fusion for sensitive scene localization. Information Fusion 45 (2019), 307 -- 323.Google ScholarGoogle ScholarCross RefCross Ref
  19. K. Sreenivasa Rao and Shashidhar G. Koolagudi. 2012. Emotion Recognition Using Speech Features. Springer Publishing Company, Incorporated, New York, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Saraceno and R. Leonardi. 1997. Audio as a support to scene change detection and characterization of video sequences. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, Vol. 4. IEEE, Munich, Germany, 2597--2600 vol.4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso. 2011. Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features. IEEE Trans. Cir. and Sys. for Video Technol. 21, 8 (Aug. 2011), 1163--1177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Michael J. Swain and Dana H. Ballard. 1991. Color indexing. International Journal of Computer Vision 7, 1 (01 Nov 1991), 11--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Toffler. 1970. Future Shock. Random House, New York, USA. https://books.google.com.br/books?id=-BhHAAAAMAAJGoogle ScholarGoogle Scholar
  24. Tiago H. Trojahn and Rudinei Goularte. 2013. Video Scene Segmentation by Improved Visual Shot Coherence. In Proceedings of the 19th Brazilian Symposium on Multimedia and the Web (WebMedia '13). ACM, New York, NY, USA, 23--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Vendrig and M. Worring. 2002. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia 4, 4 (Dec 2002), 492--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Wang, L. Gao, P. Wang, X. Sun, and X. Liu. 2018. Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length. IEEE Transactions on Multimedia 20, 3 (March 2018), 634--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Wu and M. Jin. 2015. Study on a new video scene segmentation algorithm. Applied Mathematics and Information Sciences 9, 1 (2015), 361--368. cited By 0.Google ScholarGoogle ScholarCross RefCross Ref
  28. Minerva Yeung, Boon-Lock Yeo, and Bede Liu. 1998. Segmentation of Video by Clustering and Graph Analysis. Comput. Vis. Image Underst. 71, 1 (July 1998), 94--109. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Temporal Video Scene Segmentation By Fused Bags-of-Features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web
        October 2018
        437 pages
        ISBN:9781450358675
        DOI:10.1145/3243082

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 October 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        WebMedia '18 Paper Acceptance Rate37of111submissions,33%Overall Acceptance Rate270of873submissions,31%
      • Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader