research-article

Temporal Video Scene Segmentation By Fused Bags-of-Features

Authors:
Rodrigo Mitsuo Kishi

University of São Paulo, Federal University of Mato, Grosso do Sul

University of São Paulo, Federal University of Mato, Grosso do Sul
View Profile

,
Tiago Henrique Trojahn

University of São Paulo, Federal Institute of São Paulo

University of São Paulo, Federal Institute of São Paulo
View Profile

,
Rudinei Goularte

University of São Paulo

University of São Paulo
View Profile

WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the WebOctober 2018Pages 173–180https://doi.org/10.1145/3243082.3243109

Published:16 October 2018Publication History

WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

Pages 173–180

ABSTRACT

Temporal segmentation of video into semantically coherent scenes is a fundamental step to enhance video operations like browsing, retrieval and recommendation. Available automatic scene segmentation methods in the literature are still far, in terms of efficacy, from reasonable practical application requirements. Towards to lowering this gap, this paper presents a new multimodal early fusion based scene segmentation method, which extends the classical and powerful singlemodal bags-of-features latent semantics discriminative capability to a multimodal paradigm. This approach was designed to refine the latent semantics from singlemodal data by identifying and representing audiovisual patterns while still preserving singlemodal visual/aural words patterns. Experiments have been performed over a publicly available dataset where the proposed method achieved higher average values for the FCO metric than previous state-of-the-art approaches.

References

Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal Fusion for Multimedia Analysis: A Survey. Multimedia Syst. 16, 6 (Nov. 2010), 345--379. Google ScholarDigital Library
Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. A Deep Siamese Network for Scene Detection in Broadcast Videos. In Proceedings of the 23rd ACM International Conference on Multimedia (MM '15). ACM, New York, NY, USA, 1199--1202. Google ScholarDigital Library
Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. Measuring Scene Detection Performance. In Pattern Recognition and Image Analysis, Roberto Paredes, Jaime S. Cardoso, and Xosé M. Pardo (Eds.). Springer International Publishing, Cham, 395--403.Google Scholar
BBC. 2006. Planet Earth. http://www.bbc.co.uk/programmes/b006mywy. {Online; accessed 25-may-2018}.Google Scholar
Gertjan J. Burghouts and Jan-Mark Geusebroek. 2009. Performance Evaluation of Local Colour Invariants. Comput. Vis. Image Underst. 113, 1 (Jan. 2009), 48--62. Google ScholarDigital Library
O. G. Cula and K. J. Dana. 2001. Compact representation of bidirectional texture functions. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, Kauai, HI, USA, USA, I--1041--I--1047 vol.1.Google Scholar
S. Davis and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4 (August 1980), 357--366.Google ScholarCross Ref
Manfred Del Fabro and Laszlo Böszörmenyi. 2013. State-of-the-art and future challenges in video scene detection: a survey. Multimedia Systems 19, 5 (2013), 427--454. Google ScholarDigital Library
G. Gao and H. Ma. 2012. Multi-modality movie scene detection using Kernel Canonical Correlation Analysis. In Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, Tsukuba, Japan, 3074--3077.Google Scholar
Bo Han and Weiguo Wu. 2011. Video scene segmentation using a novel boundary evaluation criterion and dynamic programming. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, Barcelona, Spain, 1--6. Google ScholarDigital Library
Xian-Sheng Hua, Dong Zhang, Mingjing Li, and Hong-Jiang Zhang. 2002. Performance Evaluation Protocol for Video Scene Detection Algorithms. In Workshop on Multimedia Information Retrieval, in conjunction with 10th ACM Multimedia. ACM, Juan-les-Pins, France.Google Scholar
I-Hong Jhuo, Guangnan Ye, Shenghua Gao, Dong Liu, Yu-Gang Jiang, D. T. Lee, and Shih-Fu Chang. 2014. Discovering joint audio--visual codewords for video event detection. Machine Vision and Applications 25, 1 (2014), 33--47. Google ScholarDigital Library
Jonathan Hare, Sina Samangooei, and David Dupplaw. 2011. OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia Analysis and Indexing of Images. In ACM Multimedia 2011. ACM, Scottsdale, Arizona, USA, 691--694. http://eprints.soton.ac.uk/273040/ Event Dates: 28/11/2011 until 1/12/2011. Google ScholarDigital Library
Rodrigo Mitsuo Kishi, Tiago Henrique Trojahn, and Rudinei Goularte. 2016. An Evaluation of Readily Usable Automatic Video Shot Segmentation Techniques. In Proceedings of the 22Nd Brazilian Symposium on Multimedia and the Web (Webmedia '16). ACM, New York, NY, USA, 199--202. Google ScholarDigital Library
Irena Koprinska and Sergio Carrato. 2001. Temporal video segmentation: A survey. Signal Processing: Image Communication 16 (2001), 477--500.Google ScholarCross Ref
Stuart P. Lloyd. 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory 28 (1982), 129--137. Google ScholarDigital Library
Bruno Lorenço Lopes, Tiago Henrique Trojahn, and Rudinei Goularte. 2014. Video Scene Detection by Multimodal Bag of Features. Journal of Information and Data Management 5, 2 (2014), 194.Google Scholar
Daniel Moreira, Sandra Avila, Mauricio Perez, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, and Anderson Rocha. 2019. Multimodal data fusion for sensitive scene localization. Information Fusion 45 (2019), 307 -- 323.Google ScholarCross Ref
K. Sreenivasa Rao and Shashidhar G. Koolagudi. 2012. Emotion Recognition Using Speech Features. Springer Publishing Company, Incorporated, New York, USA. Google ScholarDigital Library
C. Saraceno and R. Leonardi. 1997. Audio as a support to scene change detection and characterization of video sequences. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, Vol. 4. IEEE, Munich, Germany, 2597--2600 vol.4. Google ScholarDigital Library
P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso. 2011. Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features. IEEE Trans. Cir. and Sys. for Video Technol. 21, 8 (Aug. 2011), 1163--1177. Google ScholarDigital Library
Michael J. Swain and Dana H. Ballard. 1991. Color indexing. International Journal of Computer Vision 7, 1 (01 Nov 1991), 11--32. Google ScholarDigital Library
A. Toffler. 1970. Future Shock. Random House, New York, USA. https://books.google.com.br/books?id=-BhHAAAAMAAJGoogle Scholar
Tiago H. Trojahn and Rudinei Goularte. 2013. Video Scene Segmentation by Improved Visual Shot Coherence. In Proceedings of the 19th Brazilian Symposium on Multimedia and the Web (WebMedia '13). ACM, New York, NY, USA, 23--30. Google ScholarDigital Library
J. Vendrig and M. Worring. 2002. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia 4, 4 (Dec 2002), 492--499. Google ScholarDigital Library
X. Wang, L. Gao, P. Wang, X. Sun, and X. Liu. 2018. Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length. IEEE Transactions on Multimedia 20, 3 (March 2018), 634--644. Google ScholarDigital Library
S. Wu and M. Jin. 2015. Study on a new video scene segmentation algorithm. Applied Mathematics and Information Sciences 9, 1 (2015), 361--368. cited By 0.Google ScholarCross Ref
Minerva Yeung, Boon-Lock Yeo, and Bede Liu. 1998. Segmentation of Video by Clustering and Graph Analysis. Comput. Vis. Image Underst. 71, 1 (July 1998), 94--109. Google ScholarDigital Library

Index Terms

Temporal Video Scene Segmentation By Fused Bags-of-Features
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Video segmentation
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Video scene segmentation by improved visual shot coherence
WebMedia '13: Proceedings of the 19th Brazilian symposium on Multimedia and the web

Nowadays, there a increasing interest in video scene segmentation due huge amount of videos available through services like YouTube. Although there are some techniques which obtain relatively good precision and recall values when segmenting the video in ...
Read More
Multimodal early fusion operators for temporal video scene segmentation tasks
Abstract
The Temporal Video Scene Segmentation (TVSS) task is still an open problem presenting challenges in the Multimedia Analysis area. Current approaches employ multimodality, fusing features from different video data modalities as a way to improve ...
Read More
A semantic-based video scene segmentation using a deep neural network

Video scene segmentation is very important research in the field of computer vision, because it helps in efficient storage, indexing and retrieval of videos. Achieving this kind of scene segmentation cannot be done by just calculating the similarity of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web
October 2018
437 pages
ISBN:9781450358675
DOI:10.1145/3243082
General Chairs:
Manoel Carvalho Marques Neto
IFBA
,
Renato Lima Novais
IFBA
,
Carlos Ferraz
UFPE
,
Windson Viana
UFC
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Feature Fusion
Multimedia Systems
Video Scene Segmentation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
WebMedia '18 Paper Acceptance Rate37of111submissions,33%Overall Acceptance Rate270of873submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 123
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Temporal Video Scene Segmentation By Fused Bags-of-Features

WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Video scene segmentation by improved visual shot coherence

Multimodal early fusion operators for temporal video scene segmentation tasks

A semantic-based video scene segmentation using a deep neural network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Temporal Video Scene Segmentation By Fused Bags-of-Features

WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Video scene segmentation by improved visual shot coherence

Multimodal early fusion operators for temporal video scene segmentation tasks

A semantic-based video scene segmentation using a deep neural network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media