Review on Common Techniques for Urban Environment Video Analytics

  • Henry O. Velesaca ESPOL
  • Patricia L. Suárez ESPOL
  • Angel D. Sappa ESPOL / UAB
  • Dario Carpio ESPOL
  • Rafael E. Rivadeneira ESPOL
  • Angel Sanchez URJC


This work compiles the different computer vision-based approaches from the state-of-the-art intended for video analytics in urban environments. The manuscript groups the different approaches according to the typical modules present in video analysis, including image preprocessing, object detection, classification, and tracking. This proposed pipeline serves as a basic guide to representing these most representative approaches in this topic of video analysis that will be addressed in this work. Furthermore, the manuscript is not intended to be an exhaustive review of the most advanced approaches, but only a list of common techniques proposed to address recurring problems in this field.
Palavras-chave: Video Analytics, Review, Urban Environments, Smart Cities


Ahmed, I., Ahmad, M., Rodrigues, J. J., Jeon, G., and Din, S. (2021). A deep learningbased social distance monitoring framework for covid-19. Sustainable Cities and Society, 65:1–12.

Arafat, M. Y., Khairuddin, A. S. M., and Paramesran, R. (2020). Connected component analysis integrated edge based technique for automatic vehicular license plate recognition framework. Intelligent Transport Systems, 14(7):712–723.

Aslani, S. and Mahdavi-Nasab, H. (2013). Optical flow based moving object detection International Journal of Electrical, Computer, and tracking for traffic surveillance. Energetic, Electronic and Communication Engineering, 7(9):1252–1256.

Ata-Ur-Rehman, Tariq, S., Farooq, H., Jaleel, A., and Wasif, S. M. (2021). Anomaly detection with particle filtering for online video surveillance. IEEE Access, 9:19457– 19468.

Avola, D., Foresti, G. L., Martinel, N., Micheloni, C., Pannone, D., and Piciarelli, C. (2017). Aerial video surveillance system for small-scale uav environment monitoring. In 14th International Conference on Advanced Video and Signal Based Surveillance, pages 1–6. IEEE.

Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. (2016). Simple online and realtime tracking. In International Conference on Image Processing, pages 3464–3468. IEEE.

Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection.

Bouwmans, T., Javed, S., Sultana, M., and Jung, S. K. (2019). Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. Neural Networks, 117:8–66.

Buch, N., Cracknell, M., Orwell, J., and Velastin, S. A. (2009). Vehicle localisation and classification in urban cctv streams. ITS World Congress, pages 1–8.

Canel, C., Kim, T., Zhou, G., Li, C., Lim, H., Andersen, D. G., Kaminsky, M., and Dulloor, S. R. (2019). Scaling video analytics on constrained edge nodes. arXiv preprint arXiv:1905.13536.

Cao, M., Zheng, L., Jia, W., and Liu, X. (2020). Joint 3d reconstruction and object tracking for traffic video analysis under iov environment. Transactions on Intelligent Transportation Systems, 22(6):3577–3591.

Cao, X., Wu, C., Lan, J., Yan, P., and Li, X. (2011). Vehicle detection and motion analysis in low-altitude airborne video under urban environment. Transactions on Circuits and Systems for Video Technology, 21(10):1522–1533.

Caprile, B. and Torre, V. (1990). Using vanishing points for camera calibration. International Journal of Computer Vision, 4(2):127–139.

Chen, K., Wang, Z., Wang, X., Gong, D., Yu, L., Guo, Y., and Ding, G. (2021). Towards real-time object detection in gigapixel-level video. Neurocomputing.

del Pino, I., Vaquero, V., Masini, B., Sola, J., Moreno-Noguer, F., Sanfeliu, A., and Andrade-Cetto, J. (2017). Low resolution lidar-based multi-object tracking for driving applications. In Iberian Robotics Conference, pages 287–298. Springer.

Dyckmanns, H., Matthaei, R., Maurer, M., Lichte, B., Effertz, J., and Stüker, D. (2011). Object tracking in urban intersections based on active use of a priori knowledge: Active interacting multi model filter. In Intelligent Vehicles Symposium, pages 625–630. IEEE.

Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H., and Vincent, L. (2009). Large-scale privacy protection in google street view. In 12th International Conference on Computer Vision, pages 2373–2380. IEEE.

Gaddigoudar, P. K., Balihalli, T. R., Ijantkar, S. S., Iyer, N. C., and Maralappanavar, S. (2017). Pedestrian detection and tracking using particle filtering. In International Conference on Computing, Communication and Automation, pages 110–115.

Gautam, K. and Thangavel, S. K. (2019). Video analytics-based intelligent surveillance system for smart buildings. Soft Computing, 23(8):2813–2837.

Grassi, G., Jamieson, K., Bahl, P., and Pau, G. (2017). Parkmaster: An in-vehicle, edgebased video analytics service for detecting open parking spaces in urban environments. In Proceedings of Symposium on Edge Computing, pages 1–14.

Graszka, P. (2014). Median mixture model for background–foreground segmentation in video sequences. 22nd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision in co-operation with EUROGRAPHICS Association, pages 103–110.

Gupte, S., Masoud, O., Martin, R. F., and Papanikolopoulos, N. P. (2002). Detection and classification of vehicles. Transactions on Intelligent Transportation Systems, 3(1):37– 47.

Hamida, A. B., Koubaa, M., Amar, C. B., and Nicolas, H. (2014). Toward scalable application-oriented video surveillance systems. In Science and Information Conference, pages 384–388. IEEE.

Jodoin, J.-P., Bilodeau, G.-A., and Saunier, N. (2014). Urban tracker: Multiple object tracking in urban mixed traffic. In Winter Conference on Applications of Computer Vision, pages 885–892. IEEE.

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35–45.

Kumar, T. S. (2020). Video based traffic forecasting using convolution neural network model and transfer learning techniques. Journal of Innovative Image Processing, 2(03):128–134.

Lee, B. and Hedley, M. (2002). Background estimation for video surveillance. Image Vision Computing New Zealand, pages 315–320.

Li, Y., Padmanabhan, A., Zhao, P., Wang, Y., Xu, G. H., and Netravali, R. (2020). Reducto: On-camera filtering for resource-efficient real-time video analytics. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 359–376.

Lim, K., Jang, W.-D., and Kim, C.-S. (2017). Background subtraction using encoderdecoder structured convolutional neural network. In 14th International Conference on Advanced Video and Signal Based Surveillance, pages 1–6. IEEE.

Liu, C., Huynh, D. Q., Sun, Y., Reynolds, M., and Atkinson, S. (2020). A vision-based pipeline for vehicle counting, speed estimation, and classification. Transactions on Intelligent Transportation Systems.

Liu, X., Liu, W., Mei, T., and Ma, H. (2016). A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In European conference on computer vision, pages 869–884. Springer.

Liu, X., Sang, J., Wu, W., Liu, K., Liu, Q., and Xia, X. (2021). Density-aware and background-aware network for crowd counting via multi-task learning. Pattern Recognition Letters, 150:221–227.

Makhmutova, A., Anikin, I. V., and Dagaeva, M. (2020). Object tracking method for videomonitoring in intelligent transport systems. In International Russian Automation Conference, pages 535–540. IEEE.

Nguyen, T.-N., Michaelis, B., Al-Hamadi, A., Tornow, M., and Meinecke, M.-M. (2011). Stereo-camera-based urban environment perception using occupancy grid and object tracking. Transactions on Intelligent Transportation Systems, 13(1):154–165.

Noh, B., No, W., Lee, J., and Lee, D. (2020). Vision-based potential pedestrian risk analysis on unsignalized crosswalk using data mining techniques. Applied Sciences, 10(3):1–21.

Praveenkumar, S., Patil, P., and Hiremath, P. (2022). Real-time multi-object tracking of pedestrians in a video using convolution neural network and Deep SORT. In ICT Systems and Sustainability, pages 725–736. Springer.

Qu, H., Yuan, T., Sheng, Z., and Zhang, Y. (2018). A pedestrian detection method based on YOLOv3 model and image enhanced by retinex. In 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, pages 1–5. IEEE.

Ridel, D., Rehder, E., Lauer, M., Stiller, C., and Wolf, D. (2018). A literature review on the prediction of pedestrian behavior in urban scenarios. In 21st International Conference on Intelligent Transportation Systems, pages 3105–3112. IEEE.

Shi, Y., Guo, Y., Mi, Z., and Li, X. (2022). Stereo CenterNet-based 3D object detection for autonomous driving. Neurocomputing, 471:219–229.

Shi, Z., Guo, B., Zhao, M., Zhang, C., et al. (2018). Nighttime low illumination image enhancement with single image using bright/dark channel prior. Journal on Image and Video Processing, 2018(1):1–15.

Silva, R. R., Aires, K. R., and Veras, R. d. (2018). Detection of helmets on motorcyclists. Multimedia Tools and Applications, 77(5):5659–5683.

Stauffer, C. and Grimson, W. E. L. (1999). Adaptive background mixture models for real-time tracking. In Proceedings of Conference on Computer Vision and Pattern Recognition, volume 2, pages 246–252. IEEE.

Tu, N. A., Wong, K.-S., Demirci, M. F., Lee, Y.-K., et al. (2021). Toward efficient and intelligent video analytics with visual privacy protection for large-scale surveillance. The Journal of Supercomputing, pages 1–31.

Velesaca, H. O., Araujo, S., Suárez, P. L., Sánchez, A., and Sappa, A. D. (2020). Off-theshelf based system for urban environment video analytics. In International Conference on Systems, Signals and Image Processing, pages 459–464.

Vishnu, C., Singh, D., Mohan, C. K., and Babu, S. (2017). Detection of motorcyclists without helmet in videos using convolutional neural network. In International Joint Conference on Neural Networks, pages 3036–3041. IEEE.

Wei, H., Laszewski, M., and Kehtarnavaz, N. (2018). Deep learning-based person detection and classification for far field video surveillance. In 13th Dallas Circuits and Systems Conference, pages 1–4. IEEE.

Xu, R., Nikouei, S. Y., Chen, Y., Polunchenko, A., Song, S., Deng, C., and Faughnan, T. R. (2018). Real-time human objects tracking for smart surveillance at the edge. In International Conference on Communications, pages 1–6. IEEE.

Zhang, H., Wang, K., Tian, Y., Gou, C., and Wang, F.-Y. (2018). MFR-CNN: Incorporating multi-scale features and global information for traffic object detection. Transactions on Vehicular Technology, 67(9):8019–8030.

Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., and Liu, Y. (2015). Line-based multi-label energy optimization for fisheye image rectification and calibration. In Proceedings of Conference on Computer Vision and Pattern Recognition, pages 4137–4145. IEEE Computer Society.

Zou, Y., Zhang, Y., Yan, J., Jiang, X., Huang, T., Fan, H., and Cui, Z. (2021). License plate detection and recognition based on YOLOv3 and ILPRNET. Signal, Image and Video Processing, pages 1–8.
VELESACA, Henry O.; SUÁREZ, Patricia L.; SAPPA, Angel D.; CARPIO, Dario; RIVADENEIRA, Rafael E.; SANCHEZ, Angel. Review on Common Techniques for Urban Environment Video Analytics. In: WORKSHOP BRASILEIRO DE CIDADES INTELIGENTES (WBCI), 3. , 2022, Niterói. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 107-118. DOI: