Comparative study of feature extraction approaches for maritime vessel identification in CBIR
Resumo
Os sistemas de vigilância e monitorização marítima são cruciais na segurança costeira e na gestão de recursos. O reconhecimento e identificação de navios são tarefas fundamentais. No entanto, a inspeção visual é um processo caro e trabalhoso. Este estudo compara métodos para uma abordagem automatizada para identificação de navios utilizando processamento digital de imagens. O desempenho dos métodos de extração de características clássicos e baseados em aprendizado de máquina é avaliado e comparado usando um conjunto de dados de embarcações marítimas para verificar sua capacidade de identificar diferentes embarcações. Os resultados mostram que o BEiT-v2 atinge o mais alto desempenho de identificação com uma precisão média média (mAP) de 95,05%. O VGG-19 oferece o melhor equilíbrio entre precisão (segundo maior mAP) e custo computacional. Estas descobertas sugerem que os métodos de aprendizagem automática são valiosos para a identificação de embarcações, com a escolha ideal dependendo das necessidades específicas da aplicação.
Palavras-chave:
vessel recognition, Content-Based Image Retrieval, computer vision
Referências
Alahi, A., Ortiz, R., and Vandergheynst, P. (2012). Freak: Fast retina keypoint. In Conference on Computer Vision and Pattern Recognition, pages 510–517. IEEE.
Alzu’bi, A., Amira, A., and Ramzan, N. (2015). Semantic content-based image retrieval: A comprehensive study. Journal of Visual Communication and Image Representation, 32:20–54.
Awad, A. I. and Hassaballah, M. (2016). Image feature detectors and descriptors. Studies in Computational Intelligence. Springer International Publishing, Cham.
Bao, H., Dong, L., Piao, S., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
Bianco, S., Mazzini, D., Pau, D. P., and Schettini, R. (2015). Local detectors and compact descriptors for visual search: a quantitative comparison. Digital Signal Processing, 44:1–13.
Biberman, L. (1973). Perception of displayed information. Plenum Press.
Bo, L., Xiaoyang, X., Xingxing, W., and Wenting, T. (2021). Ship detection and classification from optical remote sensing images: A survey. Chinese Journal of Aeronautics, 34(3):145–163.
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). Brief: Binary robust independent elementary features. In European Conference on Computer Vision, pages 778–792. Springer.
Chang, Y.-L., Anagaw, A., Chang, L., Wang, Y. C., Hsiao, C.-Y., and Lee, W.-H. (2019). Ship detection based on YOLOv2 for SAR imagery. Remote Sensing, 11(7):786.
Chen, X., Qi, L., Yang, Y., Postolache, O., Yu, Z., and Xu, X. (2019). Port ship detection in complex environments. In International Conference on Sensing and Instrumentation in IoT Era, pages 1–6. IEEE.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929.
Driggers, R. G., Cox, P. G., and Kelley, M. (1997). National imagery interpretation rating system and the probabilities of detection, recognition, and identification. Optical Engineering, 36(7):1952–1959.
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136.
Gundogdu, E., Solmaz, B., Yücesoy, V., and Koc, A. (2016). Marvel: A large-scale image dataset for maritime vessels. In Asian conference on computer vision, pages 165–180.
Guo, H., Yang, X., Wang, N., and Gao, X. (2021). A CenterNet++ model for ship detection in SAR images. Pattern Recog., 112:107787.
Hameed, I. M., Abdulhussain, S. H., and Mahmmod, B. M. (2021). Content-based image retrieval: A review of recent trends. Cogent Engineering, 8(1):1927469.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, pages 770–778.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Conference on Computer Vision and Pattern Recognition, pages 4700–4708. IEEE.
Li, R., Liu, W., Yang, L., Sun, S., Hu, W., Zhang, F., and Li, W. (2018). DeepUNet: A deep fully convolutional network for pixel-level sea-land segmentation. Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(11):3954–3962.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. (2015). Microsoft COCO: Common objects in context.
Liu, T., Pang, B., Ai, S., and Sun, X. (2020). Study on visual detection algorithm of sea surface targets based on improved YOLOv3. Sensors, 20(24):7263.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110.
Markowska-Kaczmar, U. and Kwaśnicka, H. (2018). Deep learning—a new era in bridging the semantic gap. In Bridging the Semantic Gap in Image and Video Analysis, pages 123–159. Springer.
Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145–175.
Peng, Z., Dong, L., Bao, H., Ye, Q., and Wei, F. (2022). BEiT v2: Masked image modeling with vector-quantized visual tokenizers.
Piras, L. and Giacinto, G. (2017). Information fusion in content based image retrieval: A comprehensive overview. Information Fusion, 37:50–60.
Qiao, D., Liu, G., Dong, F., Jiang, S.-X., and Dai, L. (2020). Marine vessel re-identification: A large-scale dataset and global-and-local fusion-based discriminative feature learning. IEEE Access, 8:27744–27756.
Rosten, E. and Drummond, T. (2005). Fusing points and lines for high performance tracking. In International Conference on Computer Vision, volume 2, pages 1508–1515. IEEE.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision, pages 2564–2571.
Shao, Z., Wu, W., Wang, Z., Du, W., and Li, C. (2018). Seaships: A large-scale precisely annotated dataset for ship detection. IEEE transactions on multimedia, 20(10):2593–2604.
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Conference on Artificial Intelligence, volume 31. AAAI.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions. CoRR, abs/1409.4842.
Xie, B., Hu, L., and Mu, W. (2017a). Background suppression based on improved top-hat and saliency map filtering for infrared ship detection. In International Conference on Computing Intelligence and Information System, pages 298–301. IEEE.
Xie, S., Girshick, R., Dollár, P., Tu, Z., and He, K. (2017b). Aggregated residual transformations for deep neural networks. In Conference on Computer Vision and Pattern Recognition, pages 1492–1500. IEEE.
Alzu’bi, A., Amira, A., and Ramzan, N. (2015). Semantic content-based image retrieval: A comprehensive study. Journal of Visual Communication and Image Representation, 32:20–54.
Awad, A. I. and Hassaballah, M. (2016). Image feature detectors and descriptors. Studies in Computational Intelligence. Springer International Publishing, Cham.
Bao, H., Dong, L., Piao, S., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
Bianco, S., Mazzini, D., Pau, D. P., and Schettini, R. (2015). Local detectors and compact descriptors for visual search: a quantitative comparison. Digital Signal Processing, 44:1–13.
Biberman, L. (1973). Perception of displayed information. Plenum Press.
Bo, L., Xiaoyang, X., Xingxing, W., and Wenting, T. (2021). Ship detection and classification from optical remote sensing images: A survey. Chinese Journal of Aeronautics, 34(3):145–163.
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). Brief: Binary robust independent elementary features. In European Conference on Computer Vision, pages 778–792. Springer.
Chang, Y.-L., Anagaw, A., Chang, L., Wang, Y. C., Hsiao, C.-Y., and Lee, W.-H. (2019). Ship detection based on YOLOv2 for SAR imagery. Remote Sensing, 11(7):786.
Chen, X., Qi, L., Yang, Y., Postolache, O., Yu, Z., and Xu, X. (2019). Port ship detection in complex environments. In International Conference on Sensing and Instrumentation in IoT Era, pages 1–6. IEEE.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929.
Driggers, R. G., Cox, P. G., and Kelley, M. (1997). National imagery interpretation rating system and the probabilities of detection, recognition, and identification. Optical Engineering, 36(7):1952–1959.
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136.
Gundogdu, E., Solmaz, B., Yücesoy, V., and Koc, A. (2016). Marvel: A large-scale image dataset for maritime vessels. In Asian conference on computer vision, pages 165–180.
Guo, H., Yang, X., Wang, N., and Gao, X. (2021). A CenterNet++ model for ship detection in SAR images. Pattern Recog., 112:107787.
Hameed, I. M., Abdulhussain, S. H., and Mahmmod, B. M. (2021). Content-based image retrieval: A review of recent trends. Cogent Engineering, 8(1):1927469.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, pages 770–778.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Conference on Computer Vision and Pattern Recognition, pages 4700–4708. IEEE.
Li, R., Liu, W., Yang, L., Sun, S., Hu, W., Zhang, F., and Li, W. (2018). DeepUNet: A deep fully convolutional network for pixel-level sea-land segmentation. Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(11):3954–3962.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. (2015). Microsoft COCO: Common objects in context.
Liu, T., Pang, B., Ai, S., and Sun, X. (2020). Study on visual detection algorithm of sea surface targets based on improved YOLOv3. Sensors, 20(24):7263.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110.
Markowska-Kaczmar, U. and Kwaśnicka, H. (2018). Deep learning—a new era in bridging the semantic gap. In Bridging the Semantic Gap in Image and Video Analysis, pages 123–159. Springer.
Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145–175.
Peng, Z., Dong, L., Bao, H., Ye, Q., and Wei, F. (2022). BEiT v2: Masked image modeling with vector-quantized visual tokenizers.
Piras, L. and Giacinto, G. (2017). Information fusion in content based image retrieval: A comprehensive overview. Information Fusion, 37:50–60.
Qiao, D., Liu, G., Dong, F., Jiang, S.-X., and Dai, L. (2020). Marine vessel re-identification: A large-scale dataset and global-and-local fusion-based discriminative feature learning. IEEE Access, 8:27744–27756.
Rosten, E. and Drummond, T. (2005). Fusing points and lines for high performance tracking. In International Conference on Computer Vision, volume 2, pages 1508–1515. IEEE.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision, pages 2564–2571.
Shao, Z., Wu, W., Wang, Z., Du, W., and Li, C. (2018). Seaships: A large-scale precisely annotated dataset for ship detection. IEEE transactions on multimedia, 20(10):2593–2604.
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Conference on Artificial Intelligence, volume 31. AAAI.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions. CoRR, abs/1409.4842.
Xie, B., Hu, L., and Mu, W. (2017a). Background suppression based on improved top-hat and saliency map filtering for infrared ship detection. In International Conference on Computing Intelligence and Information System, pages 298–301. IEEE.
Xie, S., Girshick, R., Dollár, P., Tu, Z., and He, K. (2017b). Aggregated residual transformations for deep neural networks. In Conference on Computer Vision and Pattern Recognition, pages 1492–1500. IEEE.
Publicado
17/11/2024
Como Citar
SANTOS, Bryan L. G. dos; LORENA, Ana C.; CRUZ, Juliano E. C..
Comparative study of feature extraction approaches for maritime vessel identification in CBIR. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 388-399.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2024.245117.