From Text to Locations: Repurposing Language Models for Spatial Trajectory Similarity Assessment

Wilken C. Dantas Melo; Lívia Almada Cruz; Francesco Lettich; Ticiana L. Coelho da Silva; Regis Pires Magalhães

doi:10.5753/sbbd.2024.240212

Wilken C. Dantas Melo Universidade Federal do Ceará (UFC) http://orcid.org/0009-0000-6546-5413
Lívia Almada Cruz Universidade Federal do Ceará (UFC)
Francesco Lettich ISTI-CNR
Ticiana L. Coelho da Silva Universidade Federal do Ceará (UFC)
Regis Pires Magalhães Universidade Federal do Ceará (UFC)

DOI: https://doi.org/10.5753/sbbd.2024.240212

Resumo

The proliferation of electronic devices with geopositioning capabilities has significantly increased trajectory data generation, thus opening up novel opportunities in mobility analysis. Our work considers the problem of assessing spatial similarity between trajectories, and focus on deep learning-based approaches that discretize trajectories using a uniform grid to generate their embeddings. In this context, t2vec is the reference approach. Large Language Models (LLMs) show promise in capturing patterns in mobility data. In this paper, we investigate whether an LLM can be repurposed to generate high-quality trajectory embeddings for the considered task. Using two real-world trajectory datasets, we consider repurposing three language models: Word2Vec, Doc2Vec, and BERT. Our results show that BERT, trained on dense trajectory datasets, can generate high-quality embeddings, thus highlighting the potential of LLMs.

Palavras-chave: Spatial Trajectory Similarity, Trajectory Embeddings, Natural Language Processing, Language Models

Referências

Cao, H., Xu, F., Sankaranarayanan, J., Li, Y., and Samet, H. (2020). Habit2vec: Trajectory semantic embedding for living pattern recognition in population. IEEE Transactions on Mobile Computing, 19(5):1096–1108.

Crivellari, A., Resch, B., and Shi, Y. (2022). TraceBERT – a feasibility study on reconstructing spatial-temporal gaps from incomplete motion trajectories via BERT training process on discrete location sequences. Sensors, 22(4):1682.

Cruz, L., Coelho da Silva, T., Magalhães, R., Melo, W., Cordeiro, M., de Macedo, J., and Zeitouni, K. (2022). Modeling trajectories obtained from external sensors for location prediction via NLP approaches. Sensors, 22(19).

Cruz, L., Zeitouni, K., and Macedo, J. (2019). Trajectory prediction from a mass of sparse and missing external sensor data. In IEEE MDM.

Damiani, M. L., Acquaviva, A., Hachem, F., and Rossini, M. (2020). Learning behavioral representations of human mobility. In ACM SIGSPATIAL, page 367–376, New York, NY, USA. Association for Computing Machinery.

Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the ACL: Human Language Technologies, Vol.1, pages 4171–4186, Minneapolis, Minnesota. ACL.

Fang, Z., Du, Y., Zhu, X., Hu, D., Chen, L., Gao, Y., and Jensen, C. (2022). Spatio-temporal trajectory similarity learning in road networks. In 28th ACM SIGKDD, KDD ’22, page 347–356, New York, NY, USA. Association for Computing Machinery.

Fu, T.-Y. and Lee, W.-C. (2020). Trembr: Exploring road networks for trajectory representation learning. ACM TIST, 11(1):1–25.

Gruver, N., Finzi, M. A., Qiu, S., and Wilson, A. G. (2023). Large language models are zero-shot time series forecasters. In NeurIPS.

Hung, C.-C., Peng, W.-C., and Lee, W.-C. (2015). Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. The VLDB Journal, 24(2):169–192.

Jing, Y., Yu, Z., Chengyang, Z., Wenlei, X., Xing, X., Guangzhong, S., and Yan, H. (2018). Tdrive: driving directions based on taxi trajectories. In 18th ACM SIGSPATIAL, GIS ’10, pages 99–108, New York, NY, USA. Association for Computing Machinery.

Kruskal, J. (1983). An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM, 2(25):201–237.

Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st ICML, ICML’14, page II–1188–II–1196. JMLR.org.

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10(8):707–710.

Li, X., Zhao, K., Cong, G., Jensen, C. S., and Wei, W. (2018). Deep representation learning for trajectory similarity computation. In 34th IEEE ICDE, pages 617–628.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In 1st ICLR Workshop Track Proceedings.

Shuncheng, L., Su, H., Zheng, B., Zhou, X., and Zheng., K. (2019). A survey of trajectory distance measures and performance evaluation. VLDB, 408:3–32.

Taghizadeh, S., Elekes, A., Schaler, M., and Bohn, K. (2021). How meaningful are similarities in deep trajectory representations? In Information Systems, volume 98, page 101452. Elsevier.

Wang, S., Cao, J., and Philip, S. Y. (2020). Deep learning for spatio-temporal data mining: A survey. IEEE TKDE, 34(8):3681–3700.

Yang, P., Wang, H., Zhang, Y., Qin, L., Zhang, W., and Lin, X. (2021). T3S: Effective representation learning for trajectory similarity computation. In 37th IEEE ICDE, pages 2183–2188.

Yao, D., Cong, G., Zhang, C., and Bi, J. (2019). Computing trajectory similarity in linear time: A generic seed-guided neural metric learning approach. In 35th IEEE ICDE 2019, pages 1358–1369.

Zhang, H., Zhang, X., Jiang, Q., Zheng, B., Sun, Z., Sun, W., and Wang, C. (2021). Trajectory similarity learning with auxiliary supervision and optimal matching. In 29th IJCAI, IJCAI’20.

Zhang, Y., Liu, A., Liu, G., Li, Z., and Li, Q. (2019). Deep representation learning of activity trajectory similarity computation. In 2019 IEEE ICWS, pages 312–319. IEEE.