Evaluating the Performance of Twitter-based Exploit Detectors

  • Daniel Alves de Sousa UFU
  • Elaine Ribeiro de Faria UFU
  • Rodrigo Sanches Miani UFU


Patch prioritization is a crucial aspect of information systems security, and knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators accomplish this task. The analysis of social media for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning techniques to detect real-world exploits. In this paper, we use a technique that combines Twitter data with public database information to classify vulnerabilities as exploited or not-exploited. We analyze the behavior of different classifying algorithms, investigate the influence of different antivirus data as ground truth, and experiment with various time window sizes. Our findings suggest that using a Light Gradient Boosting Machine (LightGBM) can benefit the results, and for most cases, the statistics related to a tweet and the users who tweeted are more meaningful than the text tweeted. We also demonstrate the importance of using ground-truth data from security companies not mentioned in previous works.


Bilge, L. and Dumitras, T. (2012). Before we knew it: An empirical study of zero-day attacks in the real world. In Proceedings of the ACM Conference on Computer and Communications Security, pages 833–844.

Bozorgi, M., Saul, L., Savage, S., and Voelker, G. (2010). Beyond heuristics: learning to classify vulnerabilities and predict exploits. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 105–114.

Bullough, B. L., Yanchenko, A. K., Smith, C. L., and Zipkin, J. R. (2017). Predicting exploitation of disclosed software vulnerabilities using open-source data. IWSPA 2017 - Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2017, pages 45–53.

Chen, H., Liu, R., Park, N., and Subrahmanian, V. (2019). Using twitter to predict when vulnerabilities will be exploited. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3143–3152.

imbalanced-learn API documentation (2019). imbalanced-learn api. https://imbalanced-learn.readthedocs.io/%20en/stable/api.html.

Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, pages 137–142. Springer.

Mottl, D. (2018). Getoldtweets3. https://pypi.org/project/%20GetOldTweets3/.

Nayak, K., Marino, D., Efstathopoulos, P., and Dumitras, T. (2014). Some vulnerabilities are different than others. In International Workshop on Recent Advances in Intrusion Detection, pages 426–446. Springer.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Queiroz, A., Keegan, B., and Mtenzi, F. (2017). Predicting software vulnerability using security discussion in social media. European Conference on Information Warfare and Security, ECCWS, pages 628–634.

Sabottke, C., Suciu, O., and Dumitras, T. (2015). Vulnerability disclosure in the age of social media: Exploiting twitter for predicting real-world exploits. In USENIX Security Symposium, pages 1041–1056.

Shrestha, P., Sathanur, A., Maharjan, S., Saldanha, E., Arendt, D., and Volkova, S. (2020). Multiple social platforms reveal actionable signals for software vulnerability awareness: A study of github, twitter and reddit. PLOS ONE, 15(3):1–28.

Younis, A. A. and Malaiya, Y. K. (2015). Comparing and evaluating cvss base metrics and microsoft rating system. In 2015 IEEE International Conference on Software Quality, Reliability and Security, pages 252–261.
SOUSA, Daniel Alves de; FARIA, Elaine Ribeiro de; MIANI, Rodrigo Sanches. Evaluating the Performance of Twitter-based Exploit Detectors. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 20. , 2020, Petrópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 464-477. DOI: https://doi.org/10.5753/sbseg.2020.19257.