Humor Detection using Support Vector Machine
This paper aims classify texts in humorous and non-humorous, while exploring the different parameters and tactics that can be used alongside the Support Vector Machine (SVM) classifier, to see and understand their impact on the classification and find the best combinations that have the best performances considering the accuracy and the F1 score. After observing the plots and analyzing the data we were able to come to a conclusion of which combination would be best to classify the texts in the testing data provided by the HaHackathon: Detecting and Rating Humor and Offense CodaLab Competition [cod 2021]. With those results we were able to give a wide view of this type of problem solutions, which can be used in further related work in this field of research.
(2021). Nltk library documentation. https://www.nltk.org/api/nltk.html. Accessed 25 May 2021.
(2021). Numpy library documentation,. https://numpy.org/doc/stable/. Accessed 25 May 2021.
(2021). Pandas library documentation. https://pandas.pydata.org/docs/. Accessed 25 May 2021.
(2021). Scikit-learn library documentation. https://scikit-learn.org/stable/. Accessed 25 May 2021.
(2021). Support vector machine - introduction to machine learning algorithms. [link]. Accessed 25 May 2021.
Al-Khafaji H., H. A. (2017). Efficient algorithms for preprocessing and stemming of tweets in a sentiment analysis system.
Bali T., A. V. and N., S. (2018). What makes us laugh? investigations into automatichumor classification.
Berry, M. (2003). Survey of text mining: Clustering, classification and retrieval.
Sun, A., L. E. and Liu, Y. (2009). On strategies for imbalanced text classification using svm:a comparative study.
Xu, Z., Y. K. T. V. X. X. and Wang, J. (2003). Representative sampling for text classification using support vector machines.
Zhuang, D., Z. B. Y. Q. Y. J. C. Z. and Chen, Y. (2005). Efficient text classification by weighted proximal svm.