Weakly Supervised Learning Algorithm to Eliminate Irrelevant Association Rules in Large Knowledge Bases

Authors

  • Bruno B. Cifarelli Instituto Federal de São Paulo
  • Rafael G. L. Miani Instituto Federal de São Paulo

DOI:

https://doi.org/10.5753/jidm.2020.2025

Keywords:

Association Rules, Irrelevant Rules, Large Knowledge Base, Weak Supervision

Abstract

The construction and population of large knowledge bases have been widely explored in the past few years. Many techniques were developed in order to accomplish this purpose. Association rule mining algorithms can also be used to help populate these knowledge bases. Nevertheless, analyzing the amount of association rules generated can be a challenge and time-consuming task. The technique described in this article aims to eliminate irrelevant association rules in order to facilitate the rules evaluation process. To achieve that, this article presents a weakly supervised learning technique to prune irrelevant association rules. The proposed method uses irrelevant rules already discovered in past iterations and prunes off those with the same pattern. Experiments showed that the new technique can reduce and eliminate the amount of rules by about 60%, decreasing the effort required to evaluate them.

Downloads

Download data is not yet available.

References

Agrawal, R., Imielinski, T., and Swami, A. Mining association rules between sets of items in large databases. In IN: PROCEEDINGS OF THE 1993 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, WASHINGTON DC (USA. pp. 207–216, 1993.

Appel, A. P. and Hruschka Jr, E. Prophet–a link-predictor to learn new rules on nell. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. IEEE, pp. 917–924, 2011.

Baralis, E., Cagliero, L., Cerquitelli, T., and Garza, P. Generalized association rule mining with constraints. Information Sciences vol. 194, pp. 68–84, 2012.

Bayardo Jr, R. J. Efficiently mining long patterns from databases. ACM Sigmod Record 27 (2): 85–93, 1998.

Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. Dbpedia - a crystallization point for the web of data. Web Semant. 7 (3): 154–165, Sept., 2009.

Burdick, D., Calimlim, M., and Gehrke, J. Mafia: A maximal frequent itemset algorithm for transactional databases. In Data Engineering, 2001. Proceedings. 17th International Conference on. IEEE, pp. 443–452, 2001.

Carlson, A., Betteridge, J., Hruschka Jr, E. R., and Mitchell, T. M. Coupling semi-supervised learning of categories and relations. In Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing. Association for Computational Linguistics, pp. 1–9, 2009.

Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E. R., and Mitchell, T. M. Toward an architecture for never-ending language learning. In In AAAI, 2010.

Carlson, A., Betteridge, J., Wang, R. C., Hruschka Jr, E. R., and Mitchell, T. M. Coupled semi-supervised learning for information extraction. In Proceedings of the third ACM international conference on Web search and data mining. ACM, pp. 101–110, 2010.

Djenouri, Y., Drias, H., and Bendjoudi, A. Pruning irrelevant association rules using knowledge mining. International Journal of Business Intelligence and Data Mining 9 (2): 112–144, 2014.

Dong, X., Hao, F., Zhao, L., and Xu, T. An efficient method for pruning redundant negative and positive association rules. Neurocomputing, 2019.

Fan, W., Wang, X., Wu, Y., and Xu, J. Association rules with graph patterns. Proceedings of the VLDB Endowment 8 (12): 1502–1513, 2015.

GalÁrraga, L. A., Teflioudi, C., Hose, K., and Suchanek, F. Amie: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22Nd International Conference on World Wide Web. WWW ’13. Int. World Wide Web Conf. Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 413–422, 2013.

Gouda, K. and Zaki, M. J. Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery 11 (3): 223–242, Nov., 2005.

Grahne, G. and Zhu, J. High performance mining of maximal frequent itemsets. In 6th International Workshop on High Performance Data Mining, 2003.

Marinica, C. and Guillet, F. Knowledge-based interactive postmining of association rules using ontologies. IEEE Transactions on Knowledge and Data Engineering 22 (6): 784–797, 2010.

Matuszek, C., Cabral, J., Witbrock, M., and Deoliveira, J. An introduction to the syntax and content of cyc. In Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering. pp. 44–49, 2006.

Miani, R. G., Yaguinuma, C. A., Santos, M. T., and Biajiz, M. Narfo algorithm: Mining non-redundant and generalized association rules based on fuzzy ontologies. In Enterprise Inf. Systems. Springer, pp. 415–426, 2009.

Miani, R. G. L. and Hruschka Jr, E. R. Eliminating redundant and irrelevant association rules in large knowledge bases. In ICEIS (1). pp. 17–28, 2018.

Miani, R. G. L. and Hruschka Junior, E. R. Exploring association rules in a large growing knowledge base. Int. J. of Comp. Info. Syst. and Ind. Mangt Apps, 2015.

Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory. ICDT ’99. Springer-Verlag, London, UK, UK, pp. 398–416, 1999.

Pedro, S. D. and Hruschka Jr, E. R. Conversing learning: Active learning and active social interaction for human supervision in never-ending learning systems. In Advances in Artificial Intelligence–IBERAMIA 2012. Springer, pp. 231–240, 2012.

Rai, N. S., Jain, S., and Jain, A. Mining interesting positive and negative association rule based on improved genetic algorithm (mipnar_ga). International Journal of Advanced Computer Science and Applications 5 (1), 2014.

Rameshkumar, K., Sambath, M., and Ravi, S. Relevant association rule mining from medical dataset using new irrelevant rule elimination technique. In Information Communication and Embedded Systems (ICICES), 2013 Int. Conf. on. IEEE, pp. 300–304, 2013.

Sinthuja, M., Puviarasan, N., and Aruna, P. An efficient maximal frequent itemset mining algorithm based on linear prefix tree. In Communication and Computing Systems: Proceedings of the 2nd International Conference on Communication and Computing Systems (ICCCS 2018), December 1-2, 2018, Gurgaon, India. CRC Press, pp. 92, 2019.

Srikant, R. and Agrawal, R. Mining generalized association rules. In Proceedings of the 21th International Conference on Very Large Data Bases. VLDB ’95. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 407–419, 1995.

Suchanek, F. M., Kasneci, G., and Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web. WWW ’07. ACM, New York, NY, USA, pp. 697–706, 2007.

Swesi, I. M. A. O., Bakar, A. A., and Kadir, A. S. A. Mining positive and negative association rules from interesting frequent and infrequent itemsets. In Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on. IEEE, pp. 650–655, 2012.

Wahyudi, W., Khodra, M. L., Prihatmanto, A. S., and Machbub, C. Using graph pattern association rules on yago knowledge base. Journal of ICT Research and Applications 13 (2): 162–175, 2019.

Xiao, W. and Hu, J. Mrclose: A parallel algorithm for closed frequent itemset mining based on mapreduce. In Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology. pp. 7–13, 2019.

Zaki, M. J. Generating non-redundant association rules. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’00. ACM, New York, NY, USA, pp. 34–43, 2000.

Zaki, M. J. and Hsiao, C.-J. Charm: An efficient algorithm for closed itemset mining. In Proceedings of the 2002 SIAM international conference on data mining. SIAM, pp. 457–473, 2002.

Downloads

Published

2021-02-14

How to Cite

B. Cifarelli, B., & G. L. Miani, R. (2021). Weakly Supervised Learning Algorithm to Eliminate Irrelevant Association Rules in Large Knowledge Bases. Journal of Information and Data Management, 11(2). https://doi.org/10.5753/jidm.2020.2025

Issue

Section

Regular Papers