Các bài báo công bố quốc tế

Dang Nguyen, Loan T.T. Nguyen, Bay Vo, Witold Pedrycz; Efficient mining of class association rules with the itemset constraint; Knowledge-Based Systems, In Press.

Abstract

Mining class association rules (CARs) with the itemset constraint is concerned with the discovery of rules, which contain a set of specific items in the rule antecedent and a class label in the rule consequent. This task is commonly encountered in mining medical data. For example, when classifying which section of the population is at high risk for the HIV infection, epidemiologists often concentrate on rules which include demographic information such as gender, age, and marital status in the rule antecedent, and HIV-Positive in the rule consequent. There are two naive strategies to solve this problem, namely pre-processing and post-processing. The post-processing methods have to generate and consider a huge number of candidate CARs while the performance of the pre-processing methods depend on the number of records filtered out. Therefore, such approaches are time consuming. This study proposes an efficient method for mining CARs with the itemset constraint based on a lattice structure and the difference between two sets of object identifiers (diffset). Firstly, a lattice structure is built to store all frequent itemsets in the dataset. To reduce memory usage, instead of the entire set of object identifiers, the diffset is used. Secondly, the lattice is traversed to generate only rules which satisfy the itemset constraint. The experimental results show that the proposed algorithm outperforms existing methods in terms of both the mining time and memory usage.

Keywords
  • Associative classification;
  • Class association rule;
  • Data mining;
  • Useful rules