Classification based on Associations (CBA) - a performance analysis

EasyChair Preprint 501

9 pages•Date: September 12, 2018

Abstract

Classification Based on Associations (CBA) has for two decades been the algorithm of choice for researchers as well as practitioners owing to simplicity of the produced rules, accuracy of models, and also fast model building. Two versions of CBA differing in speed -- M1 and M2 -- were originally proposed by Liu et al in 1998. While the more complex M2 version was originally designated as on average 50% faster, in this article we present benchmarks performed with multiple CBA implementations on the UCI lymph dataset contesting the M2 supremacy: the results show that M1 had faster processing speeds in most evaluated setups. M2 was recorded to be faster only when the number of input rules was very small and the number of input instances was large. We hypothesize that the better performance of the M1 version can be attributed to recent advances in optimization of vectorized operations and memory structures in SciKit learn and R, which the M1 can better utilize due to better predispositions for vectorization.
This paper is accompanied by a Python implementation of CBA available at https://pypi.org/project/pyARC/.

Keyphrases: CBA, Classification, Classification by Associations, association rule, benchmark

Links:	https://easychair.org/publications/preprint/5d6G
	https://doi.org/10.29007/gjl4

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:501,
  author    = {Jiří Filip and Tomáš Kliegr},
  title     = {Classification based on Associations (CBA) - a performance analysis},
  doi       = {10.29007/gjl4},
  howpublished = {EasyChair Preprint 501},
  year      = {EasyChair, 2018}}

Download PDF Open PDF in browser