基于项集归减的高维频繁高效用项集挖掘多目标优化方法
作者:
作者单位:

安徽大学

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


An itemset reduction based multi-objective evolutionary algorithm for mining high-dimension frequent and high utility itemsets
Author:
Affiliation:

Anhui University

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    频繁高效用项集挖掘是数据挖掘的一项重要任务,目的是挖掘出一组频繁且高效的项集,挖掘到的项 集由支持度和效用这两个指标分别来衡量是否频繁和高效。在一系列用于解决这类问题的方法中,进化多目标 方法取得了良好的效果,它能够提供一组高质量解来满足不同用户的需求,以及避免传统算法中支持度和效用 的阈值难以确定这一问题。但已有多目标算法多采用0-1编码,这使得决策空间的维度和数据集中项数成正比, 因此,面对高维数据集会出现维度灾难问题。为了解决这个问题,本文设计了一种项集归减策略,通过在进化 过程中不断对不重要项进行归减来减小搜索空间,从而解决维度灾难问题。根据此策略,文章进而提出了一种 基于项集归减的高维频繁高效用项集挖掘多目标优化算法(IR-MOEA),并针对可能存在的归减过度或未归减到 位的个体提出了基于学习的种群修复策略用来调整进化方向。此外文中还提出了一种基于项集适应度的初始化 策略,使得算法在进化初期生成利于后期进化的稀疏解。多个真实和人工数据集上的实验结果表明,此算法优 于现有的多目标优化算法,特别是在高维数据集上。

    Abstract:

    Frequent and high utility itemset mining is an important task in data mining, which aims to mine a set of frequent and high utility itemsets, and the mined itemsets are measured by two metrics, support and utility, to determine whether they are frequent and high utility, respectively. Among a series of methods used to solve such problems, evolutionary multi-objective methods have achieved good results, providing a set of high-quality solutions to meet the needs of different users, as well as avoiding the problem of difficulty in determining the threshold values of support and utility in traditional algorithms. The existing multi-objective algorithms are encoded with 0-1 and the dimensionality of the decision space is proportional to the number of items in the dataset. Therefore, the curse of dimensionality problem can occur in high-dimensional datasets. In order to solve this problem, this paper designs an itemset reduction strategy to reduce the search space by reducing the unimportant items to solve the dimensional catastrophe problem. According to this strategy, the article goes on to propose a high-dimension frequent and high utility multi-objective optimization algorithm IR-MOEA for itemset mining based on itemset reduction, where a learning-based population restoration strategy is proposed to adjust the evolutionary direction for over-reduced or under-reduced individuals. In addition, an initialization strategy is proposed to generate sparse solutions that facilitate evolution at the early stage of evolution. Finally, experimental results on multiple real and artifical big datasets show that this algorithm outperforms the existing state-of-the-art multi-objective optimization algorithms for mining frequent and high utility itemsets, especially on high-dimensional datasets.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-11-04
  • 最后修改日期:2022-11-12
  • 录用日期:2022-05-17
  • 在线发布日期: 2022-06-13
  • 出版日期: