基于改进邻域空间的高维混合数据特征选择算法
作者:
作者单位:

1.南京邮电大学;2.南京财经大学

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金项目(面上项目,重点项目)


Improved Neighborhood Space Based Feature Selection Algorithm for High-dimensional Mixed Data
Author:
Affiliation:

1.Nanjing University of Posts and Telecommunications;2.Nanjing University of Finance and Economics

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    作为数据挖掘领域中一项重要的数据预处理技术,特征选择算法能够有效应对高维数据所带来的“维数灾难”问题。然而,如何对高维的混合数据进行特征选取仍然是当前研究的重点与难点之一。基于邻域关系的邻域粗糙集模型因其能够处理名词型属性和数值型属性并存的混合数据,已被成功运用于混合数据的特征选择。但现有邻域粗糙集对混合数据邻域关系的度量,仍然是基于等价关系的名词型数据划分和基于相似关系的数值型数据划分的简单融合,在利用模型划分的邻域空间和预定义的评价函数对高维混合数据进行特征选取时,适应性较差。为此,在邻域粗糙集模型的基础上,提出了一种改进的邻域空间构造方法,并设计了相应的邻域空间度量公式作为判别指标,自适应地调节邻域空间下邻域粒的大小;为了准确地表征高维混合数据邻域空间的判别能力,设计了一种考虑边界数据和邻域空间大小的评价函数;在此基础上提出了一种启发式的高维混合数据特征选择算法。通过UCI标准数据集验证了算法的有效性。

    Abstract:

    As an important data preprocessing technology in the filed of data minming, feature selection algorithm can effectively deal with the “curse of dimensionality” caused by high-dimensional data. Nonetheless, how to perform feature selection on high-dimensional mixed data is still one of the focuses and difficulities of current research. Because of competently dealing with mixed data of categorical attributes and numerical attributes coexisting, neighborhood rough set model has been widely used in feature selection of mixed data in recent years. However, existing measurement of the neighborhood relationship for mixed data still adopts the simple fusion of categorical data partition based on equivalence relationship and numerical data partition based on similarity relationship. When the features of high-dimensional mixed data are selected by partitioned neighborhood space and predefined evaluation function, the adaptability is poor. To this end, an improved construction method of neighborhood space is proposed on the basis of neighborhood rough set model; Considering boundary overlapping data and the size of neighborhood space, an evaluation function is designed to characterize the discrimination ability of neighborhood space; On this basis, a heuristic feature selection algorithm considering high-dimensional mixed data is proposed. The validity and superiority of proposed algorithm is verified by the UCI standard dataset.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-05-08
  • 最后修改日期:2023-03-24
  • 录用日期:2022-11-09
  • 在线发布日期: 2022-11-25
  • 出版日期: