﻿ 基于邻域粒化条件熵的增量式属性约简算法
 控制与决策  2019, Vol. 34 Issue (10): 2061-2072 0

### 引用本文 [复制中英文]

[复制中文]
ZHAO Xiao-long, YANG Yan. Incremental attribute reduction algorithm based on neighborhood granulation conditional entropy[J]. Control and Decision, 2019, 34(10): 2061-2072. DOI: 10.13195/j.kzyjc.2018.0138.
[复制英文]

### 文章历史

1. 安徽工业经济职业技术学院 计算机与艺术学院，合肥 230051;
2. 西南交通大学 信息科学与技术学院，成都 610031

Incremental attribute reduction algorithm based on neighborhood granulation conditional entropy
ZHAO Xiao-long 1, YANG Yan 2
1. College of Computer and Art, Anhui Technical College of Industry and Economy, Heifei 230051, China;
2. School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031, China
Abstract: Incremental attribute reduction is an important data mining method for dynamic data. The incremental attribute reduction algorithms proposed at present are mostly based on discrete data construction, but the related study for numeric data is few. Therefore, an incremental attribute reduction algorithm for object constantly increasing in numeric information system is presented. Firstly, a hierarchical neighborhood computing method is established in numeric information system, and the incremental computing of neighborhood granulation based on this method is proposed. Then, on the basis of neighborhood granulation incremental computing, the incremental updating method of neighborhood granulation conditional entropy is given, and the corresponding incremental attribute reduction algorithm is proposed on account of this updating mechanism. Finally, experimental analysis shows that the proposed algorithm has higher effectiveness and superiority for the incremental attribute reduction of numerical data.
Keywords: incremental learning    granular computing    attribute reduction    numeric data    neighborhood granulation    conditional entropy
0 引言

1 基本理论

 (1)

 (2)

p通常取1, 2和∞三种形式, p=2是熟悉的欧氏距离.

1) d(x1, x2)≥0, 仅当x1=x2时, d(x1, x2)=0;

2) d(x1, x2)=d(x2, x1);

3) d(x1, x3)≤ d(x1, x2)+d(x2, x3).

 (3)

 (4)

 (5)

Step 1:初始化redc=∅, EU(D|∅)=1.

Step 2:对于∀aiC-redc, 计算属性ai关于redc的属性重要度sredc(ai), 其中

Step 3:对于Step 2中的所有属性ai, 选出属性重要度最大的属性, 并记为a*.

Step 4:对于属性a*, 若sredc(a*)>0, 则redc=redc ∪{a*}, 进入Step 2;若sredc(a*) = 0, 则进入Step 5.

Step 5:返回约简集redc.

2 邻域粒化条件熵的增量式属性约简

2.1 邻域粒化的分层增量式计算

Liu等[20]提出的排序方法提高了邻域粒化的计算效率, 本节将该方法进一步改进, 提出一种高效的邻域粒化方法, 并应用于邻域粒化的增量式计算中.

 (6)

 (7)

2) 对于对象x1l1, 根据式(1)可以得出∀xlt≥ 3均有dB(x1, x)>δ.所以若xl1, 则对象x的邻域粒可计算为nBU(x)={yl1l2|dB(x, y)≤δ}.

3) 对于对象x1lm, 根据式(1)可以得出∀xltm-2均有dB(x1, x)>δ.所以若xlm, 则对象x的邻域粒可计算为nBU(x)={ylm-1lm|dB(x, y) ≤δ}.

 图 1 对象xi的邻域粒计算

 (8)

x0表示原点对象.

 (9)

 (10)

2.2 邻域粒化条件熵的增量式更新

 (11)

 (12)

2.3 增量式属性约简算法

−2n + 1/n + 1 ≤ 2(kn) + 1/n + 1 ≤ 1/n + 1, 2(k-n)+1/n+1的值可能为正、或负、或0, EU'-EU的值也如此.因此可以得出, 当信息系统增加一个对象后, 其邻域粒化条件熵的变化是不可确定的, 可能会增大也可能会减小, 或者可能不变.该结论可直接推广到当信息系统有多个对象加入的情形.

Step 1:初始化EU'(D|C)=1, EU'(D|redc)=1.

Step 2:根据EU(D|C)和EU(D|redc), 按照定理4增量式计算EU'(D|C)和EU'(D|redc).

Step 3:判断EU'(D|C)与EU'(D|redc)之间的大小关系, 如果EU'(D|C) < EU'(D|redc), 则转至Step 4, 如果EU'(D|C)=EU'(D|redc), 则转至Step 7.

Step 4:对于∀aC-redc, 计算每个属性对应的邻域粒化条件熵EU'(D|redc∪{a}).

Step 5:选择满足下列条件的属性, 记为a*:

Step 6:若EU'(D|redc) - EU'(D|redc∪{a*}) > 0, 则redc=redc∪{a*}, 并进入Step 4, 否则进入Step 7.

Step 7: redc'=redc, 返回约简集redc'.

3 实验分析

3.1 与邻域粒化条件熵非增量式属性约简算法比较

 图 2 ARNGCE和IARNGCE属性约简的时间消耗对比

3.2 与同类型增量式属性约简算法比较

 图 3 5种增量式算法属性约简时间消耗比较

4 结论

 [1] Liu G L, Hua Z, Zou J Y. Local attribute reductions for decision tables[J]. Information Sciences, 2018, 422: 204-217. DOI:10.1016/j.ins.2017.09.007 [2] Pawlak Z. Rough sets[J]. Int J of Parallel Programming, 1982, 11(5): 341-356. [3] Gacek A. Granular modelling of signals: A framework of granular computing[J]. Information Sciences, 2013, 221(2): 1-11. [4] 姚晟, 徐风, 赵鹏, 等. 基于邻域量化容差关系粗糙集模型的特征选择算法[J]. 模式识别与人工智能, 2017, 30(5): 416-428. (Yao S, Xu F, Zhao P, et al. Feature selection algorithm based on neighborhood valued tolerance relation rough set model[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 416-428.) [5] 潘瑞林, 李园沁, 张洪亮, 等. 基于信息熵的模糊粗糙属性约简方法[J]. 控制与决策, 2017, 32(2): 340-348. (Pan R L, Li Y Q, Zhang H L, et al. Fuzzy-rough attribute reduction algorithm based on information entropy[J]. Control and Decision, 2017, 32(2): 340-348.) [6] Wang F, Liang J Y, Qian Y H. Attribute reduction: A dimension incremental strategy[J]. Knowledge-Based Systems, 2013, 39(2): 95-108. [7] Chan C C. A rough set approach to attribute generalization in data mining[J]. Information Sciences, 1998, 107(1/2/3/4): 169-176. [8] Jing Y G, Li T R, Huang J F, et al. An incremental attribute reduction approach based on knowledge granularity under the attribute generalization[J]. Int J of Approximate Reasoning, 2016, 76(9): 80-95. [9] Liang J Y, Wang F, Dang C Y, et al. A group incremental approach to feature selection applying rough set technique[J]. IEEE Trans on Knowledge & Data Engineering, 2014, 26(2): 294-308. [10] Jing Y G, Li T R, Hamidoǎ F, et al. An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view[J]. Information Sciences, 2017, 411(7): 23-38. [11] Jing Y G, Li T R, Luo C, et al. An incremental approach for attribute reduction based on knowledge granularity[J]. Knowledge-Based Systems, 2016, 104(7): 24-38. [12] Raza M S, Qamar U. An incremental dependency calculation technique for feature selection using rough sets[J]. Information Sciences, 2016(343/344): 41-65. [13] Chen D G, Yang Y Y, Dong Z. An incremental algorithm for attribute reduction with variable precision rough sets[J]. Applied Soft Computing, 2016, 45: 129-149. DOI:10.1016/j.asoc.2016.04.003 [14] 冯少荣, 张东站. 一种高效的增量式属性约简算法[J]. 控制与决策, 2011, 26(4): 495-500. (Feng S R, Zhang D Z. Effective increment algorithm for attribute reduction[J]. Control and Decision, 2011, 26(4): 495-500.) [15] 钱进, 朱亚炎. 面向成组对象集的增量式属性约简算法[J]. 智能系统学报, 2016, 11(4): 496-502. (Qian J, Zhu Y Y. An incremental attribute reduction algorithm for group objects[J]. CAAI Trans Intelligent Systems, 2016, 11(4): 496-502.) [16] Liu D, Li T R, Zhang J B. A rough set-based incremental approach for learning knowledge in dynamic incomplete information systems[J]. Int J of Approximate Reasoning, 2014, 55(8): 1764-1786. DOI:10.1016/j.ijar.2014.05.009 [17] Shu W H, Shen H. Incremental feature selection based on rough set in dynamic incomplete data[J]. Pattern Recognition, 2014, 47(12): 3890-3906. DOI:10.1016/j.patcog.2014.06.002 [18] Hu Q H, Yu D R, Liu J F, et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information Sciences, 2008, 178(18): 3577-3594. DOI:10.1016/j.ins.2008.05.024 [19] Zhao H, Qin K Y. Mixed feature selection in incomplete decision table[J]. Knowledge-Based Systems, 2014, 57(2): 181-190. [20] Liu Y, Huang W L, Jiang Y L, et al. Quick attribute reduct algorithm for neighborhood rough set model[J]. Information Sciences, 2014, 271(7): 65-81. [21] Liang J Y, Shi Z Z. The information entropy, rough entropy and knowledge granulation in rough set theory[J]. Int J of Uncertainty, Fuzziness and Knowledge-Based Systems, 2008, 12(1): 37-46. [22] Wang C Z, He Q, Shao M W, et al. A unified information measure for general binary relations[J]. Knowledge-Based Systems, 2017, 135(11): 18-28. [23] Zhang J B, Li T R, Ruan D, et al. Neighborhood rough sets for dynamic data mining[J]. Int J of Intelligent Systems, 2012, 27(4): 317-342. DOI:10.1002/int.21523 [24] Hu Q H, Pedrycz W, Yu D R, et al. Selecting discrete and continuous features based on neighborhood decision error minimization[J]. IEEE Trans on Systems, Man, and Cybernetics: Part B, 2010, 40(1): 137-150. DOI:10.1109/TSMCB.2009.2024166