Abstract:To solve the problems of“chain data”and“high-dimension, multi-topic, large-scale text stream”in data
stream clustering, a modified Squeezer clustering algorithm is proposed, which combines the idea of projected clustering
and redefines the class centroid, radius, and judging distance. The preprocessing stage and the projected clustering stage
are introduced to improve the performance significantly and attach the semantic to the clusters for better understanding
respectively. The experiment on the Internet corpus shows that the cluster result is significantly improved at a small cost of
speed decrease and the performance of the proposed algorithm is better than that of Squeezer algorithm.