Abstract:An under-sampling unbalanced dataset support vector machine(SVM) algorithm based on spectrum cluster is
presented. Majority instances are clustered by using spectrum cluster in kernel space for resampling reprentative samples
with cluster information. The number of selected samples in each cluster is dependent on the size of each cluster and the
distance of the cluster to the all minority instances, which can not only reduce the number of majority instances, but also the
SVM classification performance under unbalanced dataset is improved by using the proposed method. In the experiments, the
proposed approach is compared with other data-preprocess methods for unbalanced dataset classification. The experimental
results show that the proposed method can not only improve classification performance of SVM algorithm in the minority
class data, but also increase the overall classification performance and effectivity.