Abstract:Aiming at the problem that the accuracy and robustness of the traditional sampling methods are not obvious, under-sampling is easy to lose important sample information, and oversampling is easy to introduce redundant information, the Pima-Indians dataset in the UCI common unbalanced datasets is taken as an example to consider the relationship between the distance within classes, the distance within classes and the imbalance, therefore, a new type oversampling method based on sample characteristics is presented. Firstly, the algorithm divides the original data set into some distance belts. Then an improved adaptive neighborhood neighborhood(Smote) algorithm based on sample characteristics is proposed to synthesize new samples in each class with several samples, and is extended to other five unbalanced data sets of UCI dataset. Finally, experiments are conducted using the traditional SVM classifier, and the results show that, in the six categories of unbalanced data sets, compared with the existing sampling method, the proposed algorithm improves the classification accuracy of the minority or majority class samples, and has stronger robustness.