To improve the performance of proximal support vector machine, a new sparse proximal support vector machine is proposed in this paper where $\ell_0$-norm regularization is used to improve the feature selection ability of the new model. However, problem with $\ell_0$-norm regularization usually is NP-hard. To overcome this difficulty, a continuous nonconvex function is used to approximate $\ell_0$-norm. With proper DC decomposition, we transform the problem into a DC programming problem which can be solved efficiently by DC algorithm. Meanwhile, we also discuss the convergence properties of our algorithm. The experimental results on both simulated and real datasets demonstrate the efficiency of the proposed algorithms.
[1] Burges C J. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery, 1998, 2:1-43.
[2] 唐静静, 田英杰. 基于间隔迁移的多视角支持向量机[J]. 运筹学学报, 2018, 22(3):1-14.
[3] 邵元海, 杨凯丽, 刘明增, 等. 从支持向量机到非平行支持向量机[J]. 运筹学学报, 2018, 22(2):55-65.
[4] Mangasarian O L, Wild E W. Multisurface proximal support vector machine classification via generalized eigenvalues[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28:69-74.
[5] Shao Y H, Deng N Y, Chen W J, et al. Improved generalized eigenvalue proximal support vector machine[J]. IEEE Signal Processing Letters, 2013, 20:213-216.
[6] Guarracino M R, Cifarelli C, Seref O, et al. A classification method based on generalized eigenvalue problems[J]. Optimization Methods and Software, 2007, 22:73-81.
[7] Zhang L, Zhou W. On the sparseness of $\ell_1$-norm of support vector machine[J]. Neural Networks, 2010, 23:373-385.
[8] Neumann J, Schnorr C, Steidl G. Combined SVM-based feature selection and classification[J]. Machine Learning, 2005, 61:129-150
[9] Weston J, Elisseeff A, Scholkopf B. Use of the zero-norm with linear models and methods[J]. Journal of Machine Learning Research, 2003, 3:1439-1461.
[10] Shao Y H, Li C N, Liu M Z, Wang Z, Deng N Y. Sparse $\ell_q$-norm least squares support vector machine with feature selection[J]. Pattern Recognition, 2018, 78:167-181.
[11] Le Thi H A, H Minh Le, Phan D N. Feature selection in machine learning:an exact penalty approach using a Difference of Convex function algorithm[J]. Machine Learning, 2015, 101:163-186.
[12] Gotoh J, Takeda A, Tono K. DC formulations and algorithms for sparse optimization problems[J]. Mathematical Programming, Ser. B, 2018, 169:141-176.
[13] Pappu V, Panagopoulos O P, Xanthopoulos P, et al. Sparse Proximal Support Vector Machines for feature selection in high dimensional datasets[J]. Expert Systems With Applications, 2015, 42:9183-9191.
[14] Le Thi H A, Phan D N. DC programming and DCA for sparse optimal scoring problem[J]. Neurocomputing, 2016, 186:178-181.
[15] Phan D N, Le Thi H A. Convex analysis approach to D.C. programming:theory, algorithms and applications[J]. Acta Mathematical Vietnamic, 1997, 22:289-355.
[14] Bazaraa M S, Sherali H D, Shetty C M. Nonlinear Programming Theory and Algorithms (Third Edition)[M]. New Jersey:John Wiley and Sons, 2006.
[16] Fan R E, Chang K W, Hsieh C J, et al. LIBLINEAR:A library for large linear classification[J]. Journal of Machine Learning Research, 2008, 9:1871-1874.
[17] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286:531-537.
[18] Alon U, Barkai N, Notterman D A, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays[J]. Proceedings of the National Academy of Sciences, 1999, 96:6745-6750.
[19] Schlimmer J. Mushroom Records Drawn from the Audubon Society Field Guide to North American Mushrooms[M]. New York:GH Lincoff (Pres), 1981.
[20] Mesejo P, Pizarro D, Abergel A, et al. Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy[J]. IEEE Transactions on Medical Imaging, 2016, 35:2051-2063.
[21] Sigillito V G, Wing S P, Hutton L V, et al. Classification of radar returns from the ionosphere using neural networks[J]. Johns Hopkins APL Technical Digest, 1989, 10:262-266.
[22] Street W N, Mangasarian O L, Wolberg W H. An inductive learning approach to prognostic prediction[C]//International Conference on Machine Learning, 1995, 522-530.
[23] Street W N, Wolberg W H, Mangasarian O L. Nuclear feature extraction for breast tumor diagnosis[C]//Proceedings of SPIE-The International Society for Optical Engineering, 1993, 861-870.