Perbandingan Klasifikasi Penyakit Kanker Paru-paru menggunakan Support Vector Machine dan K-Nearest Neighbor
Keywords:K-Fold Cross Validation, K-Nearest Neighbor, Lung Cancer, Percentage Split, Support Vector Machine
Lung cancer is a condition where cells grow uncontrollably in the lungs due to carcinogens. Lung cancer is the first cause of death in men and women’s second cause of death. One way to reduce the death rate due to lung cancer is to carry out early detection, that is classification. The process of identifying and grouping objects with the same characteristics or characteristics into several predetermined classes is called classification. Several algorithms widely used in the classification process are Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). SVM has advantages, being able to identify hyperplanes separately to maximize the margin between two or more different classes, but it is difficult to use in large data, while KNN can perform large-scale data separation and is resilient to noise in the data. This study aims to build a model using the SVM and KNN algorithms to classify lung cancer. The lung cancer dataset has a total of 309 data, where data is divided using the percentage split method and k-fold cross validation on each algorithm used. The parameters used in evaluating the model are accuracy, precision, and recall. From the research, the highest accuracy, precision, and recall values were obtained in the SVM algorithm with the percentage split method with consecutive values, namely 95.16%, 88%, and 82.5%. This indicates that the SVM algorithm with the percentage split method performs better in classifying lung cancer than other algorithms and methods,
N. M. Aljamali, W. K. N. Al-Qraawy, and T. A. Helal, “Review on Carcinogens Materials in Chemical Laboratories,” Int. J. Mol. Biol. Biochem., vol. 4, no. 1, pp. 17–25, 2022.
J. A. Barta, C. A. Powell, and J. P. Wisnivesky, “Global Epidemiology of Lung Cancer,” Ann. Glob. Heal., vol. 85, no. 1, p. 8, Jan. 2019, doi: 10.5334/aogh.2419.
A. Desiani, Erwin, B. Suprihatin, S. Yahdin, A. I. Putri, and F. R. Husein, “Bi-Path Architecture of CNN Segmentation and Classification Method for Cervical Cancer Disorders Based on Pap-smear Images,” Int. J. Comput. Sci., vol. 48, no. 3, 2021.
Ş. Yaşar, A. K. Arslan, C. Çolak, and S. Yoloğlu, “A Developed Web Based Software Can Easily Fulfill the Assumptions of Correlation, Classification and Regression Tasks in Data Processing,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), 2019, pp. 1–5. doi: 10.1109/IDAP.2019.8875914.
M. Onel, C. A. Kieslich, Y. A. Guzman, C. A. Floudas, and E. N. Pistikopoulos, “Big Data Approach to Batch Process Monitoring: Simultaneous Fault Detection and Diagnosis using Nonlinear Support Vector Machine based Feature Selection,” Comput. Chem. Eng., vol. 115, pp. 46–63, 2018, doi: https://doi.org/10.1016/j.compchemeng.2018.03.025.
R. I. Borman, F. Rossi, Y. Jusman, A. A. A. Rahni, S. D. Putra, and A. Herdiansah, “Identification of Herbal Leaf Types Based on Their Image Using First Order Feature Extraction and Multiclass SVM Algorithm,” in 2021 1st International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), 2021, pp. 12–17. doi: 10.1109/ICE3IS54102.2021.9649677.
S. R. A. Ahmed, I. Al-Barazanchi, A. Mhana, and H. R. Abdulshaheed, “Lung Cancer Classification using Data Mining and Supervised Learning Algorithms on Multi-Dimensional Data Set,” Period. Eng. Nat. Sci., vol. 7, no. 2, pp. 438–447, 2019, doi: 10.21533/pen.v7i2.483.
B. K. Francis and S. S. Babu, “Predicting Academic Performance of Students Using a Hybrid Data Mining Approach,” J. Med. Syst., vol. 43, no. 6, 2019, doi: 10.1007/s10916-019-1295-4.
F. G. Woldemichael and S. Menaria, “Prediction of Diabetes Using Data Mining Techniques,” in International Conference on Trends in Electronics and Informatics (ICOEI), 2018, pp. 414–418. doi: 10.1109/ICOEI.2018.8553959.
Y. R. Nugraha, A. P. Wibawa, and I. A. E. Zaeni, “Particle Swarm Optimization-Support Vector Machine (PSO-SVM) Algorithm for Journal Rank Classification,” in 2019 2nd International Conference of Computer and Informatics Engineering (IC2IE), 2019, pp. 69–73. doi: 10.1109/IC2IE47452.2019.8940822.
S. Widaningsih and S. Yusuf, “Penerapan Data Mining untuk Memprediksi Siswa Berprestasi dengan Menggunakan Algoritma K Nearest Neighbor,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 3, pp. 2598–2611, 2022, doi: 10.35957/jatisi.v9i3.859.
Y. Wang, Z. Pan, and Y. Pan, “A Training Data Set Cleaning Method by Classification Ability Ranking for the K-Nearest Neighbor Classifier,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 5, pp. 1544–1556, 2020, doi: 10.1109/TNNLS.2019.2920864.
S. Bharati, P. Podder, R. Mondal, A. Mahmood, and M. Raihan-Al-Masud, “Comparative Performance Analysis of Different Classification Algorithm for the Purpose of Prediction of Lung Cancer,” in International Conference on Intelligent Systems Design and Applications, 2020, vol. 941, pp. 447–457. doi: 10.1007/978-3-030-16660-1_44.
R. Devika, S. V. Avilala, and V. Subramaniyaswamy, “Comparative Study of Classifier for Chronic Kidney Disease Prediction using Naive Bayes, KNN and Random Forest,” in International Conference on Computing Methodologies and Communication (ICCMC), 2019, pp. 679–684. doi: 10.1109/ICCMC.2019.8819654.
K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 2019, pp. 1255–1260. doi: 10.1109/ICCS45141.2019.9065747.
S. A. Taher, K. A. Akhter, and K. M. A. Hasan, “N-Gram Based Sentiment Mining for Bangla Text Using Support Vector Machine,” in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1–5. doi: 10.1109/ICBSLP.2018.8554716.
S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, S. Narandes, Y. Wang, and X. U. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics and Proteomics, vol. 15, no. 1, pp. 41–51, 2018, doi: 10.21873/cgp.20063.
S. Ghosh, A. Dasgupta, and A. Swetapadma, “A Study on Support Vector Machine based Linear and Non-Linear Pattern Classification,” in 2019 International Conference on Intelligent Sustainable Systems (ICISS), 2019, pp. 24–28. doi: 10.1109/ISS1.2019.8908018.
L. Yahaya, N. D. Oye, and E. J. Garba, “A Comprehensive Review on Heart Disease Prediction Using Data Mining and Machine Learning Techniques,” Am. J. Artif. Intell., vol. 4, no. 1, pp. 20–29, 2020, doi: 10.11648/j.ajai.20200401.12.
D. M. Abdullah, A. M. Abdulazeez, and A. B. Sallow, “Lung cancer Prediction and Classification based on Correlation Selection method Using Machine Learning Techniques,” Qubahan Acad. J. , vol. 1, no. 2, pp. 141–149, 2021, doi: 10.48161/Issn.2709-8206.
M. I. Faisal, S. Bashir, Z. S. Khan, and F. Hassan Khan, “An Evaluation of Machine Learning Classifiers and Ensembles for Early Stage Prediction of Lung Cancer,” 2018 3rd Int. Conf. Emerg. Trends Eng. Sci. Technol. ICEEST 2018, pp. 1–4, 2019, doi: 10.1109/ICEEST.2018.8643311.
A. Goel and S. K. Srivastava, “Role of kernel parameters in performance evaluation of SVM,” Proc. - 2016 2nd Int. Conf. Comput. Intell. Commun. Technol. CICT 2016, pp. 166–169, 2016, doi: 10.1109/CICT.2016.40.
E. Sathiyapriya and S. Venila, “A Study on Classification Algorithms and Performance Analysis of Data Mining using Cancer Data to Predict Lung Cancer Disease,” Int. J. New Technol. Res., vol. 3, no. 8, pp. 88–93, 2017.
C. Thallam, A. Peruboyina, S. S. T. Raju, and N. Sampath, “Early Stage Lung Cancer Prediction Using Various Machine Learning Techniques,” Proc. 4th Int. Conf. Electron. Commun. Aerosp. Technol. ICECA 2020, pp. 1285–1292, 2020, doi: 10.1109/ICECA49313.2020.9297576.
H. F. Kareem, M. S. AL-Husieny, F. Y. Mohsen, E. A. Khalil, and Z. S. Hassan, “Evaluation of SVM performance in the detection of lung cancer in marked CT scan dataset,” Indones. J. Electr. Eng. Comput. Sci., vol. 21, no. 3, pp. 1731–1738, 2021, doi: 10.11591/ijeecs.v21.i3.pp1731-1738.
R. R. A. Siregar, Z. U. Siregar, and R. Arianto, “Klasifikasi Sentiment Analysis Pada Komentar Peserta Diklat Menggunakan Metode K-Nearest Neighbor,” Kilat, vol. 8, no. 1, pp. 81–92, 2019, doi: 10.33322/kilat.v8i1.421.
J. Riany, M. Fajar, and M. P. Lukman, “Penerapan Deep Sentiment Analysis pada Angket Penilaian Terbuka Menggunakan K-Nearest Neighbor,” Sisfo, vol. 6, no. 1, pp. 147–156, 2016, doi: 10.24089/j.sisfo.2016.09.011.
N. Maleki, Y. Zeinali, and S. T. A. Niaki, “A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection,” Expert Syst. Appl., vol. 164, no. July 2019, p. 113981, 2021, doi: 10.1016/j.eswa.2020.113981.
R. Patra, Prediction of lung cancer using machine learning classifier, vol. 1235 CCIS. Springer Singapore, 2020. doi: 10.1007/978-981-15-6648-6_11.
F. Adams, R. A. D. Anggoro, M. B. Satria, A. W. Oktavia, and N. Chamidah, “Perbandingan Normalisasi Data untuk Klasifikasi Wine Menggunakan Algoritma Naïve Bayes, Decision Tree, dan Support Vector Machine,” in Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), 2021, pp. 260–268.
R. A. Wijayanti, M. T. Furqon, and S. Adinugroho, “Penerapan Algoritma Support Vector Machine Terhadap Klasifikasi Tingkat Risiko Pasien Gagal Ginjal,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 10, pp. 3500–3507, 2018.
A. Septiarini, R. Saputra, A. Tejawati, and M. Wati, “Deteksi Sarung Samarinda Menggunakan Metode Naive Bayes Berbasis Pengolahan Citra,” J. Rekayasa Sist. dan Teknol. Inf., vol. 5, no. 5, pp. 927–935, 2021.
S. A. Naufal, Adiwijaya, and W. Astuti, “Analisis Perbandingan Klasifikasi Support Vector Machine (SVM) dan K-Nearest Neighbors (KNN) untuk Deteksi Kanker dengan Data Microarray,” JURIKOM (Jurnal Ris. Komputer), vol. 7, no. 1, pp. 162–168, 2020, doi: 10.30865/jurikom.v7i1.2014.
D. Cahyanti, A. Rahmayani, and S. A. Husniar, “Analisis Performa Metode KNN pada Dataset Pasien Pengidap Kanker Payudara,” Indones. J. Data Sci., vol. 1, no. 2, pp. 39–43, 2020, doi: 10.33096/ijodas.v1i2.13.
K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,” Augment. Hum. Res., vol. 5, no. 12, p. 12, 2020, doi: 10.1007/s41133-020-00032-0.
R. Novendri, A. S. Callista, D. N. Pratama, and C. E. Puspita, “Sentiment Analysis of YouTube Movie Trailer Comments Using Naïve Bayes,” Bull. Comput. Sci. Electr. Eng., vol. 1, no. 1, pp. 26–32, 2020, doi: 10.25008/bcsee.v1i1.5.