Software Skills Identification: A Multi-Class Classification on Source Code Using Machine Learning

Dimitris Bamidis; Ilias Kalouptsoglou; Apostolos Ampatzoglou; Alexandros Chatzigeorgiou

doi:10.31354/globalce.v6iSI6.278

Dimitris Bamidis

University of Macedonia, Thessaloniki, Greece

Ilias Kalouptsoglou

University of Macedonia, Thessaloniki, Greece

Apostolos Ampatzoglou

University of Macedonia, Thessaloniki, Greece

Alexandros Chatzigeorgiou

University of Macedonia, Thessaloniki, Greece

Keywords

Machine Learning, Supervised Learning, Multi-class Classification, Neural Network, Transfer Learning, Source code Analysis

Abstract

In the ever-evolving tech industry, accurately assessing the software skills of developers is critical for effective workforce management. This study presents a machine learning approach to classify software development knowledge through source code analysis, focusing on Java-based technologies. A dataset of several source code files from multiple domains of software development was compiled from public repositories and labeled for classification. The high performance achieved in this study, by applying transfer learning, underlines the suitability of pre-trained CodeBERT models for the classification of software skills.

The methodology combined both non-pretrained neural networks and pretrained models to enhance classification accuracy. Results validate the feasibility of using machine learning to identify developers' programming proficiencies, providing a foundation for sophisticated assessment tools. Future work aims to refine classification by incorporating functional task identification and commit-based analysis for a more comprehensive evaluation of coding skills. This study showcases the transformative potential of machine learning in streamlining developer assessments and advancing software engineering methodologies.

Downloads

Download data is not yet available.

Abstract 82 | PDF Downloads 31

References

1. Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., ... & Zhou, M. (2020). Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.
2. Kourtzanidis, S., Chatzigeorgiou, A., & Ampatzoglou, A. (2020, December). RepoSkillMiner: identifying software expertise from GitHub repositories using natural language processing. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (pp. 1353-1357).
3. Zhang, K., Li, G., & Jin, Z. (2022). What does Transformer learn about source code?. arXiv preprint arXiv:2207.08466.
4. Sharma, T., Efstathiou, V., Louridas, P., & Spinellis, D. (2019). On the feasibility of transfer-learning code smells using deep learning. arXiv preprint arXiv:1904.03031.

PDF

Published

Dec 31, 2024

DOI https://doi.org/10.31354/globalce.v6iSI6.278

How to Cite

Bamidis, D., Kalouptsoglou, I., Ampatzoglou, A., & Chatzigeorgiou, A. (2024). Software Skills Identification: A Multi-Class Classification on Source Code Using Machine Learning . Global Clinical Engineering Journal, 6(SI6), 74–77. https://doi.org/10.31354/globalce.v6iSI6.278

Issue

Vol. 6 No. SI6 (2024): Special Issue 6: Selected Papers - 10th Panhellenic Conference on Biomedical Technology

Section

Conference paper

This work is licensed under a Creative Commons Attribution 4.0 International License.

Author(s) must obtain all parties consent [co-author(s), others if applicable] and submit the acceptance of Copyright Agreement with their paper. Written permission must be obtained by the author for material that has been published in copyrighted material; this includes tables, figures, and quoted text that exceeds 150 words. A copy of all permissions must accompany the manuscript when published in copyrighted material. The author(s) hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. Author(s) must clearly indicate that approval for publication has been received in cases of institutional ownership.

In all submitted material authors retain all copyrights; however, the GlobalCE Journal reserve the right to reprint all or portions of the article and to post all or part of the article online. GlobalCE Journal reserves the right to edit manuscripts as required to publish in the journal. Authors are responsible for obtaining any and all clearances as appropriate. Unless stated otherwise, all articles are open-access and distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) which permits unrestricted use, distribution, and reproduction in any medium, provided the original author(s) and original work (e.g. first published in the Global Clinical Engineering Journal) is properly cited with the original URL and bibliographic citation information. The complete bibliographic information, a link to the original publication on www.Globalce.org well as the copyright and license information must be included. No use, distribution or reproduction is permitted which does not comply with these terms.

David et al. Planning Medical Technology Management in a Hospital	4906
Bolton et al. Temporal artery and non-contact infra-red thermometers: is there sufficient evidence to support their use in secondary care?	4730
Carlos de Souza et al. Compilation About Adverse Events Recorded in FDA/ USA and ANVISA/Brazil Databases Through Models Available in the Literature Concerning Analysis and Prioritization of Actions for Medical Devices	4663
Hossain et al. Evaluation of Performance Outcomes of Medical Equipment Technology Management and Patient Safety: Skilled Clinical Engineer’s Approach	4143
Ssekitoleko et al. The Status of Medical Devices and their Utilization in 9 Tertiary Hospitals and 5 Research Institutions in Uganda	3039
Judd et al. Making a Difference – Global Health Technology Success Stories: Overview of over 400 submissions from 125 Countries	2772

Main Article Content

Keywords

Abstract

Downloads

References

Article Sidebar

Article Details