Investigating Aptitude in Learning Programming Language Using Machine Learning and Natural Language Processing

Muhammad Faisal Iqbal; Adeel Zafar; Umer Khalil; Afia Ishaq

doi:10.59461/ijdiic.v3i4.145

Authors

Muhammad Faisal Iqbal Department of Data Science & Cyber Security, Riphah International University, Islamabad, Pakistan https://orcid.org/0000-0003-3070-7691
Adeel Zafar Department of Data Science & Cyber Security, Riphah International University, Islamabad, Pakistan https://orcid.org/0000-0002-4826-7746
Umer Khalil ITC Faculty of Geo-Information Science & Earth Observation, University of Twente, Enschede, The Netherlands https://orcid.org/0000-0002-1095-3169
Afia Ishaq Department of Data Science & Cyber Security, Riphah International University, Islamabad, Pakistan

DOI:

https://doi.org/10.59461/ijdiic.v3i4.145

Keywords:

Education of Computing, Programming Language, Aptitude Predictors, Machine Learning, Natural Language Processing

Abstract

This study investigates the relationship between prerequisite courses and skill acquisition in programming education. It proposes a case study examining cognitive, natural language, and mathematical aptitude indicators as predictors of programming performance. Analyzing data from 1238 undergraduate students at Riphah International University, the research employs Machine Learning models to predict outcomes, achieving high R2 scores and low Mean Squared Error rates. A zero-shot text classification model identifies required aptitude skills: 62% cognitive, 24% natural language, and 14% mathematical. These skills are mapped to predicted programming course scores, offering a new approach to understanding programming language aptitude. The study aims to bridge the gap between prerequisite courses and subsequent skill development, contributing valuable insights to computing education curriculum design.

Downloads

Download data is not yet available.

References

J. Figueiredo and F. Garcia-Penalvo, “Teaching and Learning Tools for Introductory Programming in University Courses,” SIIE 2021 - 2021 Int. Symp. Comput. Educ., no. September, 2021, doi: 10.1109/SIIE53363.2021.9583623.

L. T. Yong, C. Y. Qi, C. S. Yee, A. Johnson, and N. K. Hoong, “Designing and Developing a PDA Food Ordering System Using Interaction Design Approach: A Case Study,” in 2009 International Conference on Computer Technology and Development, 2009, pp. 68–71. doi: 10.1109/ICCTD.2009.18.

I. Milne and G. Rowe, “Difficulties in learning and teaching programming - Views of students and tutors,” Educ. Inf. Technol., vol. 7, no. 1, pp. 55–66, 2002, doi: 10.1023/A:1015362608943.

M. N. Ismail, N. A. Ngah, and I. N. Umar, “Instructional strategy in the teaching of computer programming: A need assessment analyses,” Turkish Online J. Educ. Technol., vol. 9, no. 2, pp. 125–131, 2010.

M. Kazemitabaar, J. Chow, C. K. T. Ma, B. J. Ericson, D. Weintrop, and T. Grossman, Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming, vol. 1, no. 1. Association for Computing Machinery, 2023. doi: 10.1145/3544548.3580919.

R. Yilmaz and F. G. Karaoglan Yilmaz, “Augmented intelligence in programming learning: Examining student views on the use of ChatGPT for programming learning,” Comput. Hum. Behav. Artif. Humans, vol. 1, no. 2, p. 100005, 2023, doi: 10.1016/j.chbah.2023.100005.

Y. F. Liu, J. Kim, C. Wilson, and M. Bedny, “Computer code comprehension shares neural resources with formal logical inference in the fronto-parietal network,” Elife, vol. 9, pp. 1–22, 2020, doi: 10.7554/eLife.59340.

P. J. Rich, S. L. Mason, and J. O’Leary, “Measuring the effect of continuous professional development on elementary teachers’ self-efficacy to teach coding and computational thinking,” Comput. Educ., vol. 168, no. March, 2021, doi: 10.1016/j.compedu.2021.104196.

B. Helmlinger, M. Sommer, M. Feldhammer-Kahr, G. Wood, M. E. Arendasy, and S. E. Kober, “Programming experience associated with neural efficiency during figural reasoning,” Sci. Rep., vol. 10, no. 1, pp. 1–14, 2020, doi: 10.1038/s41598-020-70360-z.

R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, “Analyzing undergraduate students’ performance using educational data mining,” Comput. Educ., vol. 113, pp. 177–194, 2017, doi: 10.1016/j.compedu.2017.05.007.

J. Köhler, L. Hidalgo, and J. L. Jara, “Predicting Students’ Outcome in an Introductory Programming Course: Leveraging the Student Background,” Appl. Sci., vol. 13, no. 21, 2023, doi: 10.3390/app132111994.

M. Endres, M. Fansher, P. Shah, and W. Weimer, “To read or to rotate? comparing the effects of technical reading training and spatial skills training on novice programming ability,” ESEC/FSE 2021 - Proc. 29th ACM Jt. Meet. Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., pp. 754–766, 2021, doi: 10.1145/3468264.3468583.

M. Endres, Z. Karas, X. Hu, I. Kovelman, and W. Weimer, “Relating reading, visualization, and coding for new programmers: A neuroimaging study,” Proc. - Int. Conf. Softw. Eng., pp. 600–612, 2021, doi: 10.1109/ICSE43902.2021.00062.

A. Zavgorodniaia, A. Hellas, O. Seppälä, and J. Sorva, “Should Explanations of Program Code Use Audio, Text, or Both? A Replication Study,” ACM Int. Conf. Proceeding Ser., vol. 2020, pp. 1–10, 2020, doi: 10.1145/3428029.3428050.

Y. Kao, B. Matlen, and D. Weintrop, “From One Language to the Next: Applications of Analogical Transfer for Programming Education,” ACM Trans. Comput. Educ., vol. 22, no. 4, 2022, doi: 10.1145/3487051.

M. Endres, W. Weimer, and A. Kamil, “An Analysis of Iterative and Recursive Problem Performance,” SIGCSE 2021 - Proc. 52nd ACM Tech. Symp. Comput. Sci. Educ., pp. 321–327, 2021, doi: 10.1145/3408877.3432391.

J. Jeuring, R. Groot, and H. Keuning, “What Skills Do You Need When Developing Software Using ChatGPT? (Discussion Paper),” ACM Int. Conf. Proceeding Ser., pp. 1–11, 2023, doi: 10.1145/3631802.3631807.

S. Rajendran, S. Chamundeswari, and A. A. Sinha, “Predicting the academic performance of middle- and high-school students using machine learning algorithms,” Soc. Sci. Humanit. Open, vol. 6, no. 1, p. 100357, 2022, doi: 10.1016/j.ssaho.2022.100357.

S. Srikant, C. Science, T. Supervisor, L. Kolodziejski, and C. Science, “Understanding Computer Programs : Computational and Cognitive Perspectives by,” no. 2011, 2023.

C. H. Kuo, M. Mottarella, T. Haile, and C. S. Prat, “Predicting Programming Success: How Intermittent Knowledge Assessments, Individual Psychometrics, and Resting-State EEG Predict Python Programming and Debugging Skills,” 2022 30th Int. Conf. Software, Telecommun. Comput. Networks, SoftCOM 2022, 2022, doi: 10.23919/SoftCOM55329.2022.9911411.

E. H. Brain and O. F. C. Programsthe, “Representations of Computer Programs in the Human Brain,” pp. 1–30, 2022.

C. Angeli and M. Giannakos, “Computational thinking education: Issues and challenges,” Comput. Human Behav., vol. 105, p. 106185, Apr. 2020, doi: 10.1016/J.CHB.2019.106185.

S. Kılıç, S. Gökoğlu, and M. Öztürk, “A Valid and Reliable Scale for Developing Programming-Oriented Computational Thinking,” J. Educ. Comput. Res., vol. 59, no. 2, pp. 257–286, 2021, doi: 10.1177/0735633120964402.

R. Scherer, F. Siddiq, and B. Sánchez-Scherer, “Some Evidence on the Cognitive Benefits of Learning to Code,” Front. Psychol., vol. 12, no. September, pp. 1–5, 2021, doi: 10.3389/fpsyg.2021.559424.

C. H. Kuo and C. S. Prat, “Computer programmers show distinct, expertise-dependent brain responses to violations in form and meaning when reading code,” Sci. Rep., vol. 14, no. 1, 2024, doi: 10.1038/s41598-024-56090-6.

E. Fedorenko, A. Ivanova, R. Dhamala, and M. U. Bers, “The Language of Programming: A Cognitive Perspective,” Trends Cogn. Sci., vol. 23, no. 7, pp. 525–528, 2019, doi: 10.1016/j.tics.2019.04.010.

J. Agarwal, G. W. Bucks, K. A. Ossman, T. J. Murphy, and C. E. Sunny, “Learning a Second Language and Learning a Programming Language: An Exploration,” ASEE Annu. Conf. Expo. Conf. Proc., 2021, doi: 10.18260/1-2--37423.

D. H. Smith, Q. Hao, F. Jagodzinski, Y. Liu, and V. Gupta, “Quantifying the Effects of Prior Knowledge in Entry-Level Programming Courses,” in CompEd 2019 - Proceedings of the ACM Conference on Global Computing Education, 2019. doi: 10.1145/3300115.3309503.

G. Barlow-Jones and D. van der Westhuizen, “Problem solving as a predictor of programming performance,” in Communications in Computer and Information Science, 2017. doi: 10.1007/978-3-319-69670-6_14.

D. Cukierman, “Predicting success in university first year computing science courses: The role of student participation in reflective learning activities and in I-clicker activities,” in Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, 2015, pp. 248–253. doi: 10.1145/2729094.2742623.

C. S. Prat, T. M. Madhyastha, M. J. Mottarella, and C. H. Kuo, “Relating Natural Language Aptitude to Individual Differences in Learning Programming Languages,” Sci. Rep., vol. 10, no. 1, pp. 1–10, 2020, doi: 10.1038/s41598-020-60661-8.

B. Shneiderman and R. Mayer, “Syntactic/semantic interactions in programmer behavior: A model and experimental results,” Int. J. Comput. Inf. Sci., vol. 8, no. 3, pp. 219–238, 1979, doi: 10.1007/BF00977789.

V. J. Shute, “Who is Likely to Acquire Programming Skills?,” J. Educ. Comput. Res., vol. 7, no. 1, pp. 1–24, 1991, doi: 10.2190/vqjd-t1yd-5wvb-rypj.

Y. Ao, H. Li, L. Zhu, S. Ali, and Z. Yang, “The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling,” 2019. doi: 10.1016/j.petrol.2018.11.067.

V. Rodriguez-Galiano, M. Sanchez-Castillo, M. Chica-Olmo, and M. Chica-Rivas, “Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines,” Ore Geol. Rev., vol. 71, pp. 804–818, Dec. 2015, doi: 10.1016/J.OREGEOREV.2015.01.001.

L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324/METRICS.

J. Peters et al., “Random forests as a tool for ecohydrological distribution modelling,” Ecol. Modell., vol. 207, no. 2–4, pp. 304–318, 2007, doi: 10.1016/j.ecolmodel.2007.05.011.

C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning, vol. 4, no. 4. Springer, 2006.

A. Keprate and R. M. C. Ratnayake, “Using gradient boosting regressor to predict stress intensity factor of a crack propagating in small bore piping,” IEEE Int. Conf. Ind. Eng. Eng. Manag., vol. 2017-Decem, no. December, pp. 1331–1336, 2017, doi: 10.1109/IEEM.2017.8290109.

J. Brownlee, “A gentle introduction to the gradient boosting algorithm for machine learning,” Mach. Learn. Mastery, vol. 21, 2016.

N. S. Zheng, X. W. Jiang, Y. Ao, and X. Zhao, “Prediction of tariff package model using ROF-LGB algorithm,” ACM Int. Conf. Proceeding Ser., pp. 54–58, 2019, doi: 10.1145/3352411.3352421.

L. Zhang, T. Xiang, and S. Gong, “Learning a deep embedding model for zero-shot learning,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 3010–3019, 2017, doi: 10.1109/CVPR.2017.321.

C. C. Aggarwal, “An Introduction to Outlier Analysis,” Outlier Anal., pp. 1–34, 2017, doi: 10.1007/978-3-319-47578-3_1.

Investigating Aptitude in Learning Programming Language Using Machine Learning and Natural Language Processing

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

journal image

Make Submission

publication information

Information

Keywords

visitors

Current Issue

Investigating Aptitude in Learning Programming Language Using Machine Learning and Natural Language Processing

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

journal image

social-icon

Make Submission

publication information

Information

Keywords

visitors

Current Issue