Analisis Peran Gen AI dalam Penetration Testing: Studi Kasus Mesin VulnHub Menggunakan GPT-4.1 dan Kali Linux

Jason Ho; Dimas Fajar Ramadhan; Alfa Renaldo Aluska

doi:10.62951/bridge.v3i3.502

Authors

Jason Ho Institut Teknologi Sepuluh Nopember
Dimas Fajar Ramadhan Institut Teknologi Sepuluh Nopember
Alfa Renaldo Aluska Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.62951/bridge.v3i3.502

Keywords:

Cybersecurity, GenAI, GPT-4.1, Penetration Testing, VulnHub

Abstract

Along with the increasing threat of cybercrime, which is predicted to cause losses of up to US$10.5 trillion by 2025 , penetration testing (pentest) has become a crucial strategy for identifying security vulnerabilities. However, the manual pentest process is often time-consuming. This research aims to analyze the role, effectiveness, and challenges of Generative AI (GenAI), specifically GPT-4.1, in accelerating and optimizing the penetration testing process. This research method uses a qualitative approach with a case study on the "PumpkinFestival" VulnHub machine , where GPT-4.1 is integrated into the Kali Linux environment through the ShellGPT tool. The results show that GPT-4.1 can significantly accelerate all stages of the pentest, from reconnaissance to exploitation. GenAI proved effective in analyzing scan results, composing specific payloads, and creating decryption scripts quickly and accurately , while also filling a research gap by evaluating a newer AI model compared to previous studies. The implication is that the integration of GenAI in cybersecurity has great potential to increase the efficiency and effectiveness of security teams in facing increasingly complex threats.

Downloads

Download data is not yet available.

References

Abu-Dabaseh, F., & Alshammari, E. (2018). Automated penetration testing: An overview. In The 4th International Conference on Natural Language Computing, Copenhagen, Denmark (pp. 121–129).

Adamović, S. (2019). Penetration testing and vulnerability assessment: Introduction, phases, tools and methods. In Sinteza 2019 - International Scientific Conference on Information Technology and Data Related Research (pp. 229–234).

Cevallos, A., Latorre, L., Alicandro, G., Wanner, Z., Cerrato, I., Zarate, J. D., ... & Rodriguez Breuning, J. (2023). Tech report: Generative AI. https://doi.org/10.18235/0005105

Chaudhary, K., Singh, A., Kumar, A., & Biswas, G. P. (2021). A comprehensive survey on network scanning techniques. Computer Science Review, 40, 100377. https://doi.org/10.1016/j.cosrev.2021.100377

Chen, Y., Wu, J., & Zhang, L. (2022). GAIL-PT: Generative adversarial imitation learning-based intelligent penetration testing. arXiv. https://doi.org/10.48550/arXiv.2204.01975

EC-Council. (2022, March 28). Understanding the five phases of the penetration testing process. EC‑Council Cybersecurity Exchange. Retrieved June 16, 2025, from https://www.eccouncil.org/cybersecurity-exchange/penetration-testing/penetration-testing-phases/

g0tmi1k. (2024, May 6). What is Kali Linux? Kali Linux Documentation. Retrieved June 16, 2025, from https://www.kali.org/docs/introduction/what-is-kali-linux/

g0tmi1k. (n.d.). About ~ VulnHub. VulnHub. Retrieved June 16, 2025, from https://www.vulnhub.com/about

Hilario, E., Azam, S., Sundaram, J., Mohammed, K. I., & Shanmugam, B. (2024). Generative AI for pentesting: The good, the bad, the ugly. International Journal of Information Security, 23, 2075–2097. https://doi.org/10.1007/s10207-024-00835-x

It, M. Y. (2023). Intelligent automated penetration testing using reinforcement learning to improve the efficiency and effectiveness of penetration testing. In A. Santoso et al. (Eds.), Proceedings of the 13th EAI International Conference on Mobile Multimedia Communications (MOBILWARE 2022) (pp. 109–124). Springer. https://doi.org/10.1007/978-3-031-29992-0_10

Manadhata, P. K., & Wing, J. M. (2011). An attack surface metric. IEEE Transactions on Software Engineering, 37(3), 371–386. https://doi.org/10.1109/TSE.2010.60

Mathews, N. S., Brus, Y., Aafer, Y., Nagappan, M., & McIntosh, S. (2024). LLbezpeky: Leveraging large language models for vulnerability detection. arXiv. https://doi.org/10.48550/arXiv.2401.01269

Morgan, S. (2024). Boardroom Cybersecurity Report 2024. SecureWorks. https://www.secureworks.com/centers/boardroom-cybersecurity-report-2024

Nedyalkov, I., & Georgiev, G. (2024, February 10). Kali Linux – A simple and effective way to study the level of cyber security and penetration testing of power electronic devices. International Journal on Information Technologies and Security, 16(2), 103–114. https://doi.org/10.59035/JMFY4876

Odun-Ayo, I., Owoka, E., Okuoyo, O., Ogunsola, O., Ikoh, O., Adeosun, O., Etukudo, D., Robert, V., & Oyeyemi, G. (2022). Evaluating common reconnaissance tools and techniques for information gathering. Journal of Computer Science, 18(2), 103–115. https://doi.org/10.3844/jcssp.2022.103.115

OpenAI. (2025, April 14). Introducing GPT‑4.1 in the API. OpenAI. https://openai.com/index/gpt-4-1/

Radharapu, B., Robinson, K., Aroyo, L., & Kumaraguru, P. (2023). AART: AI-assisted red-teaming with diverse data generation for new LLM-powered applications. arXiv. https://arxiv.org/abs/2311.08592

Railkar, D. N., & Joshi, S. (2023). A comprehensive literature review of artificial intelligent practices in the field of penetration testing. In S. Kolekar & A. Joshi (Eds.), Proceedings of the International Conference on Advances in Signal Processing and Communication (pp. 97–105). Springer. https://doi.org/10.1007/978-981-99-3979-3_8

Savelka, J., & Ashley, K. D. (2023). The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts. arXiv preprint arXiv:2305.11278. https://arxiv.org/abs/2305.11278

TheR1D. (2025, June 16). ShellGPT. GitHub. https://github.com/TheR1D/shell_gpt

UNESCO. (2023). Recommendation on the ethics of artificial intelligence. UNESCO Publishing.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Xu, H. X., Wang, S., & Li, X. (2024). Large language models for cyber security: A systematic literature review. arXiv preprint arXiv:2401.01472. https://arxiv.org/abs/2401.01472

Zhang, Z., Kumar, V., Pfahringer, B., & Bifet, A. (2024). AI-enabled automated common vulnerability scoring from Common Vulnerabilities and Exposures descriptions. International Journal of Information Security, 24(1), 16. https://doi.org/10.1007/s10207-024-00922-z.

Analisis Peran Gen AI dalam Penetration Testing

Studi Kasus Mesin VulnHub Menggunakan GPT-4.1 dan Kali Linux

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Menu New