Modeling and Benchmarking GraphRAG for Indonesian Legal Question Answering

Dea Nabila; Arbi Haza Nasution; Yohei Murakami; Stefan Koos; Ahmet Emre Ergun

Modeling and Benchmarking GraphRAG for Indonesian Legal Question Answering

Authors

Dea Nabila

Islamic University of Riau

https://orcid.org/0009-0006-7124-3351

Author
Arbi Haza Nasution

Islamic University of Riau

https://orcid.org/0000-0001-6283-3217

Author
Yohei Murakami

Ritsumeikan University

Author
Stefan Koos

University of the Bundeswehr Munich

Author
Ahmet Emre Ergun

Izmir Kâtip Çelebi University

https://orcid.org/0000-0002-3025-5640

Author

Keywords:

Llama4-Maverick, GPT-4o, Large Language Models, Graph Retrieval-Augmented Generation, Question Answering, Indonesian Civil Law

Abstract

This study explores the integration of Graph Retrieval-Augmented Generation (GraphRAG) with legal question answering in the context of Indonesian civil law (KUH Perdata). Unlike traditional RAG systems, GraphRAG leverages graph-structured knowledge representations in Neo4j to capture the hierarchical and relational nature of legal texts, enabling more precise and contextually faithful responses. Using 2,128 legal articles as the source corpus, the Indonesian Legal GraphRAG model supports structured retrieval across books, chapters, sections, and articles of the Civil Code. Several large language models (LLMs) of varying scales—very large, large, and mid-sized—were benchmarked using RAGAs metrics for faithfulness, answer relevancy, and context entity recall. Results show that Llama4-Maverick demonstrates higher performance than GPT-4o in specific metrics such as faithfulness and contextual grounding. These findings highlight the effectiveness of graph-based retrieval modeling for enhancing factual consistency and contextual relevance in legal QA and provide a new resource and benchmark for the Indonesian legal domain.

Downloads

Download data is not yet available.

References

Gu, J.; Jiang, X.; Shi, Z.; Tan, H.; Zhai, X.; Xu, C.; Li, W.; Shen, Y.; Ma, S.; Liu, H.; et al. A survey on LLM-as-a-judge. arXiv preprint arXiv:2411.15594 2024.

Nasution, A.H.; Onan, A. ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-Resource Language NLP Tasks. IEEE Access 2024, 12, 71876–71900. https://doi.org/10.1109/ACCESS.2024.3402809.

Hidayat, F.; Nasution, A.H.; Ambia, F.; Putra, D.F.; Mulyandri. Leveraging Large Language Models for Discrepancy Value Prediction in Custody Transfer Systems: A Comparative Analysis of Probabilistic and Point Forecasting Approaches. IEEE Access 2025, 13, 65643–65658. https://doi.org/10.1109/ACCESS.2025.3560254.

Nasution, A.H.; Monika, W.; Onan, A.; Murakami, Y. Benchmarking 21 Open-Source Large Language Models for Phishing Link Detection with Prompt Engineering. Information 2025, 16. https://doi.org/10.3390/info16050366.

Nasution, A.H.; Onan, A.; Murakami, Y.; Monika, W.; Hanafiah, A. Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets. IEEE Access 2025, 13, 94009–94025. https://doi.org/10.1109/ACCESS.2025.3574629.

Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435 2023.

Ratnaningsih, I.D.A.S.; Dewi, C.I.D.L. Sahnya Suatu Perjanjian Berdasarkan Kitab Undang-Undang Hukum Perdata. Jurnal Risalah Kenotariatan, 5(1), 11–18, 2024. https://doi.org/10.29303/risalahkenotariatan.v5i1.204.

Faisal, D.; Darari, F.; Ryanda, R. Granularity-aware legal question answering: a case study of Indonesian government regulations. International Journal of Advances in Intelligent Informatics 2024, 10, 359–378. https://doi.org/10.26555/ijain.v10i3.1105.

Redelaar, F.; Van Drie, R.; Verberne, S.; De Boer, M. Attributed Question Answering for Preconditions in the Dutch Law. In Proceedings of the Natural Legal Language Processing Workshop 2024; pp. 154–165. https://doi.org/10.18653/v1/2024.nllp-1.12.

Sansone, C.; Sperlí, G. Legal Information Retrieval systems: State-of-the-art and open issues. Information Systems 2022, 106, 101967. https://doi.org/10.1016/j.is.2021.101967.

Wiggers, G. The relevance of impact: bibliometric-enhanced legal information retrieval. 2023.

Amazou, Y.; Tayalati, F.; Mensouri, H.; Azmani, A.; Azmani, M. Accurate AI Assistance in Contract Law Using Retrieval-Augmented Generation to Advance Legal Technology. International Journal of Advanced Computer Science & Applications 2025, 16.

Ni, B.; Liu, Z.; Wang, L.; Lei, Y.; Zhao, Y.; Cheng, X.; Zeng, Q.; Dong, L.; Xia, Y.; Kenthapadi, K.; et al. Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey. arXiv preprint arXiv:2502.06872 2025.

Ke, Y.H.; Jin, L.; Elangovan, K.; Abdullah, H.R.; Liu, N.; Sia, A.T.H.; Soh, C.R.; Tung, J.Y.M.; Ong, J.C.L.; Kuo, C.F.; et al. Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. npj Digital Medicine 2025, 8, 187.

Barron, R.C.; Eren, M.E.; Serafimova, O.M.; Matuszek, C.; Alexandrov, B.S. Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization. arXiv preprint arXiv:2502.20364 2025.

Pipitone, N.; Alami, G.H. Legalbench-rag: A benchmark for retrieval-augmented generation in the legal domain. arXiv preprint arXiv:2408.10343 2024.

Peng, B.; Zhu, Y.; Liu, Y.; Bo, X.; Shi, H.; Hong, C.; Zhang, Y.; Tang, S. Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921 2024.

Bruckhaus, T. Rag does not work for enterprises. arXiv preprint arXiv:2406.04369 2024.

Procko, T.T.; Ochoa, O. Graph retrieval-augmented generation for large language models: A survey. In Proceedings of the 2024 Conference on AI, Science, Engineering, and Technology (AIxSET). IEEE, 2024, pp. 166–169.

Bahr, L.; Wehner, C.; Wewerka, J.; Bittencourt, J.; Schmid, U.; Daub, R. Knowledge graph enhanced retrieval-augmented generation for failure mode and effects analysis. Journal of Industrial Information Integration 2025, 45, 100807. https://doi.org/10.1016/j.jii.2025.100807.

Abu-Salih, B. Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications 2021, 185, 103076. https://doi.org/10.1016/j.jnca.2021.103076.

Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Metropolitansky, D.; Ness, R.O.; Larson, J. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130 2024.

Hu, Y.; Lei, Z.; Zhang, Z.; Pan, B.; Ling, C.; Zhao, L. Grag: Graph retrieval-augmented generation. arXiv preprint arXiv:2405.16506 2024.

Zhang, Q.; Chen, S.; Bei, Y.; Yuan, Z.; Zhou, H.; Hong, Z.; Dong, J.; Chen, H.; Chang, Y.; Huang, X. A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv preprint arXiv:2501.13958 2025.

Dong, Y.; Wang, S.; Zheng, H.; Chen, J.; Zhang, Z.; Wang, C. Advanced RAG Models with Graph Structures: Optimizing Complex Knowledge Reasoning and Text Generation. In Proceedings of the 2024 5th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC). IEEE, 2024, pp. 626–630.

Masoudifard, A.; Sorond, M.M.; Madadi, M.; Sabokrou, M.; Habibi, E. Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks. arXiv preprint arXiv:2412.08593 2024.

Shahriar, S.; Lund, B.D.; Mannuru, N.R.; Arshad, M.A.; Hayawi, K.; Bevara, R.V.K.; Mannuru, A.; Batool, L. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency. Applied Sciences 2024, 14. https://doi.org/10.3390/app14177782.

Islam, R.; Moushi, O.M. GPT-4o: The cutting-edge advancement in multimodal LLM. Authorea Preprints 2024.

Khalila, Z.; Nasution, A.H.; Monika, W.; Onan, A.; Murakami, Y.; Radi, Y.B.I.; Osmani, N.M. Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models. International Journal of Advanced Computer Science and Applications 2025, 16. https://doi.org/10.14569/IJACSA.2025.01602134.

Es, S.; James, J.; Anke, L.E.; Schockaert, S. Ragas: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2024, pp. 150–158.

Cover Image

Relationship Types in the Indonesian Legal Knowledge Graph

Downloads

PDF

Published

2026-03-27

Issue

Vol. 1 No. 1 (2026): Volume 1, Issue 1

Section

Articles

How to Cite

[1]

D. Nabila, A. H. Nasution, Y. Murakami, S. Koos, and A. E. Ergun, “Modeling and Benchmarking GraphRAG for Indonesian Legal Question Answering”, Artif. Intell. Lang. Models, vol. 1, no. 1, pp. 1–12, Mar. 2026, Accessed: Jun. 12, 2026. [Online]. Available: https://acspub.id/index.php/ailm/article/view/1

Download Citation

Modeling and Benchmarking GraphRAG for Indonesian Legal Question Answering

How to Cite

Most read articles by the same author(s)

Similar Articles

Similar Articles

LLM-as-a-Judge in Evaluation-Centric AI: Trends and Challenges

Leveraging Large Language Models for Indonesian Retail Sales Probabilistic Forecasting

AI-Generated Image Detection Using Convolutional Neural Network (CNN) Algorithm