Yasir Lutfan bin Yusuf, Suhaila binti Saee
Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology
Číslo: 3/2025
Periodikum: Acta Informatica Pragensia
DOI: 10.18267/j.aip.263
Klíčová slova: Historical documents; Old newspapers; Accessibility; Question answering; Artificial intelligence; Retrieval augmented generation; LangChain
Pro získání musíte mít účet v Citace PRO.
Objective: This study created a new system to generate answers for user questions related to the gazette using chatbot technology.
Methods: This system sends user queries to a context retrieval system, then generates an answer from the retrieved contexts using a Large Language Model. A question answering dataset was also created using a Large Language Model to evaluate this system, with dataset quality assessed by 10 annotators.
Results: The system achieved 55% higher precision, and 42% higher recall compared to previous state-of-the-art historical document question answering while only sacrificing 11% of cosine similarity. The annotators overall rated the dataset 2.9 out of 3.
Conclusion: The system could answer the general public’s questions about the Sarawak Gazette in a more direct and friendly manner compared to traditional information retrieval methods. The methods developed in this study are also applicable to other Malaysian historical texts that are written in English. All code used in this study have been released on GitHub.