2024
Cite Score
4
AI summary
This paper introduces a methodology to implement, optimize, and evaluate RAG for Brazilian Portuguese using OpenAI's gpt-4, gpt-4-1106-preview, gpt-3.5-turbo-1106, and Google's Gemini Pro, achieving a 35.4% improvement in MRR@10 via retriever optimization and a maximum relative score of 98.61%.
Main Contributions
Abstract
Retrieval Augmented Generation (RAG) has become one of the most popular paradigms for enabling LLMs to access external data, and also as a mechanism for grounding to mitigate against hallucinations. When implementing RAG you can face several challenges like effective integration of retrieval models, efficient representation learning, data diversity, computational efficiency optimization, evaluation, and quality of text generation. Given all these challenges, every day a new technique to improve RAG appears, making it unfeasible to experiment with all combinations for your problem. In this context, this paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language, focusing on the establishment of a simple pipeline for inference and experiments. We explored a diverse set of methods to answer questions about the first Harry Potter book. To generate the answers we used the OpenAI's gpt-4, gpt-4-1106-preview, gpt-3.5-turbo-1106, and Google's Gemini Pro. Focusing on the quality of the retriever, our approach achieved an improvement of MRR@10 by 35.4% compared to the baseline. When optimizing the input size in the application, we observed that it is possible to further enhance it by 2.4%. Finally, we present the complete architecture of the RAG with our recommendations. As result, we moved from a baseline of 57.88% to a maximum relative score of 98.61%.
Citation Graph
References [34]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002
19 papers in library cite
Chin Yew Lin - 2004
9 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018
23 papers in library cite
P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016
37 papers in library cite
P. Lewis, Ethan Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Kuttler, Martha Lewis, W. T. Yih, Tim Rocktaschel, Sebastian Riedel, K. Douwe - 2020
5 papers in library cite
Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019
17 papers in library cite
R. F. Nogueira, Zhejun Jiang, Junyang Lin - 2020
1 paper in library cites
M. Henderson, R. A. Rfou, B. Strope, Y. H. Sung, L. Lukacs, R. Guo, S. Kumar, B. Miklos, R. Kurzweil - 2017
2 papers in library cite
P. Xu, W. Ping, Xiaobao Wu, L. Mcafee, C. Zhu, Ze Liu, S. Subramanian, E. Bakhturina, M. Shoeybi, Bryan Catanzaro - 2023
1 paper in library cites
J. Guo, Yu Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, Chiyu Wu, W. B. Croft, X. Cheng - 2019
1 paper in library cites
S. Chen, S. Wong, L. C. Chen, Yuandong Tian - 2023
1 paper in library cites
Yibo Liu, D. Iter, Yiheng Xu, Shijie Wang, Runxin Xu, C. Zhu - 2023
1 paper in library cites
N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, Michele Bevilacqua, F. Petroni, Percy Liang - 2023
1 paper in library cites
L. H. Bonifacio, I. Campiotti, R. D. A. Lotufo, R. F. Nogueira - 2021
1 paper in library cites
R. F. Nogueira, W. Yang, Kyunghyun Cho, Junyang Lin - 2019
1 paper in library cites
S. Humeau, K. Shuster, M. Lachaux, Jason Weston - 2019
1 paper in library cites
Y. Gao, Yunyang Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Yiwei Dai, Jian Sun, Haiming Wang - 2023
1 paper in library cites
Y. Z. Zhang, Yiwei Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y. Z. Zhang, Yanru Chen, Lisa Wang, Anh Tuan Luu, W. Bi, F. Shi, Sherry Shi - 2023
1 paper in library cites
O. Press, Noah A. Smith, Martha Lewis - 2022
1 paper in library cites
B. Mitra, N. Craswell - 2018
1 paper in library cites
G. Mohandas, P. Moritz - 2023
1 paper in library cites
W. X. Zhao, Joseph Liu, R. Ren, J. R. Wen - 2022
1 paper in library cites
Joseph Liu - 2022
1 paper in library cites
G. Kamradt - 2023
1 paper in library cites
Junyang Lin, R. F. Nogueira, A. Yates - 2020
1 paper in library cites
Junyang Lin, X. Ma, S. C. Lin, J. H. Yang, R. Pradeep, Rodrigo Nogueira - 2021
1 paper in library cites
D. Brown - 2020
1 paper in library cites
G. V. Cormack, C. L. A. Clarke, S. Buttcher - 2009
1 paper in library cites
E. Horvitz - 2023
1 paper in library cites
G. Alperovich - 2023
1 paper in library cites
J. Pereira, R. Fidalgo, R. Lotufo, Rodrigo Nogueira - 2022
1 paper in library cites
G. M. Rosa, R. C. Rodrigues, R. Lotufo, Rodrigo Nogueira - 2021
1 paper in library cites
Cited by
0
papers in your library
Cites
21
papers in your library
Read
on August 17, 2025
Your review
Tags
Paper Aliases
No aliases