Building Information Retrieval (IR) Systems for Indic Languages using InstructLab

Guest Lecture

MVJ College of Engineering, Bangalore

Authors: Rudra Murthy

Abstract

Information Retrieval (IR) systems play a crucial role in organizing and accessing vast amounts of information efficiently. In this talk, I will provide a brief overview of IR systems, covering fundamental concepts, key retrieval algorithms, metrics, and evaluations of such retrieval systems. I will then introduce Hindi-BEIR, a benchmark developed at IBM to evaluate IR systems for Hindi, highlighting the challenges and insights gained from this effort. Finally, I will discuss the English-Hindi Legal IR system, built to showcase the capabilities of InstructLab as synthetic data generation framework for retrieval in legal domain. This talk will offer valuable takeaways for researchers and practitioners working on multilingual and domain-specific IR challenges.