Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic LanguagesIn Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025
- MILU: A Multi-task Indic Language Understanding BenchmarkIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025
- Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025
- UR2N: Unified Retriever and ReraNkerIn Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, Apr 2025
- Granite Embedding ModelsarXiv preprint arXiv:2502.20204, Apr 2025
- Stereotype Detection as a Catalyst for Enhanced Bias Detection: A Multi-Task Learning ApproachIn The 63rd Annual Meeting of the Association for Computational Linguistics, Apr 2025
- “You are Beautiful, Body Image Stereotypes are Ugly!” BIStereo: A Benchmark to Measure Body Image Stereotypes in Language ModelsIn The 63rd Annual Meeting of the Association for Computational Linguistics, Apr 2025
2024
- Towards understanding and mitigating the hallucinations in NLP and SpeechIn Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), Apr 2024
- PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics CapabilitiesIn Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024
- Airavata: Introducing Hindi Instruction-tuned LLMarXiv preprint arXiv:2401.15006, Aug 2024
- Mistral-SPLADE: LLMs for better Learned Sparse RetrievalarXiv preprint arXiv:2408.11119, Aug 2024
- QUESTION GENERATION OVER TABLES AND TEXTOct 2024US Patent App. 18/193,975
- Evaluating the Instruction-following Abilities of Language Models using Knowledge TasksarXiv preprint arXiv:2410.12972, Oct 2024
- SYSTEMS AND METHODS TO BUILD ONEQG: A UNIFIED QUESTION GENERATION SYSTEM ACROSS MODALITIESNov 2024US Patent App. 18/317,703
2023
- Semi-Structured Object Sequence EncodersFindings of the Association for Computational Linguistics: EMNLP 2023, Nov 2023
- Comparing DAE-based and MASS-based UNMT: Robustness to Word-Order Divergence in English–\ensuremath>Indic Language PairsIn Proceedings of the 20th International Conference on Natural Language Processing, Dec 2023
- StarCoder: may the source be with you!Transactions on Machine Learning Research, Dec 2023
- Prompting with Pseudo-Code InstructionsIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
- Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat CyberbullyingIn The 7th Workshop on Online Abuse and Harms (WOAH), Dec 2023
- Modelling Political Aggression on Social Media PlatformsIn Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Dec 2023
- A Study of Multilingual versus Meta-Learning for Language Model Pre-Training for Adaptation to Unseen Low Resource LanguagesIn Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, Dec 2023
2022
- Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languagesMachine Translation, Dec 2022
- HiNER: A Large Hindi Named Entity Recognition DatasetProceedings of the Thirteenth Language Resources and Evaluation Conference, Dec 2022
- Naamapadam: A Large-Scale Named Entity Annotated Data for Indic LanguagesProceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Dec 2022
- On Utilizing Constituent Language Resources to Improve Downstream Tasks in HinglishIn Findings of the Association for Computational Linguistics: EMNLP 2022, Dec 2022
2021
- Scrambled Translation Problem: A Problem of Denoising UNMTIn Proceedings of Machine Translation Summit XVIII: Research Track, Dec 2021
- Cognitively Aided Zero-Shot Automatic Essay GradingProceedings of the 17th International Conference on Natural Language Processing (ICON), Dec 2021
- Crosslingual Embeddings are Essential in UNMT for Distant Languages: An English to IndoAryan Case StudyIn Proceedings of Machine Translation Summit XVIII: Research Track, Dec 2021
- Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan LanguagesIn Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Dec 2021
- Language Model Pretraining and Transfer Learning for Very Low Resource LanguagesIn Proceedings of the Sixth Conference on Machine Translation, Dec 2021
2020
- A Study of Efficacy of Cross-lingual Word Embeddings for Indian LanguagesIn Young Researchers’ Symposium, Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Dec 2020
- Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze BehaviourIn Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Dec 2020
- Looking inside noun compounds: Unsupervised prepositional and free paraphrasingIn Findings of the Association for Computational Linguistics: EMNLP 2020, Dec 2020
2019
- Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource LanguagesIn 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Dec 2019
2018
- Judicious Selection of Training Data in Assisting Language for Multilingual Neural NERIn Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dec 2018
- Improving NER Tagging Performance in Low-Resource Languages via Multilingual LearningACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Dec 2018
2017
- Identifying Raga Similarity in Hindustani Classical Music through Distributed Representation of Raga NamesIn Proceedings of the 13th International Symposium on CMMR, Dec 2017
2016
- A deep learning solution to Named Entity RecognitionIn International Conference on Intelligent Text Processing and Computational Linguistics, Dec 2016
2015
- Unsupervised most frequent sense detection using word embeddingsIn Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, Dec 2015
- Using Word Embeddings for Bilingual Unsupervised WSDIn Proceedings of the 12th International Conference on Natural Language Processing, Dec 2015