Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- UR2N: Unified Retriever and ReraNkerIn Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, 2025
- Granite Embedding ModelsarXiv preprint arXiv:2502.20204, 2025
2024
- Towards understanding and mitigating the hallucinations in NLP and SpeechIn Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), 2024
- PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics CapabilitiesarXiv preprint arXiv:2401.07078, 2024
- Airavata: Introducing Hindi Instruction-tuned LLMarXiv preprint arXiv:2401.15006, 2024
- Do LLMs understand Pragmatics? An Extensive Benchmark for Evaluating Pragmatic Understanding of LLMs2024
- INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic LanguagesarXiv preprint arXiv:2407.13522, 2024
- Mistral-SPLADE: LLMs for better Learned Sparse RetrievalarXiv preprint arXiv:2408.11119, 2024
- QUESTION GENERATION OVER TABLES AND TEXTOct 2024US Patent App. 18/193,975
- Evaluating the Instruction-following Abilities of Language Models using Knowledge TasksarXiv preprint arXiv:2410.12972, Oct 2024
- MILU: A Multi-task Indic Language Understanding BenchmarkarXiv preprint arXiv:2411.02538, Oct 2024
- Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5arXiv preprint arXiv:2409.05401, Oct 2024
- SYSTEMS AND METHODS TO BUILD ONEQG: A UNIFIED QUESTION GENERATION SYSTEM ACROSS MODALITIESNov 2024US Patent App. 18/317,703
2023
- Semi-Structured Object Sequence EncodersarXiv preprint arXiv:2301.01015, Nov 2023
- Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMTarXiv preprint arXiv:2303.01191, Nov 2023
- StarCoder: may the source be with you!arXiv preprint arXiv:2305.06161, Nov 2023
- Prompting with Pseudo-Code InstructionsarXiv preprint arXiv:2305.11790, Nov 2023
- Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat CyberbullyingIn The 7th Workshop on Online Abuse and Harms (WOAH), Nov 2023
- Modelling Political Aggression on Social Media PlatformsIn Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Nov 2023
- A Study of Multilingual versus Meta-Learning for Language Model Pre-Training for Adaptation to Unseen Low Resource LanguagesIn Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, Nov 2023
2022
- Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languagesMachine Translation, Nov 2022
- HiNER: A Large Hindi Named Entity Recognition DatasetarXiv preprint arXiv:2204.13743, Nov 2022
- Naamapadam: A Large-Scale Named Entity Annotated Data for Indic LanguagesarXiv preprint arXiv:2212.10168, Nov 2022
- On Utilizing Constituent Language Resources to Improve Downstream Tasks in HinglishIn Findings of the Association for Computational Linguistics: EMNLP 2022, Nov 2022
2021
- Scrambled Translation Problem: A Problem of Denoising UNMTIn Proceedings of Machine Translation Summit XVIII: Research Track, Nov 2021
- Cognitively Aided Zero-Shot Automatic Essay GradingarXiv preprint arXiv:2102.11258, Nov 2021
- Crosslingual Embeddings are Essential in UNMT for Distant Languages: An English to IndoAryan Case StudyIn Proceedings of Machine Translation Summit XVIII: Research Track, Nov 2021
- Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan LanguagesIn Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021
- Language Model Pretraining and Transfer Learning for Very Low Resource LanguagesIn Proceedings of the Sixth Conference on Machine Translation, Nov 2021
2020
- A Study of Efficacy of Cross-lingual Word Embeddings for Indian LanguagesIn Young Researchers’ Symposium, Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Nov 2020
- Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze BehaviourIn Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Nov 2020
- Looking inside noun compounds: Unsupervised prepositional and free paraphrasingIn Findings of the Association for Computational Linguistics: EMNLP 2020, Nov 2020
2019
- Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource LanguagesIn 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Nov 2019
2018
- Judicious Selection of Training Data in Assisting Language for Multilingual Neural NERIn Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Nov 2018
- Improving NER Tagging Performance in Low-Resource Languages via Multilingual LearningACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Nov 2018
2017
- Identifying Raga Similarity in Hindustani Classical Music through Distributed Representation of Raga NamesIn Proceedings of the 13th International Symposium on CMMR, Nov 2017
2016
- A deep learning solution to Named Entity RecognitionIn International Conference on Intelligent Text Processing and Computational Linguistics, Nov 2016
2015
- Unsupervised most frequent sense detection using word embeddingsIn DENVER, Nov 2015
- Using Word Embeddings for Bilingual Unsupervised WSDIn Proceedings of the 12th International Conference on Natural Language Processing, Nov 2015