Saved in:
| Main Authors: | Pattnayak, Priyaranjan, Chowdhuri, Sanchari |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.16832 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
Tokenization Matters: Improving Zero-Shot NER for Indic Languages
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation
by: Pattnayak, Priyaranjan
Published: (2026)
by: Pattnayak, Priyaranjan
Published: (2026)
LLM for Barcodes: Generating Diverse Synthetic Data for Identity Documents
by: Patel, Hitesh Laxmichand, et al.
Published: (2024)
by: Patel, Hitesh Laxmichand, et al.
Published: (2024)
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
by: KJ, Sankalp, et al.
Published: (2025)
by: KJ, Sankalp, et al.
Published: (2025)
Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition
by: Juvekar, Kush, et al.
Published: (2026)
by: Juvekar, Kush, et al.
Published: (2026)
IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
by: Dawar, Aviral, et al.
Published: (2026)
by: Dawar, Aviral, et al.
Published: (2026)
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
by: Liu, Fan, et al.
Published: (2024)
by: Liu, Fan, et al.
Published: (2024)
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems
by: Meghwani, Hansa, et al.
Published: (2025)
by: Meghwani, Hansa, et al.
Published: (2025)
BhashaBench V1: A Comprehensive Benchmark for the Quadrant of Indic Domains
by: Devane, Vijay, et al.
Published: (2025)
by: Devane, Vijay, et al.
Published: (2025)
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
by: Cantini, Riccardo, et al.
Published: (2024)
by: Cantini, Riccardo, et al.
Published: (2024)
Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
by: Panchal, Mihir, et al.
Published: (2026)
by: Panchal, Mihir, et al.
Published: (2026)
TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages
by: Akinode, Victor, et al.
Published: (2026)
by: Akinode, Victor, et al.
Published: (2026)
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?
by: Aravapalli, Akhilesh, et al.
Published: (2024)
by: Aravapalli, Akhilesh, et al.
Published: (2024)
NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages
by: Tomar, Lakshya, et al.
Published: (2026)
by: Tomar, Lakshya, et al.
Published: (2026)
BhashaKritika: Building Synthetic Pretraining Data at Scale for Indic Languages
by: Manoj, Guduru, et al.
Published: (2025)
by: Manoj, Guduru, et al.
Published: (2025)
Multilingual State Space Models for Structured Question Answering in Indic Languages
by: Vats, Arpita, et al.
Published: (2025)
by: Vats, Arpita, et al.
Published: (2025)
IndicEval: A Bilingual Indian Educational Evaluation Framework for Large Language Models
by: Bharti, Saurabh, et al.
Published: (2026)
by: Bharti, Saurabh, et al.
Published: (2026)
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
by: Cantini, Riccardo, et al.
Published: (2025)
by: Cantini, Riccardo, et al.
Published: (2025)
IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages
by: Nigam, Shubham Kumar, et al.
Published: (2026)
by: Nigam, Shubham Kumar, et al.
Published: (2026)
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
by: Zhou, Weikang, et al.
Published: (2024)
by: Zhou, Weikang, et al.
Published: (2024)
Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts
by: Agarwal, Amit, et al.
Published: (2024)
by: Agarwal, Amit, et al.
Published: (2024)
Multilingual Coreference Resolution in Low-resource South Asian Languages
by: Mishra, Ritwik, et al.
Published: (2024)
by: Mishra, Ritwik, et al.
Published: (2024)
SPRING Lab IITM's submission to Low Resource Indic Language Translation Shared Task
by: Sayed, Hamees, et al.
Published: (2024)
by: Sayed, Hamees, et al.
Published: (2024)
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
by: Bi, Zhenyu, et al.
Published: (2025)
by: Bi, Zhenyu, et al.
Published: (2025)
JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
by: Luo, Weidi, et al.
Published: (2024)
by: Luo, Weidi, et al.
Published: (2024)
Relic: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples
by: Ghosal, Soumya Suvra, et al.
Published: (2025)
by: Ghosal, Soumya Suvra, et al.
Published: (2025)
Jailbreaking to Jailbreak
by: Kritz, Jeremy, et al.
Published: (2025)
by: Kritz, Jeremy, et al.
Published: (2025)
THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language Models
by: Ma, Zhiqing, et al.
Published: (2026)
by: Ma, Zhiqing, et al.
Published: (2026)
JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)
by: Tan, Sijun, et al.
Published: (2024)
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
Generating Leakage-Free Benchmarks for Robust RAG Evaluation
by: Liu, Jiayi, et al.
Published: (2026)
by: Liu, Jiayi, et al.
Published: (2026)
LLM Jailbreak Detection for (Almost) Free!
by: Chen, Guorui, et al.
Published: (2025)
by: Chen, Guorui, et al.
Published: (2025)
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
by: Zhang, Wenbo, et al.
Published: (2026)
by: Zhang, Wenbo, et al.
Published: (2026)
Decoding the Diversity: A Review of the Indic AI Research Landscape
by: KJ, Sankalp, et al.
Published: (2024)
by: KJ, Sankalp, et al.
Published: (2024)
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
by: Xu, Zhao, et al.
Published: (2024)
by: Xu, Zhao, et al.
Published: (2024)
Playing Language Game with LLMs Leads to Jailbreaking
by: Peng, Yu, et al.
Published: (2024)
by: Peng, Yu, et al.
Published: (2024)
Similar Items
-
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
by: Pattnayak, Priyaranjan, et al.
Published: (2026) -
Tokenization Matters: Improving Zero-Shot NER for Indic Languages
by: Pattnayak, Priyaranjan, et al.
Published: (2025) -
LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
by: Pattnayak, Priyaranjan, et al.
Published: (2026) -
SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation
by: Pattnayak, Priyaranjan
Published: (2026) -
LLM for Barcodes: Generating Diverse Synthetic Data for Identity Documents
by: Patel, Hitesh Laxmichand, et al.
Published: (2024)