Saved in:
| Main Authors: | Kim, Eunsu, Park, Junyeong, An, Na Min, Kim, Junseong, Patel, Hitesh Laxmichand, Jin, Jiho, Kruk, Julia, Agarwal, Amit, Panda, Srikant, Ilasariya, Fenal Ashokbhai, Shim, Hyunjung, Oh, Alice |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.22787 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AccessEval: Benchmarking Disability Bias in Large Language Models
by: Panda, Srikant, et al.
Published: (2025)
by: Panda, Srikant, et al.
Published: (2025)
Who's Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs
by: Hari, Vishnu, et al.
Published: (2025)
by: Hari, Vishnu, et al.
Published: (2025)
Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems
by: Meghwani, Hansa, et al.
Published: (2025)
by: Meghwani, Hansa, et al.
Published: (2025)
Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation
by: Kim, Eunsu, et al.
Published: (2025)
by: Kim, Eunsu, et al.
Published: (2025)
Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy
by: Pattnayak, Priyaranjan, et al.
Published: (2024)
by: Pattnayak, Priyaranjan, et al.
Published: (2024)
DAIQ: Auditing Demographic Attribute Inference from Question in LLMs
by: Panda, Srikant, et al.
Published: (2025)
by: Panda, Srikant, et al.
Published: (2025)
Tokenization Matters: Improving Zero-Shot NER for Indic Languages
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
by: Pattnayak, Priyaranjan, et al.
Published: (2025)
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
by: Bayramli, Zahra, et al.
Published: (2025)
by: Bayramli, Zahra, et al.
Published: (2025)
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
by: Son, Guijin, et al.
Published: (2025)
by: Son, Guijin, et al.
Published: (2025)
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
by: Pattnayak, Priyaranjan, et al.
Published: (2026)
Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts
by: Agarwal, Amit, et al.
Published: (2024)
by: Agarwal, Amit, et al.
Published: (2024)
When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts
by: Kim, Jun Seong, et al.
Published: (2025)
by: Kim, Jun Seong, et al.
Published: (2025)
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
by: Agarwal, Amit, et al.
Published: (2025)
by: Agarwal, Amit, et al.
Published: (2025)
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
by: Dua, Karan, et al.
Published: (2025)
by: Dua, Karan, et al.
Published: (2025)
LLM for Barcodes: Generating Diverse Synthetic Data for Identity Documents
by: Patel, Hitesh Laxmichand, et al.
Published: (2024)
by: Patel, Hitesh Laxmichand, et al.
Published: (2024)
RECOR: Reasoning-focused Multi-turn Conversational Retrieval Benchmark
by: Ali, Mohammed, et al.
Published: (2026)
by: Ali, Mohammed, et al.
Published: (2026)
MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
by: Park, Seojeong, et al.
Published: (2024)
by: Park, Seojeong, et al.
Published: (2024)
Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents
by: Oh, Juhyun, et al.
Published: (2025)
by: Oh, Juhyun, et al.
Published: (2025)
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
by: Agarwal, Amit, et al.
Published: (2025)
by: Agarwal, Amit, et al.
Published: (2025)
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
by: Kim, Eunsu, et al.
Published: (2025)
by: Kim, Eunsu, et al.
Published: (2025)
Aligning LLMs for Multilingual Consistency in Enterprise Applications
by: Agarwal, Amit, et al.
Published: (2025)
by: Agarwal, Amit, et al.
Published: (2025)
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
by: Song, Seyoung, et al.
Published: (2025)
by: Song, Seyoung, et al.
Published: (2025)
Scribble-Guided Diffusion for Training-free Text-to-Image Generation
by: Lee, Seonho, et al.
Published: (2024)
by: Lee, Seonho, et al.
Published: (2024)
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
by: Kim, Jiwook, et al.
Published: (2024)
by: Kim, Jiwook, et al.
Published: (2024)
TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation
by: Park, NaHyeon, et al.
Published: (2024)
by: Park, NaHyeon, et al.
Published: (2024)
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
by: Kim, Eunsu, et al.
Published: (2024)
by: Kim, Eunsu, et al.
Published: (2024)
Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
by: Shafayat, Sheikh, et al.
Published: (2024)
by: Shafayat, Sheikh, et al.
Published: (2024)
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
by: Oh, Juhyun, et al.
Published: (2024)
by: Oh, Juhyun, et al.
Published: (2024)
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
by: Kim, Dongseob, et al.
Published: (2025)
by: Kim, Dongseob, et al.
Published: (2025)
I0T: Embedding Standardization Method Towards Zero Modality Gap
by: An, Na Min, et al.
Published: (2024)
by: An, Na Min, et al.
Published: (2024)
PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation
by: Choi, Jiho, et al.
Published: (2025)
by: Choi, Jiho, et al.
Published: (2025)
JuICE: A Benchmark for Evaluating LLM-Judge in Identifying Cultural Errors
by: Jin, Jiho, et al.
Published: (2026)
by: Jin, Jiho, et al.
Published: (2026)
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
by: Lee, Seonho, et al.
Published: (2025)
by: Lee, Seonho, et al.
Published: (2025)
R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs
by: Jo, Sumin, et al.
Published: (2025)
by: Jo, Sumin, et al.
Published: (2025)
Directional Textual Inversion for Personalized Text-to-Image Generation
by: Kim, Kunhee, et al.
Published: (2025)
by: Kim, Kunhee, et al.
Published: (2025)
Rethinking the Use of Vision Transformers for AI-Generated Image Detection
by: Park, NaHyeon, et al.
Published: (2025)
by: Park, NaHyeon, et al.
Published: (2025)
Similar Items
-
AccessEval: Benchmarking Disability Bias in Large Language Models
by: Panda, Srikant, et al.
Published: (2025) -
Who's Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs
by: Hari, Vishnu, et al.
Published: (2025) -
Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation
by: Pattnayak, Priyaranjan, et al.
Published: (2025) -
Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems
by: Meghwani, Hansa, et al.
Published: (2025) -
Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization
by: Pattnayak, Priyaranjan, et al.
Published: (2025)