Saved in:
| Main Authors: | Eo, Jeyeon, Kim, Joo Young, Ju, Ran, Jung, Minyoung, Lee, Unggi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28089 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription
by: Lee, Unggi, et al.
Published: (2026)
by: Lee, Unggi, et al.
Published: (2026)
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes
by: Bharadwaj, Manasa, et al.
Published: (2026)
by: Bharadwaj, Manasa, et al.
Published: (2026)
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
by: Mukhopadhyay, Srija, et al.
Published: (2025)
by: Mukhopadhyay, Srija, et al.
Published: (2025)
MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents
by: Im, Youngmin, et al.
Published: (2025)
by: Im, Youngmin, et al.
Published: (2025)
MultiTab: A Comprehensive Benchmark Suite for Multi-Dimensional Evaluation in Tabular Domains
by: Lee, Kyungeun, et al.
Published: (2025)
by: Lee, Kyungeun, et al.
Published: (2025)
Mitigating Cross-Image Information Leakage in LVLMs for Multi-Image Tasks
by: Park, Yeji, et al.
Published: (2025)
by: Park, Yeji, et al.
Published: (2025)
Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark
by: Lee, Unggi, et al.
Published: (2025)
by: Lee, Unggi, et al.
Published: (2025)
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
by: Lim, Taein, et al.
Published: (2026)
by: Lim, Taein, et al.
Published: (2026)
PsychiatryBench: A Multi-Task Benchmark for LLMs in Psychiatry
by: Fouda, Aya E., et al.
Published: (2025)
by: Fouda, Aya E., et al.
Published: (2025)
FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs
by: Lee, Royson, et al.
Published: (2025)
by: Lee, Royson, et al.
Published: (2025)
BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
by: Choi, Jae-Woo, et al.
Published: (2024)
by: Choi, Jae-Woo, et al.
Published: (2024)
EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
by: Ran, Dongchuan, et al.
Published: (2026)
by: Ran, Dongchuan, et al.
Published: (2026)
TaskBench: Benchmarking Large Language Models for Task Automation
by: Shen, Yongliang, et al.
Published: (2023)
by: Shen, Yongliang, et al.
Published: (2023)
PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent
by: Nie, Hongyi, et al.
Published: (2026)
by: Nie, Hongyi, et al.
Published: (2026)
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
by: Bak, Taejun, et al.
Published: (2024)
by: Bak, Taejun, et al.
Published: (2024)
OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks
by: Peng, Zhihao, et al.
Published: (2025)
by: Peng, Zhihao, et al.
Published: (2025)
Unveiling the Limits of Large Language Models in Inferring Pragmatic Meaning from Non-Verbal Responses
by: Eo, Sugyeong, et al.
Published: (2026)
by: Eo, Sugyeong, et al.
Published: (2026)
BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs
by: Yoon, Sangyeon, et al.
Published: (2026)
by: Yoon, Sangyeon, et al.
Published: (2026)
KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs
by: Kim, Haechan, et al.
Published: (2026)
by: Kim, Haechan, et al.
Published: (2026)
LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health
by: Tian, Ye, et al.
Published: (2026)
by: Tian, Ye, et al.
Published: (2026)
RoboBuddy in the Classroom: Exploring LLM-Powered Social Robots for Storytelling in Learning and Integration Activities
by: Tozadore, Daniel, et al.
Published: (2025)
by: Tozadore, Daniel, et al.
Published: (2025)
OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents
by: Hu, Yulin, et al.
Published: (2026)
by: Hu, Yulin, et al.
Published: (2026)
DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy
by: Wang, Erchi, et al.
Published: (2026)
by: Wang, Erchi, et al.
Published: (2026)
HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model
by: Lee, Youngwan, et al.
Published: (2025)
by: Lee, Youngwan, et al.
Published: (2025)
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
by: Jiang, Hongchao, et al.
Published: (2025)
by: Jiang, Hongchao, et al.
Published: (2025)
ArgBench: Benchmarking LLMs on Computational Argumentation Tasks
by: Ajjour, Yamen, et al.
Published: (2026)
by: Ajjour, Yamen, et al.
Published: (2026)
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
by: Mai, Tho, et al.
Published: (2026)
by: Mai, Tho, et al.
Published: (2026)
XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
by: Kabir, Mohsinul, et al.
Published: (2026)
by: Kabir, Mohsinul, et al.
Published: (2026)
Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains
by: Lee, Kyungeun, et al.
Published: (2024)
by: Lee, Kyungeun, et al.
Published: (2024)
PILOT-Bench: A Benchmark for Legal Reasoning in the Patent Domain with IRAC-Aligned Classification Tasks
by: Jang, Yehoon, et al.
Published: (2026)
by: Jang, Yehoon, et al.
Published: (2026)
POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents
by: Zheng, Qiaoyuan, et al.
Published: (2026)
by: Zheng, Qiaoyuan, et al.
Published: (2026)
AutiHero: Engaging Parents in Creating Personalized, Multi-path Social Narratives for Autistic Children
by: Lee, Jungeun, et al.
Published: (2025)
by: Lee, Jungeun, et al.
Published: (2025)
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation
by: Woo, Young Beom, et al.
Published: (2025)
by: Woo, Young Beom, et al.
Published: (2025)
Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
by: Lee, Insu, et al.
Published: (2025)
by: Lee, Insu, et al.
Published: (2025)
LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations
by: Pham, Viet-Thanh, et al.
Published: (2026)
by: Pham, Viet-Thanh, et al.
Published: (2026)
MetaCLBench: Meta Continual Learning Benchmark on Resource-Constrained Edge Devices
by: Li, Sijia, et al.
Published: (2025)
by: Li, Sijia, et al.
Published: (2025)
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
by: Li, Xiangyi, et al.
Published: (2026)
by: Li, Xiangyi, et al.
Published: (2026)
KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
by: Wu, Tingyu, et al.
Published: (2026)
by: Wu, Tingyu, et al.
Published: (2026)
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning
by: Jung, Min Jae, et al.
Published: (2024)
by: Jung, Min Jae, et al.
Published: (2024)
Similar Items
-
A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription
by: Lee, Unggi, et al.
Published: (2026) -
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes
by: Bharadwaj, Manasa, et al.
Published: (2026) -
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
by: Mukhopadhyay, Srija, et al.
Published: (2025) -
MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents
by: Im, Youngmin, et al.
Published: (2025) -
MultiTab: A Comprehensive Benchmark Suite for Multi-Dimensional Evaluation in Tabular Domains
by: Lee, Kyungeun, et al.
Published: (2025)