Guardado en:
| Autores principales: | Shen, Lei, Shen, Xiaoyu |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2504.18373 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding
por: Elleuch, Haroun, et al.
Publicado: (2026)
por: Elleuch, Haroun, et al.
Publicado: (2026)
Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
por: Mok, Jisoo, et al.
Publicado: (2025)
por: Mok, Jisoo, et al.
Publicado: (2025)
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
por: Zhao, Zheng, et al.
Publicado: (2025)
por: Zhao, Zheng, et al.
Publicado: (2025)
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
por: Shen, Zhuokang, et al.
Publicado: (2026)
por: Shen, Zhuokang, et al.
Publicado: (2026)
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes
por: Bharadwaj, Manasa, et al.
Publicado: (2026)
por: Bharadwaj, Manasa, et al.
Publicado: (2026)
Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues
por: Sun, Lei, et al.
Publicado: (2024)
por: Sun, Lei, et al.
Publicado: (2024)
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
por: Ding, Xuanwen, et al.
Publicado: (2025)
por: Ding, Xuanwen, et al.
Publicado: (2025)
MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
por: Selvakumar, Ramaneswaran, et al.
Publicado: (2025)
por: Selvakumar, Ramaneswaran, et al.
Publicado: (2025)
PTCBENCH: Benchmarking Contextual Stability of Personality Traits in LLM Systems
por: Yu, Jiongchi, et al.
Publicado: (2026)
por: Yu, Jiongchi, et al.
Publicado: (2026)
AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
por: Oh, Gyutaek, et al.
Publicado: (2025)
por: Oh, Gyutaek, et al.
Publicado: (2025)
ES-MemEval: Benchmarking Conversational Agents on Personalized Long-Term Emotional Support
por: Chen, Tiantian, et al.
Publicado: (2026)
por: Chen, Tiantian, et al.
Publicado: (2026)
Aligning VLM Assistants with Personalized Situated Cognition
por: Li, Yongqi, et al.
Publicado: (2025)
por: Li, Yongqi, et al.
Publicado: (2025)
MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction
por: Wang, Xiao, et al.
Publicado: (2025)
por: Wang, Xiao, et al.
Publicado: (2025)
TravelAgent: An AI Assistant for Personalized Travel Planning
por: Chen, Aili, et al.
Publicado: (2024)
por: Chen, Aili, et al.
Publicado: (2024)
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation
por: Zhang, Enze, et al.
Publicado: (2025)
por: Zhang, Enze, et al.
Publicado: (2025)
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
por: Wang, Siyuan, et al.
Publicado: (2024)
por: Wang, Siyuan, et al.
Publicado: (2024)
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
por: Wang, Junyang, et al.
Publicado: (2024)
por: Wang, Junyang, et al.
Publicado: (2024)
Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task
por: Wang, Zilong, et al.
Publicado: (2025)
por: Wang, Zilong, et al.
Publicado: (2025)
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
por: Zhu, Yaochen, et al.
Publicado: (2023)
por: Zhu, Yaochen, et al.
Publicado: (2023)
How Does Personalized Memory Shape LLM Behavior? Benchmarking Rational Preference Utilization in Personalized Assistants
por: Feng, Xueyang, et al.
Publicado: (2026)
por: Feng, Xueyang, et al.
Publicado: (2026)
Large Language Models Empowered Personalized Web Agents
por: Cai, Hongru, et al.
Publicado: (2024)
por: Cai, Hongru, et al.
Publicado: (2024)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
por: Trirat, Patara, et al.
Publicado: (2024)
por: Trirat, Patara, et al.
Publicado: (2024)
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration
por: Zhou, Yucheng, et al.
Publicado: (2025)
por: Zhou, Yucheng, et al.
Publicado: (2025)
AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework
por: Fan, Meihao, et al.
Publicado: (2024)
por: Fan, Meihao, et al.
Publicado: (2024)
AutoPatent: A Multi-Agent Framework for Automatic Patent Generation
por: Wang, Qiyao, et al.
Publicado: (2024)
por: Wang, Qiyao, et al.
Publicado: (2024)
SALAD: Smart AI Language Assistant Daily
por: Nihal, Ragib Amin, et al.
Publicado: (2024)
por: Nihal, Ragib Amin, et al.
Publicado: (2024)
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
por: Li, Ziming, et al.
Publicado: (2024)
por: Li, Ziming, et al.
Publicado: (2024)
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
por: Lei, Yang, et al.
Publicado: (2023)
por: Lei, Yang, et al.
Publicado: (2023)
LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation
por: Duan, Feiyu, et al.
Publicado: (2026)
por: Duan, Feiyu, et al.
Publicado: (2026)
Talking to Data: Designing Smart Assistants for Humanities Databases
por: Sergeev, Alexander, et al.
Publicado: (2025)
por: Sergeev, Alexander, et al.
Publicado: (2025)
Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework
por: Dehghani, Zeinab, et al.
Publicado: (2026)
por: Dehghani, Zeinab, et al.
Publicado: (2026)
Fusion-Eval: Integrating Assistant Evaluators with LLMs
por: Shu, Lei, et al.
Publicado: (2023)
por: Shu, Lei, et al.
Publicado: (2023)
EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences
por: Shen, Jocelyn, et al.
Publicado: (2024)
por: Shen, Jocelyn, et al.
Publicado: (2024)
GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning
por: Cheng, Xiang, et al.
Publicado: (2026)
por: Cheng, Xiang, et al.
Publicado: (2026)
Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents
por: Shen, Yiting, et al.
Publicado: (2026)
por: Shen, Yiting, et al.
Publicado: (2026)
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
por: Shen, Yujiong, et al.
Publicado: (2026)
por: Shen, Yujiong, et al.
Publicado: (2026)
SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation
por: Wan, Yuwei, et al.
Publicado: (2024)
por: Wan, Yuwei, et al.
Publicado: (2024)
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
por: Adelani, David Ifeoluwa, et al.
Publicado: (2023)
por: Adelani, David Ifeoluwa, et al.
Publicado: (2023)
EduAgentQG: A Multi-Agent Workflow Framework for Personalized Question Generation
por: Jia, Rui, et al.
Publicado: (2025)
por: Jia, Rui, et al.
Publicado: (2025)
PRISM: A Personality-Driven Multi-Agent Framework for Social Media Simulation
por: Lu, Zhixiang, et al.
Publicado: (2025)
por: Lu, Zhixiang, et al.
Publicado: (2025)
Ejemplares similares
-
SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding
por: Elleuch, Haroun, et al.
Publicado: (2026) -
Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
por: Mok, Jisoo, et al.
Publicado: (2025) -
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
por: Zhao, Zheng, et al.
Publicado: (2025) -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
por: Shen, Zhuokang, et al.
Publicado: (2026) -
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes
por: Bharadwaj, Manasa, et al.
Publicado: (2026)