Saved in:
| Main Authors: | Hu, Jinpeng, Dong, Tengteng, Gang, Luo, Ma, Hui, Zou, Peng, Sun, Xiao, Guo, Dan, Yang, Xun, Wang, Meng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.05721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning
by: Dai, Chongyuan, et al.
Published: (2025)
by: Dai, Chongyuan, et al.
Published: (2025)
Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
AgentMental: An Interactive Multi-Agent Framework for Explainable and Adaptive Mental Health Assessment
by: Hu, Jinpeng, et al.
Published: (2025)
by: Hu, Jinpeng, et al.
Published: (2025)
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
by: Wei, Lei, et al.
Published: (2026)
by: Wei, Lei, et al.
Published: (2026)
In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning
by: Ma, Hui, et al.
Published: (2025)
by: Ma, Hui, et al.
Published: (2025)
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
by: Hu, Taojun, et al.
Published: (2024)
by: Hu, Taojun, et al.
Published: (2024)
Understanding Layer Significance in LLM Alignment
by: Shi, Guangyuan, et al.
Published: (2024)
by: Shi, Guangyuan, et al.
Published: (2024)
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
by: Dong, Honghua, et al.
Published: (2025)
by: Dong, Honghua, et al.
Published: (2025)
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
by: Zou, Anni, et al.
Published: (2024)
by: Zou, Anni, et al.
Published: (2024)
Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback
by: Zhu, Shijing, et al.
Published: (2025)
by: Zhu, Shijing, et al.
Published: (2025)
CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Causal Significance and Consistency
by: Wang, Kangsheng, et al.
Published: (2024)
by: Wang, Kangsheng, et al.
Published: (2024)
Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text
by: Mohamed, Amr, et al.
Published: (2025)
by: Mohamed, Amr, et al.
Published: (2025)
LLM Hallucination Detection: HSAD
by: Li, JinXin, et al.
Published: (2025)
by: Li, JinXin, et al.
Published: (2025)
ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction
by: Jin, Yiqiao, et al.
Published: (2025)
by: Jin, Yiqiao, et al.
Published: (2025)
LLM-based NLG Evaluation: Current Status and Challenges
by: Gao, Mingqi, et al.
Published: (2024)
by: Gao, Mingqi, et al.
Published: (2024)
Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents
by: He, Chengbo, et al.
Published: (2024)
by: He, Chengbo, et al.
Published: (2024)
MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors
by: Luo, Xiaotian, et al.
Published: (2026)
by: Luo, Xiaotian, et al.
Published: (2026)
Explaining Length Bias in LLM-Based Preference Evaluations
by: Hu, Zhengyu, et al.
Published: (2024)
by: Hu, Zhengyu, et al.
Published: (2024)
LLM-Guided Strategy Synthesis for Scalable Equality Saturation
by: Yin, Chenyun, et al.
Published: (2026)
by: Yin, Chenyun, et al.
Published: (2026)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)
by: Fayyaz, Mohsen, et al.
Published: (2024)
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
by: Yoo, Haneul, et al.
Published: (2024)
by: Yoo, Haneul, et al.
Published: (2024)
Code Fingerprints: Disentangled Attribution of LLM-Generated Code
by: Guo, Jiaxun, et al.
Published: (2026)
by: Guo, Jiaxun, et al.
Published: (2026)
HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?
by: Peng, Weihan, et al.
Published: (2026)
by: Peng, Weihan, et al.
Published: (2026)
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024)
by: Yang, Yahan, et al.
Published: (2024)
WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
by: Zhang, Binbin, et al.
Published: (2025)
by: Zhang, Binbin, et al.
Published: (2025)
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
by: Cao, Yixin, et al.
Published: (2025)
by: Cao, Yixin, et al.
Published: (2025)
Are LLM-based Evaluators Confusing NLG Quality Criteria?
by: Hu, Xinyu, et al.
Published: (2024)
by: Hu, Xinyu, et al.
Published: (2024)
Citation-Enhanced Generation for LLM-based Chatbots
by: Li, Weitao, et al.
Published: (2024)
by: Li, Weitao, et al.
Published: (2024)
Skill-Conditioned Gated Self-Distillation for LLM Reasoning
by: Huang, Jiazhen, et al.
Published: (2026)
by: Huang, Jiazhen, et al.
Published: (2026)
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning
by: Meng, Silin, et al.
Published: (2024)
by: Meng, Silin, et al.
Published: (2024)
LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models
by: Yang, Hang, et al.
Published: (2024)
by: Yang, Hang, et al.
Published: (2024)
DuanzAI: Slang-Enhanced LLM with Prompt for Humor Understanding
by: Rohn, Yesian
Published: (2024)
by: Rohn, Yesian
Published: (2024)
BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation
by: Sun, Peng, et al.
Published: (2026)
by: Sun, Peng, et al.
Published: (2026)
Understanding LLM Embeddings for Regression
by: Tang, Eric, et al.
Published: (2024)
by: Tang, Eric, et al.
Published: (2024)
IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation
by: Lin, Fan, et al.
Published: (2024)
by: Lin, Fan, et al.
Published: (2024)
Exploring LLM Multi-Agents for ICD Coding
by: Li, Rumeng, et al.
Published: (2024)
by: Li, Rumeng, et al.
Published: (2024)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
by: Yang, Dongjie, et al.
Published: (2024)
by: Yang, Dongjie, et al.
Published: (2024)
MIRAI: Evaluating LLM Agents for Event Forecasting
by: Ye, Chenchen, et al.
Published: (2024)
by: Ye, Chenchen, et al.
Published: (2024)
HuggingGraph: Understanding the Supply Chain of LLM Ecosystem
by: Rahman, Mohammad Shahedur, et al.
Published: (2025)
by: Rahman, Mohammad Shahedur, et al.
Published: (2025)
HTAA: Enhancing LLM Planning via Hybrid Toolset Agentization & Adaptation
by: Huang, Chengrui, et al.
Published: (2026)
by: Huang, Chengrui, et al.
Published: (2026)
Similar Items
-
Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning
by: Dai, Chongyuan, et al.
Published: (2025) -
Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors
by: Li, Jia, et al.
Published: (2025) -
AgentMental: An Interactive Multi-Agent Framework for Explainable and Adaptive Mental Health Assessment
by: Hu, Jinpeng, et al.
Published: (2025) -
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning
by: Wei, Lei, et al.
Published: (2026) -
In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning
by: Ma, Hui, et al.
Published: (2025)