Saved in:
| Main Authors: | Lu, Yi-Fan, Mao, Xian-Ling, Lan, Tian, Zhang, Tong, Zhu, Yu-Shi, Huang, Heyan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.03303 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
by: Lu, Yi-Fan, et al.
Published: (2024)
by: Lu, Yi-Fan, et al.
Published: (2024)
EXCEEDS: Extracting Complex Events via Nugget-based Grid Modeling in Scientific Domain
by: Lu, Yi-Fan, et al.
Published: (2024)
by: Lu, Yi-Fan, et al.
Published: (2024)
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines
by: Ma, Zi-Ao, et al.
Published: (2024)
by: Ma, Zi-Ao, et al.
Published: (2024)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
by: Tu, Rong-Cheng, et al.
Published: (2024)
by: Tu, Rong-Cheng, et al.
Published: (2024)
DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey
by: Zhang, Guo-Biao, et al.
Published: (2026)
by: Zhang, Guo-Biao, et al.
Published: (2026)
Mix-Initiative Response Generation with Dynamic Prefix Tuning
by: Nie, Yuxiang, et al.
Published: (2024)
by: Nie, Yuxiang, et al.
Published: (2024)
CriticEval: Evaluating Large Language Model as Critic
by: Lan, Tian, et al.
Published: (2024)
by: Lan, Tian, et al.
Published: (2024)
T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation
by: Ma, Zi-Ao, et al.
Published: (2025)
by: Ma, Zi-Ao, et al.
Published: (2025)
Training Language Models to Critique With Multi-agent Feedback
by: Lan, Tian, et al.
Published: (2024)
by: Lan, Tian, et al.
Published: (2024)
MMWOZ: Building Multimodal Agent for Task-oriented Dialogue
by: Yang, Pu-Hai, et al.
Published: (2025)
by: Yang, Pu-Hai, et al.
Published: (2025)
A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations
by: Lan, Tian, et al.
Published: (2025)
by: Lan, Tian, et al.
Published: (2025)
Word Matters: What Influences Domain Adaptation in Summarization?
by: Li, Yinghao, et al.
Published: (2024)
by: Li, Yinghao, et al.
Published: (2024)
Building Knowledge-Grounded Dialogue Systems with Graph-Based Semantic Modeling
by: Yang, Yizhe, et al.
Published: (2022)
by: Yang, Yizhe, et al.
Published: (2022)
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
by: Zhuo, Le, et al.
Published: (2024)
by: Zhuo, Le, et al.
Published: (2024)
Training-free Truthfulness Detection via Value Vectors in LLMs
by: Liu, Runheng, et al.
Published: (2025)
by: Liu, Runheng, et al.
Published: (2025)
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
by: Yang, Gao, et al.
Published: (2025)
by: Yang, Gao, et al.
Published: (2025)
Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation
by: Tian, Yanzhi, et al.
Published: (2026)
by: Tian, Yanzhi, et al.
Published: (2026)
A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation
by: Che, Tian-Yi, et al.
Published: (2024)
by: Che, Tian-Yi, et al.
Published: (2024)
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
by: Sharif, Omar, et al.
Published: (2025)
by: Sharif, Omar, et al.
Published: (2025)
Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model
by: Liu, Runheng, et al.
Published: (2026)
by: Liu, Runheng, et al.
Published: (2026)
Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation
by: Yao, Jiashu, et al.
Published: (2026)
by: Yao, Jiashu, et al.
Published: (2026)
MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics
by: Wang, Jiapeng, et al.
Published: (2025)
by: Wang, Jiapeng, et al.
Published: (2025)
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
by: Zhou, Xiaofeng, et al.
Published: (2025)
by: Zhou, Xiaofeng, et al.
Published: (2025)
Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation
by: Qiu, Huachuan, et al.
Published: (2023)
by: Qiu, Huachuan, et al.
Published: (2023)
Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger Detection
by: Dukić, David, et al.
Published: (2023)
by: Dukić, David, et al.
Published: (2023)
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling
by: Liu, Yuhang, et al.
Published: (2026)
by: Liu, Yuhang, et al.
Published: (2026)
Open-Domain Text Evaluation via Contrastive Distribution Methods
by: Lu, Sidi, et al.
Published: (2023)
by: Lu, Sidi, et al.
Published: (2023)
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation
by: Qiu, Yuli, et al.
Published: (2024)
by: Qiu, Yuli, et al.
Published: (2024)
Deterministic Reversible Data Augmentation for Neural Machine Translation
by: Yao, Jiashu, et al.
Published: (2024)
by: Yao, Jiashu, et al.
Published: (2024)
MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation
by: Xiao, Xingchen, et al.
Published: (2026)
by: Xiao, Xingchen, et al.
Published: (2026)
Assessing LLM Reliability on Temporally Recent Open-Domain Questions
by: Krishnappa, Pushwitha, et al.
Published: (2026)
by: Krishnappa, Pushwitha, et al.
Published: (2026)
CEO: Corpus-based Open-Domain Event Ontology Induction
by: Xu, Nan, et al.
Published: (2023)
by: Xu, Nan, et al.
Published: (2023)
CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
by: Zhu, Mengna, et al.
Published: (2024)
by: Zhu, Mengna, et al.
Published: (2024)
SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models
by: Xu, Tianhan, et al.
Published: (2024)
by: Xu, Tianhan, et al.
Published: (2024)
How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment
by: Huang, Heyan, et al.
Published: (2024)
by: Huang, Heyan, et al.
Published: (2024)
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
by: Mendonça, John, et al.
Published: (2025)
by: Mendonça, John, et al.
Published: (2025)
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
by: Liu, Xinyu, et al.
Published: (2024)
by: Liu, Xinyu, et al.
Published: (2024)
Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation
by: Liu, Zhenhua, et al.
Published: (2024)
by: Liu, Zhenhua, et al.
Published: (2024)
Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models
by: Qiu, Huachuan, et al.
Published: (2024)
by: Qiu, Huachuan, et al.
Published: (2024)
Similar Items
-
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
by: Lu, Yi-Fan, et al.
Published: (2024) -
EXCEEDS: Extracting Complex Events via Nugget-based Grid Modeling in Scientific Domain
by: Lu, Yi-Fan, et al.
Published: (2024) -
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines
by: Ma, Zi-Ao, et al.
Published: (2024) -
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
by: Tu, Rong-Cheng, et al.
Published: (2024) -
DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey
by: Zhang, Guo-Biao, et al.
Published: (2026)