:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Yi-Fan, Mao, Xian-Ling, Lan, Tian, Zhang, Tong, Zhu, Yu-Shi, Huang, Heyan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.03303
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
by: Lu, Yi-Fan, et al.
Published: (2024)

EXCEEDS: Extracting Complex Events via Nugget-based Grid Modeling in Scientific Domain
by: Lu, Yi-Fan, et al.
Published: (2024)

Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines
by: Ma, Zi-Ao, et al.
Published: (2024)

Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
by: Tu, Rong-Cheng, et al.
Published: (2024)

DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey
by: Zhang, Guo-Biao, et al.
Published: (2026)

Mix-Initiative Response Generation with Dynamic Prefix Tuning
by: Nie, Yuxiang, et al.
Published: (2024)

CriticEval: Evaluating Large Language Model as Critic
by: Lan, Tian, et al.
Published: (2024)

T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation
by: Ma, Zi-Ao, et al.
Published: (2025)

Training Language Models to Critique With Multi-agent Feedback
by: Lan, Tian, et al.
Published: (2024)

MMWOZ: Building Multimodal Agent for Task-oriented Dialogue
by: Yang, Pu-Hai, et al.
Published: (2025)

A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations
by: Lan, Tian, et al.
Published: (2025)

Word Matters: What Influences Domain Adaptation in Summarization?
by: Li, Yinghao, et al.
Published: (2024)

Building Knowledge-Grounded Dialogue Systems with Graph-Based Semantic Modeling
by: Yang, Yizhe, et al.
Published: (2022)

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
by: Zhuo, Le, et al.
Published: (2024)

Training-free Truthfulness Detection via Value Vectors in LLMs
by: Liu, Runheng, et al.
Published: (2025)

LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
by: Yang, Gao, et al.
Published: (2025)

Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation
by: Tian, Yanzhi, et al.
Published: (2026)

A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation
by: Che, Tian-Yi, et al.
Published: (2024)

REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
by: Sharif, Omar, et al.
Published: (2025)

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model
by: Liu, Runheng, et al.
Published: (2026)

Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation
by: Yao, Jiashu, et al.
Published: (2026)

MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics
by: Wang, Jiapeng, et al.
Published: (2025)

Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
by: Zhou, Xiaofeng, et al.
Published: (2025)

Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation
by: Qiu, Huachuan, et al.
Published: (2023)

Leveraging Open Information Extraction for More Robust Domain Transfer of Event Trigger Detection
by: Dukić, David, et al.
Published: (2023)

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024)

How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling
by: Liu, Yuhang, et al.
Published: (2026)

Open-Domain Text Evaluation via Contrastive Distribution Methods
by: Lu, Sidi, et al.
Published: (2023)

Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation
by: Qiu, Yuli, et al.
Published: (2024)

Deterministic Reversible Data Augmentation for Neural Machine Translation
by: Yao, Jiashu, et al.
Published: (2024)

MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation
by: Xiao, Xingchen, et al.
Published: (2026)

Assessing LLM Reliability on Temporally Recent Open-Domain Questions
by: Krishnappa, Pushwitha, et al.
Published: (2026)

CEO: Corpus-based Open-Domain Event Ontology Induction
by: Xu, Nan, et al.
Published: (2023)

CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
by: Zhu, Mengna, et al.
Published: (2024)

SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models
by: Xu, Tianhan, et al.
Published: (2024)

How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment
by: Huang, Heyan, et al.
Published: (2024)

MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
by: Mendonça, John, et al.
Published: (2025)

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
by: Liu, Xinyu, et al.
Published: (2024)

Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation
by: Liu, Zhenhua, et al.
Published: (2024)

Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models
by: Qiu, Huachuan, et al.
Published: (2024)