:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhenliang, Wang, Wenqing, Hu, Yong, Yang, Yaming, Gao, Jiaheng, Shen, Chen, Wan, Xiaojun
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2605.04496
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SagaScale: A Realistic, Scalable, and High-Quality Long-Context Benchmark Built from Full-Length Novels
by: Du, Guancheng, et al.
Published: (2025)

SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
by: Zhang, Zhenliang, et al.
Published: (2025)

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
by: Zhang, Zhenliang, et al.
Published: (2025)

Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
by: Chen, Xiang, et al.
Published: (2023)

Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
by: Zhang, Zhenliang, et al.
Published: (2025)

JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation
by: Xu, Fan, et al.
Published: (2025)

Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
by: Ruan, Jie, et al.
Published: (2024)

Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)

The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation
by: Zhang, Jiaheng, et al.
Published: (2025)

Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection
by: Li, Jiatao, et al.
Published: (2025)

M$^{3}$T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark
by: Zhang, Huixuan, et al.
Published: (2025)

Are LLM-based Evaluators Confusing NLG Quality Criteria?
by: Hu, Xinyu, et al.
Published: (2024)

Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?
by: Hu, Yutong, et al.
Published: (2024)

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
by: Ruan, Jie, et al.
Published: (2024)

MINOS: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
by: Zhang, Junzhe, et al.
Published: (2025)

Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators
by: Chang, Jiayi, et al.
Published: (2025)

Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
by: Gao, Mingqi, et al.
Published: (2024)

SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models
by: Yang, Ziyi, et al.
Published: (2025)

SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
by: Afane, Mohamed, et al.
Published: (2025)

How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
by: Zhang, Huixuan, et al.
Published: (2025)

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
by: Que, Haoran, et al.
Published: (2024)

Improving Long Text Understanding with Knowledge Distilled from Summarization Model
by: Liu, Yan, et al.
Published: (2024)

Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning
by: Wang, Xiaorong, et al.
Published: (2025)

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
by: Zhong, Meizhi, et al.
Published: (2024)

CFunModel: A "Funny" Language Model Capable of Chinese Humor Generation and Processing
by: Yu, Zhenghan, et al.
Published: (2025)

SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval
by: Li, Jiatao, et al.
Published: (2024)

Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review
by: Li, Jiatao, et al.
Published: (2025)

LLM-based NLG Evaluation: Current Status and Challenges
by: Gao, Mingqi, et al.
Published: (2024)

Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability
by: Hu, Xinyu, et al.
Published: (2024)

A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability
by: Hu, Xinyu, et al.
Published: (2025)

When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?
by: Liu, Tianyu, et al.
Published: (2026)

RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery
by: Gu, Hongchao, et al.
Published: (2025)

LongIns: A Challenging Long-context Instruction-based Exam for LLMs
by: Gavin, Shawn, et al.
Published: (2024)

From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)

Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis
by: Zhang, Wenqing, et al.
Published: (2025)

UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization
by: Zhang, Huixuan, et al.
Published: (2025)

Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models
by: Zhang, Huixuan, et al.
Published: (2024)

Decoupling SQL Query Hardness Parsing for Text-to-SQL
by: Yi, Jiawen, et al.
Published: (2023)

Self-Evolution Fine-Tuning for Policy Optimization
by: Chen, Ruijun, et al.
Published: (2024)

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding
by: Feng, Xiang, et al.
Published: (2026)