Saved in:
| Main Authors: | Zhang, Zhenliang, Wang, Wenqing, Hu, Yong, Yang, Yaming, Gao, Jiaheng, Shen, Chen, Wan, Xiaojun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.04496 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SagaScale: A Realistic, Scalable, and High-Quality Long-Context Benchmark Built from Full-Length Novels
by: Du, Guancheng, et al.
Published: (2025)
by: Du, Guancheng, et al.
Published: (2025)
SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
by: Zhang, Zhenliang, et al.
Published: (2025)
by: Zhang, Zhenliang, et al.
Published: (2025)
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
by: Zhang, Zhenliang, et al.
Published: (2025)
by: Zhang, Zhenliang, et al.
Published: (2025)
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
by: Chen, Xiang, et al.
Published: (2023)
by: Chen, Xiang, et al.
Published: (2023)
Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
by: Zhang, Zhenliang, et al.
Published: (2025)
by: Zhang, Zhenliang, et al.
Published: (2025)
JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation
by: Xu, Fan, et al.
Published: (2025)
by: Xu, Fan, et al.
Published: (2025)
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
by: Ruan, Jie, et al.
Published: (2024)
by: Ruan, Jie, et al.
Published: (2024)
Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation
by: Zhang, Jiaheng, et al.
Published: (2025)
by: Zhang, Jiaheng, et al.
Published: (2025)
Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection
by: Li, Jiatao, et al.
Published: (2025)
by: Li, Jiatao, et al.
Published: (2025)
M$^{3}$T2IBench: A Large-Scale Multi-Category, Multi-Instance, Multi-Relation Text-to-Image Benchmark
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
Are LLM-based Evaluators Confusing NLG Quality Criteria?
by: Hu, Xinyu, et al.
Published: (2024)
by: Hu, Xinyu, et al.
Published: (2024)
Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?
by: Hu, Yutong, et al.
Published: (2024)
by: Hu, Yutong, et al.
Published: (2024)
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
by: Ruan, Jie, et al.
Published: (2024)
by: Ruan, Jie, et al.
Published: (2024)
MINOS: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
by: Zhang, Junzhe, et al.
Published: (2025)
by: Zhang, Junzhe, et al.
Published: (2025)
Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators
by: Chang, Jiayi, et al.
Published: (2025)
by: Chang, Jiayi, et al.
Published: (2025)
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
by: Gao, Mingqi, et al.
Published: (2024)
by: Gao, Mingqi, et al.
Published: (2024)
SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models
by: Yang, Ziyi, et al.
Published: (2025)
by: Yang, Ziyi, et al.
Published: (2025)
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
by: Afane, Mohamed, et al.
Published: (2025)
by: Afane, Mohamed, et al.
Published: (2025)
How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
by: Que, Haoran, et al.
Published: (2024)
by: Que, Haoran, et al.
Published: (2024)
Improving Long Text Understanding with Knowledge Distilled from Summarization Model
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning
by: Wang, Xiaorong, et al.
Published: (2025)
by: Wang, Xiaorong, et al.
Published: (2025)
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
by: Zhong, Meizhi, et al.
Published: (2024)
by: Zhong, Meizhi, et al.
Published: (2024)
CFunModel: A "Funny" Language Model Capable of Chinese Humor Generation and Processing
by: Yu, Zhenghan, et al.
Published: (2025)
by: Yu, Zhenghan, et al.
Published: (2025)
SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval
by: Li, Jiatao, et al.
Published: (2024)
by: Li, Jiatao, et al.
Published: (2024)
Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review
by: Li, Jiatao, et al.
Published: (2025)
by: Li, Jiatao, et al.
Published: (2025)
LLM-based NLG Evaluation: Current Status and Challenges
by: Gao, Mingqi, et al.
Published: (2024)
by: Gao, Mingqi, et al.
Published: (2024)
Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability
by: Hu, Xinyu, et al.
Published: (2024)
by: Hu, Xinyu, et al.
Published: (2024)
A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability
by: Hu, Xinyu, et al.
Published: (2025)
by: Hu, Xinyu, et al.
Published: (2025)
When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?
by: Liu, Tianyu, et al.
Published: (2026)
by: Liu, Tianyu, et al.
Published: (2026)
RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery
by: Gu, Hongchao, et al.
Published: (2025)
by: Gu, Hongchao, et al.
Published: (2025)
LongIns: A Challenging Long-context Instruction-based Exam for LLMs
by: Gavin, Shawn, et al.
Published: (2024)
by: Gavin, Shawn, et al.
Published: (2024)
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis
by: Zhang, Wenqing, et al.
Published: (2025)
by: Zhang, Wenqing, et al.
Published: (2025)
UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models
by: Zhang, Huixuan, et al.
Published: (2024)
by: Zhang, Huixuan, et al.
Published: (2024)
Decoupling SQL Query Hardness Parsing for Text-to-SQL
by: Yi, Jiawen, et al.
Published: (2023)
by: Yi, Jiawen, et al.
Published: (2023)
Self-Evolution Fine-Tuning for Policy Optimization
by: Chen, Ruijun, et al.
Published: (2024)
by: Chen, Ruijun, et al.
Published: (2024)
DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding
by: Feng, Xiang, et al.
Published: (2026)
by: Feng, Xiang, et al.
Published: (2026)
Similar Items
-
SagaScale: A Realistic, Scalable, and High-Quality Long-Context Benchmark Built from Full-Length Novels
by: Du, Guancheng, et al.
Published: (2025) -
SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
by: Zhang, Zhenliang, et al.
Published: (2025) -
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
by: Zhang, Zhenliang, et al.
Published: (2025) -
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
by: Chen, Xiang, et al.
Published: (2023) -
Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
by: Zhang, Zhenliang, et al.
Published: (2025)