Saved in:
| Main Authors: | Liang, Renzhao, Chen, Jingru, Jia, Bo, Deng, Bo, Xie, Chenggang, Wang, Yidong, Jin, Ke, Wang, Xin, Zhang, Linfeng, Wang, Cunxiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.04895 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
by: Liang, Renzhao, et al.
Published: (2025)
by: Liang, Renzhao, et al.
Published: (2025)
MVSS: A Unified Framework for Multi-View Structured Survey Generation
by: Liu, Yinqi, et al.
Published: (2026)
by: Liu, Yinqi, et al.
Published: (2026)
RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
by: Feng, Andrew Zhuoer, et al.
Published: (2026)
by: Feng, Andrew Zhuoer, et al.
Published: (2026)
A Survey on Evaluation of Large Language Models
by: Chang, Yupeng, et al.
Published: (2023)
by: Chang, Yupeng, et al.
Published: (2023)
The characteristic polynomials of $r$-uniform hypercycles with length $l$
by: Bo, Dong, et al.
Published: (2025)
by: Bo, Dong, et al.
Published: (2025)
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
by: Zhu, Kaijie, et al.
Published: (2023)
by: Zhu, Kaijie, et al.
Published: (2023)
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error
by: Yang, Shu-Xun, et al.
Published: (2025)
by: Yang, Shu-Xun, et al.
Published: (2025)
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
by: Yang, Chuanpeng, et al.
Published: (2024)
by: Yang, Chuanpeng, et al.
Published: (2024)
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future
by: Wang, Yidong, et al.
Published: (2025)
by: Wang, Yidong, et al.
Published: (2025)
HAVE: Head-Adaptive Gating and ValuE Calibration for Hallucination Mitigation in Large Language Models
by: Tong, Xin, et al.
Published: (2025)
by: Tong, Xin, et al.
Published: (2025)
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
by: Tong, Xin, et al.
Published: (2024)
by: Tong, Xin, et al.
Published: (2024)
RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data
by: Mou, Xudong, et al.
Published: (2025)
by: Mou, Xudong, et al.
Published: (2025)
A Survey on Evaluating Large Language Models in Code Generation Tasks
by: Chen, Liguo, et al.
Published: (2024)
by: Chen, Liguo, et al.
Published: (2024)
LongSafety: Evaluating Long-Context Safety of Large Language Models
by: Lu, Yida, et al.
Published: (2025)
by: Lu, Yida, et al.
Published: (2025)
A Comprehensive Survey of Contamination Detection Methods in Large Language Models
by: Ravaut, Mathieu, et al.
Published: (2024)
by: Ravaut, Mathieu, et al.
Published: (2024)
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
by: Wen, Bosi, et al.
Published: (2026)
by: Wen, Bosi, et al.
Published: (2026)
CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models
by: Zhu, Wenhong, et al.
Published: (2023)
by: Zhu, Wenhong, et al.
Published: (2023)
Integrated approach of machine learning, Mendelian randomization and experimental validation for biomarker discovery in diabetic nephropathy
by: Yidong Zhu, et al.
Published: (2024)
by: Yidong Zhu, et al.
Published: (2024)
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
by: Wang, Yidong, et al.
Published: (2023)
by: Wang, Yidong, et al.
Published: (2023)
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Distributed Pseudo-Likelihood Method for Community Detection in Large-Scale Networks
by: Deng, Jiayi, et al.
Published: (2024)
by: Deng, Jiayi, et al.
Published: (2024)
On the Evaluation of Large Language Models in Unit Test Generation
by: Yang, Lin, et al.
Published: (2024)
by: Yang, Lin, et al.
Published: (2024)
MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors
by: Tong, Xin, et al.
Published: (2025)
by: Tong, Xin, et al.
Published: (2025)
Detecting Data Contamination in Large Language Models
by: Janicki, Juliusz, et al.
Published: (2026)
by: Janicki, Juliusz, et al.
Published: (2026)
Evaluation of Large Language Models for Numeric Anomaly Detection in Power Systems
by: Liu, Yichen, et al.
Published: (2025)
by: Liu, Yichen, et al.
Published: (2025)
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
by: Dong, Yihong, et al.
Published: (2024)
by: Dong, Yihong, et al.
Published: (2024)
A Language Anchor-Guided Method for Robust Noisy Domain Generalization
by: Dai, Zilin, et al.
Published: (2025)
by: Dai, Zilin, et al.
Published: (2025)
A Bayesian Hybrid Parameter-Efficient Fine-Tuning Method for Large Language Models
by: Chai, Yidong, et al.
Published: (2025)
by: Chai, Yidong, et al.
Published: (2025)
CrossHOI-Bench: A Unified Benchmark for HOI Evaluation across Vision-Language Models and HOI-Specific Methods
by: Lei, Qinqian, et al.
Published: (2025)
by: Lei, Qinqian, et al.
Published: (2025)
IndustryCode: A Benchmark for Industry Code Generation
by: Zeng, Puyu, et al.
Published: (2026)
by: Zeng, Puyu, et al.
Published: (2026)
PclGPT: A Large Language Model for Patronizing and Condescending Language Detection
by: Wang, Hongbo, et al.
Published: (2024)
by: Wang, Hongbo, et al.
Published: (2024)
Test-time GNN Model Evaluation on Dynamic Graphs
by: Li, Bo, et al.
Published: (2025)
by: Li, Bo, et al.
Published: (2025)
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
by: Wang, Yuxia, et al.
Published: (2024)
by: Wang, Yuxia, et al.
Published: (2024)
Strong decays of the isovector-scalar $D^\ast\bar{D}^\ast$ hadronic molecule
by: Deng, Jin-Cheng, et al.
Published: (2024)
by: Deng, Jin-Cheng, et al.
Published: (2024)
RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis
by: Feng, Andrew Zhuoer, et al.
Published: (2026)
by: Feng, Andrew Zhuoer, et al.
Published: (2026)
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
by: Golchin, Shahriar, et al.
Published: (2023)
by: Golchin, Shahriar, et al.
Published: (2023)
RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature
by: Li, Hanzheng, et al.
Published: (2025)
by: Li, Hanzheng, et al.
Published: (2025)
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
by: Tao, Yongding, et al.
Published: (2025)
by: Tao, Yongding, et al.
Published: (2025)
Similar Items
-
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
by: Liang, Renzhao, et al.
Published: (2025) -
MVSS: A Unified Framework for Multi-View Structured Survey Generation
by: Liu, Yinqi, et al.
Published: (2026) -
RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
by: Feng, Andrew Zhuoer, et al.
Published: (2026) -
A Survey on Evaluation of Large Language Models
by: Chang, Yupeng, et al.
Published: (2023) -
The characteristic polynomials of $r$-uniform hypercycles with length $l$
by: Bo, Dong, et al.
Published: (2025)