:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Chaoqun, Luo, Renjie, Hu, Shengding, Zhao, Yuanqian, Zhou, Jie, Wu, Hanghao, Zhang, Jiajie, Han, Xu, Liu, Zhiyuan, Sun, Maosong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2404.07584
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)

States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
by: Chen, Junhao, et al.
Published: (2024)

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition
by: Huang, Yufei, et al.
Published: (2024)

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
by: Cheng, Zhili, et al.
Published: (2025)

Predicting Emergent Abilities with Infinite Resolution Evaluation
by: Hu, Shengding, et al.
Published: (2023)

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
by: He, Chaoqun, et al.
Published: (2024)

Stuffed Mamba: Oversized States Lead to the Inability to Forget
by: Chen, Yingfa, et al.
Published: (2024)

LEGENT: Open Platform for Embodied Agents
by: Cheng, Zhili, et al.
Published: (2024)

DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models
by: Zhao, Ranchi, et al.
Published: (2024)

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
by: Luo, Kairong, et al.
Published: (2025)

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
by: Zhang, Xinrong, et al.
Published: (2024)

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
by: Hu, Jinyi, et al.
Published: (2024)

Densing Law of LLMs
by: Xiao, Chaojun, et al.
Published: (2024)

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
by: Zhang, Xinrong, et al.
Published: (2024)

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
by: He, Chaoqun, et al.
Published: (2026)

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
by: Song, Chenyang, et al.
Published: (2024)

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
by: He, Zheqi, et al.
Published: (2025)

MMCircuitEval: A Comprehensive Multimodal Circuit-Focused Benchmark for Evaluating LLMs
by: Zhao, Chenchen, et al.
Published: (2025)

Matrix Fejér-Riesz type theorem for a union of an interval and a point
by: Sun, Shengding, et al.
Published: (2025)

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
by: Hu, Jinyi, et al.
Published: (2023)

STExplore: An Integrated Online Platform for Comprehensive Analysis and Visualization of Spatial Transcriptomics Data
by: Yongtian Wang, et al.
Published: (2025)

On the strength of Burer's lifted convex relaxation to quadratic programming with ball constraints
by: Kılınç-Karzan, Fatma, et al.
Published: (2024)

LiCoEval: Evaluating LLMs on License Compliance in Code Generation
by: Xu, Weiwei, et al.
Published: (2024)

VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
by: Song, Tingyu, et al.
Published: (2025)

Representation Learning for Natural Language Processing
by: Liu, Zhiyuan, et al.
Published: (2020)

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
by: Hu, Shengding, et al.
Published: (2024)

Fusion-Eval: Integrating Assistant Evaluators with LLMs
by: Shu, Lei, et al.
Published: (2023)

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs
by: Li, Shangzhan, et al.
Published: (2025)

GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
by: Feng, Tao, et al.
Published: (2025)

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
by: Xiong, Miao, et al.
Published: (2023)

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
by: Zhang, Mengyuan, et al.
Published: (2024)

Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values
by: Yao, Jing, et al.
Published: (2025)

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
by: Huang, Yuxiang, et al.
Published: (2025)

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
by: Gao, Cheng, et al.
Published: (2025)

SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
by: Yueh-Han, Chen, et al.
Published: (2025)

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
by: Xiao, Chaojun, et al.
Published: (2024)

A Microgravity Simulation Experimental Platform For Small Space Robots In Orbit
by: Luo, Hang, et al.
Published: (2025)

CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices
by: Zhao, Weilin, et al.
Published: (2023)

HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
by: Cheng, Tsz Chung, et al.
Published: (2025)

MiniCPM4: Ultra-Efficient LLMs on End Devices
by: MiniCPM Team, et al.
Published: (2025)