:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hu, Haiquan, Jiang, Jiazhi, Xu, Shiyou, Zeng, Ruhan, Wang, Tian
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language Artificial Intelligence Machine Learning
Online-Zugang:	https://arxiv.org/abs/2508.12096
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Sample-Efficient Alignment for LLMs
von: Liu, Zichen, et al.
Veröffentlicht: (2024)

Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
von: Wang, Ganghua, et al.
Veröffentlicht: (2025)

PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training
von: Yi, Rongjie, et al.
Veröffentlicht: (2024)

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents
von: Turk, Matt
Veröffentlicht: (2026)

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
von: Zhou, Qinhao, et al.
Veröffentlicht: (2024)

Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
von: Tian, Yijun, et al.
Veröffentlicht: (2024)

Mathematical Derivation Graphs: A Relation Extraction Task in STEM Manuscripts
von: Prasad, Vishesh, et al.
Veröffentlicht: (2024)

Expanding Foundational Language Capabilities in Open-Source LLMs through a Korean Case Study
von: Lim, Junghwan, et al.
Veröffentlicht: (2025)

PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities
von: Li, Haoming, et al.
Veröffentlicht: (2025)

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
von: Zhang, Xingxuan, et al.
Veröffentlicht: (2025)

Quantifying the Capabilities of LLMs across Scale and Precision
von: Badshah, Sher, et al.
Veröffentlicht: (2024)

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy
von: Dong, Yihong, et al.
Veröffentlicht: (2026)

How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
von: Feng, Guhao, et al.
Veröffentlicht: (2024)

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
von: DeepSeek-AI, et al.
Veröffentlicht: (2025)

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression
von: Wang, Cangqing, et al.
Veröffentlicht: (2024)

Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs
von: Han, Pengrui, et al.
Veröffentlicht: (2026)

LLM Augmented LLMs: Expanding Capabilities through Composition
von: Bansal, Rachit, et al.
Veröffentlicht: (2024)

Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
von: Yuan, Jiaqing, et al.
Veröffentlicht: (2024)

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
von: Deng, Wenhao, et al.
Veröffentlicht: (2025)

LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
von: Faiz, Ahmad, et al.
Veröffentlicht: (2023)

Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
von: Sun, Lin, et al.
Veröffentlicht: (2025)

Measuring Vision-Language STEM Skills of Neural Models
von: Shen, Jianhao, et al.
Veröffentlicht: (2024)

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
von: Dong, Yihong, et al.
Veröffentlicht: (2025)

Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs
von: Meng, Xianzhe, et al.
Veröffentlicht: (2026)

AgentBench: Evaluating LLMs as Agents
von: Liu, Xiao, et al.
Veröffentlicht: (2023)

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
von: Lin, Xiaofeng, et al.
Veröffentlicht: (2026)

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training
von: Zhuang, Yuchen, et al.
Veröffentlicht: (2025)

FlashSampling: Fast and Memory-Efficient Exact Sampling
von: Ruiz, Tomas, et al.
Veröffentlicht: (2026)

Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
von: Dai, Hankun, et al.
Veröffentlicht: (2025)

VerAs: Verify then Assess STEM Lab Reports
von: Atil, Berk, et al.
Veröffentlicht: (2024)

Earley-Driven Dynamic Pruning for Efficient Structured Decoding
von: Sun, Xintong, et al.
Veröffentlicht: (2025)

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
von: Lin, Zicheng, et al.
Veröffentlicht: (2024)

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
von: Xu, Peng, et al.
Veröffentlicht: (2024)

ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining
von: Kim, Seonwu, et al.
Veröffentlicht: (2025)

Efficient multi-prompt evaluation of LLMs
von: Polo, Felipe Maia, et al.
Veröffentlicht: (2024)

Early Signs of Steganographic Capabilities in Frontier LLMs
von: Zolkowski, Artur, et al.
Veröffentlicht: (2025)

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
von: Zhu, Kan, et al.
Veröffentlicht: (2025)

Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities
von: Ball, Thomas, et al.
Veröffentlicht: (2024)

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design
von: Neehal, Nafis, et al.
Veröffentlicht: (2024)

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
von: Wang, Xu, et al.
Veröffentlicht: (2025)