:: Library Catalog

Imatge de la portada

Guardat en:

Dades bibliogràfiques
Autors principals:	Tajwar, Fahim, Jiang, Yiding, Thankaraj, Abitha, Rahman, Sumaita Sadia, Kolter, J Zico, Schneider, Jeff, Salakhutdinov, Ruslan
Format:	Preprint
Publicat:	2025
Matèries:	Machine Learning Artificial Intelligence Computation and Language
Accés en línia:	https://arxiv.org/abs/2502.17543
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Ítems similars

Looking beyond the next token
per: Thankaraj, Abitha, et al.
Publicat: (2025)

Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
per: Duan, Xintong, et al.
Publicat: (2025)

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
per: He, Yutong, et al.
Publicat: (2024)

Can Large Reasoning Models Self-Train?
per: Shafayat, Sheikh, et al.
Publicat: (2025)

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion
per: Dontas, Michail, et al.
Publicat: (2024)

State Combinatorial Generalization In Decision Making With Conditional Diffusion Models
per: Duan, Xintong, et al.
Publicat: (2025)

Tree Search for Language Model Agents
per: Koh, Jing Yu, et al.
Publicat: (2024)

A Simple and Effective Pruning Approach for Large Language Models
per: Sun, Mingjie, et al.
Publicat: (2023)

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
per: Xu, Yixuan Even, et al.
Publicat: (2025)

InSTA: Towards Internet-Scale Training For Agents
per: Trabucco, Brandon, et al.
Publicat: (2025)

Base Models Look Human To AI Detectors
per: Xu, Yixuan Even, et al.
Publicat: (2026)

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
per: Qu, Yuxiao, et al.
Publicat: (2025)

Self-Regulation and Requesting Interventions
per: Min, So Yeon, et al.
Publicat: (2025)

Maximum Likelihood Reinforcement Learning
per: Tajwar, Fahim, et al.
Publicat: (2026)

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration
per: Qu, Yuxiao, et al.
Publicat: (2026)

Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
per: Chen, Wen-Tse, et al.
Publicat: (2026)

Mimetic Initialization of MLPs
per: Trockman, Asher, et al.
Publicat: (2026)

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
per: Andriushchenko, Maksym, et al.
Publicat: (2024)

Existing Large Language Model Unlearning Evaluations Are Inconclusive
per: Feng, Zhili, et al.
Publicat: (2025)

Antidistillation Fingerprinting
per: Xu, Yixuan Even, et al.
Publicat: (2026)

Reasoning as an Adaptive Defense for Safety
per: Kim, Taeyoun, et al.
Publicat: (2025)

Contrastive Difference Predictive Coding
per: Zheng, Chongyi, et al.
Publicat: (2023)

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
per: Qu, Yuxiao, et al.
Publicat: (2025)

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use
per: Tien, Jeremy, et al.
Publicat: (2026)

Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line
per: Kim, Eungyeup, et al.
Publicat: (2023)

Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
per: Sokota, Samuel, et al.
Publicat: (2025)

FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers
per: Williams, Joshua Nathaniel, et al.
Publicat: (2024)

Weight Ensembling Improves Reasoning in Language Models
per: Dang, Xingyu, et al.
Publicat: (2025)

Neural Network Verification with Branch-and-Bound for General Nonlinearities
per: Shi, Zhouxing, et al.
Publicat: (2024)

CaRT: Teaching LLM Agents to Know When They Know Enough
per: Liu, Grace, et al.
Publicat: (2025)

Conservative Prediction via Data-Driven Confidence Minimization
per: Choi, Caroline, et al.
Publicat: (2023)

Predicting the Performance of Black-box LLMs through Follow-up Queries
per: Sam, Dylan, et al.
Publicat: (2025)

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
per: Bick, Aviv, et al.
Publicat: (2024)

Provably Bounding Neural Network Preimages
per: Kotha, Suhas, et al.
Publicat: (2023)

Compute-Optimal LLMs Provably Generalize Better With Scale
per: Finzi, Marc, et al.
Publicat: (2025)

Multi-Agent Computer Use
per: Koh, Jing Yu, et al.
Publicat: (2026)

HEMM: Holistic Evaluation of Multimodal Foundation Models
per: Liang, Paul Pu, et al.
Publicat: (2024)

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
per: Tang, Pingzhi, et al.
Publicat: (2026)

Contextures: Representations from Contexts
per: Zhai, Runtian, et al.
Publicat: (2025)

Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
per: Li, Kevin Y., et al.
Publicat: (2024)