Guardat en:
| Autors principals: | Tajwar, Fahim, Jiang, Yiding, Thankaraj, Abitha, Rahman, Sumaita Sadia, Kolter, J Zico, Schneider, Jeff, Salakhutdinov, Ruslan |
|---|---|
| Format: | Preprint |
| Publicat: |
2025
|
| Matèries: | |
| Accés en línia: | https://arxiv.org/abs/2502.17543 |
| Etiquetes: |
Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
Ítems similars
Looking beyond the next token
per: Thankaraj, Abitha, et al.
Publicat: (2025)
per: Thankaraj, Abitha, et al.
Publicat: (2025)
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
per: Duan, Xintong, et al.
Publicat: (2025)
per: Duan, Xintong, et al.
Publicat: (2025)
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
per: He, Yutong, et al.
Publicat: (2024)
per: He, Yutong, et al.
Publicat: (2024)
Can Large Reasoning Models Self-Train?
per: Shafayat, Sheikh, et al.
Publicat: (2025)
per: Shafayat, Sheikh, et al.
Publicat: (2025)
Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion
per: Dontas, Michail, et al.
Publicat: (2024)
per: Dontas, Michail, et al.
Publicat: (2024)
State Combinatorial Generalization In Decision Making With Conditional Diffusion Models
per: Duan, Xintong, et al.
Publicat: (2025)
per: Duan, Xintong, et al.
Publicat: (2025)
Tree Search for Language Model Agents
per: Koh, Jing Yu, et al.
Publicat: (2024)
per: Koh, Jing Yu, et al.
Publicat: (2024)
A Simple and Effective Pruning Approach for Large Language Models
per: Sun, Mingjie, et al.
Publicat: (2023)
per: Sun, Mingjie, et al.
Publicat: (2023)
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
per: Xu, Yixuan Even, et al.
Publicat: (2025)
per: Xu, Yixuan Even, et al.
Publicat: (2025)
InSTA: Towards Internet-Scale Training For Agents
per: Trabucco, Brandon, et al.
Publicat: (2025)
per: Trabucco, Brandon, et al.
Publicat: (2025)
Base Models Look Human To AI Detectors
per: Xu, Yixuan Even, et al.
Publicat: (2026)
per: Xu, Yixuan Even, et al.
Publicat: (2026)
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
per: Qu, Yuxiao, et al.
Publicat: (2025)
per: Qu, Yuxiao, et al.
Publicat: (2025)
Self-Regulation and Requesting Interventions
per: Min, So Yeon, et al.
Publicat: (2025)
per: Min, So Yeon, et al.
Publicat: (2025)
Maximum Likelihood Reinforcement Learning
per: Tajwar, Fahim, et al.
Publicat: (2026)
per: Tajwar, Fahim, et al.
Publicat: (2026)
POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration
per: Qu, Yuxiao, et al.
Publicat: (2026)
per: Qu, Yuxiao, et al.
Publicat: (2026)
Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
per: Chen, Wen-Tse, et al.
Publicat: (2026)
per: Chen, Wen-Tse, et al.
Publicat: (2026)
Mimetic Initialization of MLPs
per: Trockman, Asher, et al.
Publicat: (2026)
per: Trockman, Asher, et al.
Publicat: (2026)
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
per: Andriushchenko, Maksym, et al.
Publicat: (2024)
per: Andriushchenko, Maksym, et al.
Publicat: (2024)
Existing Large Language Model Unlearning Evaluations Are Inconclusive
per: Feng, Zhili, et al.
Publicat: (2025)
per: Feng, Zhili, et al.
Publicat: (2025)
Antidistillation Fingerprinting
per: Xu, Yixuan Even, et al.
Publicat: (2026)
per: Xu, Yixuan Even, et al.
Publicat: (2026)
Reasoning as an Adaptive Defense for Safety
per: Kim, Taeyoun, et al.
Publicat: (2025)
per: Kim, Taeyoun, et al.
Publicat: (2025)
Contrastive Difference Predictive Coding
per: Zheng, Chongyi, et al.
Publicat: (2023)
per: Zheng, Chongyi, et al.
Publicat: (2023)
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
per: Qu, Yuxiao, et al.
Publicat: (2025)
per: Qu, Yuxiao, et al.
Publicat: (2025)
ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use
per: Tien, Jeremy, et al.
Publicat: (2026)
per: Tien, Jeremy, et al.
Publicat: (2026)
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line
per: Kim, Eungyeup, et al.
Publicat: (2023)
per: Kim, Eungyeup, et al.
Publicat: (2023)
Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
per: Sokota, Samuel, et al.
Publicat: (2025)
per: Sokota, Samuel, et al.
Publicat: (2025)
FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers
per: Williams, Joshua Nathaniel, et al.
Publicat: (2024)
per: Williams, Joshua Nathaniel, et al.
Publicat: (2024)
Weight Ensembling Improves Reasoning in Language Models
per: Dang, Xingyu, et al.
Publicat: (2025)
per: Dang, Xingyu, et al.
Publicat: (2025)
Neural Network Verification with Branch-and-Bound for General Nonlinearities
per: Shi, Zhouxing, et al.
Publicat: (2024)
per: Shi, Zhouxing, et al.
Publicat: (2024)
CaRT: Teaching LLM Agents to Know When They Know Enough
per: Liu, Grace, et al.
Publicat: (2025)
per: Liu, Grace, et al.
Publicat: (2025)
Conservative Prediction via Data-Driven Confidence Minimization
per: Choi, Caroline, et al.
Publicat: (2023)
per: Choi, Caroline, et al.
Publicat: (2023)
Predicting the Performance of Black-box LLMs through Follow-up Queries
per: Sam, Dylan, et al.
Publicat: (2025)
per: Sam, Dylan, et al.
Publicat: (2025)
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
per: Bick, Aviv, et al.
Publicat: (2024)
per: Bick, Aviv, et al.
Publicat: (2024)
Provably Bounding Neural Network Preimages
per: Kotha, Suhas, et al.
Publicat: (2023)
per: Kotha, Suhas, et al.
Publicat: (2023)
Compute-Optimal LLMs Provably Generalize Better With Scale
per: Finzi, Marc, et al.
Publicat: (2025)
per: Finzi, Marc, et al.
Publicat: (2025)
Multi-Agent Computer Use
per: Koh, Jing Yu, et al.
Publicat: (2026)
per: Koh, Jing Yu, et al.
Publicat: (2026)
HEMM: Holistic Evaluation of Multimodal Foundation Models
per: Liang, Paul Pu, et al.
Publicat: (2024)
per: Liang, Paul Pu, et al.
Publicat: (2024)
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
per: Tang, Pingzhi, et al.
Publicat: (2026)
per: Tang, Pingzhi, et al.
Publicat: (2026)
Contextures: Representations from Contexts
per: Zhai, Runtian, et al.
Publicat: (2025)
per: Zhai, Runtian, et al.
Publicat: (2025)
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
per: Li, Kevin Y., et al.
Publicat: (2024)
per: Li, Kevin Y., et al.
Publicat: (2024)
Ítems similars
-
Looking beyond the next token
per: Thankaraj, Abitha, et al.
Publicat: (2025) -
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
per: Duan, Xintong, et al.
Publicat: (2025) -
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
per: He, Yutong, et al.
Publicat: (2024) -
Can Large Reasoning Models Self-Train?
per: Shafayat, Sheikh, et al.
Publicat: (2025) -
Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion
per: Dontas, Michail, et al.
Publicat: (2024)