Guardado en:
| Autores principales: | Li, Yingru, Xu, Jiawei, Han, Lei, Luo, Zhi-Quan |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2402.10228 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Scalable Exploration via Ensemble++
por: Li, Yingru, et al.
Publicado: (2024)
por: Li, Yingru, et al.
Publicado: (2024)
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation
por: Li, Yingru, et al.
Publicado: (2024)
por: Li, Yingru, et al.
Publicado: (2024)
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
por: Li, Yingru, et al.
Publicado: (2024)
por: Li, Yingru, et al.
Publicado: (2024)
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
por: Liang, Hao, et al.
Publicado: (2022)
por: Liang, Hao, et al.
Publicado: (2022)
Logit Dynamics in Softmax Policy Gradient Methods
por: Li, Yingru
Publicado: (2025)
por: Li, Yingru
Publicado: (2025)
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
por: Phan, Huy Nhat, et al.
Publicado: (2024)
por: Phan, Huy Nhat, et al.
Publicado: (2024)
Scalable In-Context Q-Learning
por: Liu, Jinmei, et al.
Publicado: (2025)
por: Liu, Jinmei, et al.
Publicado: (2025)
Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It
por: Zhang, Yaxiang, et al.
Publicado: (2026)
por: Zhang, Yaxiang, et al.
Publicado: (2026)
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
por: Xu, Jiawei, et al.
Publicado: (2024)
por: Xu, Jiawei, et al.
Publicado: (2024)
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
por: Liu, Wei, et al.
Publicado: (2026)
por: Liu, Wei, et al.
Publicado: (2026)
Trust Region Masking for Long-Horizon LLM Reinforcement Learning
por: Li, Yingru, et al.
Publicado: (2025)
por: Li, Yingru, et al.
Publicado: (2025)
Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)
por: Ma, Pingchuan, et al.
Publicado: (2024)
por: Ma, Pingchuan, et al.
Publicado: (2024)
Bridging Theory and Practice in Link Representation with Graph Neural Networks
por: Lachi, Veronica, et al.
Publicado: (2025)
por: Lachi, Veronica, et al.
Publicado: (2025)
Language Agents Meet Causality -- Bridging LLMs and Causal World Models
por: Gkountouras, John, et al.
Publicado: (2024)
por: Gkountouras, John, et al.
Publicado: (2024)
Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail
por: Li, Yingru, et al.
Publicado: (2025)
por: Li, Yingru, et al.
Publicado: (2025)
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
por: Xiong, Wei, et al.
Publicado: (2023)
por: Xiong, Wei, et al.
Publicado: (2023)
\textsc{MasFACT}: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer
por: Wang, Xuefei, et al.
Publicado: (2026)
por: Wang, Xuefei, et al.
Publicado: (2026)
Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
por: Lin, Pengxiao, et al.
Publicado: (2025)
por: Lin, Pengxiao, et al.
Publicado: (2025)
Divergence-Augmented Policy Optimization
por: Wang, Qing, et al.
Publicado: (2025)
por: Wang, Qing, et al.
Publicado: (2025)
Solving Diffusion Inverse Problems with Restart Posterior Sampling
por: Ahmed, Bilal, et al.
Publicado: (2025)
por: Ahmed, Bilal, et al.
Publicado: (2025)
A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms
por: Li, Yingru, et al.
Publicado: (2025)
por: Li, Yingru, et al.
Publicado: (2025)
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
por: Nguyen-Tang, Thanh, et al.
Publicado: (2024)
por: Nguyen-Tang, Thanh, et al.
Publicado: (2024)
Large Language Model Agent for Hyper-Parameter Optimization
por: Liu, Siyi, et al.
Publicado: (2024)
por: Liu, Siyi, et al.
Publicado: (2024)
Towards Scalable and Deep Graph Neural Networks via Noise Masking
por: Liang, Yuxuan, et al.
Publicado: (2024)
por: Liang, Yuxuan, et al.
Publicado: (2024)
FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
por: Lv, Lei, et al.
Publicado: (2026)
por: Lv, Lei, et al.
Publicado: (2026)
Flexible Bayesian Last Layer Models Using Implicit Priors and Diffusion Posterior Sampling
por: Xu, Jian, et al.
Publicado: (2024)
por: Xu, Jian, et al.
Publicado: (2024)
Diffusion Posterior Sampling is Computationally Intractable
por: Gupta, Shivam, et al.
Publicado: (2024)
por: Gupta, Shivam, et al.
Publicado: (2024)
Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
por: Parulekar, Advait, et al.
Publicado: (2025)
por: Parulekar, Advait, et al.
Publicado: (2025)
HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents
por: Li, Guankai, et al.
Publicado: (2026)
por: Li, Guankai, et al.
Publicado: (2026)
TensorHyper-VQC: A Tensor-Train-Guided Hypernetwork for Robust and Scalable Variational Quantum Computing
por: Qi, Jun, et al.
Publicado: (2025)
por: Qi, Jun, et al.
Publicado: (2025)
Agentic Unlearning: When LLM Agent Meets Machine Unlearning
por: Wang, Bin, et al.
Publicado: (2026)
por: Wang, Bin, et al.
Publicado: (2026)
Relative Policy-Transition Optimization for Fast Policy Transfer
por: Xu, Jiawei, et al.
Publicado: (2022)
por: Xu, Jiawei, et al.
Publicado: (2022)
Coupled Data and Measurement Space Dynamics for Enhanced Diffusion Posterior Sampling
por: Hamidi, Shayan Mohajer, et al.
Publicado: (2025)
por: Hamidi, Shayan Mohajer, et al.
Publicado: (2025)
Bridging Geometric States via Geometric Diffusion Bridge
por: Luo, Shengjie, et al.
Publicado: (2024)
por: Luo, Shengjie, et al.
Publicado: (2024)
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
por: Jin, Bowen, et al.
Publicado: (2024)
por: Jin, Bowen, et al.
Publicado: (2024)
SVRG and Beyond via Posterior Correction
por: Daheim, Nico, et al.
Publicado: (2025)
por: Daheim, Nico, et al.
Publicado: (2025)
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling
por: Agarwal, Alekh, et al.
Publicado: (2022)
por: Agarwal, Alekh, et al.
Publicado: (2022)
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning
por: Namkoong, Hongseok, et al.
Publicado: (2020)
por: Namkoong, Hongseok, et al.
Publicado: (2020)
TANGNN: a Concise, Scalable and Effective Graph Neural Networks with Top-m Attention Mechanism for Graph Representation Learning
por: E, Jiawei, et al.
Publicado: (2024)
por: E, Jiawei, et al.
Publicado: (2024)
Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching
por: Havens, Aaron, et al.
Publicado: (2025)
por: Havens, Aaron, et al.
Publicado: (2025)
Ejemplares similares
-
Scalable Exploration via Ensemble++
por: Li, Yingru, et al.
Publicado: (2024) -
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation
por: Li, Yingru, et al.
Publicado: (2024) -
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
por: Li, Yingru, et al.
Publicado: (2024) -
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
por: Liang, Hao, et al.
Publicado: (2022) -
Logit Dynamics in Softmax Policy Gradient Methods
por: Li, Yingru
Publicado: (2025)