Na minha lista:
| Principais autores: | Yu, Chao, Tan, Qixin, Gao, Jiaxuan, Yu, Shi, Lu, Hong, Yang, Xinting, Xu, Zelai, Wang, Yu, Wu, Yi, Vinitsky, Eugene |
|---|---|
| Formato: | Preprint |
| Publicado em: |
2025
|
| Assuntos: | |
| Acesso em linha: | https://arxiv.org/abs/2511.15738 |
| Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Registros relacionados
ICPL: Few-shot In-context Preference Learning via LLMs
por: Yu, Chao, et al.
Publicado em: (2024)
por: Yu, Chao, et al.
Publicado em: (2024)
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
por: Qiu, Le, et al.
Publicado em: (2025)
por: Qiu, Le, et al.
Publicado em: (2025)
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
por: Yang, Lu, et al.
Publicado em: (2026)
por: Yang, Lu, et al.
Publicado em: (2026)
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
por: Xu, Zelai, et al.
Publicado em: (2023)
por: Xu, Zelai, et al.
Publicado em: (2023)
Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
por: Xu, Zelai, et al.
Publicado em: (2025)
por: Xu, Zelai, et al.
Publicado em: (2025)
Verifiable Process Rewards for Agentic Reasoning
por: Yuan, Huining, et al.
Publicado em: (2026)
por: Yuan, Huining, et al.
Publicado em: (2026)
Human-compatible driving partners through data-regularized self-play reinforcement learning
por: Cornelisse, Daphne, et al.
Publicado em: (2024)
por: Cornelisse, Daphne, et al.
Publicado em: (2024)
Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
por: Sokota, Samuel, et al.
Publicado em: (2025)
por: Sokota, Samuel, et al.
Publicado em: (2025)
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
por: Xu, Zelai, et al.
Publicado em: (2026)
por: Xu, Zelai, et al.
Publicado em: (2026)
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
por: Yang, Ningyuan, et al.
Publicado em: (2025)
por: Yang, Ningyuan, et al.
Publicado em: (2025)
EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models
por: Tan, Zheyue, et al.
Publicado em: (2025)
por: Tan, Zheyue, et al.
Publicado em: (2025)
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
por: Xu, Zelai, et al.
Publicado em: (2025)
por: Xu, Zelai, et al.
Publicado em: (2025)
How Far Are We from Optimal Reasoning Efficiency?
por: Gao, Jiaxuan, et al.
Publicado em: (2025)
por: Gao, Jiaxuan, et al.
Publicado em: (2025)
LAGOON: Language-Guided Motion Control
por: Xu, Shusheng, et al.
Publicado em: (2023)
por: Xu, Shusheng, et al.
Publicado em: (2023)
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
por: Gao, Jiaxuan, et al.
Publicado em: (2025)
por: Gao, Jiaxuan, et al.
Publicado em: (2025)
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
por: Wei, Chenxing, et al.
Publicado em: (2025)
por: Wei, Chenxing, et al.
Publicado em: (2025)
Video Game Level Design as a Multi-Agent Reinforcement Learning Problem
por: Earle, Sam, et al.
Publicado em: (2025)
por: Earle, Sam, et al.
Publicado em: (2025)
RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
por: Cao, Xiaoyang, et al.
Publicado em: (2025)
por: Cao, Xiaoyang, et al.
Publicado em: (2025)
Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
por: Zhang, Ruize, et al.
Publicado em: (2025)
por: Zhang, Ruize, et al.
Publicado em: (2025)
The Fate of Simple Organics on Titan's Surface: A Theoretical Perspective
por: Yu, Xinting, et al.
Publicado em: (2024)
por: Yu, Xinting, et al.
Publicado em: (2024)
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
por: Liu, Jijia, et al.
Publicado em: (2023)
por: Liu, Jijia, et al.
Publicado em: (2023)
On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective
por: Zhou, Zhi, et al.
Publicado em: (2026)
por: Zhou, Zhi, et al.
Publicado em: (2026)
UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling
por: Huang, Kaiyu, et al.
Publicado em: (2026)
por: Huang, Kaiyu, et al.
Publicado em: (2026)
Extended Poincare Symmetry Dictates Massive Scattering Amplitudes
por: Ni, Yu-Han, et al.
Publicado em: (2024)
por: Ni, Yu-Han, et al.
Publicado em: (2024)
Large Language Models: A Historical and Sociocultural Perspective
por: Eugene Yu Ji
Publicado em: (2024)
por: Eugene Yu Ji
Publicado em: (2024)
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
por: Zhang, Hanchong, et al.
Publicado em: (2024)
por: Zhang, Hanchong, et al.
Publicado em: (2024)
D3P: Dynamic Denoising Diffusion Policy via Reinforcement Learning
por: Yu, Shu-Ang, et al.
Publicado em: (2025)
por: Yu, Shu-Ang, et al.
Publicado em: (2025)
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
por: Kazemkhani, Saman, et al.
Publicado em: (2024)
por: Kazemkhani, Saman, et al.
Publicado em: (2024)
Building reliable sim driving agents by scaling self-play
por: Cornelisse, Daphne, et al.
Publicado em: (2025)
por: Cornelisse, Daphne, et al.
Publicado em: (2025)
Atom of Thoughts for Markov LLM Test-Time Scaling
por: Teng, Fengwei, et al.
Publicado em: (2025)
por: Teng, Fengwei, et al.
Publicado em: (2025)
Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models
por: Jia, Hong, et al.
Publicado em: (2026)
por: Jia, Hong, et al.
Publicado em: (2026)
Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs
por: Zhang, Shiyu, et al.
Publicado em: (2024)
por: Zhang, Shiyu, et al.
Publicado em: (2024)
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
por: Xu, Zelai, et al.
Publicado em: (2025)
por: Xu, Zelai, et al.
Publicado em: (2025)
Online distributed algorithms for mixed equilibrium problems in dynamic environments
por: Xu, Hang, et al.
Publicado em: (2024)
por: Xu, Hang, et al.
Publicado em: (2024)
Scale Snapshot Topology Distance: Quantifying the Spatial Scale Effect From a Topological Perspective
por: Jiawei Zhu, et al.
Publicado em: (2025)
por: Jiawei Zhu, et al.
Publicado em: (2025)
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
por: Gao, Bingjie, et al.
Publicado em: (2025)
por: Gao, Bingjie, et al.
Publicado em: (2025)
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
por: Zhang, Zechuan, et al.
Publicado em: (2025)
por: Zhang, Zechuan, et al.
Publicado em: (2025)
Neural Internal Model Control: Learning a Robust Control Policy via Predictive Error Feedback
por: Gao, Feng, et al.
Publicado em: (2024)
por: Gao, Feng, et al.
Publicado em: (2024)
JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems
por: Yang, Guangzhao, et al.
Publicado em: (2026)
por: Yang, Guangzhao, et al.
Publicado em: (2026)
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
por: Zhang, Haotian, et al.
Publicado em: (2025)
por: Zhang, Haotian, et al.
Publicado em: (2025)
Registros relacionados
-
ICPL: Few-shot In-context Preference Learning via LLMs
por: Yu, Chao, et al.
Publicado em: (2024) -
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
por: Qiu, Le, et al.
Publicado em: (2025) -
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
por: Yang, Lu, et al.
Publicado em: (2026) -
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
por: Xu, Zelai, et al.
Publicado em: (2023) -
Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
por: Xu, Zelai, et al.
Publicado em: (2025)