Saved in:
| Main Authors: | Han, Peixuan, Krishnan, Adit, Friedland, Gerald, You, Jiaxuan, Kong, Chris |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.05489 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
by: Krishnan, Adit, et al.
Published: (2025)
by: Krishnan, Adit, et al.
Published: (2025)
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
by: Liu, Zijia, et al.
Published: (2025)
by: Liu, Zijia, et al.
Published: (2025)
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal
by: Han, Peixuan, et al.
Published: (2026)
by: Han, Peixuan, et al.
Published: (2026)
Effects of Feature Correlations on Associative Memory Capacity
by: Bielmeier, Stefan, et al.
Published: (2025)
by: Bielmeier, Stefan, et al.
Published: (2025)
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations
by: Moorjani, Samraj, et al.
Published: (2024)
by: Moorjani, Samraj, et al.
Published: (2024)
Learning to Reason with Mixture of Tokens
by: Jain, Adit, et al.
Published: (2025)
by: Jain, Adit, et al.
Published: (2025)
On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)
by: Gao, Jiaxuan, et al.
Published: (2024)
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
by: Tang, Zhiqiang, et al.
Published: (2024)
by: Tang, Zhiqiang, et al.
Published: (2024)
MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
by: Luo, Tianyang, et al.
Published: (2026)
by: Luo, Tianyang, et al.
Published: (2026)
Real-World Benchmarks Make Membership Inference Attacks Fail on Diffusion Models
by: Liang, Chumeng, et al.
Published: (2024)
by: Liang, Chumeng, et al.
Published: (2024)
Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization
by: Jain, Adit, et al.
Published: (2024)
by: Jain, Adit, et al.
Published: (2024)
R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling
by: Cheng, Aijia, et al.
Published: (2026)
by: Cheng, Aijia, et al.
Published: (2026)
Towards Cost-Effective Reward Guided Text Generation
by: Rashid, Ahmad, et al.
Published: (2025)
by: Rashid, Ahmad, et al.
Published: (2025)
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling
by: Dai, Zhongjie, et al.
Published: (2025)
by: Dai, Zhongjie, et al.
Published: (2025)
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
by: Zhang, Qingru, et al.
Published: (2025)
by: Zhang, Qingru, et al.
Published: (2025)
Aligning LLMs with Domain Invariant Reward Models
by: Wu, David, et al.
Published: (2025)
by: Wu, David, et al.
Published: (2025)
GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning
by: Wang, Chenglong, et al.
Published: (2025)
by: Wang, Chenglong, et al.
Published: (2025)
AcademicEval: Live Long-Context LLM Benchmark
by: Zhang, Haozhen, et al.
Published: (2025)
by: Zhang, Haozhen, et al.
Published: (2025)
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
by: Ma, Qiyao, et al.
Published: (2026)
by: Ma, Qiyao, et al.
Published: (2026)
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
by: Fan, Jiajun, et al.
Published: (2025)
by: Fan, Jiajun, et al.
Published: (2025)
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)
by: Ye, Zhiling, et al.
Published: (2025)
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning
by: Zhou, Zhi, et al.
Published: (2025)
by: Zhou, Zhi, et al.
Published: (2025)
Interacting Large Language Model Agents. Interpretable Models and Social Learning
by: Jain, Adit, et al.
Published: (2024)
by: Jain, Adit, et al.
Published: (2024)
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
by: Li, Gang, et al.
Published: (2025)
by: Li, Gang, et al.
Published: (2025)
GUIDE: Towards Scalable Advising for Research Ideas
by: Liu, Yaowenqi, et al.
Published: (2025)
by: Liu, Yaowenqi, et al.
Published: (2025)
Bridged Clustering: Semi-Supervised Sparse Bridging
by: Ye, Patrick Peixuan, et al.
Published: (2025)
by: Ye, Patrick Peixuan, et al.
Published: (2025)
Graph World Model
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
Towards Time Series Reasoning with LLMs
by: Chow, Winnie, et al.
Published: (2024)
by: Chow, Winnie, et al.
Published: (2024)
Semi-Supervised Reward Modeling via Iterative Self-Training
by: He, Yifei, et al.
Published: (2024)
by: He, Yifei, et al.
Published: (2024)
Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation
by: Ye, Jinyan, et al.
Published: (2026)
by: Ye, Jinyan, et al.
Published: (2026)
Towards Understanding Self-play for LLM Reasoning
by: Chae, Justin Yang, et al.
Published: (2025)
by: Chae, Justin Yang, et al.
Published: (2025)
SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals
by: Han, Peixuan, et al.
Published: (2025)
by: Han, Peixuan, et al.
Published: (2025)
Towards Effective Code-Integrated Reasoning
by: Bai, Fei, et al.
Published: (2025)
by: Bai, Fei, et al.
Published: (2025)
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
by: Zhang, Haozhen, et al.
Published: (2025)
by: Zhang, Haozhen, et al.
Published: (2025)
Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs
by: Zhang, Haozhen, et al.
Published: (2024)
by: Zhang, Haozhen, et al.
Published: (2024)
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
by: Ma, Haozhe, et al.
Published: (2024)
by: Ma, Haozhe, et al.
Published: (2024)
R1-Ranker: Teaching LLM Rankers to Reason
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
Understanding Interpretability by generalized distillation in Supervised Classification
by: Agarwal, Adit, et al.
Published: (2020)
by: Agarwal, Adit, et al.
Published: (2020)
Rewards as Labels: Revisiting RLVR from a Classification Perspective
by: Zhai, Zepeng, et al.
Published: (2026)
by: Zhai, Zepeng, et al.
Published: (2026)
Similar Items
-
Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
by: Krishnan, Adit, et al.
Published: (2025) -
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
by: Liu, Zijia, et al.
Published: (2025) -
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal
by: Han, Peixuan, et al.
Published: (2026) -
Effects of Feature Correlations on Associative Memory Capacity
by: Bielmeier, Stefan, et al.
Published: (2025) -
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations
by: Moorjani, Samraj, et al.
Published: (2024)