Saved in:
| Main Authors: | Ahmadi, Arash, Sharif, Sarah, Banad, Yaser |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.21201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning
by: Ahmadi, Arash, et al.
Published: (2026)
by: Ahmadi, Arash, et al.
Published: (2026)
A Comparative Study of Sampling Methods with Cross-Validation in the FedHome Framework
by: Ahmadi, Arash, et al.
Published: (2024)
by: Ahmadi, Arash, et al.
Published: (2024)
MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers
by: Ahmadi, Arash, et al.
Published: (2025)
by: Ahmadi, Arash, et al.
Published: (2025)
A Cloud-Edge Framework for Energy-Efficient Event-Driven Control: An Integration of Online Supervised Learning, Spiking Neural Networks and Local Plasticity Rules
by: Ahmadvand, Reza, et al.
Published: (2024)
by: Ahmadvand, Reza, et al.
Published: (2024)
Ultra-Low-Power Spiking Neurons in 7 nm FinFET Technology: A Comparative Analysis of Leaky Integrate-and-Fire, Morris-Lecar, and Axon-Hillock Architectures
by: Larsh, Logan, et al.
Published: (2025)
by: Larsh, Logan, et al.
Published: (2025)
Comparative Analysis of Transformer Models in Disaster Tweet Classification for Public Safety
by: Zisad, Sharif Noor, et al.
Published: (2025)
by: Zisad, Sharif Noor, et al.
Published: (2025)
Novel Pigeon-inspired 3D Obstacle Detection and Avoidance Maneuver for Multi-UAV Systems
by: Ahmadvand, Reza, et al.
Published: (2025)
by: Ahmadvand, Reza, et al.
Published: (2025)
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization
by: Simoni, Marco, et al.
Published: (2025)
by: Simoni, Marco, et al.
Published: (2025)
GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization
by: Dechtiar, Moriya, et al.
Published: (2025)
by: Dechtiar, Moriya, et al.
Published: (2025)
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
by: Wang, Jialu, et al.
Published: (2026)
by: Wang, Jialu, et al.
Published: (2026)
AviationLMM: A Large Multimodal Foundation Model for Civil Aviation
by: Li, Wenbin, et al.
Published: (2026)
by: Li, Wenbin, et al.
Published: (2026)
EBPO: Empirical Bayes Shrinkage for Stabilizing Group-Relative Policy Optimization
by: Han, Kevin, et al.
Published: (2026)
by: Han, Kevin, et al.
Published: (2026)
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)
by: Zhong, Haitian, et al.
Published: (2026)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
GTPO: Stabilizing Group Relative Policy Optimization via Gradient and Entropy Control
by: Simoni, Marco, et al.
Published: (2025)
by: Simoni, Marco, et al.
Published: (2025)
Group Sequence Policy Optimization
by: Zheng, Chujie, et al.
Published: (2025)
by: Zheng, Chujie, et al.
Published: (2025)
Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability
by: Chen, Yuxin, et al.
Published: (2025)
by: Chen, Yuxin, et al.
Published: (2025)
Automated Triaging and Transfer Learning of Incident Learning Safety Reports Using Large Language Representational Models
by: Beidler, Peter, et al.
Published: (2025)
by: Beidler, Peter, et al.
Published: (2025)
Unsupervised Neural Network for Automated Classification of Surgical Urgency Levels in Medical Transcriptions
by: Tabatabaee, Sadaf, et al.
Published: (2026)
by: Tabatabaee, Sadaf, et al.
Published: (2026)
Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine
by: Xie, Jiacheng, et al.
Published: (2025)
by: Xie, Jiacheng, et al.
Published: (2025)
Agentic Reinforced Policy Optimization
by: Dong, Guanting, et al.
Published: (2025)
by: Dong, Guanting, et al.
Published: (2025)
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
by: Shen, Wei, et al.
Published: (2024)
by: Shen, Wei, et al.
Published: (2024)
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025)
by: Hong, Haitao, et al.
Published: (2025)
Parametric Analysis of Spiking Neurons in 16 nm Fin Field‐Effect Transistor Technology
by: Logan Larsh, et al.
Published: (2026)
by: Logan Larsh, et al.
Published: (2026)
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
by: Hu, Shijing, et al.
Published: (2025)
by: Hu, Shijing, et al.
Published: (2025)
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
by: Hu, Bokai, et al.
Published: (2024)
by: Hu, Bokai, et al.
Published: (2024)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
by: Walder, Christian, et al.
Published: (2025)
by: Walder, Christian, et al.
Published: (2025)
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
by: Wang, Hongjun, et al.
Published: (2026)
by: Wang, Hongjun, et al.
Published: (2026)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Lightweight Safety Classification Using Pruned Language Models
by: Sawtell, Mason, et al.
Published: (2024)
by: Sawtell, Mason, et al.
Published: (2024)
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
by: Qi, Penghui, et al.
Published: (2025)
by: Qi, Penghui, et al.
Published: (2025)
M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization
by: Bai, Bizhe, et al.
Published: (2025)
by: Bai, Bizhe, et al.
Published: (2025)
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
by: Alsadat, Shayan Meshkat, et al.
Published: (2024)
by: Alsadat, Shayan Meshkat, et al.
Published: (2024)
AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning
by: Cai, Shihao, et al.
Published: (2025)
by: Cai, Shihao, et al.
Published: (2025)
Design and Performance Analysis of an Ultra-Low Power Integrate-and-Fire Neuron Circuit Using Nanoscale Side-contacted Field Effect Diode Technology
by: Motaman, Seyedmohamadjavad, et al.
Published: (2024)
by: Motaman, Seyedmohamadjavad, et al.
Published: (2024)
BinaryPPO: Efficient Policy Optimization for Binary Classification
by: Pandey, Punya Syon, et al.
Published: (2026)
by: Pandey, Punya Syon, et al.
Published: (2026)
Depth $F_1$: Improving Evaluation of Cross-Domain Text Classification by Measuring Semantic Generalizability
by: Seegmiller, Parker, et al.
Published: (2024)
by: Seegmiller, Parker, et al.
Published: (2024)
Similar Items
-
Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning
by: Ahmadi, Arash, et al.
Published: (2026) -
A Comparative Study of Sampling Methods with Cross-Validation in the FedHome Framework
by: Ahmadi, Arash, et al.
Published: (2024) -
MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers
by: Ahmadi, Arash, et al.
Published: (2025) -
A Cloud-Edge Framework for Energy-Efficient Event-Driven Control: An Integration of Online Supervised Learning, Spiking Neural Networks and Local Plasticity Rules
by: Ahmadvand, Reza, et al.
Published: (2024) -
Ultra-Low-Power Spiking Neurons in 7 nm FinFET Technology: A Comparative Analysis of Leaky Integrate-and-Fire, Morris-Lecar, and Axon-Hillock Architectures
by: Larsh, Logan, et al.
Published: (2025)