Saved in:
| Main Authors: | Swapnil, Ismam Nur, Saha, Aranya, Khan, Tanvir Ahmed, Haque, Mohammad Ariful, Lim, Ser-Nam |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.06755 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering
by: Saha, Aranya, et al.
Published: (2025)
by: Saha, Aranya, et al.
Published: (2025)
GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings
by: Swapnil, Ismam Nur, et al.
Published: (2025)
by: Swapnil, Ismam Nur, et al.
Published: (2025)
Compression Strategies for Efficient Multimodal LLMs in Medical Contexts
by: Khan, Tanvir A., et al.
Published: (2025)
by: Khan, Tanvir A., et al.
Published: (2025)
Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation
by: Mridul, Mohidul Haque, et al.
Published: (2024)
by: Mridul, Mohidul Haque, et al.
Published: (2024)
Delta Activations: A Representation for Finetuned Large Language Models
by: Xu, Zhiqiu, et al.
Published: (2025)
by: Xu, Zhiqiu, et al.
Published: (2025)
Shapley-Value-Based Graph Sparsification for GNN Inference
by: Akkas, Selahattin, et al.
Published: (2025)
by: Akkas, Selahattin, et al.
Published: (2025)
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference
by: Kumar, Navish, et al.
Published: (2025)
by: Kumar, Navish, et al.
Published: (2025)
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
by: Razin, Noam, et al.
Published: (2024)
by: Razin, Noam, et al.
Published: (2024)
Offline Model-Based Optimization via Policy-Guided Gradient Search
by: Chemingui, Yassine, et al.
Published: (2024)
by: Chemingui, Yassine, et al.
Published: (2024)
Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning
by: Parekh, Swapnil
Published: (2026)
by: Parekh, Swapnil
Published: (2026)
Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning
by: Parekh, Swapnil
Published: (2026)
by: Parekh, Swapnil
Published: (2026)
Sequential Policy Gradient for Adaptive Hyperparameter Optimization
by: Li, Zheng, et al.
Published: (2025)
by: Li, Zheng, et al.
Published: (2025)
GOPO: Policy Optimization using Ranked Rewards
by: Choi, Kyuseong, et al.
Published: (2026)
by: Choi, Kyuseong, et al.
Published: (2026)
Fast Explanations via Policy Gradient-Optimized Explainer
by: Pan, Deng, et al.
Published: (2024)
by: Pan, Deng, et al.
Published: (2024)
Model Merging by Uncertainty-Based Gradient Matching
by: Daheim, Nico, et al.
Published: (2023)
by: Daheim, Nico, et al.
Published: (2023)
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models
by: Islam, Tanvir
Published: (2025)
by: Islam, Tanvir
Published: (2025)
Extended Histogram-based Outlier Score (EHBOS)
by: Islam, Tanvir
Published: (2025)
by: Islam, Tanvir
Published: (2025)
VISP: Volatility Informed Stochastic Projection for Adaptive Regularization
by: Islam, Tanvir
Published: (2025)
by: Islam, Tanvir
Published: (2025)
Fostering Intrinsic Motivation in Reinforcement Learning with Pretrained Foundation Models
by: Andres, Alain, et al.
Published: (2024)
by: Andres, Alain, et al.
Published: (2024)
CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
by: Parekh, Swapnil
Published: (2026)
by: Parekh, Swapnil
Published: (2026)
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
by: Jain, Ayush, et al.
Published: (2024)
by: Jain, Ayush, et al.
Published: (2024)
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)
by: Yang, Wenkai, et al.
Published: (2026)
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
by: Bai, Qinbo, et al.
Published: (2021)
by: Bai, Qinbo, et al.
Published: (2021)
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
by: Ma, Xin, et al.
Published: (2024)
by: Ma, Xin, et al.
Published: (2024)
Post-Training with Policy Gradients: Optimality and the Base Model Barrier
by: Mousavi-Hosseini, Alireza, et al.
Published: (2026)
by: Mousavi-Hosseini, Alireza, et al.
Published: (2026)
Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation
by: Lee, Dong Bok, et al.
Published: (2024)
by: Lee, Dong Bok, et al.
Published: (2024)
Stabilizing Policy Gradient Methods via Reward Profiling
by: Ahmed, Shihab, et al.
Published: (2025)
by: Ahmed, Shihab, et al.
Published: (2025)
Learning General Policies with Policy Gradient Methods
by: Ståhlberg, Simon, et al.
Published: (2025)
by: Ståhlberg, Simon, et al.
Published: (2025)
Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions
by: Jing, Gangshan, et al.
Published: (2022)
by: Jing, Gangshan, et al.
Published: (2022)
ArrhythmiaVision: Resource-Conscious Deep Learning Models with Visual Explanations for ECG Arrhythmia Classification
by: Baig, Zuraiz, et al.
Published: (2025)
by: Baig, Zuraiz, et al.
Published: (2025)
The Extrapolation Power of Implicit Models
by: Decugis, Juliette, et al.
Published: (2024)
by: Decugis, Juliette, et al.
Published: (2024)
Identifying Representations for Intervention Extrapolation
by: Saengkyongam, Sorawit, et al.
Published: (2023)
by: Saengkyongam, Sorawit, et al.
Published: (2023)
Policy Gradient with Kernel Quadrature
by: Hayakawa, Satoshi, et al.
Published: (2023)
by: Hayakawa, Satoshi, et al.
Published: (2023)
Policy Gradient with Tree Expansion
by: Dalal, Gal, et al.
Published: (2023)
by: Dalal, Gal, et al.
Published: (2023)
EmissionNet: Air Quality Pollution Forecasting for Agriculture
by: Saligram, Prady, et al.
Published: (2025)
by: Saligram, Prady, et al.
Published: (2025)
UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs
by: Neth, Ashe, et al.
Published: (2025)
by: Neth, Ashe, et al.
Published: (2025)
Accelerating AI Performance using Anderson Extrapolation on GPUs
by: Dajani, Saleem Abdul Fattah Ahmed Al, et al.
Published: (2024)
by: Dajani, Saleem Abdul Fattah Ahmed Al, et al.
Published: (2024)
XGrad: Boosting Gradient-Based Optimizers With Weight Prediction
by: Guan, Lei, et al.
Published: (2023)
by: Guan, Lei, et al.
Published: (2023)
Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)
by: Mathur, Puneet, et al.
Published: (2026)
Double Horizon Model-Based Policy Optimization
by: Kubo, Akihiro, et al.
Published: (2025)
by: Kubo, Akihiro, et al.
Published: (2025)
Similar Items
-
CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering
by: Saha, Aranya, et al.
Published: (2025) -
GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings
by: Swapnil, Ismam Nur, et al.
Published: (2025) -
Compression Strategies for Efficient Multimodal LLMs in Medical Contexts
by: Khan, Tanvir A., et al.
Published: (2025) -
Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation
by: Mridul, Mohidul Haque, et al.
Published: (2024) -
Delta Activations: A Representation for Finetuned Large Language Models
by: Xu, Zhiqiu, et al.
Published: (2025)