Saved in:
| Main Authors: | Jia, Ziqi, Li, Junjie, Qu, Xiaoyang, Wang, Jianzong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.10049 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning
by: Jia, Ziqi, et al.
Published: (2025)
by: Jia, Ziqi, et al.
Published: (2025)
CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control
by: Shi, Jiaqi, et al.
Published: (2026)
by: Shi, Jiaqi, et al.
Published: (2026)
DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement
by: Lu, Renjie, et al.
Published: (2026)
by: Lu, Renjie, et al.
Published: (2026)
Federated Domain Generalization with Domain-specific Soft Prompts Generation
by: Wu, Jianhan, et al.
Published: (2025)
by: Wu, Jianhan, et al.
Published: (2025)
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
by: Tao, Wei, et al.
Published: (2025)
by: Tao, Wei, et al.
Published: (2025)
VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection
by: Zhang, Bin, et al.
Published: (2025)
by: Zhang, Bin, et al.
Published: (2025)
BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation
by: Tao, Wei, et al.
Published: (2025)
by: Tao, Wei, et al.
Published: (2025)
WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
by: Tao, Wei, et al.
Published: (2026)
by: Tao, Wei, et al.
Published: (2026)
MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control
by: Lu, Renjie, et al.
Published: (2026)
by: Lu, Renjie, et al.
Published: (2026)
From Inheritance to Saturation: Disentangling the Evolution of Visual Redundancy for Architecture-Aware MLLM Inference Acceleration
by: Shi, Jiaqi, et al.
Published: (2026)
by: Shi, Jiaqi, et al.
Published: (2026)
PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition
by: He, Shenglin, et al.
Published: (2024)
by: He, Shenglin, et al.
Published: (2024)
VLA-InfoEntropy: A Training-Free Vision-Attention Information Entropy Approach for Vision-Language-Action Models Inference Acceleration and Success
by: Liu, Chuhang, et al.
Published: (2026)
by: Liu, Chuhang, et al.
Published: (2026)
ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations
by: Zhang, Xulong, et al.
Published: (2024)
by: Zhang, Xulong, et al.
Published: (2024)
RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations
by: Zhang, Bin, et al.
Published: (2025)
by: Zhang, Bin, et al.
Published: (2025)
Triage: Hierarchical Visual Budgeting for Efficient Video Reasoning in Vision-Language Models
by: Wang, Anmin, et al.
Published: (2026)
by: Wang, Anmin, et al.
Published: (2026)
GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
by: Yan, Haolong, et al.
Published: (2025)
by: Yan, Haolong, et al.
Published: (2025)
Vista: Scene-Aware Optimization for Streaming Video Question Answering under Post-Hoc Queries
by: Lu, Haocheng, et al.
Published: (2026)
by: Lu, Haocheng, et al.
Published: (2026)
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
by: Gao, Hao, et al.
Published: (2025)
by: Gao, Hao, et al.
Published: (2025)
ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction
by: Ni, Chaojun, et al.
Published: (2025)
by: Ni, Chaojun, et al.
Published: (2025)
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers
by: Tao, Wei, et al.
Published: (2024)
by: Tao, Wei, et al.
Published: (2024)
Open-Ended Instruction Realization with LLM-Enabled Multi-Planner Scheduling in Autonomous Vehicles
by: Liu, Jiawei, et al.
Published: (2026)
by: Liu, Jiawei, et al.
Published: (2026)
Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization
by: Wang, Jianzong, et al.
Published: (2026)
by: Wang, Jianzong, et al.
Published: (2026)
InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs
by: Li, Bin, et al.
Published: (2025)
by: Li, Bin, et al.
Published: (2025)
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
by: Liu, Xiaoyang, et al.
Published: (2024)
by: Liu, Xiaoyang, et al.
Published: (2024)
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
by: Chen, Boyu, et al.
Published: (2025)
by: Chen, Boyu, et al.
Published: (2025)
RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation
by: Sun, Wenzhuo, et al.
Published: (2025)
by: Sun, Wenzhuo, et al.
Published: (2025)
Learning Generalizable Human Motion Generator with Reinforcement Learning
by: Mao, Yunyao, et al.
Published: (2024)
by: Mao, Yunyao, et al.
Published: (2024)
Lighting-grounded Video Generation with Renderer-based Agent Reasoning
by: Cai, Ziqi, et al.
Published: (2026)
by: Cai, Ziqi, et al.
Published: (2026)
Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward
by: Xu, Xiaogang, et al.
Published: (2025)
by: Xu, Xiaogang, et al.
Published: (2025)
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
by: Wu, Jianzong, et al.
Published: (2024)
by: Wu, Jianzong, et al.
Published: (2024)
Agent-based Video Trimming
by: Yang, Lingfeng, et al.
Published: (2024)
by: Yang, Lingfeng, et al.
Published: (2024)
ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
by: Wang, Yichen, et al.
Published: (2025)
by: Wang, Yichen, et al.
Published: (2025)
Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph
by: Wang, Zhiwei, et al.
Published: (2024)
by: Wang, Zhiwei, et al.
Published: (2024)
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
by: Liu, Yuqi, et al.
Published: (2025)
by: Liu, Yuqi, et al.
Published: (2025)
GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification
by: Xiang, Suncheng, et al.
Published: (2025)
by: Xiang, Suncheng, et al.
Published: (2025)
Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback
by: Lu, Jianglin, et al.
Published: (2025)
by: Lu, Jianglin, et al.
Published: (2025)
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
by: Jiang, Ziqi, et al.
Published: (2024)
by: Jiang, Ziqi, et al.
Published: (2024)
Learning Clustering-based Prototypes for Compositional Zero-shot Learning
by: Qu, Hongyu, et al.
Published: (2025)
by: Qu, Hongyu, et al.
Published: (2025)
Fitting Skeletal Models via Graph-based Learning
by: Gaggion, Nicolás, et al.
Published: (2024)
by: Gaggion, Nicolás, et al.
Published: (2024)
Similar Items
-
Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning
by: Jia, Ziqi, et al.
Published: (2025) -
CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control
by: Shi, Jiaqi, et al.
Published: (2026) -
DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement
by: Lu, Renjie, et al.
Published: (2026) -
Federated Domain Generalization with Domain-specific Soft Prompts Generation
by: Wu, Jianhan, et al.
Published: (2025) -
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
by: Li, Junjie, et al.
Published: (2025)