:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jia, Ziqi, Li, Junjie, Qu, Xiaoyang, Wang, Jianzong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.10049
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning
by: Jia, Ziqi, et al.
Published: (2025)

CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control
by: Shi, Jiaqi, et al.
Published: (2026)

DIVA: Harnessing the Representation Divergence in Unified Multimodal Models for Mutual Reinforcement
by: Lu, Renjie, et al.
Published: (2026)

Federated Domain Generalization with Domain-specific Soft Prompts Generation
by: Wu, Jianhan, et al.
Published: (2025)

RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
by: Li, Junjie, et al.
Published: (2025)

MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
by: Tao, Wei, et al.
Published: (2025)

VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection
by: Zhang, Bin, et al.
Published: (2025)

BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation
by: Tao, Wei, et al.
Published: (2025)

WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
by: Tao, Wei, et al.
Published: (2026)

MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control
by: Lu, Renjie, et al.
Published: (2026)

From Inheritance to Saturation: Disentangling the Evolution of Visual Redundancy for Architecture-Aware MLLM Inference Acceleration
by: Shi, Jiaqi, et al.
Published: (2026)

PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition
by: He, Shenglin, et al.
Published: (2024)

VLA-InfoEntropy: A Training-Free Vision-Attention Information Entropy Approach for Vision-Language-Action Models Inference Acceleration and Success
by: Liu, Chuhang, et al.
Published: (2026)

ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations
by: Zhang, Xulong, et al.
Published: (2024)

RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations
by: Zhang, Bin, et al.
Published: (2025)

Triage: Hierarchical Visual Budgeting for Efficient Video Reasoning in Vision-Language Models
by: Wang, Anmin, et al.
Published: (2026)

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
by: Yan, Haolong, et al.
Published: (2025)

Vista: Scene-Aware Optimization for Streaming Video Question Answering under Post-Hoc Queries
by: Lu, Haocheng, et al.
Published: (2026)

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
by: Gao, Hao, et al.
Published: (2025)

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction
by: Ni, Chaojun, et al.
Published: (2025)

Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers
by: Tao, Wei, et al.
Published: (2024)

Open-Ended Instruction Realization with LLM-Enabled Multi-Planner Scheduling in Autonomous Vehicles
by: Liu, Jiawei, et al.
Published: (2026)

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization
by: Wang, Jianzong, et al.
Published: (2026)

InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs
by: Li, Bin, et al.
Published: (2025)

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
by: Liu, Xiaoyang, et al.
Published: (2024)

VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
by: Chen, Boyu, et al.
Published: (2025)

RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation
by: Sun, Wenzhuo, et al.
Published: (2025)

Learning Generalizable Human Motion Generator with Reinforcement Learning
by: Mao, Yunyao, et al.
Published: (2024)

Lighting-grounded Video Generation with Renderer-based Agent Reasoning
by: Cai, Ziqi, et al.
Published: (2026)

Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward
by: Xu, Xiaogang, et al.
Published: (2025)

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
by: Wu, Jianzong, et al.
Published: (2024)

Agent-based Video Trimming
by: Yang, Lingfeng, et al.
Published: (2024)

ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
by: Wang, Yichen, et al.
Published: (2025)

Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph
by: Wang, Zhiwei, et al.
Published: (2024)

VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
by: Liu, Yuqi, et al.
Published: (2025)

GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification
by: Xiang, Suncheng, et al.
Published: (2025)

Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback
by: Lu, Jianglin, et al.
Published: (2025)

CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
by: Jiang, Ziqi, et al.
Published: (2024)

Learning Clustering-based Prototypes for Compositional Zero-shot Learning
by: Qu, Hongyu, et al.
Published: (2025)

Fitting Skeletal Models via Graph-based Learning
by: Gaggion, Nicolás, et al.
Published: (2024)