:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Yang, Chen, Binglin, Zheng, Yongsen, Cheng, Lechao, Li, Guanbin, Lin, Liang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.15734
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
by: Jiang, Kaixuan, et al.
Published: (2025)

MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
by: Wang, Kuo, et al.
Published: (2024)

Dual-domain Adaptation Networks for Realistic Image Super-resolution
by: Fang, Chaowei, et al.
Published: (2025)

GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection
by: Li, Jiaming, et al.
Published: (2026)

Decoupled Training with Local Reinforcement Fine-Tuning in Federated Learning
by: Ma, Yuting, et al.
Published: (2026)

DDP-WM: Disentangled Dynamics Prediction for Efficient World Models
by: Yin, Shicheng, et al.
Published: (2026)

Cross-Modal Causal Intervention for Medical Report Generation
by: Chen, Weixing, et al.
Published: (2023)

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
by: Song, Xinshuai, et al.
Published: (2024)

evMLP: An Efficient Event-Driven MLP Architecture for Vision
by: Zheng, Zhentan
Published: (2025)

TadML: A fast temporal action detection with Mechanics-MLP
by: Deng, Bowen, et al.
Published: (2022)

Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition
by: Xu, Hao, et al.
Published: (2025)

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians
by: Wei, Zeming, et al.
Published: (2025)

Credible Teacher for Semi-Supervised Object Detection in Open Scene
by: Zhuang, Jingyu, et al.
Published: (2024)

MLP Can Be A Good Transformer Learner
by: Lin, Sihao, et al.
Published: (2024)

Self-Prophetic Decoding to Unlock Visual Search in LVLMs
by: He, Zhendong, et al.
Published: (2026)

Efficient Vision Language Model Fine-tuning for Text-based Person Anomaly Search
by: He, Jiayi, et al.
Published: (2025)

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
by: Luo, Jingzhou, et al.
Published: (2025)

One Model for All: Unified Try-On and Try-Off in Any Pose via LLM-Inspired Bidirectional Tweedie Diffusion
by: Liu, Jinxi, et al.
Published: (2025)

DenoiseGS: Gaussian Reconstruction Model for Burst Denoising
by: Cheng, Yongsen, et al.
Published: (2025)

FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding
by: Yuan, Shuai, et al.
Published: (2024)

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments
by: Liu, Yang, et al.
Published: (2024)

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
by: Liu, Yu, et al.
Published: (2024)

Sim-DETR: Unlock DETR for Temporal Sentence Grounding
by: Tang, Jiajin, et al.
Published: (2025)

StgcDiff: Spatial-Temporal Graph Condition Diffusion for Sign Language Transition Generation
by: He, Jiashu, et al.
Published: (2025)

Are VLMs Lost Between Sky and Space? LinkS$^2$Bench for UAV-Satellite Dynamic Cross-View Spatial Intelligence
by: Liu, Dian, et al.
Published: (2026)

LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation
by: Ning, Yuwei, et al.
Published: (2026)

Fine-grained Dynamic Network for Generic Event Boundary Detection
by: Zheng, Ziwei, et al.
Published: (2024)

High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model
by: Zhong, Weizhi, et al.
Published: (2024)

Focus Anywhere for Fine-grained Multi-page Document Understanding
by: Liu, Chenglong, et al.
Published: (2024)

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
by: Zhang, Yiming, et al.
Published: (2026)

SpiralMLP: A Lightweight Vision MLP Architecture
by: Mu, Haojie, et al.
Published: (2024)

SpatialLM: Training Large Language Models for Structured Indoor Modeling
by: Mao, Yongsen, et al.
Published: (2025)

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
by: Li, Jiaming, et al.
Published: (2024)

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
by: Zhuang, Jingyu, et al.
Published: (2024)

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
by: He, Zijian, et al.
Published: (2024)

Modality Alignment Meets Federated Broadcasting
by: Ma, Yuting, et al.
Published: (2024)

TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation
by: Wang, Qihang, et al.
Published: (2025)

Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and Kigali
by: Kyatha, Nthenya, et al.
Published: (2025)

FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
by: Xu, Jinglin, et al.
Published: (2024)