:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Chao, Ma, Fan, Zhu, Linchao, Deng, Yingying, Yang, Yi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.00627
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization
by: Hao, Yuze, et al.
Published: (2025)

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
by: Suo, Yucheng, et al.
Published: (2025)

OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion
by: Qian, Chen, et al.
Published: (2025)

Combating Label Noise With A General Surrogate Model For Sample Selection
by: Liang, Chao, et al.
Published: (2023)

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
by: Fan, Tiehan, et al.
Published: (2024)

Computation-Efficient and Recognition-Friendly 3D Point Cloud Privacy Protection
by: Ma, Haotian, et al.
Published: (2025)

MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
by: Yuan, Huaying, et al.
Published: (2025)

XMeCap: Meme Caption Generation with Sub-Image Adaptability
by: Chen, Yuyan, et al.
Published: (2024)

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
by: Suo, Yucheng, et al.
Published: (2024)

Latent-Info and Low-Dimensional Learning for Human Mesh Recovery and Parallel Optimization
by: Zhang, Xiang, et al.
Published: (2025)

When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
by: Cao, Zhuo, et al.
Published: (2025)

CapGeo: A Caption-Assisted Approach to Geometric Reasoning
by: Li, Yuying, et al.
Published: (2025)

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
by: Luo, Junwei, et al.
Published: (2025)

Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict
by: Wu, Chaochen, et al.
Published: (2025)

MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
by: Park, Seojeong, et al.
Published: (2024)

CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors
by: Long, Jiahuan, et al.
Published: (2025)

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)

Distilling Future Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection
by: Zheng, Haowen, et al.
Published: (2025)

Accelerating Video Generation Inference with Sequential-Parallel 3D Positional Encoding Using a Global Time Index
by: Yuan, Chao, et al.
Published: (2026)

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
by: Yang, Xiangpeng, et al.
Published: (2024)

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
by: Yang, Xiangpeng, et al.
Published: (2025)

VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
by: Liang, Baoyu, et al.
Published: (2025)

Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction
by: Liu, Zhanwen, et al.
Published: (2024)

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
by: Wang, Zhenzhi, et al.
Published: (2025)

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
by: Xing, Long, et al.
Published: (2025)

Hulk: A Universal Knowledge Translator for Human-Centric Tasks
by: Wang, Yizhou, et al.
Published: (2023)

DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
by: Li, Binbin, et al.
Published: (2025)

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
by: Chen, Zigeng, et al.
Published: (2024)

ReGenNet: Towards Human Action-Reaction Synthesis
by: Xu, Liang, et al.
Published: (2024)

IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment
by: Sun, Shangkun, et al.
Published: (2025)

Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning
by: Sahu, Siddharth, et al.
Published: (2024)

FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
by: Wang, Yuxuan, et al.
Published: (2025)

Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
by: Geng, Zichen, et al.
Published: (2024)

MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Image Segmentation
by: Zhu, Yuanbing, et al.
Published: (2024)

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
by: Chen, Houlun, et al.
Published: (2024)

Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data
by: Liang, Chao, et al.
Published: (2025)

PRIME: Protect Your Videos From Malicious Editing
by: Li, Guanlin, et al.
Published: (2024)

A Unified Perspective for Loss-Oriented Imbalanced Learning via Localization
by: Wang, Zitai, et al.
Published: (2023)

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
by: Tu, Yunbin, et al.
Published: (2024)