Saved in:
| Main Authors: | Liang, Chao, Ma, Fan, Zhu, Linchao, Deng, Yingying, Yang, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.00627 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization
by: Hao, Yuze, et al.
Published: (2025)
by: Hao, Yuze, et al.
Published: (2025)
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
by: Suo, Yucheng, et al.
Published: (2025)
by: Suo, Yucheng, et al.
Published: (2025)
OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion
by: Qian, Chen, et al.
Published: (2025)
by: Qian, Chen, et al.
Published: (2025)
Combating Label Noise With A General Surrogate Model For Sample Selection
by: Liang, Chao, et al.
Published: (2023)
by: Liang, Chao, et al.
Published: (2023)
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
by: Fan, Tiehan, et al.
Published: (2024)
by: Fan, Tiehan, et al.
Published: (2024)
Computation-Efficient and Recognition-Friendly 3D Point Cloud Privacy Protection
by: Ma, Haotian, et al.
Published: (2025)
by: Ma, Haotian, et al.
Published: (2025)
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
by: Yuan, Huaying, et al.
Published: (2025)
by: Yuan, Huaying, et al.
Published: (2025)
XMeCap: Meme Caption Generation with Sub-Image Adaptability
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
by: Suo, Yucheng, et al.
Published: (2024)
by: Suo, Yucheng, et al.
Published: (2024)
Latent-Info and Low-Dimensional Learning for Human Mesh Recovery and Parallel Optimization
by: Zhang, Xiang, et al.
Published: (2025)
by: Zhang, Xiang, et al.
Published: (2025)
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
by: Cao, Zhuo, et al.
Published: (2025)
by: Cao, Zhuo, et al.
Published: (2025)
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
by: Li, Yuying, et al.
Published: (2025)
by: Li, Yuying, et al.
Published: (2025)
Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)
by: Liu, Xiaolin, et al.
Published: (2026)
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
by: Luo, Junwei, et al.
Published: (2025)
by: Luo, Junwei, et al.
Published: (2025)
Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict
by: Wu, Chaochen, et al.
Published: (2025)
by: Wu, Chaochen, et al.
Published: (2025)
MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
by: Park, Seojeong, et al.
Published: (2024)
by: Park, Seojeong, et al.
Published: (2024)
CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors
by: Long, Jiahuan, et al.
Published: (2025)
by: Long, Jiahuan, et al.
Published: (2025)
Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)
by: Deng, Linger, et al.
Published: (2024)
Distilling Future Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection
by: Zheng, Haowen, et al.
Published: (2025)
by: Zheng, Haowen, et al.
Published: (2025)
Accelerating Video Generation Inference with Sequential-Parallel 3D Positional Encoding Using a Global Time Index
by: Yuan, Chao, et al.
Published: (2026)
by: Yuan, Chao, et al.
Published: (2026)
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
by: Yang, Xiangpeng, et al.
Published: (2024)
by: Yang, Xiangpeng, et al.
Published: (2024)
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
by: Yang, Xiangpeng, et al.
Published: (2025)
by: Yang, Xiangpeng, et al.
Published: (2025)
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
by: Liang, Baoyu, et al.
Published: (2025)
by: Liang, Baoyu, et al.
Published: (2025)
Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction
by: Liu, Zhanwen, et al.
Published: (2024)
by: Liu, Zhanwen, et al.
Published: (2024)
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
by: Wang, Zhenzhi, et al.
Published: (2025)
by: Wang, Zhenzhi, et al.
Published: (2025)
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
by: Xing, Long, et al.
Published: (2025)
by: Xing, Long, et al.
Published: (2025)
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
by: Wang, Yizhou, et al.
Published: (2023)
by: Wang, Yizhou, et al.
Published: (2023)
DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
by: Li, Binbin, et al.
Published: (2025)
by: Li, Binbin, et al.
Published: (2025)
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
by: Chen, Zigeng, et al.
Published: (2024)
by: Chen, Zigeng, et al.
Published: (2024)
ReGenNet: Towards Human Action-Reaction Synthesis
by: Xu, Liang, et al.
Published: (2024)
by: Xu, Liang, et al.
Published: (2024)
IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment
by: Sun, Shangkun, et al.
Published: (2025)
by: Sun, Shangkun, et al.
Published: (2025)
Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning
by: Sahu, Siddharth, et al.
Published: (2024)
by: Sahu, Siddharth, et al.
Published: (2024)
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
by: Wang, Yuxuan, et al.
Published: (2025)
by: Wang, Yuxuan, et al.
Published: (2025)
Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
by: Geng, Zichen, et al.
Published: (2024)
by: Geng, Zichen, et al.
Published: (2024)
MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Image Segmentation
by: Zhu, Yuanbing, et al.
Published: (2024)
by: Zhu, Yuanbing, et al.
Published: (2024)
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
by: Chen, Houlun, et al.
Published: (2024)
by: Chen, Houlun, et al.
Published: (2024)
Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data
by: Liang, Chao, et al.
Published: (2025)
by: Liang, Chao, et al.
Published: (2025)
PRIME: Protect Your Videos From Malicious Editing
by: Li, Guanlin, et al.
Published: (2024)
by: Li, Guanlin, et al.
Published: (2024)
A Unified Perspective for Loss-Oriented Imbalanced Learning via Localization
by: Wang, Zitai, et al.
Published: (2023)
by: Wang, Zitai, et al.
Published: (2023)
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
by: Tu, Yunbin, et al.
Published: (2024)
by: Tu, Yunbin, et al.
Published: (2024)
Similar Items
-
3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization
by: Hao, Yuze, et al.
Published: (2025) -
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
by: Suo, Yucheng, et al.
Published: (2025) -
OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion
by: Qian, Chen, et al.
Published: (2025) -
Combating Label Noise With A General Surrogate Model For Sample Selection
by: Liang, Chao, et al.
Published: (2023) -
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
by: Fan, Tiehan, et al.
Published: (2024)