:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Jin, Wei, Ping, Li, Huan, Ren, Ziyang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2404.09263
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024)

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
by: Sun, Hao, et al.
Published: (2024)

Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
by: Um, Sung Jin, et al.
Published: (2025)

MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
by: Yuan, Huaying, et al.
Published: (2025)

GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
by: Sun, Yunzhuo, et al.
Published: (2024)

Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification
by: Chen, Zizhao, et al.
Published: (2026)

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
by: Ma, Qianli, et al.
Published: (2024)

Joint-Task Regularization for Partially Labeled Multi-Task Learning
by: Nishi, Kento, et al.
Published: (2024)

Towards Unified Modeling in Federated Multi-Task Learning via Subspace Decoupling
by: Wei, Yipan, et al.
Published: (2025)

When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
by: Cao, Zhuo, et al.
Published: (2025)

Two-Stream Interactive Joint Learning of Scene Parsing and Geometric Vision Tasks
by: Tang, Guanfeng, et al.
Published: (2026)

Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
by: Kang, Bin, et al.
Published: (2024)

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection
by: Zhao, Henghao, et al.
Published: (2023)

Exploring Task-Level Optimal Prompts for Visual In-Context Learning
by: Zhu, Yan, et al.
Published: (2025)

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025)

GIRL-DETR: Gradient-Isolated Reinforcement Learning for Video Moment Retrieval
by: Zhang, Shihang, et al.
Published: (2026)

A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
by: He, Yonghao, et al.
Published: (2024)

Stability Plasticity Decoupled Fine-tuning For Few-shot end-to-end Object Detection
by: Yin, Yuantao, et al.
Published: (2024)

MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
by: Park, Seojeong, et al.
Published: (2024)

Deep Extrinsic Manifold Representation for Vision Tasks
by: Zhang, Tongtong, et al.
Published: (2024)

Transferability-Guided Cross-Domain Cross-Task Transfer Learning
by: Tan, Yang, et al.
Published: (2022)

Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
by: Ming, Yifei, et al.
Published: (2024)

General and Task-Oriented Video Segmentation
by: Chen, Mu, et al.
Published: (2024)

Unleash the Potential of CLIP for Video Highlight Detection
by: Han, Donghoon, et al.
Published: (2024)

Scale Decoupled Distillation
by: Luo, Shicai Wei Chunbo Luo Yang
Published: (2024)

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks
by: Ku, Max, et al.
Published: (2024)

NuWa: Deriving Lightweight Task-Specific Vision Transformers for Edge Devices
by: Wei, Ziteng, et al.
Published: (2025)

Denoising Task Routing for Diffusion Models
by: Park, Byeongjun, et al.
Published: (2023)

Task Me Anything
by: Zhang, Jieyu, et al.
Published: (2024)

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
by: Zhang, Zhaoyang, et al.
Published: (2023)

Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data
by: Oh, Youngmin, et al.
Published: (2026)

HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
by: Chen, Houlun, et al.
Published: (2024)

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
by: Hoffmann, David T., et al.
Published: (2023)

SLVideo: A Sign Language Video Moment Retrieval Framework
by: Martins, Gonçalo Vinagre, et al.
Published: (2024)

SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
by: Yu, An, et al.
Published: (2025)

Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
by: Peng, Xinyu, et al.
Published: (2026)

Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection
by: Jin, Pengfei, et al.
Published: (2024)

MapDream: Task-Driven Map Learning for Vision-Language Navigation
by: Lian, Guoxin, et al.
Published: (2026)

Apollo: Unified Multi-Task Audio-Video Joint Generation
by: Wang, Jun, et al.
Published: (2026)