Saved in:
| Main Authors: | Yang, Jin, Wei, Ping, Li, Huan, Ren, Ziyang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.09263 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024)
by: Paul, Dhiman, et al.
Published: (2024)
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
by: Sun, Hao, et al.
Published: (2024)
by: Sun, Hao, et al.
Published: (2024)
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
by: Um, Sung Jin, et al.
Published: (2025)
by: Um, Sung Jin, et al.
Published: (2025)
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
by: Yuan, Huaying, et al.
Published: (2025)
by: Yuan, Huaying, et al.
Published: (2025)
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
by: Sun, Yunzhuo, et al.
Published: (2024)
by: Sun, Yunzhuo, et al.
Published: (2024)
Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification
by: Chen, Zizhao, et al.
Published: (2026)
by: Chen, Zizhao, et al.
Published: (2026)
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
by: Ma, Qianli, et al.
Published: (2024)
by: Ma, Qianli, et al.
Published: (2024)
Joint-Task Regularization for Partially Labeled Multi-Task Learning
by: Nishi, Kento, et al.
Published: (2024)
by: Nishi, Kento, et al.
Published: (2024)
Towards Unified Modeling in Federated Multi-Task Learning via Subspace Decoupling
by: Wei, Yipan, et al.
Published: (2025)
by: Wei, Yipan, et al.
Published: (2025)
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
by: Cao, Zhuo, et al.
Published: (2025)
by: Cao, Zhuo, et al.
Published: (2025)
Two-Stream Interactive Joint Learning of Scene Parsing and Geometric Vision Tasks
by: Tang, Guanfeng, et al.
Published: (2026)
by: Tang, Guanfeng, et al.
Published: (2026)
Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
by: Kang, Bin, et al.
Published: (2024)
by: Kang, Bin, et al.
Published: (2024)
DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection
by: Zhao, Henghao, et al.
Published: (2023)
by: Zhao, Henghao, et al.
Published: (2023)
Exploring Task-Level Optimal Prompts for Visual In-Context Learning
by: Zhu, Yan, et al.
Published: (2025)
by: Zhu, Yan, et al.
Published: (2025)
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025)
by: Zhang, Ziyang, et al.
Published: (2025)
GIRL-DETR: Gradient-Isolated Reinforcement Learning for Video Moment Retrieval
by: Zhang, Shihang, et al.
Published: (2026)
by: Zhang, Shihang, et al.
Published: (2026)
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
by: He, Yonghao, et al.
Published: (2024)
by: He, Yonghao, et al.
Published: (2024)
Stability Plasticity Decoupled Fine-tuning For Few-shot end-to-end Object Detection
by: Yin, Yuantao, et al.
Published: (2024)
by: Yin, Yuantao, et al.
Published: (2024)
MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
by: Park, Seojeong, et al.
Published: (2024)
by: Park, Seojeong, et al.
Published: (2024)
Deep Extrinsic Manifold Representation for Vision Tasks
by: Zhang, Tongtong, et al.
Published: (2024)
by: Zhang, Tongtong, et al.
Published: (2024)
Transferability-Guided Cross-Domain Cross-Task Transfer Learning
by: Tan, Yang, et al.
Published: (2022)
by: Tan, Yang, et al.
Published: (2022)
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
by: Ming, Yifei, et al.
Published: (2024)
by: Ming, Yifei, et al.
Published: (2024)
General and Task-Oriented Video Segmentation
by: Chen, Mu, et al.
Published: (2024)
by: Chen, Mu, et al.
Published: (2024)
Unleash the Potential of CLIP for Video Highlight Detection
by: Han, Donghoon, et al.
Published: (2024)
by: Han, Donghoon, et al.
Published: (2024)
Scale Decoupled Distillation
by: Luo, Shicai Wei Chunbo Luo Yang
Published: (2024)
by: Luo, Shicai Wei Chunbo Luo Yang
Published: (2024)
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks
by: Ku, Max, et al.
Published: (2024)
by: Ku, Max, et al.
Published: (2024)
NuWa: Deriving Lightweight Task-Specific Vision Transformers for Edge Devices
by: Wei, Ziteng, et al.
Published: (2025)
by: Wei, Ziteng, et al.
Published: (2025)
Denoising Task Routing for Diffusion Models
by: Park, Byeongjun, et al.
Published: (2023)
by: Park, Byeongjun, et al.
Published: (2023)
Task Me Anything
by: Zhang, Jieyu, et al.
Published: (2024)
by: Zhang, Jieyu, et al.
Published: (2024)
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
by: Zhang, Zhaoyang, et al.
Published: (2023)
by: Zhang, Zhaoyang, et al.
Published: (2023)
Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data
by: Oh, Youngmin, et al.
Published: (2026)
by: Oh, Youngmin, et al.
Published: (2026)
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)
by: Askari, Arian, et al.
Published: (2025)
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
by: Chen, Houlun, et al.
Published: (2024)
by: Chen, Houlun, et al.
Published: (2024)
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
by: Hoffmann, David T., et al.
Published: (2023)
by: Hoffmann, David T., et al.
Published: (2023)
SLVideo: A Sign Language Video Moment Retrieval Framework
by: Martins, Gonçalo Vinagre, et al.
Published: (2024)
by: Martins, Gonçalo Vinagre, et al.
Published: (2024)
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
by: Yu, An, et al.
Published: (2025)
by: Yu, An, et al.
Published: (2025)
Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
by: Peng, Xinyu, et al.
Published: (2026)
by: Peng, Xinyu, et al.
Published: (2026)
Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection
by: Jin, Pengfei, et al.
Published: (2024)
by: Jin, Pengfei, et al.
Published: (2024)
MapDream: Task-Driven Map Learning for Vision-Language Navigation
by: Lian, Guoxin, et al.
Published: (2026)
by: Lian, Guoxin, et al.
Published: (2026)
Apollo: Unified Multi-Task Audio-Video Joint Generation
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
Similar Items
-
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024) -
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
by: Sun, Hao, et al.
Published: (2024) -
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
by: Um, Sung Jin, et al.
Published: (2025) -
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
by: Yuan, Huaying, et al.
Published: (2025) -
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
by: Sun, Yunzhuo, et al.
Published: (2024)