Saved in:
| Main Authors: | Liu, Man, Bai, Huihui, Li, Feng, Zhang, Chunjie, Wei, Yunchao, Chua, Tat-Seng, Zhao, Yao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.03032 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning
by: Liu, Man, et al.
Published: (2024)
by: Liu, Man, et al.
Published: (2024)
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
Extending Visual Dynamics for Video-to-Music Generation
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Harnessing Group-Oriented Consistency Constraints for Semi-Supervised Semantic Segmentation in CdZnTe Semiconductors
by: Li, Peihao, et al.
Published: (2025)
by: Li, Peihao, et al.
Published: (2025)
Can I Trust Your Answer? Visually Grounded Video Question Answering
by: Xiao, Junbin, et al.
Published: (2023)
by: Xiao, Junbin, et al.
Published: (2023)
Region-Adaptive Transform with Segmentation Prior for Image Compression
by: Liu, Yuxi, et al.
Published: (2024)
by: Liu, Yuxi, et al.
Published: (2024)
Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art
by: Jin, Zhe, et al.
Published: (2025)
by: Jin, Zhe, et al.
Published: (2025)
FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models
by: Liao, Xinting, et al.
Published: (2025)
by: Liao, Xinting, et al.
Published: (2025)
Universal Scene Graph Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2026)
by: Fang, Xiang, et al.
Published: (2026)
Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration
by: He, Jinghan, et al.
Published: (2026)
by: He, Jinghan, et al.
Published: (2026)
Disentangling Masked Autoencoders for Unsupervised Domain Generalization
by: Zhang, An, et al.
Published: (2024)
by: Zhang, An, et al.
Published: (2024)
Understanding Long Videos via LLM-Powered Entity Relation Graphs
by: Chu, Meng, et al.
Published: (2025)
by: Chu, Meng, et al.
Published: (2025)
Principled Multimodal Representation Learning
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
by: Jiang, Huajie, et al.
Published: (2025)
by: Jiang, Huajie, et al.
Published: (2025)
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
by: Qi, Ji, et al.
Published: (2025)
by: Qi, Ji, et al.
Published: (2025)
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
by: Chen, Yiyang, et al.
Published: (2022)
by: Chen, Yiyang, et al.
Published: (2022)
3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective
by: Zheng, Zhedong, et al.
Published: (2022)
by: Zheng, Zhedong, et al.
Published: (2022)
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
by: Tan, Chuangchuang, et al.
Published: (2024)
by: Tan, Chuangchuang, et al.
Published: (2024)
Visual Adaptive Prompting for Compositional Zero-Shot Learning
by: Stein, Kyle, et al.
Published: (2025)
by: Stein, Kyle, et al.
Published: (2025)
PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution
by: Zhang, Yaning, et al.
Published: (2025)
by: Zhang, Yaning, et al.
Published: (2025)
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
by: Liu, Han, et al.
Published: (2025)
by: Liu, Han, et al.
Published: (2025)
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
by: Fei, Hao, et al.
Published: (2024)
by: Fei, Hao, et al.
Published: (2024)
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
by: Li, Juncheng, et al.
Published: (2023)
by: Li, Juncheng, et al.
Published: (2023)
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
by: Fei, Hao, et al.
Published: (2023)
by: Fei, Hao, et al.
Published: (2023)
UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval
by: Guo, Hongyu, et al.
Published: (2025)
by: Guo, Hongyu, et al.
Published: (2025)
Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal
by: Liao, Rongxin, et al.
Published: (2025)
by: Liao, Rongxin, et al.
Published: (2025)
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)
by: Chu, Meng, et al.
Published: (2023)
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
by: Zhu, Fengbin, et al.
Published: (2024)
by: Zhu, Fengbin, et al.
Published: (2024)
Learning Visual Proxy for Compositional Zero-Shot Learning
by: Zhang, Shiyu, et al.
Published: (2025)
by: Zhang, Shiyu, et al.
Published: (2025)
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis
by: Lin, Dongheng, et al.
Published: (2025)
by: Lin, Dongheng, et al.
Published: (2025)
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
by: Tian, Junjiao, et al.
Published: (2023)
by: Tian, Junjiao, et al.
Published: (2023)
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition
by: Xu, Yiyan, et al.
Published: (2025)
by: Xu, Yiyan, et al.
Published: (2025)
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
by: Liang, Xiao, et al.
Published: (2025)
by: Liang, Xiao, et al.
Published: (2025)
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
by: Wei, Shengxun, et al.
Published: (2024)
by: Wei, Shengxun, et al.
Published: (2024)
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
by: Cui, Chenhang, et al.
Published: (2024)
by: Cui, Chenhang, et al.
Published: (2024)
MURE: Hierarchical Multi-Resolution Encoding via Vision-Language Models for Visual Document Retrieval
by: Zhu, Fengbin, et al.
Published: (2026)
by: Zhu, Fengbin, et al.
Published: (2026)
PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures
by: Shao, Yuheng, et al.
Published: (2025)
by: Shao, Yuheng, et al.
Published: (2025)
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
by: Yu, Xiao, et al.
Published: (2025)
by: Yu, Xiao, et al.
Published: (2025)
CoPS: Conditional Prompt Synthesis for Zero-Shot Anomaly Detection
by: Chen, Qiyu, et al.
Published: (2025)
by: Chen, Qiyu, et al.
Published: (2025)
Similar Items
-
PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning
by: Liu, Man, et al.
Published: (2024) -
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
by: Zhou, Zhenglin, et al.
Published: (2025) -
Extending Visual Dynamics for Video-to-Music Generation
by: Liu, Xiaohao, et al.
Published: (2025) -
Harnessing Group-Oriented Consistency Constraints for Semi-Supervised Semantic Segmentation in CdZnTe Semiconductors
by: Li, Peihao, et al.
Published: (2025) -
Can I Trust Your Answer? Visually Grounded Video Question Answering
by: Xiao, Junbin, et al.
Published: (2023)