Saved in:
| Main Authors: | Cheng, Jiacheng, Shin, Hijung Valentina, Vasconcelos, Nuno, Russell, Bryan, Heilbron, Fabian Caba |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.03190 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EditDuet: A Multi-Agent System for Video Non-Linear Editing
by: Sandoval-Castaneda, Marcelo, et al.
Published: (2025)
by: Sandoval-Castaneda, Marcelo, et al.
Published: (2025)
Discovering Divergent Representations between Text-to-Image Models
by: Dunlap, Lisa, et al.
Published: (2025)
by: Dunlap, Lisa, et al.
Published: (2025)
Improving Personalized Search with Regularized Low-Rank Parameter Updates
by: Ryan, Fiona, et al.
Published: (2025)
by: Ryan, Fiona, et al.
Published: (2025)
ResidualViT for Efficient Temporally Dense Video Encoding
by: Soldan, Mattia, et al.
Published: (2025)
by: Soldan, Mattia, et al.
Published: (2025)
Generative Timelines for Instructed Visual Assembly
by: Pardo, Alejandro, et al.
Published: (2024)
by: Pardo, Alejandro, et al.
Published: (2024)
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
by: Phung, Quynh, et al.
Published: (2025)
by: Phung, Quynh, et al.
Published: (2025)
Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior
by: Lin, David Chuan-En, et al.
Published: (2022)
by: Lin, David Chuan-En, et al.
Published: (2022)
VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototyping in the Latent Space
by: Lin, David Chuan-En, et al.
Published: (2022)
by: Lin, David Chuan-En, et al.
Published: (2022)
Scaling Up Video Summarization Pretraining with Large Language Models
by: Argaw, Dawit Mureja, et al.
Published: (2024)
by: Argaw, Dawit Mureja, et al.
Published: (2024)
Towards Automated Movie Trailer Generation
by: Argaw, Dawit Mureja, et al.
Published: (2024)
by: Argaw, Dawit Mureja, et al.
Published: (2024)
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
by: Kwon, Gihyun, et al.
Published: (2024)
by: Kwon, Gihyun, et al.
Published: (2024)
Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
by: Sridhar, Deepak, et al.
Published: (2024)
by: Sridhar, Deepak, et al.
Published: (2024)
SCHEME: Scalable Channel Mixer for Vision Transformers
by: Sridhar, Deepak, et al.
Published: (2023)
by: Sridhar, Deepak, et al.
Published: (2023)
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
by: Sarkar, Aditya, et al.
Published: (2026)
by: Sarkar, Aditya, et al.
Published: (2026)
Diffusion Models with Adaptive Negative Sampling Without External Resources
by: Desai, Alakh, et al.
Published: (2025)
by: Desai, Alakh, et al.
Published: (2025)
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
by: Sridhar, Deepak, et al.
Published: (2024)
by: Sridhar, Deepak, et al.
Published: (2024)
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Improving image synthesis with diffusion-negative sampling
by: Desai, Alakh, et al.
Published: (2024)
by: Desai, Alakh, et al.
Published: (2024)
EditAR: Unified Conditional Generation with Autoregressive Models
by: Mu, Jiteng, et al.
Published: (2025)
by: Mu, Jiteng, et al.
Published: (2025)
Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models
by: Sadanandan, Binesh, et al.
Published: (2026)
by: Sadanandan, Binesh, et al.
Published: (2026)
PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
by: Sadanandan, Binesh, et al.
Published: (2026)
by: Sadanandan, Binesh, et al.
Published: (2026)
Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)
by: Paischer, Fabian, et al.
Published: (2023)
Fairness and Bias Mitigation in Computer Vision: A Survey
by: Dehdashtian, Sepehr, et al.
Published: (2024)
by: Dehdashtian, Sepehr, et al.
Published: (2024)
Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
by: Adachi, Kazuki, et al.
Published: (2025)
by: Adachi, Kazuki, et al.
Published: (2025)
An Attribute-Based Measure of Video Complexity
by: Sarkar, Aditya, et al.
Published: (2026)
by: Sarkar, Aditya, et al.
Published: (2026)
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
by: Gao, Zhengqing, et al.
Published: (2024)
by: Gao, Zhengqing, et al.
Published: (2024)
Long-Tailed Anomaly Detection with Learnable Class Names
by: Ho, Chih-Hui, et al.
Published: (2024)
by: Ho, Chih-Hui, et al.
Published: (2024)
Diffusion-based Data Augmentation for Object Counting Problems
by: Wang, Zhen, et al.
Published: (2024)
by: Wang, Zhen, et al.
Published: (2024)
PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing
by: Mahadev, Rohan, et al.
Published: (2026)
by: Mahadev, Rohan, et al.
Published: (2026)
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
by: Wu, Tz-Ying, et al.
Published: (2023)
by: Wu, Tz-Ying, et al.
Published: (2023)
EgoPrivacy: What Your First-Person Camera Says About You?
by: Li, Yijiang, et al.
Published: (2025)
by: Li, Yijiang, et al.
Published: (2025)
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
by: Xing, Mingwei, et al.
Published: (2026)
by: Xing, Mingwei, et al.
Published: (2026)
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
by: Won, John, et al.
Published: (2025)
by: Won, John, et al.
Published: (2025)
Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing
by: Lou, Meng, et al.
Published: (2026)
by: Lou, Meng, et al.
Published: (2026)
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
by: Kumar, Anand, et al.
Published: (2024)
by: Kumar, Anand, et al.
Published: (2024)
Vision encoders should be image size agnostic and task driven
by: Prisadnikov, Nedyalko, et al.
Published: (2025)
by: Prisadnikov, Nedyalko, et al.
Published: (2025)
CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection
by: Khan, Sohail Ahmed, et al.
Published: (2024)
by: Khan, Sohail Ahmed, et al.
Published: (2024)
Anomaly Detection by Adapting a pre-trained Vision Language Model
by: Cai, Yuxuan, et al.
Published: (2024)
by: Cai, Yuxuan, et al.
Published: (2024)
Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation
by: Zhang, Xiaoran, et al.
Published: (2025)
by: Zhang, Xiaoran, et al.
Published: (2025)
Similar Items
-
EditDuet: A Multi-Agent System for Video Non-Linear Editing
by: Sandoval-Castaneda, Marcelo, et al.
Published: (2025) -
Discovering Divergent Representations between Text-to-Image Models
by: Dunlap, Lisa, et al.
Published: (2025) -
Improving Personalized Search with Regularized Low-Rank Parameter Updates
by: Ryan, Fiona, et al.
Published: (2025) -
ResidualViT for Efficient Temporally Dense Video Encoding
by: Soldan, Mattia, et al.
Published: (2025) -
Generative Timelines for Instructed Visual Assembly
by: Pardo, Alejandro, et al.
Published: (2024)