:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cheng, Jiacheng, Shin, Hijung Valentina, Vasconcelos, Nuno, Russell, Bryan, Heilbron, Fabian Caba
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.03190
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EditDuet: A Multi-Agent System for Video Non-Linear Editing
by: Sandoval-Castaneda, Marcelo, et al.
Published: (2025)

Discovering Divergent Representations between Text-to-Image Models
by: Dunlap, Lisa, et al.
Published: (2025)

Improving Personalized Search with Regularized Low-Rank Parameter Updates
by: Ryan, Fiona, et al.
Published: (2025)

ResidualViT for Efficient Temporally Dense Video Encoding
by: Soldan, Mattia, et al.
Published: (2025)

Generative Timelines for Instructed Visual Assembly
by: Pardo, Alejandro, et al.
Published: (2024)

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)

CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
by: Phung, Quynh, et al.
Published: (2025)

Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior
by: Lin, David Chuan-En, et al.
Published: (2022)

VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototyping in the Latent Space
by: Lin, David Chuan-En, et al.
Published: (2022)

Scaling Up Video Summarization Pretraining with Large Language Models
by: Argaw, Dawit Mureja, et al.
Published: (2024)

Towards Automated Movie Trailer Generation
by: Argaw, Dawit Mureja, et al.
Published: (2024)

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
by: Kwon, Gihyun, et al.
Published: (2024)

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
by: Sridhar, Deepak, et al.
Published: (2024)

SCHEME: Scalable Channel Mixer for Vision Transformers
by: Sridhar, Deepak, et al.
Published: (2023)

Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
by: Sarkar, Aditya, et al.
Published: (2026)

Diffusion Models with Adaptive Negative Sampling Without External Resources
by: Desai, Alakh, et al.
Published: (2025)

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models
by: Sridhar, Deepak, et al.
Published: (2024)

Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)

Improving image synthesis with diffusion-negative sampling
by: Desai, Alakh, et al.
Published: (2024)

EditAR: Unified Conditional Generation with Autoregressive Models
by: Mu, Jiteng, et al.
Published: (2025)

Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models
by: Sadanandan, Binesh, et al.
Published: (2026)

PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
by: Sadanandan, Binesh, et al.
Published: (2026)

Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)

Fairness and Bias Mitigation in Computer Vision: A Survey
by: Dehdashtian, Sepehr, et al.
Published: (2024)

Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
by: Adachi, Kazuki, et al.
Published: (2025)

An Attribute-Based Measure of Video Complexity
by: Sarkar, Aditya, et al.
Published: (2026)

Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
by: Gao, Zhengqing, et al.
Published: (2024)

Long-Tailed Anomaly Detection with Learnable Class Names
by: Ho, Chih-Hui, et al.
Published: (2024)

Diffusion-based Data Augmentation for Object Counting Problems
by: Wang, Zhen, et al.
Published: (2024)

PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing
by: Mahadev, Rohan, et al.
Published: (2026)

ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
by: Wu, Tz-Ying, et al.
Published: (2023)

EgoPrivacy: What Your First-Person Camera Says About You?
by: Li, Yijiang, et al.
Published: (2025)

AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
by: Xing, Mingwei, et al.
Published: (2026)

Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
by: Won, John, et al.
Published: (2025)

Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing
by: Lou, Meng, et al.
Published: (2026)

IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
by: Kumar, Anand, et al.
Published: (2024)

Vision encoders should be image size agnostic and task driven
by: Prisadnikov, Nedyalko, et al.
Published: (2025)

CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection
by: Khan, Sohail Ahmed, et al.
Published: (2024)

Anomaly Detection by Adapting a pre-trained Vision Language Model
by: Cai, Yuxuan, et al.
Published: (2024)

Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation
by: Zhang, Xiaoran, et al.
Published: (2025)