Saved in:
| Main Authors: | Yilmaz, Nilay, Patel, Maitreya, Kusumba, Naga Sai Abhiram, He, Yixuan, Yang, Yezhou |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.19357 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025)
by: Kusumba, Abhiram, et al.
Published: (2025)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)
by: Patel, Maitreya, et al.
Published: (2023)
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)
by: Yilmaz, Nilay, et al.
Published: (2025)
Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations
by: Patel, Maitreya, et al.
Published: (2026)
by: Patel, Maitreya, et al.
Published: (2026)
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)
by: Cheng, Sheng, et al.
Published: (2024)
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025)
by: Malaviya, Vatsal, et al.
Published: (2025)
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
by: Pathiraja, Bimsara, et al.
Published: (2025)
by: Pathiraja, Bimsara, et al.
Published: (2025)
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
by: Kim, Changhoon, et al.
Published: (2023)
by: Kim, Changhoon, et al.
Published: (2023)
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
by: Fallah, Forouzan, et al.
Published: (2025)
by: Fallah, Forouzan, et al.
Published: (2025)
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
by: Vani, Sameep, et al.
Published: (2025)
by: Vani, Sameep, et al.
Published: (2025)
Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers
by: Talasila, Abhiroop, et al.
Published: (2024)
by: Talasila, Abhiroop, et al.
Published: (2024)
MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes
by: Pande, Nilay, et al.
Published: (2025)
by: Pande, Nilay, et al.
Published: (2025)
StrideNET: Swin Transformer for Terrain Recognition with Dynamic Roughness Extraction
by: Shelare, Maitreya, et al.
Published: (2024)
by: Shelare, Maitreya, et al.
Published: (2024)
Axial-UNet: A Neural Weather Model for Precipitation Nowcasting
by: Mamtani, Sumit, et al.
Published: (2025)
by: Mamtani, Sumit, et al.
Published: (2025)
VOCAL: Visual Odometry via ContrAstive Learning
by: Huang, Chi-Yao, et al.
Published: (2025)
by: Huang, Chi-Yao, et al.
Published: (2025)
LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition
by: Chappa, Naga Venkata Sai Raviteja, et al.
Published: (2024)
by: Chappa, Naga Venkata Sai Raviteja, et al.
Published: (2024)
Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling for Motion Deblurring
by: Suin, Maitreya, et al.
Published: (2024)
by: Suin, Maitreya, et al.
Published: (2024)
`Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning
by: Feinglass, Joshua, et al.
Published: (2024)
by: Feinglass, Joshua, et al.
Published: (2024)
HueManity: Probing Fine-Grained Visual Perception in MLLMs
by: Grover, Rynaa, et al.
Published: (2025)
by: Grover, Rynaa, et al.
Published: (2025)
Hi-Light: A Path to high-fidelity, high-resolution video relighting with a Novel Evaluation Paradigm
by: Liu, Xiangrui, et al.
Published: (2026)
by: Liu, Xiangrui, et al.
Published: (2026)
Zero-Shot Personalization of Objects via Textual Inversion
by: Roy, Aniket, et al.
Published: (2026)
by: Roy, Aniket, et al.
Published: (2026)
RemoteVAR: Autoregressive Visual Modeling for Remote Sensing Change Detection
by: Korkmaz, Yilmaz, et al.
Published: (2026)
by: Korkmaz, Yilmaz, et al.
Published: (2026)
CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models
by: Suin, Maitreya, et al.
Published: (2024)
by: Suin, Maitreya, et al.
Published: (2024)
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks
by: Hong, Jinyung, et al.
Published: (2024)
by: Hong, Jinyung, et al.
Published: (2024)
Information Geometry of Evolution of Neural Network Parameters While Training
by: Thiruthummal, Abhiram Anand, et al.
Published: (2024)
by: Thiruthummal, Abhiram Anand, et al.
Published: (2024)
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
by: Naharas, Nilay, et al.
Published: (2025)
by: Naharas, Nilay, et al.
Published: (2025)
Event-based Graph Representation with Spatial and Motion Vectors for Asynchronous Object Detection
by: Verma, Aayush Atul, et al.
Published: (2025)
by: Verma, Aayush Atul, et al.
Published: (2025)
Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)
by: Wang, Yancheng, et al.
Published: (2024)
Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks
by: Satish, Manthan Chelenahalli, et al.
Published: (2024)
by: Satish, Manthan Chelenahalli, et al.
Published: (2024)
SpaRRTa: A Synthetic Benchmark for Evaluating Spatial Intelligence in Visual Foundation Models
by: Kargin, Turhan Can, et al.
Published: (2026)
by: Kargin, Turhan Can, et al.
Published: (2026)
Learning Sparse Visual Representations via Spatial-Semantic Factorization
by: Zhao, Theodore Zhengde, et al.
Published: (2026)
by: Zhao, Theodore Zhengde, et al.
Published: (2026)
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
by: Feinglass, Joshua, et al.
Published: (2024)
by: Feinglass, Joshua, et al.
Published: (2024)
SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
by: Aliminati, Manideep Reddy, et al.
Published: (2024)
by: Aliminati, Manideep Reddy, et al.
Published: (2024)
Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
Deep Learning in Image Classification: Evaluating VGG19's Performance on Complex Visual Data
by: He, Weijie, et al.
Published: (2024)
by: He, Weijie, et al.
Published: (2024)
Similar Items
-
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025) -
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024) -
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023) -
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025) -
Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
by: Patel, Maitreya, et al.
Published: (2024)