:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mahajan, Abhinav, Tripathy, Abhikhya, Pala, Sudeeksha Reddy, Methi, Vaibhav, Joseph, K J, Srinivasan, Balaji Vasan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.14605
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Design-o-meter: Towards Evaluating and Refining Graphic Designs
by: Goyal, Sahil, et al.
Published: (2024)

Step-by-step Layered Design Generation
by: Khan, Faizan Farooq, et al.
Published: (2025)

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models
by: Shukla, Tripti, et al.
Published: (2024)

Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
by: Agarwal, Aishwarya, et al.
Published: (2024)

AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
by: Agarwal, Aishwarya, et al.
Published: (2024)

Agentic Design Review System
by: Nag, Sayan, et al.
Published: (2025)

FloAt: Flow Warping of Self-Attention for Clothing Animation Generation
by: Mishra, Swasti Shreya, et al.
Published: (2024)

ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models
by: Kothandaraman, Divya, et al.
Published: (2024)

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
by: Chowdhury, Sanjoy, et al.
Published: (2024)

PICS in Pics: Physics Informed Contour Selection for Rapid Image Segmentation
by: Dwivedi, Vikas, et al.
Published: (2023)

Towards Efficient Exemplar Based Image Editing with Multimodal VLMs
by: Jadhav, Avadhoot, et al.
Published: (2025)

Through the PRISM: Principle-Aware, Interpretable, and Multi-Scale Evaluation of Visual Designs
by: Gandhi, Mona, et al.
Published: (2026)

Action Recognition based Industrial Safety Violation Detection
by: Reddy, Surya N, et al.
Published: (2024)

POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
by: Joshi, Abhinav, et al.
Published: (2025)

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization
by: Magrini, Gabriele, et al.
Published: (2026)

EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects
by: Magrini, Gabriele, et al.
Published: (2025)

Towards Intrinsic-Aware Monocular 3D Object Detection
by: Zhang, Zhihao, et al.
Published: (2026)

Unified Framework for Open-World Compositional Zero-shot Learning
by: Jayasekara, Hirunima, et al.
Published: (2024)

ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
by: Srivastava, Ashutosh, et al.
Published: (2024)

Towards Explainable LiDAR Point Cloud Semantic Segmentation via Gradient Based Target Localization
by: Kuriyal, Abhishek, et al.
Published: (2024)

Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection
by: Chuchra, Akanksha, et al.
Published: (2026)

Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition
by: Bora, Maheswar, et al.
Published: (2025)

AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
by: Patnaik, Sohan, et al.
Published: (2025)

Compositional Image-Text Matching and Retrieval by Grounding Entities
by: Vongala, Madhukar Reddy, et al.
Published: (2025)

Drone Detection with Event Cameras
by: Magrini, Gabriele, et al.
Published: (2025)

Insights from the Algonauts 2025 Winners
by: Scotti, Paul S., et al.
Published: (2025)

UCATSC: Uncertainty-Aware Constrained Traffic Signal Control Under Vision-Based Partial Observability
by: Bodagala, Jayawant, et al.
Published: (2026)

Measuring Train Driver Performance as Key to Approval of Driverless Trains
by: Tagiew, Rustam, et al.
Published: (2025)

Generalizing Monocular 3D Object Detection
by: Kumar, Abhinav
Published: (2025)

Towards Understanding Best Practices for Quantization of Vision-Language Models
by: Das, Gautom, et al.
Published: (2026)

PRECISe : Prototype-Reservation for Explainable Classification under Imbalanced and Scarce-Data Settings
by: Ganatra, Vaibhav, et al.
Published: (2024)

How to Design and Train Your Implicit Neural Representation for Video Compression
by: Gwilliam, Matthew, et al.
Published: (2025)

Exploring Compositionality in Vision Transformers using Wavelet Representations
by: Purushottamdas, Akshad Shyam, et al.
Published: (2025)

PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement
by: Jamil, Sofia, et al.
Published: (2025)

Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
by: Jamil, Sofia, et al.
Published: (2025)

Interpreting Hand gestures using Object Detection and Digits Classification
by: K, Sangeetha, et al.
Published: (2024)

RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion
by: Paregi, Bhanu Pratap, et al.
Published: (2025)

Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
by: Wei, Yibing, et al.
Published: (2024)

Spike-TBR: a Noise Resilient Neuromorphic Event Representation
by: Magrini, Gabriele, et al.
Published: (2025)

One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition
by: Darur, Balaji, et al.
Published: (2026)