Saved in:
| Main Authors: | Mahajan, Abhinav, Tripathy, Abhikhya, Pala, Sudeeksha Reddy, Methi, Vaibhav, Joseph, K J, Srinivasan, Balaji Vasan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.14605 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Design-o-meter: Towards Evaluating and Refining Graphic Designs
by: Goyal, Sahil, et al.
Published: (2024)
by: Goyal, Sahil, et al.
Published: (2024)
Step-by-step Layered Design Generation
by: Khan, Faizan Farooq, et al.
Published: (2025)
by: Khan, Faizan Farooq, et al.
Published: (2025)
Test-time Conditional Text-to-Image Synthesis Using Diffusion Models
by: Shukla, Tripti, et al.
Published: (2024)
by: Shukla, Tripti, et al.
Published: (2024)
Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
by: Agarwal, Aishwarya, et al.
Published: (2024)
by: Agarwal, Aishwarya, et al.
Published: (2024)
AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
by: Agarwal, Aishwarya, et al.
Published: (2024)
by: Agarwal, Aishwarya, et al.
Published: (2024)
Agentic Design Review System
by: Nag, Sayan, et al.
Published: (2025)
by: Nag, Sayan, et al.
Published: (2025)
FloAt: Flow Warping of Self-Attention for Clothing Animation Generation
by: Mishra, Swasti Shreya, et al.
Published: (2024)
by: Mishra, Swasti Shreya, et al.
Published: (2024)
ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models
by: Kothandaraman, Divya, et al.
Published: (2024)
by: Kothandaraman, Divya, et al.
Published: (2024)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
by: Chowdhury, Sanjoy, et al.
Published: (2024)
by: Chowdhury, Sanjoy, et al.
Published: (2024)
PICS in Pics: Physics Informed Contour Selection for Rapid Image Segmentation
by: Dwivedi, Vikas, et al.
Published: (2023)
by: Dwivedi, Vikas, et al.
Published: (2023)
Towards Efficient Exemplar Based Image Editing with Multimodal VLMs
by: Jadhav, Avadhoot, et al.
Published: (2025)
by: Jadhav, Avadhoot, et al.
Published: (2025)
Through the PRISM: Principle-Aware, Interpretable, and Multi-Scale Evaluation of Visual Designs
by: Gandhi, Mona, et al.
Published: (2026)
by: Gandhi, Mona, et al.
Published: (2026)
Action Recognition based Industrial Safety Violation Detection
by: Reddy, Surya N, et al.
Published: (2024)
by: Reddy, Surya N, et al.
Published: (2024)
POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
by: Joshi, Abhinav, et al.
Published: (2025)
by: Joshi, Abhinav, et al.
Published: (2025)
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization
by: Magrini, Gabriele, et al.
Published: (2026)
by: Magrini, Gabriele, et al.
Published: (2026)
EV-Flying: an Event-based Dataset for In-The-Wild Recognition of Flying Objects
by: Magrini, Gabriele, et al.
Published: (2025)
by: Magrini, Gabriele, et al.
Published: (2025)
Towards Intrinsic-Aware Monocular 3D Object Detection
by: Zhang, Zhihao, et al.
Published: (2026)
by: Zhang, Zhihao, et al.
Published: (2026)
Unified Framework for Open-World Compositional Zero-shot Learning
by: Jayasekara, Hirunima, et al.
Published: (2024)
by: Jayasekara, Hirunima, et al.
Published: (2024)
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
by: Srivastava, Ashutosh, et al.
Published: (2024)
by: Srivastava, Ashutosh, et al.
Published: (2024)
Towards Explainable LiDAR Point Cloud Semantic Segmentation via Gradient Based Target Localization
by: Kuriyal, Abhishek, et al.
Published: (2024)
by: Kuriyal, Abhishek, et al.
Published: (2024)
Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection
by: Chuchra, Akanksha, et al.
Published: (2026)
by: Chuchra, Akanksha, et al.
Published: (2026)
Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition
by: Bora, Maheswar, et al.
Published: (2025)
by: Bora, Maheswar, et al.
Published: (2025)
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
by: Patnaik, Sohan, et al.
Published: (2025)
by: Patnaik, Sohan, et al.
Published: (2025)
Compositional Image-Text Matching and Retrieval by Grounding Entities
by: Vongala, Madhukar Reddy, et al.
Published: (2025)
by: Vongala, Madhukar Reddy, et al.
Published: (2025)
Drone Detection with Event Cameras
by: Magrini, Gabriele, et al.
Published: (2025)
by: Magrini, Gabriele, et al.
Published: (2025)
Insights from the Algonauts 2025 Winners
by: Scotti, Paul S., et al.
Published: (2025)
by: Scotti, Paul S., et al.
Published: (2025)
UCATSC: Uncertainty-Aware Constrained Traffic Signal Control Under Vision-Based Partial Observability
by: Bodagala, Jayawant, et al.
Published: (2026)
by: Bodagala, Jayawant, et al.
Published: (2026)
Measuring Train Driver Performance as Key to Approval of Driverless Trains
by: Tagiew, Rustam, et al.
Published: (2025)
by: Tagiew, Rustam, et al.
Published: (2025)
Generalizing Monocular 3D Object Detection
by: Kumar, Abhinav
Published: (2025)
by: Kumar, Abhinav
Published: (2025)
Towards Understanding Best Practices for Quantization of Vision-Language Models
by: Das, Gautom, et al.
Published: (2026)
by: Das, Gautom, et al.
Published: (2026)
PRECISe : Prototype-Reservation for Explainable Classification under Imbalanced and Scarce-Data Settings
by: Ganatra, Vaibhav, et al.
Published: (2024)
by: Ganatra, Vaibhav, et al.
Published: (2024)
How to Design and Train Your Implicit Neural Representation for Video Compression
by: Gwilliam, Matthew, et al.
Published: (2025)
by: Gwilliam, Matthew, et al.
Published: (2025)
Exploring Compositionality in Vision Transformers using Wavelet Representations
by: Purushottamdas, Akshad Shyam, et al.
Published: (2025)
by: Purushottamdas, Akshad Shyam, et al.
Published: (2025)
PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement
by: Jamil, Sofia, et al.
Published: (2025)
by: Jamil, Sofia, et al.
Published: (2025)
Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
by: Jamil, Sofia, et al.
Published: (2025)
by: Jamil, Sofia, et al.
Published: (2025)
Interpreting Hand gestures using Object Detection and Digits Classification
by: K, Sangeetha, et al.
Published: (2024)
by: K, Sangeetha, et al.
Published: (2024)
RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion
by: Paregi, Bhanu Pratap, et al.
Published: (2025)
by: Paregi, Bhanu Pratap, et al.
Published: (2025)
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
by: Wei, Yibing, et al.
Published: (2024)
by: Wei, Yibing, et al.
Published: (2024)
Spike-TBR: a Noise Resilient Neuromorphic Event Representation
by: Magrini, Gabriele, et al.
Published: (2025)
by: Magrini, Gabriele, et al.
Published: (2025)
One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition
by: Darur, Balaji, et al.
Published: (2026)
by: Darur, Balaji, et al.
Published: (2026)
Similar Items
-
Design-o-meter: Towards Evaluating and Refining Graphic Designs
by: Goyal, Sahil, et al.
Published: (2024) -
Step-by-step Layered Design Generation
by: Khan, Faizan Farooq, et al.
Published: (2025) -
Test-time Conditional Text-to-Image Synthesis Using Diffusion Models
by: Shukla, Tripti, et al.
Published: (2024) -
Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
by: Agarwal, Aishwarya, et al.
Published: (2024) -
AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
by: Agarwal, Aishwarya, et al.
Published: (2024)