:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ahmad, Niaz, Lee, Youngmoon, Wang, Guanghui
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.19032
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation
by: Ahmad, Niaz, et al.
Published: (2025)

AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network
by: Kang, Donghwa, et al.
Published: (2024)

NCDD: Nearest Centroid Distance Deficit for Out-Of-Distribution Detection in Gastrointestinal Vision
by: Pokhrel, Sandesh, et al.
Published: (2024)

Steerable Visual Representations
by: Ruthardt, Jona, et al.
Published: (2026)

The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
by: Paruchuri, Akshay, et al.
Published: (2026)

A Survey on Mamba Architecture for Vision Applications
by: Ibrahim, Fady, et al.
Published: (2025)

Beyond ZOH: Advanced Discretization Strategies for Vision Mamba
by: Ibrahim, Fady, et al.
Published: (2026)

Bridging the Visual-to-Physical Gap: Physically Aligned Representations for Fall Risk Analysis
by: Zhang, Xianqi
Published: (2026)

Mitigating Bias in Facial Recognition Systems: Centroid Fairness Loss Optimization
by: Conti, Jean-Rémy, et al.
Published: (2025)

Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers
by: Gandhi, Sanket, et al.
Published: (2024)

Representation Learning with Adaptive Superpixel Coding
by: Khalil, Mahmoud, et al.
Published: (2025)

VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
by: Lee, Eunsoo, et al.
Published: (2026)

Aligning Machine and Human Visual Representations across Abstraction Levels
by: Muttenthaler, Lukas, et al.
Published: (2024)

Privacy-Concealing Cooperative Perception for BEV Scene Segmentation
by: Wang, Song, et al.
Published: (2026)

Comparing Computational Pathology Foundation Models using Representational Similarity Analysis
by: Mishra, Vaibhav, et al.
Published: (2025)

Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
by: Wang, Wentao, et al.
Published: (2025)

Cluster Contrast for Unsupervised Visual Representation Learning
by: Giakoumoglou, Nikolaos, et al.
Published: (2025)

Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT
by: Zheng, Sixiao, et al.
Published: (2024)

Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation
by: Ahmad, Kaleem
Published: (2025)

PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation
by: Ibrahem, Hatem, et al.
Published: (2025)

GTMA: Dynamic Representation Optimization for OOD Vision-Language Models
by: Zhang, Jensen, et al.
Published: (2025)

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising
by: Lin, Han, et al.
Published: (2026)

Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models
by: Belal, Mohammad, et al.
Published: (2024)

SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation
by: Miller, Luke James, et al.
Published: (2026)

Human-Like Coarse Object Representations in Vision Models
by: Gizdov, Andrey, et al.
Published: (2026)

Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT
by: Asfour, Alaa, et al.
Published: (2026)

Robust 3D Point Clouds Classification based on Declarative Defenders
by: Li, Kaidong, et al.
Published: (2024)

Selective Visual Representations Improve Convergence and Generalization for Embodied AI
by: Eftekhar, Ainaz, et al.
Published: (2023)

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)

Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025)

Uncertainty-Informed Volume Visualization using Implicit Neural Representation
by: Saklani, Shanu, et al.
Published: (2024)

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
by: Kim, Jeongho, et al.
Published: (2024)

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)

MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation
by: Wang, Haoming, et al.
Published: (2026)

Reanimating Images using Neural Representations of Dynamic Stimuli
by: Yeung, Jacob, et al.
Published: (2024)

LLMs in Political Science: Heralding a New Era of Visual Analysis
by: Wang, Yu
Published: (2024)

Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)

DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention
by: Tang, Xiaoya, et al.
Published: (2024)

LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
by: Xie, Roy, et al.
Published: (2026)