Saved in:
| Main Authors: | Ahmad, Niaz, Lee, Youngmoon, Wang, Guanghui |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.19032 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation
by: Ahmad, Niaz, et al.
Published: (2025)
by: Ahmad, Niaz, et al.
Published: (2025)
AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network
by: Kang, Donghwa, et al.
Published: (2024)
by: Kang, Donghwa, et al.
Published: (2024)
NCDD: Nearest Centroid Distance Deficit for Out-Of-Distribution Detection in Gastrointestinal Vision
by: Pokhrel, Sandesh, et al.
Published: (2024)
by: Pokhrel, Sandesh, et al.
Published: (2024)
Steerable Visual Representations
by: Ruthardt, Jona, et al.
Published: (2026)
by: Ruthardt, Jona, et al.
Published: (2026)
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
by: Paruchuri, Akshay, et al.
Published: (2026)
by: Paruchuri, Akshay, et al.
Published: (2026)
A Survey on Mamba Architecture for Vision Applications
by: Ibrahim, Fady, et al.
Published: (2025)
by: Ibrahim, Fady, et al.
Published: (2025)
Beyond ZOH: Advanced Discretization Strategies for Vision Mamba
by: Ibrahim, Fady, et al.
Published: (2026)
by: Ibrahim, Fady, et al.
Published: (2026)
Bridging the Visual-to-Physical Gap: Physically Aligned Representations for Fall Risk Analysis
by: Zhang, Xianqi
Published: (2026)
by: Zhang, Xianqi
Published: (2026)
Mitigating Bias in Facial Recognition Systems: Centroid Fairness Loss Optimization
by: Conti, Jean-Rémy, et al.
Published: (2025)
by: Conti, Jean-Rémy, et al.
Published: (2025)
Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers
by: Gandhi, Sanket, et al.
Published: (2024)
by: Gandhi, Sanket, et al.
Published: (2024)
Representation Learning with Adaptive Superpixel Coding
by: Khalil, Mahmoud, et al.
Published: (2025)
by: Khalil, Mahmoud, et al.
Published: (2025)
VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought
by: Lee, Eunsoo, et al.
Published: (2026)
by: Lee, Eunsoo, et al.
Published: (2026)
Aligning Machine and Human Visual Representations across Abstraction Levels
by: Muttenthaler, Lukas, et al.
Published: (2024)
by: Muttenthaler, Lukas, et al.
Published: (2024)
Privacy-Concealing Cooperative Perception for BEV Scene Segmentation
by: Wang, Song, et al.
Published: (2026)
by: Wang, Song, et al.
Published: (2026)
Comparing Computational Pathology Foundation Models using Representational Similarity Analysis
by: Mishra, Vaibhav, et al.
Published: (2025)
by: Mishra, Vaibhav, et al.
Published: (2025)
Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
Cluster Contrast for Unsupervised Visual Representation Learning
by: Giakoumoglou, Nikolaos, et al.
Published: (2025)
by: Giakoumoglou, Nikolaos, et al.
Published: (2025)
Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT
by: Zheng, Sixiao, et al.
Published: (2024)
by: Zheng, Sixiao, et al.
Published: (2024)
Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation
by: Ahmad, Kaleem
Published: (2025)
by: Ahmad, Kaleem
Published: (2025)
PanoTPS-Net: Panoramic Room Layout Estimation via Thin Plate Spline Transformation
by: Ibrahem, Hatem, et al.
Published: (2025)
by: Ibrahem, Hatem, et al.
Published: (2025)
GTMA: Dynamic Representation Optimization for OOD Vision-Language Models
by: Zhang, Jensen, et al.
Published: (2025)
by: Zhang, Jensen, et al.
Published: (2025)
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising
by: Lin, Han, et al.
Published: (2026)
by: Lin, Han, et al.
Published: (2026)
Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models
by: Belal, Mohammad, et al.
Published: (2024)
by: Belal, Mohammad, et al.
Published: (2024)
SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation
by: Miller, Luke James, et al.
Published: (2026)
by: Miller, Luke James, et al.
Published: (2026)
Human-Like Coarse Object Representations in Vision Models
by: Gizdov, Andrey, et al.
Published: (2026)
by: Gizdov, Andrey, et al.
Published: (2026)
Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT
by: Asfour, Alaa, et al.
Published: (2026)
by: Asfour, Alaa, et al.
Published: (2026)
Robust 3D Point Clouds Classification based on Declarative Defenders
by: Li, Kaidong, et al.
Published: (2024)
by: Li, Kaidong, et al.
Published: (2024)
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
by: Eftekhar, Ainaz, et al.
Published: (2023)
by: Eftekhar, Ainaz, et al.
Published: (2023)
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)
by: Li, Siyuan, et al.
Published: (2025)
Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025)
by: Zhang, Ziyang, et al.
Published: (2025)
Uncertainty-Informed Volume Visualization using Implicit Neural Representation
by: Saklani, Shanu, et al.
Published: (2024)
by: Saklani, Shanu, et al.
Published: (2024)
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
by: Kim, Jeongho, et al.
Published: (2024)
by: Kim, Jeongho, et al.
Published: (2024)
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation
by: Wang, Haoming, et al.
Published: (2026)
by: Wang, Haoming, et al.
Published: (2026)
Reanimating Images using Neural Representations of Dynamic Stimuli
by: Yeung, Jacob, et al.
Published: (2024)
by: Yeung, Jacob, et al.
Published: (2024)
LLMs in Political Science: Heralding a New Era of Visual Analysis
by: Wang, Yu
Published: (2024)
by: Wang, Yu
Published: (2024)
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)
by: Li, Hengzhuang, et al.
Published: (2025)
DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention
by: Tang, Xiaoya, et al.
Published: (2024)
by: Tang, Xiaoya, et al.
Published: (2024)
LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
by: Xie, Roy, et al.
Published: (2026)
by: Xie, Roy, et al.
Published: (2026)
Similar Items
-
Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation
by: Ahmad, Niaz, et al.
Published: (2025) -
AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network
by: Kang, Donghwa, et al.
Published: (2024) -
NCDD: Nearest Centroid Distance Deficit for Out-Of-Distribution Detection in Gastrointestinal Vision
by: Pokhrel, Sandesh, et al.
Published: (2024) -
Steerable Visual Representations
by: Ruthardt, Jona, et al.
Published: (2026) -
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
by: Paruchuri, Akshay, et al.
Published: (2026)