Saved in:
| Main Authors: | Passi, Ananya, Robinson, Brian S., Bonner, Michael F. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.19155 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An extremely coarse feedback signal is sufficient for learning human-aligned visual representations
by: Mehta, Yash, et al.
Published: (2026)
by: Mehta, Yash, et al.
Published: (2026)
Universal dimensions of visual representation
by: Chen, Zirui, et al.
Published: (2024)
by: Chen, Zirui, et al.
Published: (2024)
Rapidly deploying on-device eye tracking by distilling visual foundation models
by: Jiang, Cheng, et al.
Published: (2026)
by: Jiang, Cheng, et al.
Published: (2026)
SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition
by: Chen, Shunpeng, et al.
Published: (2025)
by: Chen, Shunpeng, et al.
Published: (2025)
Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation
by: Weekley, Daron, et al.
Published: (2025)
by: Weekley, Daron, et al.
Published: (2025)
DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking
by: Zha, Jirong, et al.
Published: (2025)
by: Zha, Jirong, et al.
Published: (2025)
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
by: Peirone, Simone Alberto, et al.
Published: (2025)
by: Peirone, Simone Alberto, et al.
Published: (2025)
Towards long-term player tracking with graph hierarchies and domain-specific features
by: Koshkina, Maria, et al.
Published: (2025)
by: Koshkina, Maria, et al.
Published: (2025)
Mining Contextualized Visual Associations from Images for Creativity Understanding
by: Sahu, Ananya, et al.
Published: (2025)
by: Sahu, Ananya, et al.
Published: (2025)
Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
FedPartWhole: Federated domain generalization via consistent part-whole hierarchies
by: Radwan, Ahmed, et al.
Published: (2024)
by: Radwan, Ahmed, et al.
Published: (2024)
Characterizing Universal Object Representations Across Vision Models
by: Mahner, Florian P., et al.
Published: (2026)
by: Mahner, Florian P., et al.
Published: (2026)
On the rankability of visual embeddings
by: Sonthalia, Ankit, et al.
Published: (2025)
by: Sonthalia, Ankit, et al.
Published: (2025)
ARMARecon: An ARMA Convolutional Filter based Graph Neural Network for Neurodegenerative Dementias Classification
by: Abburi, VSS Tejaswi, et al.
Published: (2026)
by: Abburi, VSS Tejaswi, et al.
Published: (2026)
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos
by: Thomas, Xavier, et al.
Published: (2025)
by: Thomas, Xavier, et al.
Published: (2025)
Characterizing the visual representation of objects from the child's view
by: Yang, Jane, et al.
Published: (2026)
by: Yang, Jane, et al.
Published: (2026)
HOLA: Enhancing Audio-visual Deepfake Detection via Hierarchical Contextual Aggregations and Efficient Pre-training
by: Wu, Xuecheng, et al.
Published: (2025)
by: Wu, Xuecheng, et al.
Published: (2025)
Neuromorphic visual attention for Sign-language recognition on SpiNNaker
by: Liskova, Sarka, et al.
Published: (2026)
by: Liskova, Sarka, et al.
Published: (2026)
Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement
by: Akbar, Sahil Ali, et al.
Published: (2024)
by: Akbar, Sahil Ali, et al.
Published: (2024)
UltrON: Ultrasound Occupancy Networks
by: Wysocki, Magdalena, et al.
Published: (2025)
by: Wysocki, Magdalena, et al.
Published: (2025)
Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection
by: Aggarwal, Sajal, et al.
Published: (2024)
by: Aggarwal, Sajal, et al.
Published: (2024)
LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application
by: Huang, Zhe, et al.
Published: (2024)
by: Huang, Zhe, et al.
Published: (2024)
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
by: Melnyk, Pavlo, et al.
Published: (2022)
by: Melnyk, Pavlo, et al.
Published: (2022)
Wandering around: A bioinspired approach to visual attention through object motion sensitivity
by: D'Angelo, Giulia, et al.
Published: (2025)
by: D'Angelo, Giulia, et al.
Published: (2025)
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
by: Singh, Vikram, et al.
Published: (2025)
by: Singh, Vikram, et al.
Published: (2025)
Spectral Progressive Diffusion for Efficient Image and Video Generation
by: Xiao, Howard, et al.
Published: (2026)
by: Xiao, Howard, et al.
Published: (2026)
Large-scale visual SLAM for in-the-wild videos
by: Sun, Shuo, et al.
Published: (2025)
by: Sun, Shuo, et al.
Published: (2025)
Explaning with trees: interpreting CNNs using hierarchies
by: Rodrigues, Caroline Mazini, et al.
Published: (2024)
by: Rodrigues, Caroline Mazini, et al.
Published: (2024)
Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation
by: Chao, Brian, et al.
Published: (2026)
by: Chao, Brian, et al.
Published: (2026)
RemEdit: Efficient Diffusion Editing with Riemannian Geometry
by: Adhikarla, Eashan, et al.
Published: (2026)
by: Adhikarla, Eashan, et al.
Published: (2026)
A transition towards virtual representations of visual scenes
by: Pereira, Américo, et al.
Published: (2024)
by: Pereira, Américo, et al.
Published: (2024)
AI-driven visual monitoring of industrial assembly tasks
by: Nardon, Mattia, et al.
Published: (2025)
by: Nardon, Mattia, et al.
Published: (2025)
Perception Encoder: The best visual embeddings are not at the output of the network
by: Bolya, Daniel, et al.
Published: (2025)
by: Bolya, Daniel, et al.
Published: (2025)
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
by: Chung, Jiwan, et al.
Published: (2024)
by: Chung, Jiwan, et al.
Published: (2024)
Automated mapping of virtual environments with visual predictive coding
by: Gornet, James, et al.
Published: (2023)
by: Gornet, James, et al.
Published: (2023)
Block-Sparse Global Attention for Efficient Multi-View Geometry Transformers
by: Wang, Chung-Shien Brian, et al.
Published: (2025)
by: Wang, Chung-Shien Brian, et al.
Published: (2025)
Incremental dimension reduction for efficient and accurate visual anomaly detection
by: Lee, Teng-Yok
Published: (2026)
by: Lee, Teng-Yok
Published: (2026)
SCHIGAND: A Synthetic Facial Generation Mode Pipeline
by: Kadali, Ananya, et al.
Published: (2026)
by: Kadali, Ananya, et al.
Published: (2026)
Affine transformation estimation improves visual self-supervised learning
by: Torpey, David, et al.
Published: (2024)
by: Torpey, David, et al.
Published: (2024)
Similar Items
-
An extremely coarse feedback signal is sufficient for learning human-aligned visual representations
by: Mehta, Yash, et al.
Published: (2026) -
Universal dimensions of visual representation
by: Chen, Zirui, et al.
Published: (2024) -
Rapidly deploying on-device eye tracking by distilling visual foundation models
by: Jiang, Cheng, et al.
Published: (2026) -
SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition
by: Chen, Shunpeng, et al.
Published: (2025) -
Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation
by: Weekley, Daron, et al.
Published: (2025)