Saved in:
| Main Authors: | Chaudhari, Shravan, Akula, Trilokya, Kim, Yoon, Blake, Tom |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.12511 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GazeLLM: Multimodal LLMs incorporating Human Visual Attention
by: Rekimoto, Jun
Published: (2025)
by: Rekimoto, Jun
Published: (2025)
On the Interpretability of Part-Prototype Based Classifiers: A Human Centric Analysis
by: Davoodi, Omid, et al.
Published: (2023)
by: Davoodi, Omid, et al.
Published: (2023)
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding
by: Wei, Yuxiang, et al.
Published: (2025)
by: Wei, Yuxiang, et al.
Published: (2025)
Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception
by: Shen, Junxiao, et al.
Published: (2023)
by: Shen, Junxiao, et al.
Published: (2023)
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
by: Mushkani, Rashid, et al.
Published: (2025)
by: Mushkani, Rashid, et al.
Published: (2025)
TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
by: Luthra, Prerna
Published: (2025)
by: Luthra, Prerna
Published: (2025)
QPM: Discrete Optimization for Globally Interpretable Image Classification
by: Norrenbrock, Thomas, et al.
Published: (2025)
by: Norrenbrock, Thomas, et al.
Published: (2025)
Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
by: Natalie, Rosiana, et al.
Published: (2025)
by: Natalie, Rosiana, et al.
Published: (2025)
Evaluating Visual Prompts with Eye-Tracking Data for MLLM-Based Human Activity Recognition
by: Choi, Jae Young, et al.
Published: (2026)
by: Choi, Jae Young, et al.
Published: (2026)
ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale
by: Huang, Jinbin, et al.
Published: (2024)
by: Huang, Jinbin, et al.
Published: (2024)
Generalization of CNNs on Relational Reasoning with Bar Charts
by: Cui, Zhenxing, et al.
Published: (2025)
by: Cui, Zhenxing, et al.
Published: (2025)
Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation
by: Seo, Kyungjin, et al.
Published: (2024)
by: Seo, Kyungjin, et al.
Published: (2024)
Agile Deliberation: Concept Deliberation for Subjective Visual Classification
by: Wang, Leijie, et al.
Published: (2025)
by: Wang, Leijie, et al.
Published: (2025)
VILOD: A Visual Interactive Labeling Tool for Object Detection
by: Holm, Isac
Published: (2025)
by: Holm, Isac
Published: (2025)
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
by: Li, Aaron J., et al.
Published: (2023)
by: Li, Aaron J., et al.
Published: (2023)
Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
by: Jha, Saurav, et al.
Published: (2025)
by: Jha, Saurav, et al.
Published: (2025)
AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
by: Ye, Yilin, et al.
Published: (2025)
by: Ye, Yilin, et al.
Published: (2025)
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
by: Wu, Hang, et al.
Published: (2025)
by: Wu, Hang, et al.
Published: (2025)
Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual Drivers
by: Gomaa, Amr, et al.
Published: (2024)
by: Gomaa, Amr, et al.
Published: (2024)
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
by: Zhou, Honglu, et al.
Published: (2025)
by: Zhou, Honglu, et al.
Published: (2025)
Zero-shot Emotion Annotation in Facial Images Using Large Multimodal Models: Benchmarking and Prospects for Multi-Class, Multi-Frame Approaches
by: Zhang, He, et al.
Published: (2025)
by: Zhang, He, et al.
Published: (2025)
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
by: Park, Se Jin, et al.
Published: (2024)
by: Park, Se Jin, et al.
Published: (2024)
Facial Analysis Systems and Down Syndrome
by: Rondina, Marco, et al.
Published: (2025)
by: Rondina, Marco, et al.
Published: (2025)
Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge
by: LaBella, Dominic, et al.
Published: (2024)
by: LaBella, Dominic, et al.
Published: (2024)
Modeling Subjective Urban Perception with Human Gaze
by: Che, Lin, et al.
Published: (2026)
by: Che, Lin, et al.
Published: (2026)
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
by: Ki, Taekyung, et al.
Published: (2026)
by: Ki, Taekyung, et al.
Published: (2026)
MP-GUI: Modality Perception with MLLMs for GUI Understanding
by: Wang, Ziwei, et al.
Published: (2025)
by: Wang, Ziwei, et al.
Published: (2025)
Object Recognition in Human Computer Interaction:- A Comparative Analysis
by: Ranade, Kaushik, et al.
Published: (2024)
by: Ranade, Kaushik, et al.
Published: (2024)
A Foundational Generative Model for Breast Ultrasound Image Analysis
by: Yu, Haojun, et al.
Published: (2025)
by: Yu, Haojun, et al.
Published: (2025)
Magma: A Foundation Model for Multimodal AI Agents
by: Yang, Jianwei, et al.
Published: (2025)
by: Yang, Jianwei, et al.
Published: (2025)
Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
by: Schoop, Eldon, et al.
Published: (2022)
by: Schoop, Eldon, et al.
Published: (2022)
Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video
by: Rajabi, Mohammad Sadra, et al.
Published: (2026)
by: Rajabi, Mohammad Sadra, et al.
Published: (2026)
AIN: The Arabic INclusive Large Multimodal Model
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction
by: Parab, Mithun, et al.
Published: (2024)
by: Parab, Mithun, et al.
Published: (2024)
Generative Augmented Reality: Paradigms, Technologies, and Future Applications
by: Liang, Chen, et al.
Published: (2025)
by: Liang, Chen, et al.
Published: (2025)
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception
by: Shahzad, Sahibzada Adil, et al.
Published: (2024)
by: Shahzad, Sahibzada Adil, et al.
Published: (2024)
Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network
by: Xiong, Tianyu, et al.
Published: (2024)
by: Xiong, Tianyu, et al.
Published: (2024)
BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
by: Rondelli, Massimo, et al.
Published: (2026)
by: Rondelli, Massimo, et al.
Published: (2026)
Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis
by: Mannam, Varun, et al.
Published: (2025)
by: Mannam, Varun, et al.
Published: (2025)
Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
by: Polavaram, Sridevi, et al.
Published: (2025)
by: Polavaram, Sridevi, et al.
Published: (2025)
Similar Items
-
GazeLLM: Multimodal LLMs incorporating Human Visual Attention
by: Rekimoto, Jun
Published: (2025) -
On the Interpretability of Part-Prototype Based Classifiers: A Human Centric Analysis
by: Davoodi, Omid, et al.
Published: (2023) -
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding
by: Wei, Yuxiang, et al.
Published: (2025) -
Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception
by: Shen, Junxiao, et al.
Published: (2023) -
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
by: Mushkani, Rashid, et al.
Published: (2025)