:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chaudhari, Shravan, Akula, Trilokya, Kim, Yoon, Blake, Tom
Format:	Preprint
Published:	2025
Subjects:	Human-Computer Interaction Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2504.12511
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GazeLLM: Multimodal LLMs incorporating Human Visual Attention
by: Rekimoto, Jun
Published: (2025)

On the Interpretability of Part-Prototype Based Classifiers: A Human Centric Analysis
by: Davoodi, Omid, et al.
Published: (2023)

MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding
by: Wei, Yuxiang, et al.
Published: (2025)

Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception
by: Shen, Junxiao, et al.
Published: (2023)

LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
by: Mushkani, Rashid, et al.
Published: (2025)

TraitSpaces: Towards Interpretable Visual Creativity for Human-AI Co-Creation
by: Luthra, Prerna
Published: (2025)

QPM: Discrete Optimization for Globally Interpretable Image Classification
by: Norrenbrock, Thomas, et al.
Published: (2025)

Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
by: Natalie, Rosiana, et al.
Published: (2025)

Evaluating Visual Prompts with Eye-Tracking Data for MLLM-Based Human Activity Recognition
by: Choi, Jae Young, et al.
Published: (2026)

ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale
by: Huang, Jinbin, et al.
Published: (2024)

Generalization of CNNs on Relational Reasoning with Bar Charts
by: Cui, Zhenxing, et al.
Published: (2025)

Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation
by: Seo, Kyungjin, et al.
Published: (2024)

Agile Deliberation: Concept Deliberation for Subjective Visual Classification
by: Wang, Leijie, et al.
Published: (2025)

VILOD: A Visual Interactive Labeling Tool for Object Detection
by: Holm, Isac
Published: (2025)

Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
by: Li, Aaron J., et al.
Published: (2023)

Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics
by: Jha, Saurav, et al.
Published: (2025)

AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
by: Ye, Yilin, et al.
Published: (2025)

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
by: Wu, Hang, et al.
Published: (2025)

Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual Drivers
by: Gomaa, Amr, et al.
Published: (2024)

Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
by: Zhou, Honglu, et al.
Published: (2025)

Zero-shot Emotion Annotation in Facial Images Using Large Multimodal Models: Benchmarking and Prospects for Multi-Class, Multi-Frame Approaches
by: Zhang, He, et al.
Published: (2025)

AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
by: Park, Se Jin, et al.
Published: (2024)

Facial Analysis Systems and Down Syndrome
by: Rondina, Marco, et al.
Published: (2025)

Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge
by: LaBella, Dominic, et al.
Published: (2024)

Modeling Subjective Urban Perception with Human Gaze
by: Che, Lin, et al.
Published: (2026)

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
by: Ki, Taekyung, et al.
Published: (2026)

MP-GUI: Modality Perception with MLLMs for GUI Understanding
by: Wang, Ziwei, et al.
Published: (2025)

Object Recognition in Human Computer Interaction:- A Comparative Analysis
by: Ranade, Kaushik, et al.
Published: (2024)

A Foundational Generative Model for Breast Ultrasound Image Analysis
by: Yu, Haojun, et al.
Published: (2025)

Magma: A Foundation Model for Multimodal AI Agents
by: Yang, Jianwei, et al.
Published: (2025)

Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
by: Schoop, Eldon, et al.
Published: (2022)

Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video
by: Rajabi, Mohammad Sadra, et al.
Published: (2026)

AIN: The Arabic INclusive Large Multimodal Model
by: Heakl, Ahmed, et al.
Published: (2025)

MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction
by: Parab, Mithun, et al.
Published: (2024)

Generative Augmented Reality: Paradigms, Technologies, and Future Applications
by: Liang, Chen, et al.
Published: (2025)

How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception
by: Shahzad, Sahibzada Adil, et al.
Published: (2024)

Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network
by: Xiong, Tianyu, et al.
Published: (2024)

BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
by: Rondelli, Massimo, et al.
Published: (2026)

Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis
by: Mannam, Varun, et al.
Published: (2025)

Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
by: Polavaram, Sridevi, et al.
Published: (2025)