:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Howell, Anthony, Wu, Nancy, Bagchi, Sharmistha, Kim, Yushim, Sun, Chayn
Format:	Preprint
Published:	2025
Subjects:	Computers and Society Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.15132
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Reproducible Workflow for Scraping, Structuring, and Segmenting Legacy Archaeological Artifact Images
by: Palomeque-Gonzalez, Juan
Published: (2025)

Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai
by: Tan, Kaizhen, et al.
Published: (2025)

TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis
by: Ji, Chunhou, et al.
Published: (2025)

BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
by: Tan, Bryan Chen Zhengyu, et al.
Published: (2025)

PLAS-Net: Pixel-Level Area Segmentation for UAV-Based Beach Litter Monitoring
by: Liu, Yongying, et al.
Published: (2026)

BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode
by: Li, Zongrong, et al.
Published: (2024)

MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes
by: Liu, Liu, et al.
Published: (2025)

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing
by: DeAndres-Tame, Ivan, et al.
Published: (2024)

Predicting Local Climate Zones using Urban Morphometrics and Satellite Imagery
by: Majer, Hugo, et al.
Published: (2026)

Deep Umbra: A Generative Approach for Sunlight Access Computation in Urban Spaces
by: Omar, Kazi Shahrukh, et al.
Published: (2024)

How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits
by: Tang, Jason, et al.
Published: (2026)

From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics
by: Cupini, Paolo, et al.
Published: (2026)

LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
by: Zhang, Jian, et al.
Published: (2025)

Ethical Considerations for the Military Use of Artificial Intelligence in Visual Reconnaissance
by: Anneken, Mathias, et al.
Published: (2025)

ERIT Lightweight Multimodal Dataset for Elderly Emotion Recognition and Multimodal Fusion Evaluation
by: Frieske, Rita, et al.
Published: (2024)

AI-Generated Figures in Academic Publishing: Policies, Tools, and Practical Guidelines
by: Chen, Davie
Published: (2026)

Urban Socio-Semantic Segmentation with Vision-Language Reasoning
by: Wang, Yu, et al.
Published: (2026)

Training-Free Multimodal Deepfake Detection via Graph Reasoning
by: Liu, Yuxin, et al.
Published: (2025)

Seeing Candidates at Scale: Multimodal LLMs for Visual Political Communication on Instagram
by: Achmann-Denkler, Michael, et al.
Published: (2026)

Diagnosing Urban Street Vitality via a Visual-Semantic and Spatiotemporal Framework for Street-Level Economics
by: Zhuo, Xinxin, et al.
Published: (2026)

Recovering Parametric Scenes from Very Few Time-of-Flight Pixels
by: Sifferman, Carter, et al.
Published: (2025)

Surgeons Awareness, Expectations, and Involvement with Artificial Intelligence: a Survey Pre and Post the GPT Era
by: Arboit, Lorenzo, et al.
Published: (2025)

Learning Multimodal Cues of Children's Uncertainty
by: Cheng, Qi, et al.
Published: (2024)

Vitamin N: Benefits of Different Forms of Public Greenery for Urban Health
by: Šćepanović, Sanja, et al.
Published: (2025)

Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
by: Tami, Mohammad Abu, et al.
Published: (2024)

From Review to Design: Ethical Multimodal Driver Monitoring Systems for Risk Mitigation, Incident Response, and Accountability in Automated Vehicles
by: Khana, Bilal, et al.
Published: (2026)

From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models
by: Yang, Cheng, et al.
Published: (2026)

Two Stage Context Learning with Large Language Models for Multimodal Stance Detection on Climate Change
by: Pangtey, Lata, et al.
Published: (2025)

AI's Blind Spots: Geographic Knowledge and Diversity Deficit in Generated Urban Scenario
by: Beneduce, Ciro, et al.
Published: (2025)

Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
by: Jiang, Roy, et al.
Published: (2026)

EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
by: Sun, Weiyu, et al.
Published: (2026)

Restoring Ancient Ideograph: A Multimodal Multitask Neural Network Approach
by: Duan, Siyu, et al.
Published: (2024)

Do Street View Imagery and Public Participation GIS align: Comparative Analysis of Urban Attractiveness
by: Malekzadeh, Milad, et al.
Published: (2025)

MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms
by: Jin, Yiqiao, et al.
Published: (2024)

A High Resolution Urban and Rural Settlement Map of Africa Using Deep Learning and Satellite Imagery
by: Kakooei, Mohammad, et al.
Published: (2024)

Monitoring of Urban Changes with multi-modal Sentinel 1 and 2 Data in Mariupol, Ukraine, in 2022/23
by: Zitzlsberger, Georg, et al.
Published: (2023)

Multimodal Political Bias Identification and Neutralization
by: Bernard, Cedric, et al.
Published: (2025)

Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study
by: Washington, Peter, et al.
Published: (2020)

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
by: Si, Chenglei, et al.
Published: (2024)

No One Knows the State of the Art in Geospatial Foundation Models
by: Corley, Isaac, et al.
Published: (2026)