Saved in:
| Main Authors: | Zhang, Jiaxin, Li, Yunqin, Fukuda, Tomohiro, Wang, Bowen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.19719 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Modal Feature Fusion for Spatial Morphology Analysis of Traditional Villages via Hierarchical Graph Neural Networks
by: Zhang, Jiaxin, et al.
Published: (2025)
by: Zhang, Jiaxin, et al.
Published: (2025)
BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode
by: Li, Zongrong, et al.
Published: (2024)
by: Li, Zongrong, et al.
Published: (2024)
Modeling Urban Food Insecurity with Google Street View Images
by: Li, David
Published: (2025)
by: Li, David
Published: (2025)
Interpretable Multimodal Framework for Human-Centered Street Assessment: Integrating Visual-Language Models for Perceptual Urban Diagnostics
by: Lan, HaoTian
Published: (2025)
by: Lan, HaoTian
Published: (2025)
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
by: Wang, Ziyue, et al.
Published: (2024)
by: Wang, Ziyue, et al.
Published: (2024)
UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images
by: Tan, Kaizhen, et al.
Published: (2026)
by: Tan, Kaizhen, et al.
Published: (2026)
From Street View to Visual Network: Mapping the Visibility of Urban Landmarks with Vision-Language Models
by: Fan, Zicheng, et al.
Published: (2025)
by: Fan, Zicheng, et al.
Published: (2025)
Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images
by: Yu, Jiaxi, et al.
Published: (2024)
by: Yu, Jiaxi, et al.
Published: (2024)
CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series
by: Huang, Tianyuan, et al.
Published: (2024)
by: Huang, Tianyuan, et al.
Published: (2024)
Eyes on the Streets: Leveraging Street-Level Imaging to Model Urban Crime Dynamics
by: Qi, Zhixuan, et al.
Published: (2024)
by: Qi, Zhixuan, et al.
Published: (2024)
An Integrated Causal Inference Framework for Traffic Safety Modeling with Semantic Street-View Visual Features
by: Sun, Lishan, et al.
Published: (2026)
by: Sun, Lishan, et al.
Published: (2026)
Safety of Multimodal Large Language Models on Images and Texts
by: Liu, Xin, et al.
Published: (2024)
by: Liu, Xin, et al.
Published: (2024)
Seeing through Satellite Images at Street Views
by: Qian, Ming, et al.
Published: (2025)
by: Qian, Ming, et al.
Published: (2025)
GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models
by: Zhang, Jiaxin, et al.
Published: (2026)
by: Zhang, Jiaxin, et al.
Published: (2026)
Semantic4Safety: Causal Insights from Zero-shot Street View Imagery Segmentation for Urban Road Safety
by: Chen, Huan, et al.
Published: (2025)
by: Chen, Huan, et al.
Published: (2025)
DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery
by: Yang, Yifan, et al.
Published: (2026)
by: Yang, Yifan, et al.
Published: (2026)
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
by: Yan, Yunzhi, et al.
Published: (2024)
by: Yan, Yunzhi, et al.
Published: (2024)
Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
by: Wu, Sijing, et al.
Published: (2026)
by: Wu, Sijing, et al.
Published: (2026)
MMaDA: Multimodal Large Diffusion Language Models
by: Yang, Ling, et al.
Published: (2025)
by: Yang, Ling, et al.
Published: (2025)
StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting
by: Cui, Xiao, et al.
Published: (2024)
by: Cui, Xiao, et al.
Published: (2024)
StreetView-Waste: A Multi-Task Dataset for Urban Waste Management
by: Paulo, Diogo J., et al.
Published: (2025)
by: Paulo, Diogo J., et al.
Published: (2025)
Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery
by: Yao, Siyuan, et al.
Published: (2026)
by: Yao, Siyuan, et al.
Published: (2026)
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
by: Wang, Hanqing, et al.
Published: (2025)
by: Wang, Hanqing, et al.
Published: (2025)
Assessment of Multimodal Large Language Models in Alignment with Human Values
by: Shi, Zhelun, et al.
Published: (2024)
by: Shi, Zhelun, et al.
Published: (2024)
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
by: Wu, Tianhe, et al.
Published: (2024)
by: Wu, Tianhe, et al.
Published: (2024)
NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation
by: Liu, Youzhi, et al.
Published: (2024)
by: Liu, Youzhi, et al.
Published: (2024)
ZenSVI: An Open-Source Software for the Integrated Acquisition, Processing and Analysis of Street View Imagery Towards Scalable Urban Science
by: Ito, Koichi, et al.
Published: (2024)
by: Ito, Koichi, et al.
Published: (2024)
Street-View Image Generation from a Bird's-Eye View Layout
by: Swerdlow, Alexander, et al.
Published: (2023)
by: Swerdlow, Alexander, et al.
Published: (2023)
FaceInsight: A Multimodal Large Language Model for Face Perception
by: Li, Jingzhi, et al.
Published: (2025)
by: Li, Jingzhi, et al.
Published: (2025)
Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery
by: Yang, Zhenyuan, et al.
Published: (2024)
by: Yang, Zhenyuan, et al.
Published: (2024)
CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
by: Li, Weijia, et al.
Published: (2024)
by: Li, Weijia, et al.
Published: (2024)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models
by: Zhu, Ruishu, et al.
Published: (2025)
by: Zhu, Ruishu, et al.
Published: (2025)
SVIA: A Street View Image Anonymization Framework for Self-Driving Applications
by: Liu, Dongyu, et al.
Published: (2025)
by: Liu, Dongyu, et al.
Published: (2025)
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)
by: Li, Ling, et al.
Published: (2024)
Diagnosing Urban Street Vitality via a Visual-Semantic and Spatiotemporal Framework for Street-Level Economics
by: Zhuo, Xinxin, et al.
Published: (2026)
by: Zhuo, Xinxin, et al.
Published: (2026)
DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition
by: Li, Ji, et al.
Published: (2026)
by: Li, Ji, et al.
Published: (2026)
Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
by: Chen, Zheng, et al.
Published: (2024)
by: Chen, Zheng, et al.
Published: (2024)
Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning
by: Li, Weijia, et al.
Published: (2024)
by: Li, Weijia, et al.
Published: (2024)
Efficient Depth-Guided Urban View Synthesis
by: Miao, Sheng, et al.
Published: (2024)
by: Miao, Sheng, et al.
Published: (2024)
Similar Items
-
Multi-Modal Feature Fusion for Spatial Morphology Analysis of Traditional Villages via Hierarchical Graph Neural Networks
by: Zhang, Jiaxin, et al.
Published: (2025) -
BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode
by: Li, Zongrong, et al.
Published: (2024) -
Modeling Urban Food Insecurity with Google Street View Images
by: Li, David
Published: (2025) -
Interpretable Multimodal Framework for Human-Centered Street Assessment: Integrating Visual-Language Models for Perceptual Urban Diagnostics
by: Lan, HaoTian
Published: (2025) -
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
by: Wang, Ziyue, et al.
Published: (2024)