Saved in:
| Main Authors: | Su, Yiyang, Liu, Xiaoming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification?
by: Zhu, Jie, et al.
Published: (2026)
by: Zhu, Jie, et al.
Published: (2026)
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
by: Su, Yiyang, et al.
Published: (2025)
by: Su, Yiyang, et al.
Published: (2025)
KeyPoint Relative Position Encoding for Face Recognition
by: Kim, Minchul, et al.
Published: (2024)
by: Kim, Minchul, et al.
Published: (2024)
SapiensID: Foundation for Human Recognition
by: Kim, Minchul, et al.
Published: (2025)
by: Kim, Minchul, et al.
Published: (2025)
Open-Set Biometrics: Beyond Good Closed-Set Models
by: Su, Yiyang, et al.
Published: (2024)
by: Su, Yiyang, et al.
Published: (2024)
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
by: Wang, Yikun, et al.
Published: (2025)
by: Wang, Yikun, et al.
Published: (2025)
U-Mind: A Unified Framework for Real-Time Multimodal Interaction with Audiovisual Generation
by: Deng, Xiang, et al.
Published: (2026)
by: Deng, Xiang, et al.
Published: (2026)
FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition
by: Zhu, Jie, et al.
Published: (2026)
by: Zhu, Jie, et al.
Published: (2026)
A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition
by: Zhu, Jie, et al.
Published: (2025)
by: Zhu, Jie, et al.
Published: (2025)
Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning
by: Zheng, Yushuo, et al.
Published: (2026)
by: Zheng, Yushuo, et al.
Published: (2026)
LocalScore: Local Density-Aware Similarity Scoring for Biometrics
by: Su, Yiyang, et al.
Published: (2026)
by: Su, Yiyang, et al.
Published: (2026)
Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
Statewide Visual Geolocalization in the Wild
by: Fervers, Florian, et al.
Published: (2024)
by: Fervers, Florian, et al.
Published: (2024)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning
by: Vyas, Apoorv, et al.
Published: (2025)
by: Vyas, Apoorv, et al.
Published: (2025)
Audiovisual Masked Autoencoders
by: Georgescu, Mariana-Iuliana, et al.
Published: (2022)
by: Georgescu, Mariana-Iuliana, et al.
Published: (2022)
GeoRC: A Benchmark for Geolocation Reasoning Chains
by: Talreja, Mohit, et al.
Published: (2026)
by: Talreja, Mohit, et al.
Published: (2026)
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
by: Chaubey, Ashutosh, et al.
Published: (2026)
by: Chaubey, Ashutosh, et al.
Published: (2026)
Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
by: Hashmi, Ammarah, et al.
Published: (2024)
by: Hashmi, Ammarah, et al.
Published: (2024)
Zwitscherkasten -- DIY Audiovisual bird monitoring
by: Blum, Dominik, et al.
Published: (2026)
by: Blum, Dominik, et al.
Published: (2026)
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
by: Guo, Xiao, et al.
Published: (2025)
by: Guo, Xiao, et al.
Published: (2025)
PIGEON: Predicting Image Geolocations
by: Haas, Lukas, et al.
Published: (2023)
by: Haas, Lukas, et al.
Published: (2023)
GaGA: Towards Interactive Global Geolocation Assistant
by: Dou, Zhiyang, et al.
Published: (2024)
by: Dou, Zhiyang, et al.
Published: (2024)
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
by: Gou, Yunhao, et al.
Published: (2025)
by: Gou, Yunhao, et al.
Published: (2025)
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
by: Liu, Yuqi, et al.
Published: (2025)
by: Liu, Yuqi, et al.
Published: (2025)
GeoRouter: Dynamic Paradigm Routing for Worldwide Image Geolocalization
by: Jia, Pengyue, et al.
Published: (2026)
by: Jia, Pengyue, et al.
Published: (2026)
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild
by: Wang, Zhiqiang, et al.
Published: (2024)
by: Wang, Zhiqiang, et al.
Published: (2024)
GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization
by: Jia, Pengyue, et al.
Published: (2025)
by: Jia, Pengyue, et al.
Published: (2025)
Granular Privacy Control for Geolocation with Vision Language Models
by: Mendes, Ethan, et al.
Published: (2024)
by: Mendes, Ethan, et al.
Published: (2024)
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
by: Pillai, Manu S, et al.
Published: (2024)
by: Pillai, Manu S, et al.
Published: (2024)
Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video
by: Li, Chenxing, et al.
Published: (2026)
by: Li, Chenxing, et al.
Published: (2026)
Referee: Reference-aware Audiovisual Deepfake Detection
by: Boo, Hyemin, et al.
Published: (2025)
by: Boo, Hyemin, et al.
Published: (2025)
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
by: Gadi, Hari Krishna, et al.
Published: (2026)
by: Gadi, Hari Krishna, et al.
Published: (2026)
Self-supervised Audiovisual Representation Learning for Remote Sensing Data
by: Heidler, Konrad, et al.
Published: (2021)
by: Heidler, Konrad, et al.
Published: (2021)
X-Streamer: Unified Human World Modeling with Audiovisual Interaction
by: Xie, You, et al.
Published: (2025)
by: Xie, You, et al.
Published: (2025)
Image-Based Geolocation Using Large Vision-Language Models
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales
by: Qian, Zhaofang, et al.
Published: (2025)
by: Qian, Zhaofang, et al.
Published: (2025)
Combi-CAM: A Novel Multi-Layer Approach for Explainable Image Geolocalization
by: Faget, David, et al.
Published: (2026)
by: Faget, David, et al.
Published: (2026)
Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models
by: Grainge, Oliver, et al.
Published: (2025)
by: Grainge, Oliver, et al.
Published: (2025)
Similar Items
-
Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification?
by: Zhu, Jie, et al.
Published: (2026) -
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
by: Su, Yiyang, et al.
Published: (2025) -
KeyPoint Relative Position Encoding for Face Recognition
by: Kim, Minchul, et al.
Published: (2024) -
SapiensID: Foundation for Human Recognition
by: Kim, Minchul, et al.
Published: (2025) -
Open-Set Biometrics: Beyond Good Closed-Set Models
by: Su, Yiyang, et al.
Published: (2024)