:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Jun, Lin, Yi, Huang, Zilong, Yin, Jiacong, Ye, Junyan, Zhou, Yuchuan, Li, Weijia, Zhang, Xiang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.22228
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
by: Huang, Zilong, et al.
Published: (2025)

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
by: Yan, Zhiyuan, et al.
Published: (2025)

MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts
by: Huang, Zilong, et al.
Published: (2025)

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
by: Zhou, Baichuan, et al.
Published: (2024)

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
by: Ye, Junyan, et al.
Published: (2025)

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
by: Ye, Junyan, et al.
Published: (2024)

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
by: Li, Weijia, et al.
Published: (2024)

GenClaw: Code-Driven Agentic Image Generation
by: Ye, Junyan, et al.
Published: (2026)

The Less Meaningful the Understanding, the Faster the Feeling: Speech Comprehension Changes Perceptual Speech Tempo
by: Liangjie Chen, et al.
Published: (2025)

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
by: He, Jun, et al.
Published: (2026)

SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors
by: Fan, Ruijie, et al.
Published: (2025)

Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
by: Ye, Junyan, et al.
Published: (2024)

Do MLLMs Exhibit Human-like Perceptual Behaviors? HVSBench: A Benchmark for MLLM Alignment with Human Perceptual Behavior
by: Lin, Jiaying, et al.
Published: (2024)

FakeVLM-R1: Internalizing Physical Laws via CoT for Synthetic Image Detection
by: Zhu, Leqi, et al.
Published: (2026)

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling
by: Sun, Jiacong, et al.
Published: (2024)

RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension
by: Gao, Tianyi, et al.
Published: (2025)

Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
by: Shen, Lingdong, et al.
Published: (2024)

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
by: Ye, Junyan, et al.
Published: (2025)

RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments
by: Fu, Yuchuan, et al.
Published: (2025)

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
by: Ye, Junyan, et al.
Published: (2025)

3D Question Answering for City Scene Understanding
by: Sun, Penglei, et al.
Published: (2024)

A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
by: Jiang, Siyang, et al.
Published: (2025)

TrueCity: Real and Simulated Urban Data for Cross-Domain 3D Scene Understanding
by: Nguyen, Duc, et al.
Published: (2025)

ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning
by: Luu, Quan Khanh, et al.
Published: (2025)

LibCity: A Unified Library Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction
by: Jiang, Jiawei, et al.
Published: (2023)

Reference-based Controllable Scene Stylization with Gaussian Splatting
by: Mei, Yiqun, et al.
Published: (2024)

SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting
by: Ma, Yunshan, et al.
Published: (2023)

Where am I? Cross-View Geo-localization with Natural Language Descriptions
by: Ye, Junyan, et al.
Published: (2024)

SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia
by: Yue, Pengfei, et al.
Published: (2026)

Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting
by: Zhou, Hongbi, et al.
Published: (2025)

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
by: De, Anik, et al.
Published: (2025)

IntentGrasp: A Comprehensive Benchmark for Intent Understanding
by: Yin, Yuwei, et al.
Published: (2026)

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
by: Ye, Junyan, et al.
Published: (2024)

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
by: Ye, Junyan, et al.
Published: (2024)

OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild
by: Guo, Yuncheng, et al.
Published: (2025)

Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights
by: Hashmi, Ammarah, et al.
Published: (2024)

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
by: Li, Qingmei, et al.
Published: (2025)

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding
by: Shi, Shuyao, et al.
Published: (2026)

OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
by: Zhao, Youjun, et al.
Published: (2024)

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model
by: Li, Sijing, et al.
Published: (2025)