:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Duan, Yinglin, Zou, Zhengxia, Gu, Tongwei, Jia, Wei, Zhao, Zhan, Xu, Luyi, Liu, Xinzhu, Lin, Yenan, Jiang, Hao, Chen, Kang, Qiu, Shuang
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2509.05263
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

WorldGPT: Empowering LLM as Multimodal World Model
by: Ge, Zhiqi, et al.
Published: (2024)

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling
by: Cao, Jinqi, et al.
Published: (2026)

TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis
by: Kang, Jiaming, et al.
Published: (2025)

BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting
by: Wu, Yongchang, et al.
Published: (2025)

Code2Worlds: Empowering Coding LLMs for 4D World Generation
by: Zhang, Yi, et al.
Published: (2026)

The Moderating Effect of Informal Institutions: Clans and Straw Burning in China
by: Liang Tang, et al.
Published: (2025)

MicroWorld: Empowering Multimodal Large Language Models to Bridge the Microscopic Domain Gap with Multimodal Attribute Graph
by: Li, Manyu, et al.
Published: (2026)

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction
by: Liu, Chengzhi, et al.
Published: (2026)

Unbiased Dynamic Multimodal Fusion
by: Wei, Shicai, et al.
Published: (2026)

MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models
by: Guo, Zile, et al.
Published: (2026)

Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation
by: Qi, Zipeng, et al.
Published: (2024)

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
by: Sun, Zhihao, et al.
Published: (2024)

Synergistic double‐doped elastic composites for durable, ultra‐flexible sign language translation sensors
by: Tongshun Wu, et al.
Published: (2025)

Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
by: Liu, Chenyang, et al.
Published: (2024)

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
by: Zhou, Runjie, et al.
Published: (2026)

The Complex and Challenging World of the Host–Pathogen Interaction
by: Marcel I. Ramirez
Published: (2024)

Spatial-Temporal Human-Object Interaction Detection
by: Sun, Xu, et al.
Published: (2025)

SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations
by: Wu, Jason, et al.
Published: (2026)

BiTAgent: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models
by: Zhan, Yu-Wei, et al.
Published: (2025)

OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects
by: Qiu, Wenmo, et al.
Published: (2024)

WSSM: Geographic-enhanced hierarchical state-space model for global station weather forecast
by: Yang, Songru, et al.
Published: (2025)

CloudMamba: An Uncertainty-Guided Dual-Scale Mamba Network for Cloud Detection in Remote Sensing Imagery
by: Yang, Jiajun, et al.
Published: (2026)

Empowering Multi-Robot Cooperation via Sequential World Models
by: Zhao, Zijie, et al.
Published: (2025)

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
by: Li, Teng, et al.
Published: (2025)

A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems
by: Chen, Zhiqiu, et al.
Published: (2024)

Kolmogorov Arnold Neural Interpolator for Downscaling and Correcting Meteorological Fields from In-Situ Observations
by: Liu, Zili, et al.
Published: (2025)

CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection
by: Zhang, Haotian, et al.
Published: (2024)

FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
by: Zhang, Haotian, et al.
Published: (2025)

Matrix-Game: Interactive World Foundation Model
by: Zhang, Yifan, et al.
Published: (2025)

Phase Repair for Time-Domain Convolutional Neural Networks in Music Super-Resolution
by: Zhang, Yenan, et al.
Published: (2023)

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
by: Zhu, Muzhi, et al.
Published: (2025)

Olaf-World: Orienting Latent Actions for Video World Modeling
by: Jiang, Yuxin, et al.
Published: (2026)

GigaWorld-0: World Models as Data Engine to Empower Embodied AI
by: GigaWorld Team, et al.
Published: (2025)

Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement
by: Zhang, Beibei, et al.
Published: (2025)

Les Dissonances: Cross-Tool Harvesting and Polluting in Pool-of-Tools Empowered LLM Agents
by: Li, Zichuan, et al.
Published: (2025)

Lattice Boltzmann Model for Learning Real-World Pixel Dynamicity
by: Zheng, Guangze, et al.
Published: (2025)

WorldMark: A Unified Benchmark Suite for Interactive Video World Models
by: Xu, Xiaojie, et al.
Published: (2026)

OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model
by: Zhang, Zhenhao, et al.
Published: (2025)

Empowering Teachers to Build a Better World
Published: (2020)

MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
by: Guo, Junliang, et al.
Published: (2025)