:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Xiaoyi, Tang, Hao
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2506.03173
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Towards Enhanced Image Generation Via Multi-modal Chain of Thought in Unified Generative Models
von: Wang, Yi, et al.
Veröffentlicht: (2025)

WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation
von: Joshi, Amogh, et al.
Veröffentlicht: (2026)

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
von: Lu, Yifan, et al.
Veröffentlicht: (2024)

Thinking Ahead: Foresight Intelligence in MLLMs and World Models
von: Gong, Zhantao, et al.
Veröffentlicht: (2025)

XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence
von: Ghahfarokhi, Sepehr Salem, et al.
Veröffentlicht: (2026)

PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World
von: Wang, Changpeng, et al.
Veröffentlicht: (2026)

Simulating the Visual World with Artificial Intelligence: A Roadmap
von: Yue, Jingtong, et al.
Veröffentlicht: (2025)

Chain of World: World Model Thinking in Latent Motion
von: Yang, Fuxiang, et al.
Veröffentlicht: (2026)

Pandora: Towards General World Model with Natural Language Actions and Video States
von: Xiang, Jiannan, et al.
Veröffentlicht: (2024)

LatXGen: Towards Radiation-Free and Accurate Quantitative Analysis of Sagittal Spinal Alignment Via Cross-Modal Radiographic View Synthesis
von: Zhao, Moxin, et al.
Veröffentlicht: (2025)

Deepfake Detection Via Facial Feature Extraction and Modeling
von: Carter, Benjamin, et al.
Veröffentlicht: (2025)

Interpreting Physics in Video World Models
von: Joseph, Sonia, et al.
Veröffentlicht: (2026)

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
von: Li, Wenqiao, et al.
Veröffentlicht: (2025)

3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors
von: Liu, Xi, et al.
Veröffentlicht: (2024)

CityX: Controllable Procedural Content Generation for Unbounded 3D Cities
von: Zhang, Shougao, et al.
Veröffentlicht: (2024)

Cabbage: A Differential Growth Framework for Open Surfaces
von: Liu, Xiaoyi, et al.
Veröffentlicht: (2025)

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
von: Gao, Qiyue, et al.
Veröffentlicht: (2025)

Application of Multimodal Fusion Deep Learning Model in Disease Recognition
von: Liu, Xiaoyi, et al.
Veröffentlicht: (2024)

ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
von: Chang, Jiahao, et al.
Veröffentlicht: (2025)

Robot Learning from a Physical World Model
von: Mao, Jiageng, et al.
Veröffentlicht: (2025)

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries
von: Dang, Chenxu, et al.
Veröffentlicht: (2025)

World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
von: Lin, Zuyao, et al.
Veröffentlicht: (2026)

Towards Ancient Plant Seed Classification: A Benchmark Dataset and Baseline Model
von: Xing, Rui, et al.
Veröffentlicht: (2025)

EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy
von: Yu, Yichun, et al.
Veröffentlicht: (2025)

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
von: Gu, Jing, et al.
Veröffentlicht: (2025)

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
von: Tang, Zuojin, et al.
Veröffentlicht: (2026)

DexWorldModel: Causal Latent World Modeling towards Automated Learning of Embodied Tasks
von: Deng, Yueci, et al.
Veröffentlicht: (2026)

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
von: Li, Hengzhi, et al.
Veröffentlicht: (2025)

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
von: Feng, Hao, et al.
Veröffentlicht: (2023)

CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs
von: Huang, Xiaoyi, et al.
Veröffentlicht: (2026)

Towards Efficient and Intelligent Laser Weeding: Method and Dataset for Weed Stem Detection
von: Liu, Dingning, et al.
Veröffentlicht: (2025)

Foundation Models -- A Panacea for Artificial Intelligence in Pathology?
von: Mulliqi, Nita, et al.
Veröffentlicht: (2025)

Learning Vision-Language-Action World Models for Autonomous Driving
von: Wang, Guoqing, et al.
Veröffentlicht: (2026)

SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
von: Wu, Haoning, et al.
Veröffentlicht: (2025)

Bridging the Gap: Toward Cognitive Autonomy in Artificial Intelligence
von: Golilarz, Noorbakhsh Amiri, et al.
Veröffentlicht: (2025)

RenderWorld: World Model with Self-Supervised 3D Label
von: Yan, Ziyang, et al.
Veröffentlicht: (2024)

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
von: Lu, Guanxing, et al.
Veröffentlicht: (2025)

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
von: Lin, Wang, et al.
Veröffentlicht: (2026)

Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment
von: Tang, Jinzhou, et al.
Veröffentlicht: (2025)

Mirage2Matter: A Physically Grounded Gaussian World Model from Video
von: Gao, Zhengqing, et al.
Veröffentlicht: (2026)