Saved in:
| Main Authors: | Wei, Xiaojing, Zhang, Ting, He, Wei, Wang, Jingdong, Huang, Hua |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.08180 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MagicGeo: Training-Free Text-Guided Geometric Diagram Generation
by: Wang, Junxiao, et al.
Published: (2025)
by: Wang, Junxiao, et al.
Published: (2025)
On Multi-Step Theorem Prediction via Non-Parametric Structural Priors
by: Zhao, Junbo, et al.
Published: (2026)
by: Zhao, Junbo, et al.
Published: (2026)
Loom: Diffusion-Transformer for Interleaved Generation
by: Ye, Mingcheng, et al.
Published: (2025)
by: Ye, Mingcheng, et al.
Published: (2025)
TrajLoom: Dense Future Trajectory Generation from Video
by: Zhang, Zewei, et al.
Published: (2026)
by: Zhang, Zewei, et al.
Published: (2026)
Beyond the Textual: Generating Coherent Visual Options for MCQs
by: Wang, Wanqiang, et al.
Published: (2025)
by: Wang, Wanqiang, et al.
Published: (2025)
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs
by: Zhang, Yiman, et al.
Published: (2025)
by: Zhang, Yiman, et al.
Published: (2025)
Low-Biased General Annotated Dataset Generation
by: Jiang, Dengyang, et al.
Published: (2024)
by: Jiang, Dengyang, et al.
Published: (2024)
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)
by: Shi, Jiapeng, et al.
Published: (2026)
LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving
by: Federico, Giulio, et al.
Published: (2025)
by: Federico, Giulio, et al.
Published: (2025)
Diagram-Driven Course Questions Generation
by: Zhang, Xinyu, et al.
Published: (2024)
by: Zhang, Xinyu, et al.
Published: (2024)
Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction
by: Jiao, Zhengbo, et al.
Published: (2026)
by: Jiao, Zhengbo, et al.
Published: (2026)
Historical Astronomical Diagrams Decomposition in Geometric Primitives
by: Kalleli, Syrine, et al.
Published: (2024)
by: Kalleli, Syrine, et al.
Published: (2024)
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
by: Zheng, Linfang, et al.
Published: (2024)
by: Zheng, Linfang, et al.
Published: (2024)
Bringing Textual Prompt to AI-Generated Image Quality Assessment
by: Qu, Bowen, et al.
Published: (2024)
by: Qu, Bowen, et al.
Published: (2024)
GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
by: Zhang, Chengrui, et al.
Published: (2025)
by: Zhang, Chengrui, et al.
Published: (2025)
DISPLAY: Directable Human-Object Interaction Video Generation via Sparse Motion Guidance and Multi-Task Auxiliary
by: Guan, Jiazhi, et al.
Published: (2026)
by: Guan, Jiazhi, et al.
Published: (2026)
GeoVideo: Introducing Geometric Regularization into Video Generation Model
by: Bai, Yunpeng, et al.
Published: (2025)
by: Bai, Yunpeng, et al.
Published: (2025)
TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution Detection
by: Chen, Jiankang, et al.
Published: (2024)
by: Chen, Jiankang, et al.
Published: (2024)
BGG: Bridging the Geometric Gap between Cross-View images by Vision Foundation Model Adaptation for Geo-Localization
by: Wang, Wei, et al.
Published: (2026)
by: Wang, Wei, et al.
Published: (2026)
Are Classification Robustness and Explanation Robustness Really Strongly Correlated? An Analysis Through Input Loss Landscape
by: Chen, Tiejin, et al.
Published: (2024)
by: Chen, Tiejin, et al.
Published: (2024)
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
by: Huang, Runhui, et al.
Published: (2024)
by: Huang, Runhui, et al.
Published: (2024)
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
by: Jiang, Dengyang, et al.
Published: (2025)
by: Jiang, Dengyang, et al.
Published: (2025)
Textual Inversion and Self-supervised Refinement for Radiology Report Generation
by: Luo, Yuanjiang, et al.
Published: (2024)
by: Luo, Yuanjiang, et al.
Published: (2024)
CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks
by: Qi, Yu, et al.
Published: (2025)
by: Qi, Yu, et al.
Published: (2025)
Visual Textualization for Image Prompted Object Detection
by: Wu, Yongjian, et al.
Published: (2025)
by: Wu, Yongjian, et al.
Published: (2025)
Improving Image Restoration through Removing Degradations in Textual Representations
by: Lin, Jingbo, et al.
Published: (2023)
by: Lin, Jingbo, et al.
Published: (2023)
TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion
by: Zhu, Mingrui, et al.
Published: (2025)
by: Zhu, Mingrui, et al.
Published: (2025)
RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning
by: Song, Jiahe, et al.
Published: (2025)
by: Song, Jiahe, et al.
Published: (2025)
Distilling Textual Priors from LLM to Efficient Image Fusion
by: Zhang, Ran, et al.
Published: (2025)
by: Zhang, Ran, et al.
Published: (2025)
Sora Generates Videos with Stunning Geometrical Consistency
by: Li, Xuanyi, et al.
Published: (2024)
by: Li, Xuanyi, et al.
Published: (2024)
UniPPTBench: A Unified Benchmark for Presentation Generation Across Diverse Input Settings
by: Zhao, Bo, et al.
Published: (2026)
by: Zhao, Bo, et al.
Published: (2026)
RealCustom++: Representing Images as Real Textual Word for Real-Time Customization
by: Mao, Zhendong, et al.
Published: (2024)
by: Mao, Zhendong, et al.
Published: (2024)
LiDARDraft: Generating LiDAR Point Cloud from Versatile Inputs
by: Wei, Haiyun, et al.
Published: (2025)
by: Wei, Haiyun, et al.
Published: (2025)
MoReact: Generating Reactive Motion from Textual Descriptions
by: Xu, Xiyan, et al.
Published: (2025)
by: Xu, Xiyan, et al.
Published: (2025)
Landscape-Awareness for Geometric View Diffusion Model
by: Chen, Yan-Ting, et al.
Published: (2026)
by: Chen, Yan-Ting, et al.
Published: (2026)
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
by: Hua, Jiacheng, et al.
Published: (2026)
by: Hua, Jiacheng, et al.
Published: (2026)
Pyramidal Patchification Flow for Visual Generation
by: Li, Hui, et al.
Published: (2025)
by: Li, Hui, et al.
Published: (2025)
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
by: Yao, Ting, et al.
Published: (2024)
by: Yao, Ting, et al.
Published: (2024)
G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors
by: Yang, Haoxin, et al.
Published: (2024)
by: Yang, Haoxin, et al.
Published: (2024)
TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram
by: Lei, Mingxi, et al.
Published: (2024)
by: Lei, Mingxi, et al.
Published: (2024)
Similar Items
-
MagicGeo: Training-Free Text-Guided Geometric Diagram Generation
by: Wang, Junxiao, et al.
Published: (2025) -
On Multi-Step Theorem Prediction via Non-Parametric Structural Priors
by: Zhao, Junbo, et al.
Published: (2026) -
Loom: Diffusion-Transformer for Interleaved Generation
by: Ye, Mingcheng, et al.
Published: (2025) -
TrajLoom: Dense Future Trajectory Generation from Video
by: Zhang, Zewei, et al.
Published: (2026) -
Beyond the Textual: Generating Coherent Visual Options for MCQs
by: Wang, Wanqiang, et al.
Published: (2025)