Saved in:
| Main Authors: | Huang, Zihan, Wu, Tao, Lin, Wang, Zhang, Shengyu, Chen, Jingyuan, Wu, Fei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.09039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
by: Xu, Zhiyang, et al.
Published: (2025)
by: Xu, Zhiyang, et al.
Published: (2025)
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
by: Lin, Wang, et al.
Published: (2026)
by: Lin, Wang, et al.
Published: (2026)
Geometry OR Tracker: Universal Geometric Operating Room Tracking
by: Shao, Yihua, et al.
Published: (2026)
by: Shao, Yihua, et al.
Published: (2026)
GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation
by: Guo, Shasha, et al.
Published: (2025)
by: Guo, Shasha, et al.
Published: (2025)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion
by: Lv, Zheqi, et al.
Published: (2025)
by: Lv, Zheqi, et al.
Published: (2025)
DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
by: Tao, Haoyi, et al.
Published: (2026)
by: Tao, Haoyi, et al.
Published: (2026)
Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation
by: Che, Chang, et al.
Published: (2024)
by: Che, Chang, et al.
Published: (2024)
Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
by: Li, Yuting, et al.
Published: (2025)
by: Li, Yuting, et al.
Published: (2025)
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
by: Mueller, Phillip, et al.
Published: (2024)
by: Mueller, Phillip, et al.
Published: (2024)
Auto-nnU-Net: Towards Automated Medical Image Segmentation
by: Becktepe, Jannis, et al.
Published: (2025)
by: Becktepe, Jannis, et al.
Published: (2025)
AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use
by: Yang, Yaotian, et al.
Published: (2025)
by: Yang, Yaotian, et al.
Published: (2025)
Nested AutoRegressive Models
by: Wu, Hongyu, et al.
Published: (2025)
by: Wu, Hongyu, et al.
Published: (2025)
On Synthetic Texture Datasets: Challenges, Creation, and Curation
by: Hoak, Blaine, et al.
Published: (2024)
by: Hoak, Blaine, et al.
Published: (2024)
Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
by: Wu, Linshan, et al.
Published: (2024)
by: Wu, Linshan, et al.
Published: (2024)
GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
by: Li, Xuhui, et al.
Published: (2025)
by: Li, Xuhui, et al.
Published: (2025)
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
by: Chen, Weifeng, et al.
Published: (2024)
by: Chen, Weifeng, et al.
Published: (2024)
A Dataset Generation Scheme Based on Video2EEG-SPGN-Diffusion for SEED-VD
by: Guo, Yunfei, et al.
Published: (2025)
by: Guo, Yunfei, et al.
Published: (2025)
Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation
by: Chai, Enhui, et al.
Published: (2026)
by: Chai, Enhui, et al.
Published: (2026)
Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving
by: Armanious, Karim, et al.
Published: (2024)
by: Armanious, Karim, et al.
Published: (2024)
Understanding Semantic Perturbations on In-Processing Generative Image Watermarks
by: Nakra, Anirudh, et al.
Published: (2026)
by: Nakra, Anirudh, et al.
Published: (2026)
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)
by: Zhang, Mengchen, et al.
Published: (2025)
GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models
by: Ye, Zhankai, et al.
Published: (2026)
by: Ye, Zhankai, et al.
Published: (2026)
AutoLTS: Automating Cycling Stress Assessment via Contrastive Learning and Spatial Post-processing
by: Lin, Bo, et al.
Published: (2023)
by: Lin, Bo, et al.
Published: (2023)
High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
by: Huang, Nan, et al.
Published: (2023)
by: Huang, Nan, et al.
Published: (2023)
Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
by: Lin, Haobo, et al.
Published: (2026)
by: Lin, Haobo, et al.
Published: (2026)
Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets
by: Chen, Changjian, et al.
Published: (2024)
by: Chen, Changjian, et al.
Published: (2024)
Large Language Models Can Understanding Depth from Monocular Images
by: Xia, Zhongyi, et al.
Published: (2024)
by: Xia, Zhongyi, et al.
Published: (2024)
SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation
by: Huang, Chen-Hsiu, et al.
Published: (2024)
by: Huang, Chen-Hsiu, et al.
Published: (2024)
Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
by: Jiang, Yibo, et al.
Published: (2026)
by: Jiang, Yibo, et al.
Published: (2026)
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
by: Sahoo, Pranab, et al.
Published: (2024)
by: Sahoo, Pranab, et al.
Published: (2024)
InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
by: Zheng, Yan, et al.
Published: (2024)
by: Zheng, Yan, et al.
Published: (2024)
Rectified Point Flow: Generic Point Cloud Pose Estimation
by: Sun, Tao, et al.
Published: (2025)
by: Sun, Tao, et al.
Published: (2025)
VCD: A Dataset for Visual Commonsense Discovery in Images
by: Shen, Xiangqing, et al.
Published: (2024)
by: Shen, Xiangqing, et al.
Published: (2024)
L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
by: Guo, Ping, et al.
Published: (2024)
by: Guo, Ping, et al.
Published: (2024)
Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline
by: Lian, Jingchun, et al.
Published: (2024)
by: Lian, Jingchun, et al.
Published: (2024)
ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction
by: Duan, Zhongjie, et al.
Published: (2024)
by: Duan, Zhongjie, et al.
Published: (2024)
Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection
by: Xiong, Zihan, et al.
Published: (2025)
by: Xiong, Zihan, et al.
Published: (2025)
STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
by: Qin, Jie, et al.
Published: (2025)
by: Qin, Jie, et al.
Published: (2025)
Similar Items
-
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
by: Xu, Zhiyang, et al.
Published: (2025) -
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
by: Lin, Wang, et al.
Published: (2026) -
Geometry OR Tracker: Universal Geometric Operating Room Tracking
by: Shao, Yihua, et al.
Published: (2026) -
GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation
by: Guo, Shasha, et al.
Published: (2025) -
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion
by: Lv, Zheqi, et al.
Published: (2025)