:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Zihan, Wu, Tao, Lin, Wang, Zhang, Shengyu, Chen, Jingyuan, Wu, Fei
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2409.09039
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
by: Xu, Zhiyang, et al.
Published: (2025)

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
by: Lin, Wang, et al.
Published: (2026)

Geometry OR Tracker: Universal Geometric Operating Room Tracking
by: Shao, Yihua, et al.
Published: (2026)

GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation
by: Guo, Shasha, et al.
Published: (2025)

Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion
by: Lv, Zheqi, et al.
Published: (2025)

DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
by: Sun, Haoran, et al.
Published: (2025)

OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
by: Tao, Haoyi, et al.
Published: (2026)

Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation
by: Che, Chang, et al.
Published: (2024)

Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
by: Li, Yuting, et al.
Published: (2025)

GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
by: Mueller, Phillip, et al.
Published: (2024)

Auto-nnU-Net: Towards Automated Medical Image Segmentation
by: Becktepe, Jannis, et al.
Published: (2025)

AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use
by: Yang, Yaotian, et al.
Published: (2025)

Nested AutoRegressive Models
by: Wu, Hongyu, et al.
Published: (2025)

On Synthetic Texture Datasets: Challenges, Creation, and Curation
by: Hoak, Blaine, et al.
Published: (2024)

Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
by: Wu, Linshan, et al.
Published: (2024)

GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
by: Li, Xuhui, et al.
Published: (2025)

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
by: Huang, Kung-Hsiang, et al.
Published: (2025)

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
by: Chen, Weifeng, et al.
Published: (2024)

A Dataset Generation Scheme Based on Video2EEG-SPGN-Diffusion for SEED-VD
by: Guo, Yunfei, et al.
Published: (2025)

Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation
by: Chai, Enhui, et al.
Published: (2026)

Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving
by: Armanious, Karim, et al.
Published: (2024)

Understanding Semantic Perturbations on In-Processing Generative Image Watermarks
by: Nakra, Anirudh, et al.
Published: (2026)

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)

GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models
by: Ye, Zhankai, et al.
Published: (2026)

AutoLTS: Automating Cycling Stress Assessment via Contrastive Learning and Spatial Post-processing
by: Lin, Bo, et al.
Published: (2023)

High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
by: Huang, Nan, et al.
Published: (2023)

Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
by: Lin, Haobo, et al.
Published: (2026)

Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets
by: Chen, Changjian, et al.
Published: (2024)

Large Language Models Can Understanding Depth from Monocular Images
by: Xia, Zhongyi, et al.
Published: (2024)

SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation
by: Huang, Chen-Hsiu, et al.
Published: (2024)

Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
by: Jiang, Yibo, et al.
Published: (2026)

Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
by: Sahoo, Pranab, et al.
Published: (2024)

InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
by: Zheng, Yan, et al.
Published: (2024)

Rectified Point Flow: Generic Point Cloud Pose Estimation
by: Sun, Tao, et al.
Published: (2025)

VCD: A Dataset for Visual Commonsense Discovery in Images
by: Shen, Xiangqing, et al.
Published: (2024)

L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
by: Guo, Ping, et al.
Published: (2024)

Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline
by: Lian, Jingchun, et al.
Published: (2024)

ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction
by: Duan, Zhongjie, et al.
Published: (2024)

Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection
by: Xiong, Zihan, et al.
Published: (2025)

STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
by: Qin, Jie, et al.
Published: (2025)