:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, You, Ma, Fan, Yang, Yi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.16749
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Synth-Align: Improving Trustworthiness in Vision-Language Model with Synthetic Preference Data Alignment
by: Wijaya, Robert, et al.
Published: (2024)

SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data
by: Kularathne, Yudara, et al.
Published: (2024)

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
by: Zhou, Dewei, et al.
Published: (2024)

Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
by: Zhou, Dewei, et al.
Published: (2024)

SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation
by: Wu, Wangyu, et al.
Published: (2025)

Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios
by: Marcus, Richard, et al.
Published: (2025)

Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification
by: Basu, Abhipsa, et al.
Published: (2025)

Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation
by: Molino, Daniele, et al.
Published: (2025)

Harnessing Machine Learning for Discerning AI-Generated Synthetic Images
by: Wang, Yuyang, et al.
Published: (2024)

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
by: Ye, Junyan, et al.
Published: (2025)

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
by: Li, You, et al.
Published: (2024)

ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)

Few-Shot Synthetic Data Generation with Diffusion Models for Downstream Vision Tasks
by: Dushenev, Daniil, et al.
Published: (2026)

Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
by: Wang, Xinran, et al.
Published: (2025)

Cross-Lingual SynthDocs: A Large-Scale Synthetic Corpus for Any to Arabic OCR and Document Understanding
by: Al-Homoud, Haneen, et al.
Published: (2025)

Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
by: Sharifzadeh, Sahand, et al.
Published: (2024)

Any-to-3D Generation via Hybrid Diffusion Supervision
by: Fan, Yijun, et al.
Published: (2024)

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
by: Meng, Chutian, et al.
Published: (2024)

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
by: Wei, Zhixiang, et al.
Published: (2023)

Depth Any Video with Scalable Synthetic Data
by: Yang, Honghui, et al.
Published: (2024)

SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)

OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision
by: Berenguel-Baeta, Bruno, et al.
Published: (2024)

Role-SynthCLIP: A Role Play Driven Diverse Synthetic Data Approach
by: Huangfu, Yuanxiang, et al.
Published: (2025)

MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation
by: Kang, YoonJe, et al.
Published: (2025)

UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025)

SynthRAR: Ring Artifacts Reduction in CT with Unrolled Network and Synthetic Data Training
by: Yang, Hongxu, et al.
Published: (2026)

Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks
by: Ashrafian, Pooria, et al.
Published: (2024)

Segment Any-Quality Images with Generative Latent Space Enhancement
by: Guo, Guangqian, et al.
Published: (2025)

Single Image, Any Face: Generalisable 3D Face Generation
by: Wang, Wenqing, et al.
Published: (2024)

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
by: Li, Haoxin, et al.
Published: (2025)

Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
by: Ma, Junyuan, et al.
Published: (2026)

AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks
by: Xie, Ming, et al.
Published: (2025)

AnyTrans: Translate AnyText in the Image with Large Scale Models
by: Qian, Zhipeng, et al.
Published: (2024)

SynthBrainGrow: Synthetic Diffusion Brain Aging for Longitudinal MRI Data Generation in Young People
by: Zapaishchykova, Anna, et al.
Published: (2024)

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models
by: Wang, Xuesong, et al.
Published: (2026)

Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
by: Liu, Shih-Wen, et al.
Published: (2025)

A Conditional Generative Framework for Synthetic Data Augmentation in Segmenting Thin and Elongated Structures in Biological Images
by: Liu, Yi, et al.
Published: (2025)

FoleyDirector: Fine-Grained Temporal Steering for Video-to-Audio Generation via Structured Scripts
by: Li, You, et al.
Published: (2026)

Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
by: Qian, Haotian, et al.
Published: (2024)