:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhang, Shiyi, Liang, Dong, Zheng, Hairong, Zhou, Yihang
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition Artificial Intelligence I.2
Acceso en línea:	https://arxiv.org/abs/2506.06035
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion
por: Zhang, Shiyi, et al.
Publicado: (2025)

HAMMR: HierArchical MultiModal React agents for generic VQA
por: Castrejon, Lluis, et al.
Publicado: (2024)

Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation
por: Manghotay, Reyhaneh Ahani, et al.
Publicado: (2026)

DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models
por: Carnemolla, Simone, et al.
Publicado: (2025)

CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
por: Ou, Ziyang
Publicado: (2025)

Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models
por: Zwick, Pascal, et al.
Publicado: (2024)

FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion
por: Caselles-Dupré, Hugo, et al.
Publicado: (2026)

DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction
por: Du, Chenhe, et al.
Publicado: (2024)

Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network
por: Liu, Liangjin, et al.
Publicado: (2025)

CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
por: Safdar, Aon, et al.
Publicado: (2025)

LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
por: Vasilcoiu, Ana, et al.
Publicado: (2025)

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
por: Liu, Yisu, et al.
Publicado: (2024)

Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis
por: Umeike, Robinson, et al.
Publicado: (2025)

Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
por: Kocsis, Peter, et al.
Publicado: (2025)

Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection
por: da Graca, Mario de Jesus, et al.
Publicado: (2025)

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection
por: Wang, Gaojian, et al.
Publicado: (2025)

Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation
por: Jin, Jing, et al.
Publicado: (2025)

PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation
por: Wang, Yuanlong, et al.
Publicado: (2026)

NeeCo: Image Synthesis of Novel Instrument States Based on Dynamic and Deformable 3D Gaussian Reconstruction
por: Zeng, Tianle, et al.
Publicado: (2025)

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
por: Chong, Zheng, et al.
Publicado: (2025)

FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction
por: Fang, Irving, et al.
Publicado: (2024)

From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
por: Gonzalez, Leonardo
Publicado: (2026)

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
por: Deng, Juncan, et al.
Publicado: (2024)

Intrinsic Image Diffusion for Indoor Single-view Material Estimation
por: Kocsis, Peter, et al.
Publicado: (2023)

On the Limitations of Vision-Language Models in Understanding Image Transforms
por: Anis, Ahmad Mustafa, et al.
Publicado: (2025)

Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model
por: Xie, Taofeng, et al.
Publicado: (2024)

Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training
por: Shi, Mingjia, et al.
Publicado: (2024)

FACMIC: Federated Adaptative CLIP Model for Medical Image Classification
por: Wu, Yihang, et al.
Publicado: (2024)

SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision
por: Lin, Weikai, et al.
Publicado: (2025)

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
por: Sun, Jiachen, et al.
Publicado: (2024)

Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
por: Elberg, Rafael, et al.
Publicado: (2024)

Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
por: Liang, Yijun, et al.
Publicado: (2024)

Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
por: Korolkov, Vasilii, et al.
Publicado: (2025)

RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
por: Ge, Junyao, et al.
Publicado: (2024)

BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution
por: He, Zihao, et al.
Publicado: (2025)

Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation
por: Zhou, Yihang, et al.
Publicado: (2024)

Representation Paradigms in AI-based 3D Radiological Image Reconstruction: A Systematic Review
por: Yang, Yuezhe, et al.
Publicado: (2025)

T-HITL Effectively Addresses Problematic Associations in Image Generation and Maintains Overall Visual Quality
por: Epstein, Susan, et al.
Publicado: (2024)

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies
por: Liang, Shuqiao, et al.
Publicado: (2025)

Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity
por: Lu, Yizhuo, et al.
Publicado: (2026)