Guardado en:
| Autores principales: | Zhang, Shiyi, Liang, Dong, Zheng, Hairong, Zhou, Yihang |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2506.06035 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion
por: Zhang, Shiyi, et al.
Publicado: (2025)
por: Zhang, Shiyi, et al.
Publicado: (2025)
HAMMR: HierArchical MultiModal React agents for generic VQA
por: Castrejon, Lluis, et al.
Publicado: (2024)
por: Castrejon, Lluis, et al.
Publicado: (2024)
Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation
por: Manghotay, Reyhaneh Ahani, et al.
Publicado: (2026)
por: Manghotay, Reyhaneh Ahani, et al.
Publicado: (2026)
DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models
por: Carnemolla, Simone, et al.
Publicado: (2025)
por: Carnemolla, Simone, et al.
Publicado: (2025)
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
por: Ou, Ziyang
Publicado: (2025)
por: Ou, Ziyang
Publicado: (2025)
Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models
por: Zwick, Pascal, et al.
Publicado: (2024)
por: Zwick, Pascal, et al.
Publicado: (2024)
FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion
por: Caselles-Dupré, Hugo, et al.
Publicado: (2026)
por: Caselles-Dupré, Hugo, et al.
Publicado: (2026)
DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction
por: Du, Chenhe, et al.
Publicado: (2024)
por: Du, Chenhe, et al.
Publicado: (2024)
Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network
por: Liu, Liangjin, et al.
Publicado: (2025)
por: Liu, Liangjin, et al.
Publicado: (2025)
CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
por: Safdar, Aon, et al.
Publicado: (2025)
por: Safdar, Aon, et al.
Publicado: (2025)
LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
por: Vasilcoiu, Ana, et al.
Publicado: (2025)
por: Vasilcoiu, Ana, et al.
Publicado: (2025)
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
por: Liu, Yisu, et al.
Publicado: (2024)
por: Liu, Yisu, et al.
Publicado: (2024)
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis
por: Umeike, Robinson, et al.
Publicado: (2025)
por: Umeike, Robinson, et al.
Publicado: (2025)
Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
por: Kocsis, Peter, et al.
Publicado: (2025)
por: Kocsis, Peter, et al.
Publicado: (2025)
Diffusion-Based Synthetic Brightfield Microscopy Images for Enhanced Single Cell Detection
por: da Graca, Mario de Jesus, et al.
Publicado: (2025)
por: da Graca, Mario de Jesus, et al.
Publicado: (2025)
Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection
por: Wang, Gaojian, et al.
Publicado: (2025)
por: Wang, Gaojian, et al.
Publicado: (2025)
Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation
por: Jin, Jing, et al.
Publicado: (2025)
por: Jin, Jing, et al.
Publicado: (2025)
PBSBench: A Multi-Level Vision-Language Framework and Benchmark for Hematopathology Whole Slide Image Interpretation
por: Wang, Yuanlong, et al.
Publicado: (2026)
por: Wang, Yuanlong, et al.
Publicado: (2026)
NeeCo: Image Synthesis of Novel Instrument States Based on Dynamic and Deformable 3D Gaussian Reconstruction
por: Zeng, Tianle, et al.
Publicado: (2025)
por: Zeng, Tianle, et al.
Publicado: (2025)
CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
por: Chong, Zheng, et al.
Publicado: (2025)
por: Chong, Zheng, et al.
Publicado: (2025)
FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction
por: Fang, Irving, et al.
Publicado: (2024)
por: Fang, Irving, et al.
Publicado: (2024)
From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
por: Gonzalez, Leonardo
Publicado: (2026)
por: Gonzalez, Leonardo
Publicado: (2026)
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
por: Deng, Juncan, et al.
Publicado: (2024)
por: Deng, Juncan, et al.
Publicado: (2024)
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
por: Kocsis, Peter, et al.
Publicado: (2023)
por: Kocsis, Peter, et al.
Publicado: (2023)
On the Limitations of Vision-Language Models in Understanding Image Transforms
por: Anis, Ahmad Mustafa, et al.
Publicado: (2025)
por: Anis, Ahmad Mustafa, et al.
Publicado: (2025)
Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model
por: Xie, Taofeng, et al.
Publicado: (2024)
por: Xie, Taofeng, et al.
Publicado: (2024)
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training
por: Shi, Mingjia, et al.
Publicado: (2024)
por: Shi, Mingjia, et al.
Publicado: (2024)
FACMIC: Federated Adaptative CLIP Model for Medical Image Classification
por: Wu, Yihang, et al.
Publicado: (2024)
por: Wu, Yihang, et al.
Publicado: (2024)
SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision
por: Lin, Weikai, et al.
Publicado: (2025)
por: Lin, Weikai, et al.
Publicado: (2025)
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
por: Sun, Jiachen, et al.
Publicado: (2024)
por: Sun, Jiachen, et al.
Publicado: (2024)
Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
por: Elberg, Rafael, et al.
Publicado: (2024)
por: Elberg, Rafael, et al.
Publicado: (2024)
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
por: Liang, Yijun, et al.
Publicado: (2024)
por: Liang, Yijun, et al.
Publicado: (2024)
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
por: Korolkov, Vasilii, et al.
Publicado: (2025)
por: Korolkov, Vasilii, et al.
Publicado: (2025)
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
por: Ge, Junyao, et al.
Publicado: (2024)
por: Ge, Junyao, et al.
Publicado: (2024)
BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution
por: He, Zihao, et al.
Publicado: (2025)
por: He, Zihao, et al.
Publicado: (2025)
Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation
por: Zhou, Yihang, et al.
Publicado: (2024)
por: Zhou, Yihang, et al.
Publicado: (2024)
Representation Paradigms in AI-based 3D Radiological Image Reconstruction: A Systematic Review
por: Yang, Yuezhe, et al.
Publicado: (2025)
por: Yang, Yuezhe, et al.
Publicado: (2025)
T-HITL Effectively Addresses Problematic Associations in Image Generation and Maintains Overall Visual Quality
por: Epstein, Susan, et al.
Publicado: (2024)
por: Epstein, Susan, et al.
Publicado: (2024)
FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies
por: Liang, Shuqiao, et al.
Publicado: (2025)
por: Liang, Shuqiao, et al.
Publicado: (2025)
Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity
por: Lu, Yizhuo, et al.
Publicado: (2026)
por: Lu, Yizhuo, et al.
Publicado: (2026)
Ejemplares similares
-
HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion
por: Zhang, Shiyi, et al.
Publicado: (2025) -
HAMMR: HierArchical MultiModal React agents for generic VQA
por: Castrejon, Lluis, et al.
Publicado: (2024) -
Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation
por: Manghotay, Reyhaneh Ahani, et al.
Publicado: (2026) -
DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models
por: Carnemolla, Simone, et al.
Publicado: (2025) -
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier
por: Ou, Ziyang
Publicado: (2025)