:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Jialuo, Chai, Wenhao, Fu, Xingyu, Xu, Haiyang, Xie, Saining
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online-Zugang:	https://arxiv.org/abs/2504.13129
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

VideoNSA: Native Sparse Attention Scales Video Understanding
von: Song, Enxin, et al.
Veröffentlicht: (2025)

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
von: Li, Jialuo, et al.
Veröffentlicht: (2025)

Transition Matching Distillation for Fast Video Generation
von: Nie, Weili, et al.
Veröffentlicht: (2026)

Improved Baselines with Representation Autoencoders
von: Singh, Jaskirat, et al.
Veröffentlicht: (2026)

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
von: Wang, Hao, et al.
Veröffentlicht: (2025)

MoDE: CLIP Data Experts via Clustering
von: Ma, Jiawei, et al.
Veröffentlicht: (2024)

Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI
von: Newen, Carina, et al.
Veröffentlicht: (2025)

What matters for Representation Alignment: Global Information or Spatial Structure?
von: Singh, Jaskirat, et al.
Veröffentlicht: (2025)

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
von: Chu, Tianzhe, et al.
Veröffentlicht: (2025)

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
von: Xu, Sihan, et al.
Veröffentlicht: (2023)

Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
von: Yang, Haobo, et al.
Veröffentlicht: (2025)

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection
von: Zhang, Haichao, et al.
Veröffentlicht: (2023)

PackDiT: Joint Human Motion and Text Generation via Mutual Prompting
von: Jiang, Zhongyu, et al.
Veröffentlicht: (2025)

The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization
von: Li, Manyi, et al.
Veröffentlicht: (2026)

T2I-ConBench: Text-to-Image Benchmark for Continual Post-training
von: Huang, Zhehao, et al.
Veröffentlicht: (2025)

Label-free Neural Semantic Image Synthesis
von: Wang, Jiayi, et al.
Veröffentlicht: (2024)

InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs
von: Tang, Lv, et al.
Veröffentlicht: (2026)

ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
von: Li, Hong, et al.
Veröffentlicht: (2024)

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
von: Zhou, Guanyu, et al.
Veröffentlicht: (2026)

Addressing Negative Transfer in Diffusion Models
von: Go, Hyojun, et al.
Veröffentlicht: (2023)

Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models
von: Li, Wenhao, et al.
Veröffentlicht: (2023)

Generalizable Geometric Image Caption Synthesis
von: Xin, Yue, et al.
Veröffentlicht: (2025)

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
von: Berrada, Tariq, et al.
Veröffentlicht: (2023)

Accessing Vision Foundation Models via ImageNet-1K
von: Zhang, Yitian, et al.
Veröffentlicht: (2024)

H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
von: Li, Wenhao, et al.
Veröffentlicht: (2025)

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
von: Zhang, Yi, et al.
Veröffentlicht: (2024)

Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration
von: Li, Zhili, et al.
Veröffentlicht: (2026)

When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery
von: Xie, Yiqun, et al.
Veröffentlicht: (2024)

Improving Diffusion-Based Image Synthesis with Context Prediction
von: Yang, Ling, et al.
Veröffentlicht: (2024)

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes
von: Hashmi, Ammarah, et al.
Veröffentlicht: (2024)

TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
von: Zhou, Wenhao, et al.
Veröffentlicht: (2025)

ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion
von: Fang, Zhenghan, et al.
Veröffentlicht: (2025)

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
von: Kao, Kuei-Chun, et al.
Veröffentlicht: (2026)

High-Resolution Image Synthesis via Next-Token Prediction
von: Chen, Dengsheng, et al.
Veröffentlicht: (2024)

Editing Massive Concepts in Text-to-Image Diffusion Models
von: Xiong, Tianwei, et al.
Veröffentlicht: (2024)

Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation
von: Li, Wenhao, et al.
Veröffentlicht: (2025)

Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
von: Izadi, Amirmohammad, et al.
Veröffentlicht: (2025)

Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping
von: Jeanson, Gabriel, et al.
Veröffentlicht: (2026)

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
von: Lin, Xuewu, et al.
Veröffentlicht: (2024)

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
von: Ye, Hanrong, et al.
Veröffentlicht: (2023)