:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jing, Yixiao, Zhang, Chaoyu, Zhong, Zixuan, Huang, Peizhou
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.06672
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

IELDG: Suppressing Domain-Specific Noise with Inverse Evolution Layers for Domain Generalized Semantic Segmentation
by: Fan, Qizhe, et al.
Published: (2025)

FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
by: Bai, Chengyu, et al.
Published: (2025)

Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance
by: Wu, Song, et al.
Published: (2026)

GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis
by: Kim, Changjin, et al.
Published: (2025)

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
by: Li, Jialu, et al.
Published: (2025)

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
by: Song, Zijie, et al.
Published: (2025)

SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation
by: Tang, Zixuan, et al.
Published: (2026)

Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics
by: Gotin, Georgii, et al.
Published: (2025)

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
by: Lin, Jingli, et al.
Published: (2025)

PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting
by: Song, Yixiao, et al.
Published: (2026)

How Much 3D Do Video Foundation Models Encode?
by: Huang, Zixuan, et al.
Published: (2025)

Top-Down Semantic Refinement for Image Captioning
by: Zhang, Jusheng, et al.
Published: (2025)

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction
by: Huang, Shuwei, et al.
Published: (2026)

Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization
by: Wu, Cho-Ying, et al.
Published: (2024)

Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing
by: Su, Tongtong, et al.
Published: (2025)

VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer
by: Lin, Rui, et al.
Published: (2026)

ArchShapeNet:An Interpretable 3D-CNN Framework for Evaluating Architectural Shapes
by: Yin, Jun, et al.
Published: (2025)

Spatial-Aware Latent Initialization for Controllable Image Generation
by: Sun, Wenqiang, et al.
Published: (2024)

A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
by: Iurada, Leonardo, et al.
Published: (2025)

SAT3D: Image-driven Semantic Attribute Transfer in 3D
by: Zhai, Zhijun, et al.
Published: (2024)

Combating Semantic Contamination in Learning with Label Noise
by: Fan, Wenxiao, et al.
Published: (2024)

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
by: Fan, Xiang, et al.
Published: (2024)

Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset
by: Zhang, Ruixu, et al.
Published: (2025)

InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization
by: Chung, Jaeyoung, et al.
Published: (2026)

Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge
by: Xu, Yinsong, et al.
Published: (2026)

Paired Image Generation with Diffusion-Guided Diffusion Models
by: Zhang, Haoxuan, et al.
Published: (2025)

MetaSSC: Enhancing 3D Semantic Scene Completion for Autonomous Driving through Meta-Learning and Long-sequence Modeling
by: Qu, Yansong, et al.
Published: (2024)

Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
by: Li, Jinxuan, et al.
Published: (2025)

A Survey on Backbones for Deep Video Action Recognition
by: Tang, Zixuan, et al.
Published: (2024)

StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling
by: Jing, Liqi, et al.
Published: (2026)

ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report
by: Yuan, Yixiao, et al.
Published: (2024)

Plan-X: Instruct Video Generation via Semantic Planning
by: Huang, Lun, et al.
Published: (2025)

One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization
by: Mousavi, Seyed Amir, et al.
Published: (2025)

NeuroBridge: Bio-Inspired Self-Supervised EEG-to-Image Decoding via Cognitive Priors and Bidirectional Semantic Alignment
by: Zhang, Wenjiang, et al.
Published: (2025)

Scalable Image Tokenization with Index Backpropagation Quantization
by: Shi, Fengyuan, et al.
Published: (2024)

NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation
by: Zheng, PengFei, et al.
Published: (2024)

IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
by: Liang, Yujia, et al.
Published: (2025)

Enhance Vision-Language Alignment with Noise
by: Huang, Sida, et al.
Published: (2024)

INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs
by: Yang, Junqi, et al.
Published: (2026)

HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)