Saved in:
| Main Authors: | Jing, Yixiao, Zhang, Chaoyu, Zhong, Zixuan, Huang, Peizhou |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.06672 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
IELDG: Suppressing Domain-Specific Noise with Inverse Evolution Layers for Domain Generalized Semantic Segmentation
by: Fan, Qizhe, et al.
Published: (2025)
by: Fan, Qizhe, et al.
Published: (2025)
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
by: Bai, Chengyu, et al.
Published: (2025)
by: Bai, Chengyu, et al.
Published: (2025)
Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance
by: Wu, Song, et al.
Published: (2026)
by: Wu, Song, et al.
Published: (2026)
GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis
by: Kim, Changjin, et al.
Published: (2025)
by: Kim, Changjin, et al.
Published: (2025)
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
by: Li, Jialu, et al.
Published: (2025)
by: Li, Jialu, et al.
Published: (2025)
Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
by: Song, Zijie, et al.
Published: (2025)
by: Song, Zijie, et al.
Published: (2025)
SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation
by: Tang, Zixuan, et al.
Published: (2026)
by: Tang, Zixuan, et al.
Published: (2026)
Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics
by: Gotin, Georgii, et al.
Published: (2025)
by: Gotin, Georgii, et al.
Published: (2025)
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
by: Lin, Jingli, et al.
Published: (2025)
by: Lin, Jingli, et al.
Published: (2025)
PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting
by: Song, Yixiao, et al.
Published: (2026)
by: Song, Yixiao, et al.
Published: (2026)
How Much 3D Do Video Foundation Models Encode?
by: Huang, Zixuan, et al.
Published: (2025)
by: Huang, Zixuan, et al.
Published: (2025)
Top-Down Semantic Refinement for Image Captioning
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction
by: Huang, Shuwei, et al.
Published: (2026)
by: Huang, Shuwei, et al.
Published: (2026)
Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization
by: Wu, Cho-Ying, et al.
Published: (2024)
by: Wu, Cho-Ying, et al.
Published: (2024)
Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing
by: Su, Tongtong, et al.
Published: (2025)
by: Su, Tongtong, et al.
Published: (2025)
VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer
by: Lin, Rui, et al.
Published: (2026)
by: Lin, Rui, et al.
Published: (2026)
ArchShapeNet:An Interpretable 3D-CNN Framework for Evaluating Architectural Shapes
by: Yin, Jun, et al.
Published: (2025)
by: Yin, Jun, et al.
Published: (2025)
Spatial-Aware Latent Initialization for Controllable Image Generation
by: Sun, Wenqiang, et al.
Published: (2024)
by: Sun, Wenqiang, et al.
Published: (2024)
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
by: Iurada, Leonardo, et al.
Published: (2025)
by: Iurada, Leonardo, et al.
Published: (2025)
SAT3D: Image-driven Semantic Attribute Transfer in 3D
by: Zhai, Zhijun, et al.
Published: (2024)
by: Zhai, Zhijun, et al.
Published: (2024)
Combating Semantic Contamination in Learning with Label Noise
by: Fan, Wenxiao, et al.
Published: (2024)
by: Fan, Wenxiao, et al.
Published: (2024)
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
by: Fan, Xiang, et al.
Published: (2024)
by: Fan, Xiang, et al.
Published: (2024)
Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset
by: Zhang, Ruixu, et al.
Published: (2025)
by: Zhang, Ruixu, et al.
Published: (2025)
InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization
by: Chung, Jaeyoung, et al.
Published: (2026)
by: Chung, Jaeyoung, et al.
Published: (2026)
Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge
by: Xu, Yinsong, et al.
Published: (2026)
by: Xu, Yinsong, et al.
Published: (2026)
Paired Image Generation with Diffusion-Guided Diffusion Models
by: Zhang, Haoxuan, et al.
Published: (2025)
by: Zhang, Haoxuan, et al.
Published: (2025)
MetaSSC: Enhancing 3D Semantic Scene Completion for Autonomous Driving through Meta-Learning and Long-sequence Modeling
by: Qu, Yansong, et al.
Published: (2024)
by: Qu, Yansong, et al.
Published: (2024)
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
by: Li, Jinxuan, et al.
Published: (2025)
by: Li, Jinxuan, et al.
Published: (2025)
A Survey on Backbones for Deep Video Action Recognition
by: Tang, Zixuan, et al.
Published: (2024)
by: Tang, Zixuan, et al.
Published: (2024)
StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling
by: Jing, Liqi, et al.
Published: (2026)
by: Jing, Liqi, et al.
Published: (2026)
ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report
by: Yuan, Yixiao, et al.
Published: (2024)
by: Yuan, Yixiao, et al.
Published: (2024)
Plan-X: Instruct Video Generation via Semantic Planning
by: Huang, Lun, et al.
Published: (2025)
by: Huang, Lun, et al.
Published: (2025)
One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization
by: Mousavi, Seyed Amir, et al.
Published: (2025)
by: Mousavi, Seyed Amir, et al.
Published: (2025)
NeuroBridge: Bio-Inspired Self-Supervised EEG-to-Image Decoding via Cognitive Priors and Bidirectional Semantic Alignment
by: Zhang, Wenjiang, et al.
Published: (2025)
by: Zhang, Wenjiang, et al.
Published: (2025)
Scalable Image Tokenization with Index Backpropagation Quantization
by: Shi, Fengyuan, et al.
Published: (2024)
by: Shi, Fengyuan, et al.
Published: (2024)
NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation
by: Zheng, PengFei, et al.
Published: (2024)
by: Zheng, PengFei, et al.
Published: (2024)
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
by: Liang, Yujia, et al.
Published: (2025)
by: Liang, Yujia, et al.
Published: (2025)
Enhance Vision-Language Alignment with Noise
by: Huang, Sida, et al.
Published: (2024)
by: Huang, Sida, et al.
Published: (2024)
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs
by: Yang, Junqi, et al.
Published: (2026)
by: Yang, Junqi, et al.
Published: (2026)
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
by: Xiao, Yicheng, et al.
Published: (2025)
by: Xiao, Yicheng, et al.
Published: (2025)
Similar Items
-
IELDG: Suppressing Domain-Specific Noise with Inverse Evolution Layers for Domain Generalized Semantic Segmentation
by: Fan, Qizhe, et al.
Published: (2025) -
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
by: Bai, Chengyu, et al.
Published: (2025) -
Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance
by: Wu, Song, et al.
Published: (2026) -
GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis
by: Kim, Changjin, et al.
Published: (2025) -
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
by: Li, Jialu, et al.
Published: (2025)