:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yutian, Pei, Zhongyi, Mao, Yi, Wang, Chen, Liu, Lin, Wang, Jianmin
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.01528
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
by: Wu, Bin, et al.
Published: (2026)

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
by: Luo, Yawen, et al.
Published: (2026)

FSM-Net: An Efficient Frequency-Spatial Network for Real-World Deblurring
by: Ly, Vinh-Thuan
Published: (2026)

Autonomous AI-enabled Industrial Sorting Pipeline for Advanced Textile Recycling
by: Spyridis, Yannis, et al.
Published: (2024)

Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation
by: Gao, Bin-Bin, et al.
Published: (2025)

Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration
by: Cai, Zhongyi, et al.
Published: (2025)

Stream-T1: Test-Time Scaling for Streaming Video Generation
by: Tu, Yijing, et al.
Published: (2026)

CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management
by: Wang, Chao, et al.
Published: (2026)

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
by: Huang, Yubo, et al.
Published: (2025)

From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion
by: Yu, Zhenyu, et al.
Published: (2025)

MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
by: Zhang, Yongshun, et al.
Published: (2025)

IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
by: Li, Yifan, et al.
Published: (2025)

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows
by: Han, Yuxuan, et al.
Published: (2026)

AI-Driven Innovations in Volumetric Video Streaming: A Review
by: Entezami, Erfan, et al.
Published: (2024)

InfVSR: Toward Consistency-Driven Streaming Generative Video Super-Resolution
by: Zhang, Ziqing, et al.
Published: (2025)

Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection
by: Liu, Ruiqi, et al.
Published: (2025)

DeformStream: Deformation-based Adaptive Volumetric Video Streaming
by: Li, Boyan, et al.
Published: (2024)

Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning
by: Qian, Zhihao, et al.
Published: (2023)

CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
by: Chen, Peiqi, et al.
Published: (2025)

Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
by: Zhen, Dingcheng, et al.
Published: (2025)

Finding Visual Saliency in Continuous Spike Stream
by: Zhu, Lin, et al.
Published: (2024)

A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging
by: Ma, Rui, et al.
Published: (2026)

Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation
by: Lei, Biwen, et al.
Published: (2025)

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
by: Wei, Meng, et al.
Published: (2025)

MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
by: Qiu, Yuheng, et al.
Published: (2024)

Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA
by: Yu, Ruizhi, et al.
Published: (2026)

PPBoost: Progressive Prompt Boosting for Text-Driven Medical Image Segmentation
by: Li, Xuchen, et al.
Published: (2025)

Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition
by: Liu, Yutian, et al.
Published: (2024)

AutoIAD: Manager-Driven Multi-Agent Collaboration for Automated Industrial Anomaly Detection
by: Ji, Dongwei, et al.
Published: (2025)

Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction
by: Chen, Jiacong, et al.
Published: (2025)

Stream Query Denoising for Vectorized HD Map Construction
by: Wang, Shuo, et al.
Published: (2024)

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
by: Wang, Yiyu, et al.
Published: (2025)

MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
by: Xu, Shuo, et al.
Published: (2024)

Penalizing Boundary Activation for Object Completeness in Diffusion Models
by: Xu, Haoyang, et al.
Published: (2025)

IS-Diff: Improving Diffusion-Based Inpainting with Better Initial Seed
by: Lyu, Yongzhe, et al.
Published: (2025)

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
by: Liu, Yufan, et al.
Published: (2025)

Saliency Driven Imagery Preprocessing for Efficient Compression -- Industrial Paper
by: Downes, Justin, et al.
Published: (2026)

ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension
by: Xu, Duo, et al.
Published: (2025)

Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers
by: Li, Hanxi, et al.
Published: (2023)

Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis
by: Yeh, Chun-Hsiao, et al.
Published: (2024)