:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	He, Yuhang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.07522
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Training-free Online Video Step Grounding
by: Zanella, Luca, et al.
Published: (2025)

Training-free Geometric Image Editing on Diffusion Models
by: Zhu, Hanshen, et al.
Published: (2025)

Technical Report for SoccerNet Challenge 2022 -- Replay Grounding Task
by: Chen, Shimin, et al.
Published: (2024)

Space Rotation with Basis Transformation for Training-free Test-Time Adaptation
by: Ding, Chenhao, et al.
Published: (2025)

NeuroClaw Technical Report
by: Wang, Cheng, et al.
Published: (2026)

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
by: Yang, Yuhang, et al.
Published: (2024)

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
by: Bu, Jiazi, et al.
Published: (2025)

StreamingClaw Technical Report
by: Chen, Jiawei, et al.
Published: (2026)

LandMarkSystem Technical Report
by: Ma, Zhenxiang, et al.
Published: (2025)

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
by: Zhou, Feng, et al.
Published: (2025)

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
by: Bu, Jiazi, et al.
Published: (2024)

Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness
by: Mannes, Mahmoud
Published: (2026)

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
by: Zhou, Yujie, et al.
Published: (2025)

Kling-MotionControl Technical Report
by: Kling Team, et al.
Published: (2026)

LongCat-Image Technical Report
by: Meituan LongCat Team, et al.
Published: (2025)

Kling-Omni Technical Report
by: Kling Team, et al.
Published: (2025)

Kelix Technical Report
by: Ding, Boyang, et al.
Published: (2026)

Training-free Video Temporal Grounding using Large-scale Pre-trained Models
by: Zheng, Minghang, et al.
Published: (2024)

Kwai Keye-VL Technical Report
by: Kwai Keye Team, et al.
Published: (2025)

LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
by: Alharbi, Yazeed, et al.
Published: (2024)

Kimi-VL Technical Report
by: Kimi Team, et al.
Published: (2025)

PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding
by: Hou, Chenshu, et al.
Published: (2024)

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment
by: Gou, Dongqiang, et al.
Published: (2026)

ABot-OCR Technical Report
by: Jiang, Kaitao, et al.
Published: (2026)

Singpath-VL Technical Report
by: Qiu, Zhen, et al.
Published: (2026)

Uni-Parser Technical Report
by: Fang, Xi, et al.
Published: (2025)

Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)

Logics-Parsing Technical Report
by: Chen, Xiangyang, et al.
Published: (2025)

Qwen-Image Technical Report
by: Wu, Chenfei, et al.
Published: (2025)

T2SGrid: Temporal-to-Spatial Gridification for Video Temporal Grounding
by: Guo, Chaohong, et al.
Published: (2026)

HunyuanVideo 1.5 Technical Report
by: Wu, Bing, et al.
Published: (2025)

ReGround: Improving Textual and Spatial Grounding at No Cost
by: Lee, Phillip Y., et al.
Published: (2024)

ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
by: Bekci, Burak, et al.
Published: (2024)

TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking
by: Nguyen-Quang, Thuc, et al.
Published: (2024)

Spatially-Adaptive Hash Encodings For Neural Surface Reconstruction
by: Walker, Thomas, et al.
Published: (2024)

KlingAvatar 2.0 Technical Report
by: Kling Team, et al.
Published: (2025)

Partial-to-Partial Shape Matching with Geometric Consistency
by: Ehm, Viktoria, et al.
Published: (2024)

RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning
by: Huang, Shiqi, et al.
Published: (2026)

Geometrically-Constrained Agent for Spatial Reasoning
by: Chen, Zeren, et al.
Published: (2025)

Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
by: Huang, Lang, et al.
Published: (2025)