Saved in:
| Main Author: | He, Yuhang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.07522 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Training-free Online Video Step Grounding
by: Zanella, Luca, et al.
Published: (2025)
by: Zanella, Luca, et al.
Published: (2025)
Training-free Geometric Image Editing on Diffusion Models
by: Zhu, Hanshen, et al.
Published: (2025)
by: Zhu, Hanshen, et al.
Published: (2025)
Technical Report for SoccerNet Challenge 2022 -- Replay Grounding Task
by: Chen, Shimin, et al.
Published: (2024)
by: Chen, Shimin, et al.
Published: (2024)
Space Rotation with Basis Transformation for Training-free Test-Time Adaptation
by: Ding, Chenhao, et al.
Published: (2025)
by: Ding, Chenhao, et al.
Published: (2025)
NeuroClaw Technical Report
by: Wang, Cheng, et al.
Published: (2026)
by: Wang, Cheng, et al.
Published: (2026)
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
by: Yang, Yuhang, et al.
Published: (2024)
by: Yang, Yuhang, et al.
Published: (2024)
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
by: Bu, Jiazi, et al.
Published: (2025)
by: Bu, Jiazi, et al.
Published: (2025)
StreamingClaw Technical Report
by: Chen, Jiawei, et al.
Published: (2026)
by: Chen, Jiawei, et al.
Published: (2026)
LandMarkSystem Technical Report
by: Ma, Zhenxiang, et al.
Published: (2025)
by: Ma, Zhenxiang, et al.
Published: (2025)
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
by: Zhou, Feng, et al.
Published: (2025)
by: Zhou, Feng, et al.
Published: (2025)
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
by: Bu, Jiazi, et al.
Published: (2024)
by: Bu, Jiazi, et al.
Published: (2024)
Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness
by: Mannes, Mahmoud
Published: (2026)
by: Mannes, Mahmoud
Published: (2026)
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
by: Zhou, Yujie, et al.
Published: (2025)
by: Zhou, Yujie, et al.
Published: (2025)
Kling-MotionControl Technical Report
by: Kling Team, et al.
Published: (2026)
by: Kling Team, et al.
Published: (2026)
LongCat-Image Technical Report
by: Meituan LongCat Team, et al.
Published: (2025)
by: Meituan LongCat Team, et al.
Published: (2025)
Kling-Omni Technical Report
by: Kling Team, et al.
Published: (2025)
by: Kling Team, et al.
Published: (2025)
Kelix Technical Report
by: Ding, Boyang, et al.
Published: (2026)
by: Ding, Boyang, et al.
Published: (2026)
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
by: Zheng, Minghang, et al.
Published: (2024)
by: Zheng, Minghang, et al.
Published: (2024)
Kwai Keye-VL Technical Report
by: Kwai Keye Team, et al.
Published: (2025)
by: Kwai Keye Team, et al.
Published: (2025)
LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
by: Alharbi, Yazeed, et al.
Published: (2024)
by: Alharbi, Yazeed, et al.
Published: (2024)
Kimi-VL Technical Report
by: Kimi Team, et al.
Published: (2025)
by: Kimi Team, et al.
Published: (2025)
PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding
by: Hou, Chenshu, et al.
Published: (2024)
by: Hou, Chenshu, et al.
Published: (2024)
Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment
by: Gou, Dongqiang, et al.
Published: (2026)
by: Gou, Dongqiang, et al.
Published: (2026)
ABot-OCR Technical Report
by: Jiang, Kaitao, et al.
Published: (2026)
by: Jiang, Kaitao, et al.
Published: (2026)
Singpath-VL Technical Report
by: Qiu, Zhen, et al.
Published: (2026)
by: Qiu, Zhen, et al.
Published: (2026)
Uni-Parser Technical Report
by: Fang, Xi, et al.
Published: (2025)
by: Fang, Xi, et al.
Published: (2025)
Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)
by: Yan, Haolong, et al.
Published: (2025)
Logics-Parsing Technical Report
by: Chen, Xiangyang, et al.
Published: (2025)
by: Chen, Xiangyang, et al.
Published: (2025)
Qwen-Image Technical Report
by: Wu, Chenfei, et al.
Published: (2025)
by: Wu, Chenfei, et al.
Published: (2025)
T2SGrid: Temporal-to-Spatial Gridification for Video Temporal Grounding
by: Guo, Chaohong, et al.
Published: (2026)
by: Guo, Chaohong, et al.
Published: (2026)
HunyuanVideo 1.5 Technical Report
by: Wu, Bing, et al.
Published: (2025)
by: Wu, Bing, et al.
Published: (2025)
ReGround: Improving Textual and Spatial Grounding at No Cost
by: Lee, Phillip Y., et al.
Published: (2024)
by: Lee, Phillip Y., et al.
Published: (2024)
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
by: Bekci, Burak, et al.
Published: (2024)
by: Bekci, Burak, et al.
Published: (2024)
TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking
by: Nguyen-Quang, Thuc, et al.
Published: (2024)
by: Nguyen-Quang, Thuc, et al.
Published: (2024)
Spatially-Adaptive Hash Encodings For Neural Surface Reconstruction
by: Walker, Thomas, et al.
Published: (2024)
by: Walker, Thomas, et al.
Published: (2024)
KlingAvatar 2.0 Technical Report
by: Kling Team, et al.
Published: (2025)
by: Kling Team, et al.
Published: (2025)
Partial-to-Partial Shape Matching with Geometric Consistency
by: Ehm, Viktoria, et al.
Published: (2024)
by: Ehm, Viktoria, et al.
Published: (2024)
RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning
by: Huang, Shiqi, et al.
Published: (2026)
by: Huang, Shiqi, et al.
Published: (2026)
Geometrically-Constrained Agent for Spatial Reasoning
by: Chen, Zeren, et al.
Published: (2025)
by: Chen, Zeren, et al.
Published: (2025)
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
by: Huang, Lang, et al.
Published: (2025)
by: Huang, Lang, et al.
Published: (2025)
Similar Items
-
Training-free Online Video Step Grounding
by: Zanella, Luca, et al.
Published: (2025) -
Training-free Geometric Image Editing on Diffusion Models
by: Zhu, Hanshen, et al.
Published: (2025) -
Technical Report for SoccerNet Challenge 2022 -- Replay Grounding Task
by: Chen, Shimin, et al.
Published: (2024) -
Space Rotation with Basis Transformation for Training-free Test-Time Adaptation
by: Ding, Chenhao, et al.
Published: (2025) -
NeuroClaw Technical Report
by: Wang, Cheng, et al.
Published: (2026)