Saved in:
| Main Authors: | Takenaka, Patrick, Maucher, Johannes, Huber, Marco F. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.09537 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ViPro-2: Unsupervised State Estimation via Integrated Dynamics for Guiding Video Prediction
by: Takenaka, Patrick, et al.
Published: (2025)
by: Takenaka, Patrick, et al.
Published: (2025)
Guiding Video Prediction with Explicit Procedural Knowledge
by: Takenaka, Patrick, et al.
Published: (2024)
by: Takenaka, Patrick, et al.
Published: (2024)
Anonymization of Documents for Law Enforcement with Machine Learning
by: Eberhardinger, Manuel, et al.
Published: (2025)
by: Eberhardinger, Manuel, et al.
Published: (2025)
Classification of Inkjet Printers based on Droplet Statistics
by: Takenaka, Patrick, et al.
Published: (2024)
by: Takenaka, Patrick, et al.
Published: (2024)
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios
by: Loison, António, et al.
Published: (2026)
by: Loison, António, et al.
Published: (2026)
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning
by: Jauch, Christian, et al.
Published: (2023)
by: Jauch, Christian, et al.
Published: (2023)
Enabling Versatile Controls for Video Diffusion Models
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
LoViT: Long Video Transformer for Surgical Phase Recognition
by: Liu, Yang, et al.
Published: (2023)
by: Liu, Yang, et al.
Published: (2023)
ViRED: Prediction of Visual Relations in Engineering Drawings
by: Gu, Chao, et al.
Published: (2024)
by: Gu, Chao, et al.
Published: (2024)
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)
by: Yan, Peizheng, et al.
Published: (2026)
VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
by: Si, Shengyu, et al.
Published: (2026)
by: Si, Shengyu, et al.
Published: (2026)
Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection
by: Aubard, Martin, et al.
Published: (2024)
by: Aubard, Martin, et al.
Published: (2024)
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
by: Wei, Guoyizhe, et al.
Published: (2025)
by: Wei, Guoyizhe, et al.
Published: (2025)
ViPRA: Video Prediction for Robot Actions
by: Routray, Sandeep, et al.
Published: (2025)
by: Routray, Sandeep, et al.
Published: (2025)
An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models
by: Hu, Zizhao, et al.
Published: (2024)
by: Hu, Zizhao, et al.
Published: (2024)
FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts
by: de Avalle, Guillermo Gil, et al.
Published: (2026)
by: de Avalle, Guillermo Gil, et al.
Published: (2026)
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
by: Rawte, Vipula, et al.
Published: (2024)
by: Rawte, Vipula, et al.
Published: (2024)
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
by: Marmon, Andrew, et al.
Published: (2024)
by: Marmon, Andrew, et al.
Published: (2024)
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)
by: Zhang, Mengchen, et al.
Published: (2025)
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
by: Wang, Kai, et al.
Published: (2024)
by: Wang, Kai, et al.
Published: (2024)
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
by: Sun, Jian, et al.
Published: (2023)
by: Sun, Jian, et al.
Published: (2023)
Less is More: Label-Guided Summarization of Procedural and Instructional Videos
by: Rajpal, Shreya, et al.
Published: (2026)
by: Rajpal, Shreya, et al.
Published: (2026)
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
by: Qin, Luozheng, et al.
Published: (2026)
by: Qin, Luozheng, et al.
Published: (2026)
Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence
by: Fu, Danzhen, et al.
Published: (2025)
by: Fu, Danzhen, et al.
Published: (2025)
LoViF 2026 The First Challenge on Weather Removal in Videos
by: Qian, Chenghao, et al.
Published: (2026)
by: Qian, Chenghao, et al.
Published: (2026)
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model
by: Fu, Yongjie, et al.
Published: (2024)
by: Fu, Yongjie, et al.
Published: (2024)
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
by: Kong, Xianghao, et al.
Published: (2025)
by: Kong, Xianghao, et al.
Published: (2025)
EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
by: Ran, Dongchuan, et al.
Published: (2026)
by: Ran, Dongchuan, et al.
Published: (2026)
StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video
by: Li, Ao, et al.
Published: (2026)
by: Li, Ao, et al.
Published: (2026)
SceneX: Procedural Controllable Large-scale Scene Generation
by: Zhou, Mengqi, et al.
Published: (2024)
by: Zhou, Mengqi, et al.
Published: (2024)
ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation
by: Federico, Giulio, et al.
Published: (2025)
by: Federico, Giulio, et al.
Published: (2025)
Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models
by: de Avalle, Guillermo Gil, et al.
Published: (2026)
by: de Avalle, Guillermo Gil, et al.
Published: (2026)
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
by: Kim, Ye-eun, et al.
Published: (2025)
by: Kim, Ye-eun, et al.
Published: (2025)
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
by: Song, Yeon-Ji, et al.
Published: (2024)
by: Song, Yeon-Ji, et al.
Published: (2024)
Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS
by: Gkikas, Stefanos, et al.
Published: (2024)
by: Gkikas, Stefanos, et al.
Published: (2024)
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model
by: Zhan, Ruohao, et al.
Published: (2025)
by: Zhan, Ruohao, et al.
Published: (2025)
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
by: Mou, Tingshu, et al.
Published: (2026)
by: Mou, Tingshu, et al.
Published: (2026)
ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users
by: Yin, Xiangyu, et al.
Published: (2025)
by: Yin, Xiangyu, et al.
Published: (2025)
ProMISe: Promptable Medical Image Segmentation using SAM
by: Wang, Jinfeng, et al.
Published: (2024)
by: Wang, Jinfeng, et al.
Published: (2024)
Similar Items
-
ViPro-2: Unsupervised State Estimation via Integrated Dynamics for Guiding Video Prediction
by: Takenaka, Patrick, et al.
Published: (2025) -
Guiding Video Prediction with Explicit Procedural Knowledge
by: Takenaka, Patrick, et al.
Published: (2024) -
Anonymization of Documents for Law Enforcement with Machine Learning
by: Eberhardinger, Manuel, et al.
Published: (2025) -
Classification of Inkjet Printers based on Droplet Statistics
by: Takenaka, Patrick, et al.
Published: (2024) -
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios
by: Loison, António, et al.
Published: (2026)