:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yasunaga, Ayaka, Saito, Hideo, Mori, Shohei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.21009
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

IntelliCap: Intelligent Guidance for Consistent View Sampling
by: Yasunaga, Ayaka, et al.
Published: (2025)

Dense Depth from Event Focal Stack
by: Horikawa, Kenta, et al.
Published: (2024)

High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights
by: Kato, Yuna, et al.
Published: (2025)

Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery
by: Kato, Yuna, et al.
Published: (2025)

Profile-Specific 3DMM Regression from a Single Lateral Face Image
by: Kanaya, Taiki, et al.
Published: (2026)

Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation
by: Enomoto, Shohei
Published: (2025)

Hand Held Multi-Object Tracking Dataset in American Football
by: Otsubo, Rintaro, et al.
Published: (2025)

RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
by: Fujii, Ryo, et al.
Published: (2024)

EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
by: Hatano, Masashi, et al.
Published: (2024)

RatBodyFormer: Rat Body Surface from Keypoints
by: Higami, Ayaka, et al.
Published: (2024)

The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation
by: Hatano, Masashi, et al.
Published: (2025)

Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams
by: Nakabayashi, Takuya, et al.
Published: (2025)

Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
by: Ishikawa, Reina, et al.
Published: (2025)

Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
by: Hatano, Masashi, et al.
Published: (2024)

E2GS: Event Enhanced Gaussian Splatting
by: Deguchi, Hiroyuki, et al.
Published: (2024)

Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

VIOLA: Towards Video In-Context Learning with Minimal Annotations
by: Fujii, Ryo, et al.
Published: (2026)

Déjà View: Looping Transformers for Multi-View 3D Reconstruction
by: Burzio, Alessandro, et al.
Published: (2026)

LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting
by: Bao, Zhenyu, et al.
Published: (2024)

SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
by: Shinoda, Risa, et al.
Published: (2024)

LoopViT: Scaling Visual ARC with Looped Transformers
by: Shu, Wen-Jie, et al.
Published: (2026)

CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
by: Fujii, Ryo, et al.
Published: (2024)

V-Loop: Visual Logical Loop Verification for Hallucination Detection in Medical Visual Question Answering
by: Jin, Mengyuan, et al.
Published: (2026)

Towards Predicting Any Human Trajectory In Context
by: Fujii, Ryo, et al.
Published: (2025)

Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering
by: Mori, Erika, et al.
Published: (2025)

View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
by: Zhang, Qi, et al.
Published: (2024)

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2026)

Interactive Garment Recommendation with User in the Loop
by: Becattini, Federico, et al.
Published: (2024)

Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video
by: Deguchi, Hiroyuki, et al.
Published: (2026)

Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
by: Yasunaga, Michihiro, et al.
Published: (2025)

ELT: Elastic Looped Transformers for Visual Generation
by: Goyal, Sahil, et al.
Published: (2026)

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription
by: Zhao, Hongxiang, et al.
Published: (2024)

Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach
by: Hatano, Masashi, et al.
Published: (2025)

Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization
by: Enomoto, Shohei
Published: (2025)

A Robust Error-Resistant View Selection Method for 3D Reconstruction
by: Zhang, Shaojie, et al.
Published: (2024)

Human-in-the-Loop Visual Re-ID for Population Size Estimation
by: Perez, Gustavo, et al.
Published: (2023)

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
by: Darjana, Nathan, et al.
Published: (2025)

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)