:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Ishikawa, Reina, Fujii, Ryo, Saito, Hideo, Hachiuma, Ryo
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2509.03385
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Towards Predicting Any Human Trajectory In Context
di: Fujii, Ryo, et al.
Pubblicazione: (2025)

RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
di: Fujii, Ryo, et al.
Pubblicazione: (2024)

Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos
di: Fujii, Ryo, et al.
Pubblicazione: (2024)

VIOLA: Towards Video In-Context Learning with Minimal Annotations
di: Fujii, Ryo, et al.
Pubblicazione: (2026)

CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
di: Fujii, Ryo, et al.
Pubblicazione: (2024)

Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
di: Hatano, Masashi, et al.
Pubblicazione: (2024)

EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
di: Hatano, Masashi, et al.
Pubblicazione: (2024)

Learning from Synthetic Data via Provenance-Based Input Gradient Guidance
di: Nagano, Koshiro, et al.
Pubblicazione: (2026)

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
di: Fujii, Ryo, et al.
Pubblicazione: (2024)

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
di: Darjana, Nathan, et al.
Pubblicazione: (2025)

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
di: Fujii, Ryo, et al.
Pubblicazione: (2024)

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
di: Hirota, Yusuke, et al.
Pubblicazione: (2024)

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
di: Shen, Xiaoqian, et al.
Pubblicazione: (2025)

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
di: Lee, Byung-Kwan, et al.
Pubblicazione: (2025)

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
di: Liu, Lu, et al.
Pubblicazione: (2024)

Interpretable Debiasing of Vision-Language Models for Social Fairness
di: An, Na Min, et al.
Pubblicazione: (2026)

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
di: Hirota, Yusuke, et al.
Pubblicazione: (2025)

Unified Reinforcement and Imitation Learning for Vision-Language Models
di: Lee, Byung-Kwan, et al.
Pubblicazione: (2025)

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
di: Lee, Byung-Kwan, et al.
Pubblicazione: (2024)

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
di: Yang, Chiao-An, et al.
Pubblicazione: (2025)

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
di: Hirota, Yusuke, et al.
Pubblicazione: (2024)

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
di: Heo, Miran, et al.
Pubblicazione: (2025)

OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
di: Che, Yuchen, et al.
Pubblicazione: (2024)

PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier
di: Wang, Shaomeng, et al.
Pubblicazione: (2025)

Comprehensive Evaluation of Rule-Based, Machine Learning, and Deep Learning in Human Estimation Using Radio Wave Sensing: Accuracy, Spatial Generalization, and Output Granularity Trade-offs
di: Tanaka, Tomoya, et al.
Pubblicazione: (2025)

Autoregressive Universal Video Segmentation Model
di: Heo, Miran, et al.
Pubblicazione: (2025)

AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models
di: Zhou, Yutong, et al.
Pubblicazione: (2024)

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
di: Wang, Weitao, et al.
Pubblicazione: (2024)

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models
di: Chiu, Hsu-kuang, et al.
Pubblicazione: (2025)

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
di: Lu, Yunhong, et al.
Pubblicazione: (2025)

Piggyback Camera: Easy-to-Deploy Visual Surveillance by Mobile Sensing on Commercial Robot Vacuums
di: Yonetani, Ryo
Pubblicazione: (2025)

Profile-Specific 3DMM Regression from a Single Lateral Face Image
di: Kanaya, Taiki, et al.
Pubblicazione: (2026)

Aligning Multimodal LLM with Human Preference: A Survey
di: Yu, Tao, et al.
Pubblicazione: (2025)

Aligning Object Detector Bounding Boxes with Human Preference
di: Strafforello, Ombretta, et al.
Pubblicazione: (2024)

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
di: Shen, Wenhao, et al.
Pubblicazione: (2025)

Benchmarking XAI Explanations with Human-Aligned Evaluations
di: Kazmierczak, Rémi, et al.
Pubblicazione: (2024)

AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation
di: Liang, Chao, et al.
Pubblicazione: (2025)

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
di: Hirota, Yusuke, et al.
Pubblicazione: (2025)

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
di: Zhao, Xiangyu, et al.
Pubblicazione: (2025)

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference
di: Liu, Kendong, et al.
Pubblicazione: (2024)