:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yun, Heeseung, Na, Joonil, Kim, Jaeyeon, Murdock, Calvin, Kim, Gunhee
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.18470
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
by: Yun, Heeseung, et al.
Published: (2024)

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
by: Ahn, Jaewoo, et al.
Published: (2025)

FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
by: Ahn, Jaewoo, et al.
Published: (2025)

Improving Cone-Beam CT Image Quality with Knowledge Distillation-Enhanced Diffusion Model in Imbalanced Data Settings
by: Hwang, Joonil, et al.
Published: (2024)

Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting
by: Koo, Junseo, et al.
Published: (2025)

MAVIS: A Benchmark for Multimodal Source Attribution in Long-form Visual Question Answering
by: Song, Seokwon, et al.
Published: (2025)

Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
by: Lai, Bolin, et al.
Published: (2023)

Bi-directional Contextual Attention for 3D Dense Captioning
by: Kim, Minjung, et al.
Published: (2024)

HalLoc: Token-level Localization of Hallucinations for Vision Language Models
by: Park, Eunkyu, et al.
Published: (2025)

EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting
by: Choi, Jaeyoung, et al.
Published: (2026)

ViSAGe: Video-to-Spatial Audio Generation
by: Kim, Jaeyeon, et al.
Published: (2025)

See It All: Contextualized Late Aggregation for 3D Dense Captioning
by: Kim, Minjung, et al.
Published: (2024)

Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents
by: Baik, Sangwon, et al.
Published: (2026)

Egocentric Gaze Estimation via Neck-Mounted Camera
by: Huang, Haoyu, et al.
Published: (2026)

ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation
by: Li, Jia, et al.
Published: (2026)

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
by: Kim, Chris Dongjoo, et al.
Published: (2025)

FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
by: Yun, Junhyeog, et al.
Published: (2025)

EgoCampus: Egocentric Pedestrian Eye Gaze Model and Dataset
by: John, Ronan, et al.
Published: (2025)

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation
by: Lai, Bolin, et al.
Published: (2022)

Can Language Models Laugh at YouTube Short-form Videos?
by: Ko, Dayoon, et al.
Published: (2023)

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
by: Mazzamuto, Michele, et al.
Published: (2024)

Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
by: Pani, Anupam, et al.
Published: (2025)

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
by: Shin, Chaehun, et al.
Published: (2024)

Exploring High-Order Self-Similarity for Video Understanding
by: Kim, Manjin, et al.
Published: (2026)

GazeMotion: Gaze-guided Human Motion Forecasting
by: Hu, Zhiming, et al.
Published: (2024)

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
by: Chowdhury, Sanjoy, et al.
Published: (2025)

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting
by: Liu, Ruicong, et al.
Published: (2025)

ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images
by: Jeong, Jinseo, et al.
Published: (2024)

In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
by: Peng, Taiying, et al.
Published: (2025)

Personalized Federated Learning for Egocentric Video Gaze Estimation with Comprehensive Parameter Frezzing
by: Feng, Yuhu, et al.
Published: (2025)

Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
by: He, Yufei, et al.
Published: (2025)

HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization
by: Chang, Joohyun, et al.
Published: (2025)

ChartCap: Mitigating Hallucination of Dense Chart Captioning
by: Lim, Junyoung, et al.
Published: (2025)

GazeShift: Unsupervised Gaze Estimation and Dataset for VR
by: Shapira, Gil, et al.
Published: (2026)

Eyes on Target: Gaze-Aware Object Detection in Egocentric Video
by: Lall, Vishakha, et al.
Published: (2025)

Causal Representation-Based Domain Generalization on Gaze Estimation
by: Kim, Younghan, et al.
Published: (2024)

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization
by: Oh, Yeongtak, et al.
Published: (2026)

OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
by: Lee, Suyoung, et al.
Published: (2024)

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
by: Lee, Jewon, et al.
Published: (2025)

LPOI: Listwise Preference Optimization for Vision Language Models
by: Zadeh, Fatemeh Pesaran, et al.
Published: (2025)