Saved in:
| Main Authors: | Yun, Heeseung, Na, Joonil, Kim, Jaeyeon, Murdock, Calvin, Kim, Gunhee |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.18470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
by: Yun, Heeseung, et al.
Published: (2024)
by: Yun, Heeseung, et al.
Published: (2024)
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
by: Ahn, Jaewoo, et al.
Published: (2025)
by: Ahn, Jaewoo, et al.
Published: (2025)
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
by: Ahn, Jaewoo, et al.
Published: (2025)
by: Ahn, Jaewoo, et al.
Published: (2025)
Improving Cone-Beam CT Image Quality with Knowledge Distillation-Enhanced Diffusion Model in Imbalanced Data Settings
by: Hwang, Joonil, et al.
Published: (2024)
by: Hwang, Joonil, et al.
Published: (2024)
Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting
by: Koo, Junseo, et al.
Published: (2025)
by: Koo, Junseo, et al.
Published: (2025)
MAVIS: A Benchmark for Multimodal Source Attribution in Long-form Visual Question Answering
by: Song, Seokwon, et al.
Published: (2025)
by: Song, Seokwon, et al.
Published: (2025)
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
by: Lai, Bolin, et al.
Published: (2023)
by: Lai, Bolin, et al.
Published: (2023)
Bi-directional Contextual Attention for 3D Dense Captioning
by: Kim, Minjung, et al.
Published: (2024)
by: Kim, Minjung, et al.
Published: (2024)
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
by: Park, Eunkyu, et al.
Published: (2025)
by: Park, Eunkyu, et al.
Published: (2025)
EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting
by: Choi, Jaeyoung, et al.
Published: (2026)
by: Choi, Jaeyoung, et al.
Published: (2026)
ViSAGe: Video-to-Spatial Audio Generation
by: Kim, Jaeyeon, et al.
Published: (2025)
by: Kim, Jaeyeon, et al.
Published: (2025)
See It All: Contextualized Late Aggregation for 3D Dense Captioning
by: Kim, Minjung, et al.
Published: (2024)
by: Kim, Minjung, et al.
Published: (2024)
Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents
by: Baik, Sangwon, et al.
Published: (2026)
by: Baik, Sangwon, et al.
Published: (2026)
Egocentric Gaze Estimation via Neck-Mounted Camera
by: Huang, Haoyu, et al.
Published: (2026)
by: Huang, Haoyu, et al.
Published: (2026)
ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation
by: Li, Jia, et al.
Published: (2026)
by: Li, Jia, et al.
Published: (2026)
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
by: Kim, Chris Dongjoo, et al.
Published: (2025)
by: Kim, Chris Dongjoo, et al.
Published: (2025)
FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
by: Yun, Junhyeog, et al.
Published: (2025)
by: Yun, Junhyeog, et al.
Published: (2025)
EgoCampus: Egocentric Pedestrian Eye Gaze Model and Dataset
by: John, Ronan, et al.
Published: (2025)
by: John, Ronan, et al.
Published: (2025)
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation
by: Lai, Bolin, et al.
Published: (2022)
by: Lai, Bolin, et al.
Published: (2022)
Can Language Models Laugh at YouTube Short-form Videos?
by: Ko, Dayoon, et al.
Published: (2023)
by: Ko, Dayoon, et al.
Published: (2023)
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
by: Mazzamuto, Michele, et al.
Published: (2024)
by: Mazzamuto, Michele, et al.
Published: (2024)
Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
by: Pani, Anupam, et al.
Published: (2025)
by: Pani, Anupam, et al.
Published: (2025)
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
by: Shin, Chaehun, et al.
Published: (2024)
by: Shin, Chaehun, et al.
Published: (2024)
Exploring High-Order Self-Similarity for Video Understanding
by: Kim, Manjin, et al.
Published: (2026)
by: Kim, Manjin, et al.
Published: (2026)
GazeMotion: Gaze-guided Human Motion Forecasting
by: Hu, Zhiming, et al.
Published: (2024)
by: Hu, Zhiming, et al.
Published: (2024)
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
by: Chowdhury, Sanjoy, et al.
Published: (2025)
by: Chowdhury, Sanjoy, et al.
Published: (2025)
SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting
by: Liu, Ruicong, et al.
Published: (2025)
by: Liu, Ruicong, et al.
Published: (2025)
ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images
by: Jeong, Jinseo, et al.
Published: (2024)
by: Jeong, Jinseo, et al.
Published: (2024)
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
by: Peng, Taiying, et al.
Published: (2025)
by: Peng, Taiying, et al.
Published: (2025)
Personalized Federated Learning for Egocentric Video Gaze Estimation with Comprehensive Parameter Frezzing
by: Feng, Yuhu, et al.
Published: (2025)
by: Feng, Yuhu, et al.
Published: (2025)
Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks
by: He, Yufei, et al.
Published: (2025)
by: He, Yufei, et al.
Published: (2025)
HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization
by: Chang, Joohyun, et al.
Published: (2025)
by: Chang, Joohyun, et al.
Published: (2025)
ChartCap: Mitigating Hallucination of Dense Chart Captioning
by: Lim, Junyoung, et al.
Published: (2025)
by: Lim, Junyoung, et al.
Published: (2025)
GazeShift: Unsupervised Gaze Estimation and Dataset for VR
by: Shapira, Gil, et al.
Published: (2026)
by: Shapira, Gil, et al.
Published: (2026)
Eyes on Target: Gaze-Aware Object Detection in Egocentric Video
by: Lall, Vishakha, et al.
Published: (2025)
by: Lall, Vishakha, et al.
Published: (2025)
Causal Representation-Based Domain Generalization on Gaze Estimation
by: Kim, Younghan, et al.
Published: (2024)
by: Kim, Younghan, et al.
Published: (2024)
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization
by: Oh, Yeongtak, et al.
Published: (2026)
by: Oh, Yeongtak, et al.
Published: (2026)
OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
by: Lee, Suyoung, et al.
Published: (2024)
by: Lee, Suyoung, et al.
Published: (2024)
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
by: Lee, Jewon, et al.
Published: (2025)
by: Lee, Jewon, et al.
Published: (2025)
LPOI: Listwise Preference Optimization for Vision Language Models
by: Zadeh, Fatemeh Pesaran, et al.
Published: (2025)
by: Zadeh, Fatemeh Pesaran, et al.
Published: (2025)
Similar Items
-
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
by: Yun, Heeseung, et al.
Published: (2024) -
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
by: Ahn, Jaewoo, et al.
Published: (2025) -
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
by: Ahn, Jaewoo, et al.
Published: (2025) -
Improving Cone-Beam CT Image Quality with Knowledge Distillation-Enhanced Diffusion Model in Imbalanced Data Settings
by: Hwang, Joonil, et al.
Published: (2024) -
Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting
by: Koo, Junseo, et al.
Published: (2025)