:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bi, Jing, Xu, Chenliang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.02997
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EAGLE: Egocentric AGgregated Language-video Engine
by: Bi, Jing, et al.
Published: (2024)

OSCaR: Object State Captioning and State Change Representation
by: Nguyen, Nguyen, et al.
Published: (2024)

MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
by: Ye, Hanrong, et al.
Published: (2024)

3D-Aware Instance Segmentation and Tracking in Egocentric Videos
by: Bhalgat, Yash, et al.
Published: (2024)

On Memorization in Diffusion Models
by: Gu, Xiangming, et al.
Published: (2023)

EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms
by: VanVoorst, Brian, et al.
Published: (2026)

Whole-Body Conditioned Egocentric Video Prediction
by: Bai, Yutong, et al.
Published: (2025)

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
by: Darjana, Nathan, et al.
Published: (2025)

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation
by: Zhou, Chenliang, et al.
Published: (2022)

Advancing Egocentric Video Question Answering with Multimodal Large Language Models
by: Patel, Alkesh, et al.
Published: (2025)

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
by: Yang, Ruihan, et al.
Published: (2025)

What Happens Next? Anticipating Future Motion by Generating Point Trajectories
by: Boduljak, Gabrijel, et al.
Published: (2025)

TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
by: Qu, Leigang, et al.
Published: (2025)

Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2026)

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
by: Lin, Dongyan, et al.
Published: (2026)

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition
by: Chen, Baiyu, et al.
Published: (2025)

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2025)

Captured by Captions: On Memorization and its Mitigation in CLIP Models
by: Wang, Wenhao, et al.
Published: (2025)

Memorization In Stable Diffusion Is Unexpectedly Driven by CLIP Embeddings
by: Kim, Bumjun, et al.
Published: (2026)

X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation Learning
by: Wang, Yunzhe, et al.
Published: (2025)

Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion
by: Karnik, Sathwik, et al.
Published: (2026)

Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
by: Asthana, Rohan, et al.
Published: (2026)

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
by: Jain, Anubhav, et al.
Published: (2024)

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
by: Tang, Yolo Yunlong, et al.
Published: (2024)

Impact of Layer Norm on Memorization and Generalization in Transformers
by: Singhal, Rishi, et al.
Published: (2025)

Generative Models: What Do They Know? Do They Know Things? Let's Find Out!
by: Du, Xiaodan, et al.
Published: (2023)

Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted
by: Chavhan, Ruchika, et al.
Published: (2024)

Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
by: Safaei, Bardia, et al.
Published: (2025)

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
by: Chen, Lu, et al.
Published: (2025)

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
by: Chu, Tianzhe, et al.
Published: (2025)

Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs
by: Lou, Siyu, et al.
Published: (2024)

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
by: Chowdhury, Sanjoy, et al.
Published: (2025)

Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
by: Zhang, Mellon M., et al.
Published: (2025)

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
by: Hua, Hang, et al.
Published: (2024)

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
by: Zhang, Yiyuan, et al.
Published: (2024)

Efficient Pre-training for Localized Instruction Generation of Videos
by: Batra, Anil, et al.
Published: (2023)

We Should Separate Memorization from Copyright
by: Haviv, Adi, et al.
Published: (2026)

3D Hand Pose Estimation in Everyday Egocentric Images
by: Prakash, Aditya, et al.
Published: (2023)