Saved in:
| Main Authors: | Bi, Jing, Xu, Chenliang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.02997 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EAGLE: Egocentric AGgregated Language-video Engine
by: Bi, Jing, et al.
Published: (2024)
by: Bi, Jing, et al.
Published: (2024)
OSCaR: Object State Captioning and State Change Representation
by: Nguyen, Nguyen, et al.
Published: (2024)
by: Nguyen, Nguyen, et al.
Published: (2024)
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
by: Ye, Hanrong, et al.
Published: (2024)
by: Ye, Hanrong, et al.
Published: (2024)
3D-Aware Instance Segmentation and Tracking in Egocentric Videos
by: Bhalgat, Yash, et al.
Published: (2024)
by: Bhalgat, Yash, et al.
Published: (2024)
On Memorization in Diffusion Models
by: Gu, Xiangming, et al.
Published: (2023)
by: Gu, Xiangming, et al.
Published: (2023)
EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms
by: VanVoorst, Brian, et al.
Published: (2026)
by: VanVoorst, Brian, et al.
Published: (2026)
Whole-Body Conditioned Egocentric Video Prediction
by: Bai, Yutong, et al.
Published: (2025)
by: Bai, Yutong, et al.
Published: (2025)
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)
by: Fujii, Ryo, et al.
Published: (2024)
EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
by: Darjana, Nathan, et al.
Published: (2025)
by: Darjana, Nathan, et al.
Published: (2025)
CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation
by: Zhou, Chenliang, et al.
Published: (2022)
by: Zhou, Chenliang, et al.
Published: (2022)
Advancing Egocentric Video Question Answering with Multimodal Large Language Models
by: Patel, Alkesh, et al.
Published: (2025)
by: Patel, Alkesh, et al.
Published: (2025)
EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)
by: Fujii, Ryo, et al.
Published: (2024)
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
by: Boduljak, Gabrijel, et al.
Published: (2025)
by: Boduljak, Gabrijel, et al.
Published: (2025)
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
by: Qu, Leigang, et al.
Published: (2025)
by: Qu, Leigang, et al.
Published: (2025)
Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2026)
by: Han, Boyu, et al.
Published: (2026)
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
by: Lin, Dongyan, et al.
Published: (2026)
by: Lin, Dongyan, et al.
Published: (2026)
COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition
by: Chen, Baiyu, et al.
Published: (2025)
by: Chen, Baiyu, et al.
Published: (2025)
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2025)
by: Han, Boyu, et al.
Published: (2025)
Captured by Captions: On Memorization and its Mitigation in CLIP Models
by: Wang, Wenhao, et al.
Published: (2025)
by: Wang, Wenhao, et al.
Published: (2025)
Memorization In Stable Diffusion Is Unexpectedly Driven by CLIP Embeddings
by: Kim, Bumjun, et al.
Published: (2026)
by: Kim, Bumjun, et al.
Published: (2026)
X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation Learning
by: Wang, Yunzhe, et al.
Published: (2025)
by: Wang, Yunzhe, et al.
Published: (2025)
Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion
by: Karnik, Sathwik, et al.
Published: (2026)
by: Karnik, Sathwik, et al.
Published: (2026)
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
by: Asthana, Rohan, et al.
Published: (2026)
by: Asthana, Rohan, et al.
Published: (2026)
Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
by: Jain, Anubhav, et al.
Published: (2024)
by: Jain, Anubhav, et al.
Published: (2024)
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
by: Tang, Yolo Yunlong, et al.
Published: (2024)
by: Tang, Yolo Yunlong, et al.
Published: (2024)
Impact of Layer Norm on Memorization and Generalization in Transformers
by: Singhal, Rishi, et al.
Published: (2025)
by: Singhal, Rishi, et al.
Published: (2025)
Generative Models: What Do They Know? Do They Know Things? Let's Find Out!
by: Du, Xiaodan, et al.
Published: (2023)
by: Du, Xiaodan, et al.
Published: (2023)
Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted
by: Chavhan, Ruchika, et al.
Published: (2024)
by: Chavhan, Ruchika, et al.
Published: (2024)
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
by: Safaei, Bardia, et al.
Published: (2025)
by: Safaei, Bardia, et al.
Published: (2025)
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
by: Chen, Lu, et al.
Published: (2025)
by: Chen, Lu, et al.
Published: (2025)
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
by: Chu, Tianzhe, et al.
Published: (2025)
by: Chu, Tianzhe, et al.
Published: (2025)
Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs
by: Lou, Siyu, et al.
Published: (2024)
by: Lou, Siyu, et al.
Published: (2024)
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
by: Chowdhury, Sanjoy, et al.
Published: (2025)
by: Chowdhury, Sanjoy, et al.
Published: (2025)
Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
by: Zhang, Mellon M., et al.
Published: (2025)
by: Zhang, Mellon M., et al.
Published: (2025)
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
by: Hua, Hang, et al.
Published: (2024)
by: Hua, Hang, et al.
Published: (2024)
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
by: Zhang, Yiyuan, et al.
Published: (2024)
by: Zhang, Yiyuan, et al.
Published: (2024)
Efficient Pre-training for Localized Instruction Generation of Videos
by: Batra, Anil, et al.
Published: (2023)
by: Batra, Anil, et al.
Published: (2023)
We Should Separate Memorization from Copyright
by: Haviv, Adi, et al.
Published: (2026)
by: Haviv, Adi, et al.
Published: (2026)
3D Hand Pose Estimation in Everyday Egocentric Images
by: Prakash, Aditya, et al.
Published: (2023)
by: Prakash, Aditya, et al.
Published: (2023)
Similar Items
-
EAGLE: Egocentric AGgregated Language-video Engine
by: Bi, Jing, et al.
Published: (2024) -
OSCaR: Object State Captioning and State Change Representation
by: Nguyen, Nguyen, et al.
Published: (2024) -
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
by: Ye, Hanrong, et al.
Published: (2024) -
3D-Aware Instance Segmentation and Tracking in Egocentric Videos
by: Bhalgat, Yash, et al.
Published: (2024) -
On Memorization in Diffusion Models
by: Gu, Xiangming, et al.
Published: (2023)