Guardado en:
| Autores principales: | Taluzzi, Agnese, Gesualdi, Davide, Santambrogio, Riccardo, Plizzari, Chiara, Palermo, Francesca, Mentasti, Simone, Matteucci, Matteo |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2506.08553 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
A Spatio-temporal Graph Network Allowing Incomplete Trajectory Input for Pedestrian Trajectory Prediction
por: Long, Juncen, et al.
Publicado: (2025)
por: Long, Juncen, et al.
Publicado: (2025)
Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation
por: Savazzi, Giacomo, et al.
Publicado: (2025)
por: Savazzi, Giacomo, et al.
Publicado: (2025)
EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge
por: Chen, Zhiwei, et al.
Publicado: (2026)
por: Chen, Zhiwei, et al.
Publicado: (2026)
SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement
por: Guo, Yiran, et al.
Publicado: (2026)
por: Guo, Yiran, et al.
Publicado: (2026)
Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
por: Yang, Sicheng, et al.
Publicado: (2026)
por: Yang, Sicheng, et al.
Publicado: (2026)
Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge
por: Xu, Yinsong, et al.
Publicado: (2026)
por: Xu, Yinsong, et al.
Publicado: (2026)
Advancing Surgical VQA with Scene Graph Knowledge
por: Yuan, Kun, et al.
Publicado: (2023)
por: Yuan, Kun, et al.
Publicado: (2023)
EETnet: a CNN for Gaze Detection and Tracking for Smart-Eyewear
por: Aspesi, Andrea, et al.
Publicado: (2025)
por: Aspesi, Andrea, et al.
Publicado: (2025)
High-frequency near-eye ground truth for event-based eye tracking
por: Simpsi, Andrea, et al.
Publicado: (2025)
por: Simpsi, Andrea, et al.
Publicado: (2025)
More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation
por: Catalano, Nico, et al.
Publicado: (2024)
por: Catalano, Nico, et al.
Publicado: (2024)
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
por: Perrett, Toby, et al.
Publicado: (2025)
por: Perrett, Toby, et al.
Publicado: (2025)
Domain Generalization using Action Sequences for Egocentric Action Recognition
por: Nasirimajd, Amirshayan, et al.
Publicado: (2025)
por: Nasirimajd, Amirshayan, et al.
Publicado: (2025)
SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions
por: Sbrolli, Cristian, et al.
Publicado: (2025)
por: Sbrolli, Cristian, et al.
Publicado: (2025)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
por: Li, Rongjie, et al.
Publicado: (2024)
por: Li, Rongjie, et al.
Publicado: (2024)
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
por: Chiatti, Agnese, et al.
Publicado: (2025)
por: Chiatti, Agnese, et al.
Publicado: (2025)
Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing
por: Lomurno, Eugenio, et al.
Publicado: (2024)
por: Lomurno, Eugenio, et al.
Publicado: (2024)
Trust in Vision-Language Models: Insights from a Participatory User Workshop
por: Chiatti, Agnese, et al.
Publicado: (2025)
por: Chiatti, Agnese, et al.
Publicado: (2025)
No Captions, No Problem: Captionless 3D-CLIP Alignment with Hard Negatives via CLIP Knowledge and LLMs
por: Sbrolli, Cristian, et al.
Publicado: (2024)
por: Sbrolli, Cristian, et al.
Publicado: (2024)
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
por: Jin, Xiaofeng, et al.
Publicado: (2025)
por: Jin, Xiaofeng, et al.
Publicado: (2025)
Act, Think or Abstain: Complexity-Aware Adaptive Inference for Vision-Language-Action Models
por: Izzo, Riccardo Andrea, et al.
Publicado: (2026)
por: Izzo, Riccardo Andrea, et al.
Publicado: (2026)
Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding
por: Longo, Antonello, et al.
Publicado: (2025)
por: Longo, Antonello, et al.
Publicado: (2025)
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
por: Plizzari, Chiara, et al.
Publicado: (2025)
por: Plizzari, Chiara, et al.
Publicado: (2025)
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
por: Plizzari, Chiara, et al.
Publicado: (2024)
por: Plizzari, Chiara, et al.
Publicado: (2024)
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
por: Yuan, Xu, et al.
Publicado: (2025)
por: Yuan, Xu, et al.
Publicado: (2025)
Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation
por: Elghazaly, Gamal, et al.
Publicado: (2025)
por: Elghazaly, Gamal, et al.
Publicado: (2025)
From Pixels to Graphs: Deep Graph-Level Anomaly Detection on Dermoscopic Images
por: Xu, Dehn, et al.
Publicado: (2025)
por: Xu, Dehn, et al.
Publicado: (2025)
Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges
por: Catalano, Nico, et al.
Publicado: (2023)
por: Catalano, Nico, et al.
Publicado: (2023)
Scene Structure Guidance Network: Unfolding Graph Partitioning into Pixel-Wise Feature Learning
por: Shin, Jisu, et al.
Publicado: (2023)
por: Shin, Jisu, et al.
Publicado: (2023)
Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks
por: Lomurno, Eugenio, et al.
Publicado: (2024)
por: Lomurno, Eugenio, et al.
Publicado: (2024)
UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition
por: Ghilotti, Filippo, et al.
Publicado: (2026)
por: Ghilotti, Filippo, et al.
Publicado: (2026)
SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
por: Drago, Mauro Orazio, et al.
Publicado: (2025)
por: Drago, Mauro Orazio, et al.
Publicado: (2025)
Knowledge Condensation and Reasoning for Knowledge-based VQA
por: Hao, Dongze, et al.
Publicado: (2024)
por: Hao, Dongze, et al.
Publicado: (2024)
SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs
por: Sun, Zhigang, et al.
Publicado: (2024)
por: Sun, Zhigang, et al.
Publicado: (2024)
Stable Diffusion Dataset Generation for Downstream Classification Tasks
por: Lomurno, Eugenio, et al.
Publicado: (2024)
por: Lomurno, Eugenio, et al.
Publicado: (2024)
The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry
por: Cudrano, Paolo, et al.
Publicado: (2024)
por: Cudrano, Paolo, et al.
Publicado: (2024)
Multiview Scene Graph
por: Zhang, Juexiao, et al.
Publicado: (2024)
por: Zhang, Juexiao, et al.
Publicado: (2024)
Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?
por: Sbrolli, Cristian, et al.
Publicado: (2024)
por: Sbrolli, Cristian, et al.
Publicado: (2024)
Auto-Comp: An Automated Pipeline for Scalable Compositional Probing of Contrastive Vision-Language Models
por: Sbrolli, Cristian, et al.
Publicado: (2026)
por: Sbrolli, Cristian, et al.
Publicado: (2026)
GHR-VQA: Graph-guided Hierarchical Relational Reasoning for Video Question Answering
por: Brilli, Dionysia Danai, et al.
Publicado: (2025)
por: Brilli, Dionysia Danai, et al.
Publicado: (2025)
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
por: Jiang, Bowen, et al.
Publicado: (2023)
por: Jiang, Bowen, et al.
Publicado: (2023)
Ejemplares similares
-
A Spatio-temporal Graph Network Allowing Incomplete Trajectory Input for Pedestrian Trajectory Prediction
por: Long, Juncen, et al.
Publicado: (2025) -
Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation
por: Savazzi, Giacomo, et al.
Publicado: (2025) -
EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge
por: Chen, Zhiwei, et al.
Publicado: (2026) -
SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement
por: Guo, Yiran, et al.
Publicado: (2026) -
Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
por: Yang, Sicheng, et al.
Publicado: (2026)