:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Taluzzi, Agnese, Gesualdi, Davide, Santambrogio, Riccardo, Plizzari, Chiara, Palermo, Francesca, Mentasti, Simone, Matteucci, Matteo
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2506.08553
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

A Spatio-temporal Graph Network Allowing Incomplete Trajectory Input for Pedestrian Trajectory Prediction
por: Long, Juncen, et al.
Publicado: (2025)

Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation
por: Savazzi, Giacomo, et al.
Publicado: (2025)

EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge
por: Chen, Zhiwei, et al.
Publicado: (2026)

SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement
por: Guo, Yiran, et al.
Publicado: (2026)

Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
por: Yang, Sicheng, et al.
Publicado: (2026)

Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Challenge
por: Xu, Yinsong, et al.
Publicado: (2026)

Advancing Surgical VQA with Scene Graph Knowledge
por: Yuan, Kun, et al.
Publicado: (2023)

EETnet: a CNN for Gaze Detection and Tracking for Smart-Eyewear
por: Aspesi, Andrea, et al.
Publicado: (2025)

High-frequency near-eye ground truth for event-based eye tracking
por: Simpsi, Andrea, et al.
Publicado: (2025)

More than the Sum of Its Parts: Ensembling Backbone Networks for Few-Shot Segmentation
por: Catalano, Nico, et al.
Publicado: (2024)

HD-EPIC: A Highly-Detailed Egocentric Video Dataset
por: Perrett, Toby, et al.
Publicado: (2025)

Domain Generalization using Action Sequences for Egocentric Action Recognition
por: Nasirimajd, Amirshayan, et al.
Publicado: (2025)

SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions
por: Sbrolli, Cristian, et al.
Publicado: (2025)

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
por: Li, Rongjie, et al.
Publicado: (2024)

Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
por: Chiatti, Agnese, et al.
Publicado: (2025)

Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing
por: Lomurno, Eugenio, et al.
Publicado: (2024)

Trust in Vision-Language Models: Insights from a Participatory User Workshop
por: Chiatti, Agnese, et al.
Publicado: (2025)

No Captions, No Problem: Captionless 3D-CLIP Alignment with Hard Negatives via CLIP Knowledge and LLMs
por: Sbrolli, Cristian, et al.
Publicado: (2024)

OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
por: Jin, Xiaofeng, et al.
Publicado: (2025)

Act, Think or Abstain: Complexity-Aware Adaptive Inference for Vision-Language-Action Models
por: Izzo, Riccardo Andrea, et al.
Publicado: (2026)

Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding
por: Longo, Antonello, et al.
Publicado: (2025)

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
por: Plizzari, Chiara, et al.
Publicado: (2025)

Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
por: Plizzari, Chiara, et al.
Publicado: (2024)

mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
por: Yuan, Xu, et al.
Publicado: (2025)

Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation
por: Elghazaly, Gamal, et al.
Publicado: (2025)

From Pixels to Graphs: Deep Graph-Level Anomaly Detection on Dermoscopic Images
por: Xu, Dehn, et al.
Publicado: (2025)

Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges
por: Catalano, Nico, et al.
Publicado: (2023)

Scene Structure Guidance Network: Unfolding Graph Partitioning into Pixel-Wise Feature Learning
por: Shin, Jisu, et al.
Publicado: (2023)

Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks
por: Lomurno, Eugenio, et al.
Publicado: (2024)

UniLiPs: Unified LiDAR Pseudo-Labeling with Geometry-Grounded Dynamic Scene Decomposition
por: Ghilotti, Filippo, et al.
Publicado: (2026)

SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
por: Drago, Mauro Orazio, et al.
Publicado: (2025)

Knowledge Condensation and Reasoning for Knowledge-based VQA
por: Hao, Dongze, et al.
Publicado: (2024)

SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs
por: Sun, Zhigang, et al.
Publicado: (2024)

Stable Diffusion Dataset Generation for Downstream Classification Tasks
por: Lomurno, Eugenio, et al.
Publicado: (2024)

The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry
por: Cudrano, Paolo, et al.
Publicado: (2024)

Multiview Scene Graph
por: Zhang, Juexiao, et al.
Publicado: (2024)

Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?
por: Sbrolli, Cristian, et al.
Publicado: (2024)

Auto-Comp: An Automated Pipeline for Scalable Compositional Probing of Contrastive Vision-Language Models
por: Sbrolli, Cristian, et al.
Publicado: (2026)

GHR-VQA: Graph-guided Hierarchical Relational Reasoning for Video Question Answering
por: Brilli, Dionysia Danai, et al.
Publicado: (2025)

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
por: Jiang, Bowen, et al.
Publicado: (2023)