Saved in:
| Main Authors: | Izzo, Elena, Parolari, Luca, Vezzaro, Davide, Ballan, Lamberto |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.12919 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension
by: Parolari, Luca, et al.
Published: (2024)
by: Parolari, Luca, et al.
Published: (2024)
Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings
by: Parolari, Luca, et al.
Published: (2026)
by: Parolari, Luca, et al.
Published: (2026)
Towards Polyp Counting In Full-Procedure Colonoscopy Videos
by: Parolari, Luca, et al.
Published: (2025)
by: Parolari, Luca, et al.
Published: (2025)
Temporally-Aware Supervised Contrastive Learning for Polyp Counting in Colonoscopy
by: Parolari, Luca, et al.
Published: (2025)
by: Parolari, Luca, et al.
Published: (2025)
Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos
by: Parolari, Luca, et al.
Published: (2026)
by: Parolari, Luca, et al.
Published: (2026)
PersONAL: Towards a Comprehensive Benchmark for Personalized Embodied Agents
by: Ziliotto, Filippo, et al.
Published: (2025)
by: Ziliotto, Filippo, et al.
Published: (2025)
Multiview Progress Prediction of Robot Activities
by: Zoppellari, Elena, et al.
Published: (2026)
by: Zoppellari, Elena, et al.
Published: (2026)
You Only Landmark Once: Lightweight U-Net Face Super Resolution with YOLO-World Landmark Heatmaps
by: Carraro, Riccardo, et al.
Published: (2026)
by: Carraro, Riccardo, et al.
Published: (2026)
Assessing the Visual Enumeration Abilities of Specialized Counting Architectures and Vision-Language Models
by: Hou, Kuinan, et al.
Published: (2025)
by: Hou, Kuinan, et al.
Published: (2025)
Distilling Knowledge for Short-to-Long Term Trajectory Prediction
by: Das, Sourav, et al.
Published: (2023)
by: Das, Sourav, et al.
Published: (2023)
Following the Human Thread in Social Navigation
by: Scofano, Luca, et al.
Published: (2024)
by: Scofano, Luca, et al.
Published: (2024)
MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation
by: Raychaudhuri, Sonia, et al.
Published: (2025)
by: Raychaudhuri, Sonia, et al.
Published: (2025)
T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation
by: Huang, Kaiyi, et al.
Published: (2023)
by: Huang, Kaiyi, et al.
Published: (2023)
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion
by: Ruan, Jiacheng, et al.
Published: (2024)
by: Ruan, Jiacheng, et al.
Published: (2024)
OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
by: Li, Bingnan, et al.
Published: (2025)
by: Li, Bingnan, et al.
Published: (2025)
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
by: Li, Bohao, et al.
Published: (2024)
by: Li, Bohao, et al.
Published: (2024)
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
by: Sun, Kaiyue, et al.
Published: (2024)
by: Sun, Kaiyue, et al.
Published: (2024)
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)
by: Zheng, Guangcong, et al.
Published: (2023)
GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark
by: Cai, Xiao, et al.
Published: (2024)
by: Cai, Xiao, et al.
Published: (2024)
CompBench: Benchmarking Complex Instruction-guided Image Editing
by: Jia, Bohan, et al.
Published: (2025)
by: Jia, Bohan, et al.
Published: (2025)
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
PAI-Bench: A Comprehensive Benchmark For Physical AI
by: Zhou, Fengzhe, et al.
Published: (2025)
by: Zhou, Fengzhe, et al.
Published: (2025)
PhenoBench: A Comprehensive Benchmark for Cell Phenotyping
by: Winklmayr, Claudia, et al.
Published: (2025)
by: Winklmayr, Claudia, et al.
Published: (2025)
Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training
by: Liu, Changkun, et al.
Published: (2026)
by: Liu, Changkun, et al.
Published: (2026)
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
by: Zhang, Yuhan, et al.
Published: (2025)
by: Zhang, Yuhan, et al.
Published: (2025)
FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
by: Heo, Inbum, et al.
Published: (2025)
by: Heo, Inbum, et al.
Published: (2025)
EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
by: Liu, Shaoyu, et al.
Published: (2025)
by: Liu, Shaoyu, et al.
Published: (2025)
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization
by: Zhuang, Cailin, et al.
Published: (2025)
by: Zhuang, Cailin, et al.
Published: (2025)
Saliency-Bench: A Comprehensive Benchmark for Evaluating Visual Explanations
by: Zhang, Yifei, et al.
Published: (2023)
by: Zhang, Yifei, et al.
Published: (2023)
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
by: Wu, Baoyuan, et al.
Published: (2024)
by: Wu, Baoyuan, et al.
Published: (2024)
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
by: Xu, Mingjie, et al.
Published: (2025)
by: Xu, Mingjie, et al.
Published: (2025)
LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
TextVidBench: A Benchmark for Long Video Scene Text Understanding
by: Zhong, Yangyang, et al.
Published: (2025)
by: Zhong, Yangyang, et al.
Published: (2025)
Layout Agnostic Scene Text Image Synthesis with Diffusion Models
by: Zhangli, Qilong, et al.
Published: (2024)
by: Zhangli, Qilong, et al.
Published: (2024)
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
by: Fang, Chuan, et al.
Published: (2025)
by: Fang, Chuan, et al.
Published: (2025)
CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout
by: Bai, Haotian, et al.
Published: (2023)
by: Bai, Haotian, et al.
Published: (2023)
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
by: Pan, Yulin, et al.
Published: (2025)
by: Pan, Yulin, et al.
Published: (2025)
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Similar Items
-
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension
by: Parolari, Luca, et al.
Published: (2024) -
Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings
by: Parolari, Luca, et al.
Published: (2026) -
Towards Polyp Counting In Full-Procedure Colonoscopy Videos
by: Parolari, Luca, et al.
Published: (2025) -
Temporally-Aware Supervised Contrastive Learning for Polyp Counting in Colonoscopy
by: Parolari, Luca, et al.
Published: (2025) -
Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos
by: Parolari, Luca, et al.
Published: (2026)