:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Cesista, Franz Louis
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2406.11403
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
by: Liang, Hao, et al.
Published: (2025)

Technique Report of CVPR 2024 PBDL Challenges
by: Fu, Ying, et al.
Published: (2024)

ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
by: Liang, Tianming, et al.
Published: (2025)

Technical Report for CVPR 2024 WeatherProof Dataset Challenge: Semantic Segmentation on Paired Real Data
by: Cao, Guojin, et al.
Published: (2024)

DIVE: Deep-search Iterative Video Exploration A Technical Report for the CVRR Challenge at CVPR 2025
by: Kamoto, Umihiro, et al.
Published: (2025)

The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge
by: Peng, Jinghan, et al.
Published: (2025)

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
by: Ma, Guoqing, et al.
Published: (2025)

Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge
by: Deng, Tianchen, et al.
Published: (2024)

Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge
by: Liang, Hanfang, et al.
Published: (2024)

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
by: Chen, Wei, et al.
Published: (2024)

Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024)

Qwen2.5-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)

VARCO-VISION-2.0 Technical Report
by: Cha, Young-rok, et al.
Published: (2025)

Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
by: Jeong, Kiyoon, et al.
Published: (2024)

Arctic-Extract Technical Report
by: Chiliński, Mateusz, et al.
Published: (2025)

Privacy-Aware Camera 2.0 Technical Report
by: Song, Huan, et al.
Published: (2026)

Mano Technical Report
by: Fu, Tianyu, et al.
Published: (2025)

Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings
by: Yousefi, Mojtaba, et al.
Published: (2024)

2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model
by: Guo, Zilong, et al.
Published: (2025)

2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation
by: Xu, Zhensong, et al.
Published: (2024)

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text
by: Rahman, Mizanur, et al.
Published: (2025)

Technical Report for the 5th CLVision Challenge at CVPR: Addressing the Class-Incremental with Repetition using Unlabeled Data -- 4th Place Solution
by: Moraiti, Panagiota, et al.
Published: (2025)

Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)

Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025)

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)

MedGemma Technical Report
by: Sellergren, Andrew, et al.
Published: (2025)

MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
by: Yang, Zhongyu, et al.
Published: (2024)

Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation
by: Jamil, Sofia, et al.
Published: (2025)

Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)

The Solution for the CVPR2024 NICE Image Captioning Challenge
by: Huang, Longfei, et al.
Published: (2024)

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
by: Cao, Bin, et al.
Published: (2024)

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
by: Xiang, Sike, et al.
Published: (2026)

Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)

MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models
by: Sharma, Harshita, et al.
Published: (2024)

MiMo-Embodied: X-Embodied Foundation Model Technical Report
by: Hao, Xiaoshuai, et al.
Published: (2025)

Qwen2.5-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)

Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)

Pegasus-v1 Technical Report
by: Jung, Raehyuk, et al.
Published: (2024)

Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025)

The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge
by: Pan, Hongpeng, et al.
Published: (2024)