Saved in:
| Main Author: | Cesista, Franz Louis |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.11403 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
by: Liang, Hao, et al.
Published: (2025)
by: Liang, Hao, et al.
Published: (2025)
Technique Report of CVPR 2024 PBDL Challenges
by: Fu, Ying, et al.
Published: (2024)
by: Fu, Ying, et al.
Published: (2024)
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
by: Liang, Tianming, et al.
Published: (2025)
by: Liang, Tianming, et al.
Published: (2025)
Technical Report for CVPR 2024 WeatherProof Dataset Challenge: Semantic Segmentation on Paired Real Data
by: Cao, Guojin, et al.
Published: (2024)
by: Cao, Guojin, et al.
Published: (2024)
DIVE: Deep-search Iterative Video Exploration A Technical Report for the CVRR Challenge at CVPR 2025
by: Kamoto, Umihiro, et al.
Published: (2025)
by: Kamoto, Umihiro, et al.
Published: (2025)
The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge
by: Peng, Jinghan, et al.
Published: (2025)
by: Peng, Jinghan, et al.
Published: (2025)
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
by: Ma, Guoqing, et al.
Published: (2025)
by: Ma, Guoqing, et al.
Published: (2025)
Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge
by: Deng, Tianchen, et al.
Published: (2024)
by: Deng, Tianchen, et al.
Published: (2024)
Separating Drone Point Clouds From Complex Backgrounds by Cluster Filter -- Technical Report for CVPR 2024 UG2 Challenge
by: Liang, Hanfang, et al.
Published: (2024)
by: Liang, Hanfang, et al.
Published: (2024)
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024)
by: Malartic, Quentin, et al.
Published: (2024)
Qwen2.5-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)
by: Bai, Shuai, et al.
Published: (2025)
VARCO-VISION-2.0 Technical Report
by: Cha, Young-rok, et al.
Published: (2025)
by: Cha, Young-rok, et al.
Published: (2025)
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
by: Jeong, Kiyoon, et al.
Published: (2024)
by: Jeong, Kiyoon, et al.
Published: (2024)
Arctic-Extract Technical Report
by: Chiliński, Mateusz, et al.
Published: (2025)
by: Chiliński, Mateusz, et al.
Published: (2025)
Privacy-Aware Camera 2.0 Technical Report
by: Song, Huan, et al.
Published: (2026)
by: Song, Huan, et al.
Published: (2026)
Mano Technical Report
by: Fu, Tianyu, et al.
Published: (2025)
by: Fu, Tianyu, et al.
Published: (2025)
Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings
by: Yousefi, Mojtaba, et al.
Published: (2024)
by: Yousefi, Mojtaba, et al.
Published: (2024)
2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model
by: Guo, Zilong, et al.
Published: (2025)
by: Guo, Zilong, et al.
Published: (2025)
2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation
by: Xu, Zhensong, et al.
Published: (2024)
by: Xu, Zhensong, et al.
Published: (2024)
Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text
by: Rahman, Mizanur, et al.
Published: (2025)
by: Rahman, Mizanur, et al.
Published: (2025)
Technical Report for the 5th CLVision Challenge at CVPR: Addressing the Class-Incremental with Repetition using Unlabeled Data -- 4th Place Solution
by: Moraiti, Panagiota, et al.
Published: (2025)
by: Moraiti, Panagiota, et al.
Published: (2025)
Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)
by: Auer, Christoph, et al.
Published: (2024)
Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)
by: Huang, Haoyang, et al.
Published: (2025)
MedGemma Technical Report
by: Sellergren, Andrew, et al.
Published: (2025)
by: Sellergren, Andrew, et al.
Published: (2025)
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
by: Yang, Zhongyu, et al.
Published: (2024)
by: Yang, Zhongyu, et al.
Published: (2024)
Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation
by: Jamil, Sofia, et al.
Published: (2025)
by: Jamil, Sofia, et al.
Published: (2025)
Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)
by: Li, Yadong, et al.
Published: (2024)
The Solution for the CVPR2024 NICE Image Captioning Challenge
by: Huang, Longfei, et al.
Published: (2024)
by: Huang, Longfei, et al.
Published: (2024)
2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
by: Cao, Bin, et al.
Published: (2024)
by: Cao, Bin, et al.
Published: (2024)
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
by: Xiang, Sike, et al.
Published: (2026)
by: Xiang, Sike, et al.
Published: (2026)
Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)
by: Lu, Shiyin, et al.
Published: (2025)
MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models
by: Sharma, Harshita, et al.
Published: (2024)
by: Sharma, Harshita, et al.
Published: (2024)
MiMo-Embodied: X-Embodied Foundation Model Technical Report
by: Hao, Xiaoshuai, et al.
Published: (2025)
by: Hao, Xiaoshuai, et al.
Published: (2025)
Qwen2.5-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)
by: Xu, Jin, et al.
Published: (2025)
Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)
by: Phoenix, Team, et al.
Published: (2026)
Pegasus-v1 Technical Report
by: Jung, Raehyuk, et al.
Published: (2024)
by: Jung, Raehyuk, et al.
Published: (2024)
Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025)
by: He, Yuxuan, et al.
Published: (2025)
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge
by: Pan, Hongpeng, et al.
Published: (2024)
by: Pan, Hongpeng, et al.
Published: (2024)
Similar Items
-
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
by: Liang, Hao, et al.
Published: (2025) -
Technique Report of CVPR 2024 PBDL Challenges
by: Fu, Ying, et al.
Published: (2024) -
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
by: Liang, Tianming, et al.
Published: (2025) -
Technical Report for CVPR 2024 WeatherProof Dataset Challenge: Semantic Segmentation on Paired Real Data
by: Cao, Guojin, et al.
Published: (2024) -
DIVE: Deep-search Iterative Video Exploration A Technical Report for the CVRR Challenge at CVPR 2025
by: Kamoto, Umihiro, et al.
Published: (2025)