Saved in:
| Main Authors: | Beyer, Lucas, Steiner, Andreas, Pinto, André Susano, Kolesnikov, Alexander, Wang, Xiao, Salz, Daniel, Neumann, Maxim, Alabdulmohsin, Ibrahim, Tschannen, Michael, Bugliarello, Emanuele, Unterthiner, Thomas, Keysers, Daniel, Koppula, Skanda, Liu, Fangyu, Grycner, Adam, Gritsenko, Alexey, Houlsby, Neil, Kumar, Manoj, Rong, Keran, Eisenschlos, Julian, Kabra, Rishabh, Bauer, Matthias, Bošnjak, Matko, Chen, Xi, Minderer, Matthias, Voigtlaender, Paul, Bica, Ioana, Balazevic, Ivana, Puigcerver, Joan, Papalampidi, Pinelopi, Henaff, Olivier, Xiong, Xi, Soricut, Radu, Harmsen, Jeremiah, Zhai, Xiaohua |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.07726 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PaliGemma 2: A Family of Versatile VLMs for Transfer
by: Steiner, Andreas, et al.
Published: (2024)
by: Steiner, Andreas, et al.
Published: (2024)
Memory Consolidation Enables Long-Context Video Understanding
by: Balažević, Ivana, et al.
Published: (2024)
by: Balažević, Ivana, et al.
Published: (2024)
Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023)
by: Minderer, Matthias, et al.
Published: (2023)
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
by: Papalampidi, Pinelopi, et al.
Published: (2023)
by: Papalampidi, Pinelopi, et al.
Published: (2023)
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
by: Papalampidi, Pinelopi, et al.
Published: (2025)
by: Papalampidi, Pinelopi, et al.
Published: (2025)
Finding the Right Moment: Human-Assisted Trailer Creation via Task Composition
by: Papalampidi, Pinelopi, et al.
Published: (2021)
by: Papalampidi, Pinelopi, et al.
Published: (2021)
Improving fine-grained understanding in image-text pre-training
by: Bica, Ioana, et al.
Published: (2024)
by: Bica, Ioana, et al.
Published: (2024)
From Sparse to Soft Mixtures of Experts
by: Puigcerver, Joan, et al.
Published: (2023)
by: Puigcerver, Joan, et al.
Published: (2023)
JetFormer: An Autoregressive Generative Model of Raw Images and Text
by: Tschannen, Michael, et al.
Published: (2024)
by: Tschannen, Michael, et al.
Published: (2024)
Jet: A Modern Transformer-Based Normalizing Flow
by: Kolesnikov, Alexander, et al.
Published: (2024)
by: Kolesnikov, Alexander, et al.
Published: (2024)
PaliGemma-CXR: A Multi-task Multimodal Model for TB Chest X-ray Interpretation
by: Musinguzi, Denis, et al.
Published: (2025)
by: Musinguzi, Denis, et al.
Published: (2025)
LocCa: Visual Pretraining with Location-aware Captioners
by: Wan, Bo, et al.
Published: (2024)
by: Wan, Bo, et al.
Published: (2024)
Scaling Pre-training to One Hundred Billion Data for Vision Language Models
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
Hibridación entre la codorniz común (Coturnix coturnix) y la codorniz de granja: estado de un problema de conservación
by: M. Puigcerver
Published: (2013)
by: M. Puigcerver
Published: (2013)
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
by: Pouget, Angéline, et al.
Published: (2024)
by: Pouget, Angéline, et al.
Published: (2024)
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)
by: Salzmann, Tim, et al.
Published: (2024)
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
by: Wiles, Olivia, et al.
Published: (2024)
by: Wiles, Olivia, et al.
Published: (2024)
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
by: Tschannen, Michael, et al.
Published: (2025)
by: Tschannen, Michael, et al.
Published: (2025)
Escuela de Administración, Pontificia Universidad Católica de Chile, 1994 2000
by: Matko Koljatic
Published: (2001)
by: Matko Koljatic
Published: (2001)
Experimental Study of Cold‐formed Steel Frames with Semi‐rigid Floor‐to‐wall Joints
by: Xi Guo, et al.
Published: (2025)
by: Xi Guo, et al.
Published: (2025)
Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking
by: Zhang, Xinliang Frederick, et al.
Published: (2025)
by: Zhang, Xinliang Frederick, et al.
Published: (2025)
TAPVid-3D: A Benchmark for Tracking Any Point in 3D
by: Koppula, Skanda, et al.
Published: (2024)
by: Koppula, Skanda, et al.
Published: (2024)
Campo Experimental Forestal "San Juan Tetla", Pue
by: Susano Hernández, Roberto
Published: (1963)
by: Susano Hernández, Roberto
Published: (1963)
Principales plantas potencialmente toxicas para la ganadería de la zona forestal de Zoquiapan, Edo. de México
by: Susano Hernández, Roberto
Published: (1970)
by: Susano Hernández, Roberto
Published: (1970)
Especies arbóreas forestales susceptibles de aprovecharse como forraje
by: Susano Hernández, Roberto
Published: (1981)
by: Susano Hernández, Roberto
Published: (1981)
A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?
by: Alabdulmohsin, Ibrahim, et al.
Published: (2025)
by: Alabdulmohsin, Ibrahim, et al.
Published: (2025)
Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems
by: Alabdulmohsin, Ibrahim, et al.
Published: (2025)
by: Alabdulmohsin, Ibrahim, et al.
Published: (2025)
Failed schemes of relatedness in domestic work: Filipina domestic workers in Greece
by: Pinelopi Topali
Published: (2024)
by: Pinelopi Topali
Published: (2024)
Wavelet-Based Image Tokenizer for Vision Transformers
by: Zhu, Zhenhai, et al.
Published: (2024)
by: Zhu, Zhenhai, et al.
Published: (2024)
A Mixed Diet Makes DINO An Omnivorous Vision Encoder
by: Kabra, Rishabh, et al.
Published: (2026)
by: Kabra, Rishabh, et al.
Published: (2026)
Modular differential equations of minimal orders of the elliptic genus of Calabi--Yau varieties
by: Adler, Dmitrii, et al.
Published: (2025)
by: Adler, Dmitrii, et al.
Published: (2025)
Semilinear wave equations with time-dependent coefficients
by: Antonić, Nenad, et al.
Published: (2026)
by: Antonić, Nenad, et al.
Published: (2026)
Large pelagic species permit holders in the Caribbean Sea and Gulf of Mexico: statistics, characteristics, and demographic trends
by: Salz, R.J., et al.
Published: (2007)
by: Salz, R.J., et al.
Published: (2007)
Smoothness spaces for warped time-frequency representations -- Decomposition spaces and embedding relations
by: Holighaus, Nicki, et al.
Published: (2024)
by: Holighaus, Nicki, et al.
Published: (2024)
Coorbit theory of warped time-frequency systems in $\mathbb{R}^d$
by: Holighaus, Nicki, et al.
Published: (2022)
by: Holighaus, Nicki, et al.
Published: (2022)
A case of Conradi‐Hünermann‐Happle syndrome treated with topical simvastatin‐cholesterol ointment
by: Jean Zevallos, et al.
Published: (2024)
by: Jean Zevallos, et al.
Published: (2024)
Application Of Large Language Models For The Extraction Of Information From Particle Accelerator Technical Documentation
by: Dai, Qing, et al.
Published: (2025)
by: Dai, Qing, et al.
Published: (2025)
Rare Angiosarcoma in the Postauricular Region—A Diagnostic Challenge and Treatment
by: Katarzyna Żyżynska, et al.
Published: (2026)
by: Katarzyna Żyżynska, et al.
Published: (2026)
Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations
by: Kumar, Manoj, et al.
Published: (2024)
by: Kumar, Manoj, et al.
Published: (2024)
The Philosophy of Desire in the Buddhist Pali Canon
by: Webster, David
Published: (2025)
by: Webster, David
Published: (2025)
Similar Items
-
PaliGemma 2: A Family of Versatile VLMs for Transfer
by: Steiner, Andreas, et al.
Published: (2024) -
Memory Consolidation Enables Long-Context Video Understanding
by: Balažević, Ivana, et al.
Published: (2024) -
Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023) -
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
by: Papalampidi, Pinelopi, et al.
Published: (2023) -
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
by: Papalampidi, Pinelopi, et al.
Published: (2025)