:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Beyer, Lucas, Steiner, Andreas, Pinto, André Susano, Kolesnikov, Alexander, Wang, Xiao, Salz, Daniel, Neumann, Maxim, Alabdulmohsin, Ibrahim, Tschannen, Michael, Bugliarello, Emanuele, Unterthiner, Thomas, Keysers, Daniel, Koppula, Skanda, Liu, Fangyu, Grycner, Adam, Gritsenko, Alexey, Houlsby, Neil, Kumar, Manoj, Rong, Keran, Eisenschlos, Julian, Kabra, Rishabh, Bauer, Matthias, Bošnjak, Matko, Chen, Xi, Minderer, Matthias, Voigtlaender, Paul, Bica, Ioana, Balazevic, Ivana, Puigcerver, Joan, Papalampidi, Pinelopi, Henaff, Olivier, Xiong, Xi, Soricut, Radu, Harmsen, Jeremiah, Zhai, Xiaohua
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2407.07726
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PaliGemma 2: A Family of Versatile VLMs for Transfer
by: Steiner, Andreas, et al.
Published: (2024)

Memory Consolidation Enables Long-Context Video Understanding
by: Balažević, Ivana, et al.
Published: (2024)

Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023)

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
by: Papalampidi, Pinelopi, et al.
Published: (2023)

Dynamic Classifier-Free Diffusion Guidance via Online Feedback
by: Papalampidi, Pinelopi, et al.
Published: (2025)

Finding the Right Moment: Human-Assisted Trailer Creation via Task Composition
by: Papalampidi, Pinelopi, et al.
Published: (2021)

Improving fine-grained understanding in image-text pre-training
by: Bica, Ioana, et al.
Published: (2024)

From Sparse to Soft Mixtures of Experts
by: Puigcerver, Joan, et al.
Published: (2023)

JetFormer: An Autoregressive Generative Model of Raw Images and Text
by: Tschannen, Michael, et al.
Published: (2024)

Jet: A Modern Transformer-Based Normalizing Flow
by: Kolesnikov, Alexander, et al.
Published: (2024)

PaliGemma-CXR: A Multi-task Multimodal Model for TB Chest X-ray Interpretation
by: Musinguzi, Denis, et al.
Published: (2025)

LocCa: Visual Pretraining with Location-aware Captioners
by: Wan, Bo, et al.
Published: (2024)

Scaling Pre-training to One Hundred Billion Data for Vision Language Models
by: Wang, Xiao, et al.
Published: (2025)

Hibridación entre la codorniz común (Coturnix coturnix) y la codorniz de granja: estado de un problema de conservación
by: M. Puigcerver
Published: (2013)

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
by: Pouget, Angéline, et al.
Published: (2024)

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
by: Wiles, Olivia, et al.
Published: (2024)

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
by: Tschannen, Michael, et al.
Published: (2025)

Escuela de Administración, Pontificia Universidad Católica de Chile, 1994 2000
by: Matko Koljatic
Published: (2001)

Experimental Study of Cold‐formed Steel Frames with Semi‐rigid Floor‐to‐wall Joints
by: Xi Guo, et al.
Published: (2025)

Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking
by: Zhang, Xinliang Frederick, et al.
Published: (2025)

TAPVid-3D: A Benchmark for Tracking Any Point in 3D
by: Koppula, Skanda, et al.
Published: (2024)

Campo Experimental Forestal "San Juan Tetla", Pue
by: Susano Hernández, Roberto
Published: (1963)

Principales plantas potencialmente toxicas para la ganadería de la zona forestal de Zoquiapan, Edo. de México
by: Susano Hernández, Roberto
Published: (1970)

Especies arbóreas forestales susceptibles de aprovecharse como forraje
by: Susano Hernández, Roberto
Published: (1981)

A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?
by: Alabdulmohsin, Ibrahim, et al.
Published: (2025)

Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems
by: Alabdulmohsin, Ibrahim, et al.
Published: (2025)

Failed schemes of relatedness in domestic work: Filipina domestic workers in Greece
by: Pinelopi Topali
Published: (2024)

Wavelet-Based Image Tokenizer for Vision Transformers
by: Zhu, Zhenhai, et al.
Published: (2024)

A Mixed Diet Makes DINO An Omnivorous Vision Encoder
by: Kabra, Rishabh, et al.
Published: (2026)

Modular differential equations of minimal orders of the elliptic genus of Calabi--Yau varieties
by: Adler, Dmitrii, et al.
Published: (2025)

Semilinear wave equations with time-dependent coefficients
by: Antonić, Nenad, et al.
Published: (2026)

Large pelagic species permit holders in the Caribbean Sea and Gulf of Mexico: statistics, characteristics, and demographic trends
by: Salz, R.J., et al.
Published: (2007)

Smoothness spaces for warped time-frequency representations -- Decomposition spaces and embedding relations
by: Holighaus, Nicki, et al.
Published: (2024)

Coorbit theory of warped time-frequency systems in $\mathbb{R}^d$
by: Holighaus, Nicki, et al.
Published: (2022)

A case of Conradi‐Hünermann‐Happle syndrome treated with topical simvastatin‐cholesterol ointment
by: Jean Zevallos, et al.
Published: (2024)

Application Of Large Language Models For The Extraction Of Information From Particle Accelerator Technical Documentation
by: Dai, Qing, et al.
Published: (2025)

Rare Angiosarcoma in the Postauricular Region—A Diagnostic Challenge and Treatment
by: Katarzyna Żyżynska, et al.
Published: (2026)

Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations
by: Kumar, Manoj, et al.
Published: (2024)

The Philosophy of Desire in the Buddhist Pali Canon
by: Webster, David
Published: (2025)