Search Results - DatologyAI
- Showing 1 - 6 results of 6
-
1
Arcee Trinity Large Technical Report by Singh, Varun, Krauss, Lucas, Jaghouar, Sami, Sirovatka, Matej, Goddard, Charles, Obied, Fares, Ong, Jack Min, Straube, Jannik, Fern, Harley, Aria, Stewart, Conner, Kealty, Colin, Panahi, Maziyar, Kirsten, Simon, Deshpande, Anushka, Vij, Anneketh, Bresnu, Arthur, Veldurthi, Pranav, Ravishankar, Raghav, Bishnoi, Hardik, Team, DatologyAI, Team, Arcee AI, Team, Prime Intellect, McQuade, Mark, Hagemann, Johannes, Atkins, Lucas
Published 2026Get full text
Preprint -
2
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining by DatologyAI, :, Maini, Pratyush, Dorna, Vineeth, Doshi, Parth, Carranza, Aldo, Pan, Fan, Urbanek, Jack, Burstein, Paul, Fang, Alex, Deng, Alvin, Abbas, Amro, Larsen, Brett, Blakeney, Cody, Bannur, Charvi, Baek, Christina, Teh, Darren, Schwab, David, Mongstad, Haakon, Yin, Haoli, Wills, Josh, Mentzer, Kaleigh, Merrick, Luke, Monti, Ricardo, Adiga, Rishabh, Joshi, Siddharth, Das, Spandan, Wang, Zhengping, Gaza, Bogdan, Morcos, Ari, Leavitt, Matthew
Published 2025Get full text
Preprint -
3
Luxical: High-Speed Lexical-Dense Text Embeddings by DatologyAI, :, Merrick, Luke, Fang, Alex, Carranza, Aldo, Deng, Alvin, Abbas, Amro, Larsen, Brett, Blakeney, Cody, Teh, Darren, Schwab, David, Pan, Fan, Mongstad, Haakon, Yin, Haoli, Urbanek, Jack, Lee, Jason, Telanoff, Jason, Wills, Josh, Mentzer, Kaleigh, Burstein, Paul, Doshi, Parth, Burnstein, Paul, Maini, Pratyush, Monti, Ricardo, Adiga, Rishabh, Loftin, Scott, Joshi, Siddharth, Das, Spandan, Jiang, Tony, Dorna, Vineeth, Wang, Zhengping, Gaza, Bogdan, Morcos, Ari, Leavitt, Matthew
Published 2025Get full text
Preprint -
4
DatBench: Discriminative, Faithful, and Efficient VLM Evaluations by DatologyAI, :, Joshi, Siddharth, Yin, Haoli, Adiga, Rishabh, Monti, Ricardo, Carranza, Aldo, Fang, Alex, Deng, Alvin, Abbas, Amro, Larsen, Brett, Blakeney, Cody, Teh, Darren, Schwab, David, Pan, Fan, Mongstad, Haakon, Urbanek, Jack, Lee, Jason, Telanoff, Jason, Wills, Josh, Mentzer, Kaleigh, Merrick, Luke, Doshi, Parth, Burstein, Paul, Maini, Pratyush, Loftin, Scott, Das, Spandan, Jiang, Tony, Dorna, Vineeth, Wang, Zhengping, Gaza, Bogdan, Morcos, Ari, Leavitt, Matthew
Published 2026Get full text
Preprint -
5
ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset by DatologyAI, :, Carranza, Aldo Gael, Mentzer, Kaleigh, Monti, Ricardo Pio, Fang, Alex, Deng, Alvin, Abbas, Amro, Suri, Anshuman, Larsen, Brett, Blakeney, Cody, Teh, Darren, Schwab, David, Kiner, Diego, Pan, Fan, Mongstad, Haakon, Yin, Haoli, Urbanek, Jack, Lee, Jason, Telanoff, Jason, Wills, Josh, Merrick, Luke, Böther, Maximilian, Doshi, Parth, Burstein, Paul, Maini, Pratyush, Adiga, Rishabh, Joshi, Siddharth, Das, Spandan, Jiang, Tony, Dorna, Vineeth, Wang, Zhengping, Gaza, Bogdan, Morcos, Ari, Leavitt, Matthew
Published 2026Get full text
Preprint -
6
20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone by DatologyAI, :, Joshi, Siddharth, Yin, Haoli, Adiga, Rishabh, Mongstad, Haakon, Deng, Alvin, Carranza, Aldo, Fang, Alex, Abbas, Amro, Suri, Anshuman, Larsen, Brett, Zayas, Daniel, Teh, Darren, Schwab, David, Kiner, Diego, Pan, Fan, Urbanek, Jack, Lee, Jason, Telanoff, Jason, Wills, Josh, Mentzer, Kaleigh, Merrick, Luke, Böther, Maximilian, Doshi, Parth, Burstein, Paul, Maini, Pratyush, Robroek, Ties, Jiang, Tony, Jain, Vidhi, Dorna, Vineeth, Wang, Zhengping, Gaza, Bogdan, Morcos, Ari, Leavitt, Matthew
Published 2026Get full text
Preprint