Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.02112 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917276052094976 |
|---|---|
| author | Campolongo, Elizabeth G. Chou, Yuan-Tang Govorkova, Ekaterina Bhimji, Wahid Chao, Wei-Lun Harris, Chris Hsu, Shih-Chieh Lapp, Hilmar Neubauer, Mark S. Namayanja, Josephine Subramanian, Aneesh Harris, Philip Anand, Advaith Carlyn, David E. Ghosh, Subhankar Lawrence, Christopher Moreno, Eric Raikman, Ryan Wu, Jiaman Zhang, Ziheng Adhi, Bayu Gharehtoragh, Mohammad Ahmadi Monsalve, Saúl Alonso Babicz, Marta Baig, Furqan Banerji, Namrata Bardon, William Barna, Tyler Berger-Wolf, Tanya Dieng, Adji Bousso Brachman, Micah Buat, Quentin Hui, David C. Y. Cao, Phuong Cerino, Franco Chang, Yi-Chun Chaulagain, Shivaji Chen, An-Kai Chen, Deming Chen, Eric Chou, Chia-Jui Ciou, Zih-Chen Cochran-Branson, Miles Choi, Artur Cordeiro Oudot Coughlin, Michael Cremonesi, Matteo Dadarlat, Maria Darch, Peter Desai, Malina Diaz, Daniel Dillmann, Steven Duarte, Javier Duporge, Isla Ekka, Urbas Heravi, Saba Entezari Fang, Hao Flynn, Rian Fox, Geoffrey Freed, Emily Gao, Hang Gao, Jing Gonski, Julia Graham, Matthew Hashemi, Abolfazl Hauck, Scott Hazelden, James Peterson, Joshua Henry Hoang, Duc Hu, Wei Huennefeld, Mirco Hyde, David Janeja, Vandana Jaroenchai, Nattapon Jia, Haoyi Kang, Yunfan Kholiavchenko, Maksim Khoda, Elham E. Kim, Sangin Kumar, Aditya Lai, Bo-Cheng Le, Trung Lee, Chi-Wei Lee, JangHyeon Lee, Shaocheng van der Lee, Suzan Lewis, Charles Li, Haitong Li, Haoyang Liao, Henry Liu, Mia Liu, Xiaolin Liu, Xiulong Loncar, Vladimir Lyu, Fangzheng Makarov, Ilya Mallampalli, Abhishikth Mao, Chen-Yu Michels, Alexander Migala, Alexander Mokhtar, Farouk Morlighem, Mathieu Namgung, Min Novak, Andrzej Novick, Andrew Orsborn, Amy Padmanabhan, Anand Pan, Jia-Cheng Pandya, Sneh Pei, Zhiyuan Peixoto, Ana Percivall, George Leung, Alex Po Purushotham, Sanjay Que, Zhiqiang Quinnan, Melissa Ranjan, Arghya Rankin, Dylan Reissel, Christina Riedel, Benedikt Rubenstein, Dan Sasli, Argyro Shlizerman, Eli Singh, Arushi Singh, Kim Sokol, Eric R. Sorensen, Arturo Su, Yu Taheri, Mitra Thakkar, Vaibhav Thomas, Ann Mariam Toberer, Eric Tsai, Chenghan Vandewalle, Rebecca Verma, Arjun Venterea, Ricco C. Wang, He Wang, Jianwu Wang, Sam Wang, Shaowen Watts, Gordon Weitz, Jason Wildridge, Andrew Williams, Rebecca Wolf, Scott Xu, Yue Yan, Jianqi Yu, Jai Zhang, Yulei Zhao, Haoran Zhao, Ying Zhong, Yibo |
| author_facet | Campolongo, Elizabeth G. Chou, Yuan-Tang Govorkova, Ekaterina Bhimji, Wahid Chao, Wei-Lun Harris, Chris Hsu, Shih-Chieh Lapp, Hilmar Neubauer, Mark S. Namayanja, Josephine Subramanian, Aneesh Harris, Philip Anand, Advaith Carlyn, David E. Ghosh, Subhankar Lawrence, Christopher Moreno, Eric Raikman, Ryan Wu, Jiaman Zhang, Ziheng Adhi, Bayu Gharehtoragh, Mohammad Ahmadi Monsalve, Saúl Alonso Babicz, Marta Baig, Furqan Banerji, Namrata Bardon, William Barna, Tyler Berger-Wolf, Tanya Dieng, Adji Bousso Brachman, Micah Buat, Quentin Hui, David C. Y. Cao, Phuong Cerino, Franco Chang, Yi-Chun Chaulagain, Shivaji Chen, An-Kai Chen, Deming Chen, Eric Chou, Chia-Jui Ciou, Zih-Chen Cochran-Branson, Miles Choi, Artur Cordeiro Oudot Coughlin, Michael Cremonesi, Matteo Dadarlat, Maria Darch, Peter Desai, Malina Diaz, Daniel Dillmann, Steven Duarte, Javier Duporge, Isla Ekka, Urbas Heravi, Saba Entezari Fang, Hao Flynn, Rian Fox, Geoffrey Freed, Emily Gao, Hang Gao, Jing Gonski, Julia Graham, Matthew Hashemi, Abolfazl Hauck, Scott Hazelden, James Peterson, Joshua Henry Hoang, Duc Hu, Wei Huennefeld, Mirco Hyde, David Janeja, Vandana Jaroenchai, Nattapon Jia, Haoyi Kang, Yunfan Kholiavchenko, Maksim Khoda, Elham E. Kim, Sangin Kumar, Aditya Lai, Bo-Cheng Le, Trung Lee, Chi-Wei Lee, JangHyeon Lee, Shaocheng van der Lee, Suzan Lewis, Charles Li, Haitong Li, Haoyang Liao, Henry Liu, Mia Liu, Xiaolin Liu, Xiulong Loncar, Vladimir Lyu, Fangzheng Makarov, Ilya Mallampalli, Abhishikth Mao, Chen-Yu Michels, Alexander Migala, Alexander Mokhtar, Farouk Morlighem, Mathieu Namgung, Min Novak, Andrzej Novick, Andrew Orsborn, Amy Padmanabhan, Anand Pan, Jia-Cheng Pandya, Sneh Pei, Zhiyuan Peixoto, Ana Percivall, George Leung, Alex Po Purushotham, Sanjay Que, Zhiqiang Quinnan, Melissa Ranjan, Arghya Rankin, Dylan Reissel, Christina Riedel, Benedikt Rubenstein, Dan Sasli, Argyro Shlizerman, Eli Singh, Arushi Singh, Kim Sokol, Eric R. Sorensen, Arturo Su, Yu Taheri, Mitra Thakkar, Vaibhav Thomas, Ann Mariam Toberer, Eric Tsai, Chenghan Vandewalle, Rebecca Verma, Arjun Venterea, Ricco C. Wang, He Wang, Jianwu Wang, Sam Wang, Shaowen Watts, Gordon Weitz, Jason Wildridge, Andrew Williams, Rebecca Wolf, Scott Xu, Yue Yan, Jianqi Yu, Jai Zhang, Yulei Zhao, Haoran Zhao, Ying Zhong, Yibo |
| contents | Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_02112 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Building Machine Learning Challenges for Anomaly Detection in Science Campolongo, Elizabeth G. Chou, Yuan-Tang Govorkova, Ekaterina Bhimji, Wahid Chao, Wei-Lun Harris, Chris Hsu, Shih-Chieh Lapp, Hilmar Neubauer, Mark S. Namayanja, Josephine Subramanian, Aneesh Harris, Philip Anand, Advaith Carlyn, David E. Ghosh, Subhankar Lawrence, Christopher Moreno, Eric Raikman, Ryan Wu, Jiaman Zhang, Ziheng Adhi, Bayu Gharehtoragh, Mohammad Ahmadi Monsalve, Saúl Alonso Babicz, Marta Baig, Furqan Banerji, Namrata Bardon, William Barna, Tyler Berger-Wolf, Tanya Dieng, Adji Bousso Brachman, Micah Buat, Quentin Hui, David C. Y. Cao, Phuong Cerino, Franco Chang, Yi-Chun Chaulagain, Shivaji Chen, An-Kai Chen, Deming Chen, Eric Chou, Chia-Jui Ciou, Zih-Chen Cochran-Branson, Miles Choi, Artur Cordeiro Oudot Coughlin, Michael Cremonesi, Matteo Dadarlat, Maria Darch, Peter Desai, Malina Diaz, Daniel Dillmann, Steven Duarte, Javier Duporge, Isla Ekka, Urbas Heravi, Saba Entezari Fang, Hao Flynn, Rian Fox, Geoffrey Freed, Emily Gao, Hang Gao, Jing Gonski, Julia Graham, Matthew Hashemi, Abolfazl Hauck, Scott Hazelden, James Peterson, Joshua Henry Hoang, Duc Hu, Wei Huennefeld, Mirco Hyde, David Janeja, Vandana Jaroenchai, Nattapon Jia, Haoyi Kang, Yunfan Kholiavchenko, Maksim Khoda, Elham E. Kim, Sangin Kumar, Aditya Lai, Bo-Cheng Le, Trung Lee, Chi-Wei Lee, JangHyeon Lee, Shaocheng van der Lee, Suzan Lewis, Charles Li, Haitong Li, Haoyang Liao, Henry Liu, Mia Liu, Xiaolin Liu, Xiulong Loncar, Vladimir Lyu, Fangzheng Makarov, Ilya Mallampalli, Abhishikth Mao, Chen-Yu Michels, Alexander Migala, Alexander Mokhtar, Farouk Morlighem, Mathieu Namgung, Min Novak, Andrzej Novick, Andrew Orsborn, Amy Padmanabhan, Anand Pan, Jia-Cheng Pandya, Sneh Pei, Zhiyuan Peixoto, Ana Percivall, George Leung, Alex Po Purushotham, Sanjay Que, Zhiqiang Quinnan, Melissa Ranjan, Arghya Rankin, Dylan Reissel, Christina Riedel, Benedikt Rubenstein, Dan Sasli, Argyro Shlizerman, Eli Singh, Arushi Singh, Kim Sokol, Eric R. Sorensen, Arturo Su, Yu Taheri, Mitra Thakkar, Vaibhav Thomas, Ann Mariam Toberer, Eric Tsai, Chenghan Vandewalle, Rebecca Verma, Arjun Venterea, Ricco C. Wang, He Wang, Jianwu Wang, Sam Wang, Shaowen Watts, Gordon Weitz, Jason Wildridge, Andrew Williams, Rebecca Wolf, Scott Xu, Yue Yan, Jianqi Yu, Jai Zhang, Yulei Zhao, Haoran Zhao, Ying Zhong, Yibo Machine Learning Instrumentation and Methods for Astrophysics Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery. |
| title | Building Machine Learning Challenges for Anomaly Detection in Science |
| topic | Machine Learning Instrumentation and Methods for Astrophysics |
| url | https://arxiv.org/abs/2503.02112 |