Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Giebink, Noah, Gupta, Amrita, Verìssimo, Diogo, Chang, Charlotte H., Chang, Tony, Brennan, Angela, Dickson, Brett, Bowmer, Alex, Baillie, Jonathan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Information Retrieval
Online Access:	https://arxiv.org/abs/2405.01610
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909188495507456
author	Giebink, Noah Gupta, Amrita Verìssimo, Diogo Chang, Charlotte H. Chang, Tony Brennan, Angela Dickson, Brett Bowmer, Alex Baillie, Jonathan
author_facet	Giebink, Noah Gupta, Amrita Verìssimo, Diogo Chang, Charlotte H. Chang, Tony Brennan, Angela Dickson, Brett Bowmer, Alex Baillie, Jonathan
contents	Measuring public attitudes toward wildlife provides crucial insights into our relationship with nature and helps monitor progress toward Global Biodiversity Framework targets. Yet, conducting such assessments at a global scale is challenging. Manually curating search terms for querying news and social media is tedious, costly, and can lead to biased results. Raw news and social media data returned from queries are often cluttered with irrelevant content and syndicated articles. We aim to overcome these challenges by leveraging modern Natural Language Processing (NLP) tools. We introduce a folk taxonomy approach for improved search term generation and employ cosine similarity on Term Frequency-Inverse Document Frequency vectors to filter syndicated articles. We also introduce an extensible relevance filtering pipeline which uses unsupervised learning to reveal common topics, followed by an open-source zero-shot Large Language Model (LLM) to assign topics to news article titles, which are then used to assign relevance. Finally, we conduct sentiment, topic, and volume analyses on resulting data. We illustrate our methodology with a case study of news and X (formerly Twitter) data before and during the COVID-19 pandemic for various mammal taxa, including bats, pangolins, elephants, and gorillas. During the data collection period, up to 62% of articles including keywords pertaining to bats were deemed irrelevant to biodiversity, underscoring the importance of relevance filtering. At the pandemic's onset, we observed increased volume and a significant sentiment shift toward horseshoe bats, which were implicated in the pandemic, but not for other focal taxa. The proposed methods open the door to conservation practitioners applying modern and emerging NLP tools, including LLMs "out of the box," to analyze public perceptions of biodiversity during current events or campaigns.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_01610
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Automating the Analysis of Public Saliency and Attitudes towards Biodiversity from Digital Media Giebink, Noah Gupta, Amrita Verìssimo, Diogo Chang, Charlotte H. Chang, Tony Brennan, Angela Dickson, Brett Bowmer, Alex Baillie, Jonathan Computation and Language Information Retrieval Measuring public attitudes toward wildlife provides crucial insights into our relationship with nature and helps monitor progress toward Global Biodiversity Framework targets. Yet, conducting such assessments at a global scale is challenging. Manually curating search terms for querying news and social media is tedious, costly, and can lead to biased results. Raw news and social media data returned from queries are often cluttered with irrelevant content and syndicated articles. We aim to overcome these challenges by leveraging modern Natural Language Processing (NLP) tools. We introduce a folk taxonomy approach for improved search term generation and employ cosine similarity on Term Frequency-Inverse Document Frequency vectors to filter syndicated articles. We also introduce an extensible relevance filtering pipeline which uses unsupervised learning to reveal common topics, followed by an open-source zero-shot Large Language Model (LLM) to assign topics to news article titles, which are then used to assign relevance. Finally, we conduct sentiment, topic, and volume analyses on resulting data. We illustrate our methodology with a case study of news and X (formerly Twitter) data before and during the COVID-19 pandemic for various mammal taxa, including bats, pangolins, elephants, and gorillas. During the data collection period, up to 62% of articles including keywords pertaining to bats were deemed irrelevant to biodiversity, underscoring the importance of relevance filtering. At the pandemic's onset, we observed increased volume and a significant sentiment shift toward horseshoe bats, which were implicated in the pandemic, but not for other focal taxa. The proposed methods open the door to conservation practitioners applying modern and emerging NLP tools, including LLMs "out of the box," to analyze public perceptions of biodiversity during current events or campaigns.
title	Automating the Analysis of Public Saliency and Attitudes towards Biodiversity from Digital Media
topic	Computation and Language Information Retrieval
url	https://arxiv.org/abs/2405.01610

Similar Items