MOOD publications

MOOD project is at the forefront of European research of infectious disease surveillance and modelling from a data science perspective, investigating the impact of global warming on disease outbreaks, and proposing innovations for building of One Health systems across Europe and the world.

In the table below all publications to which the MOOD project contributed are listed. Use the filter to select the most relevant articles.

Show all

19.

SYED, Mehtab Alam; ARSEVSKA, Elena; ROCHE, Mathieu; TEISSEIRE, Maguelonne

GeospartRE: Extraction and Geocoding of spatial relation entities in textual documents Journal Article

In: Cartography and Geographic Information Science, 2023.

Links | BibTeX | Tags: OpenDataSet, Text mining

18.

Valentin, Sarah; Boudoua, Bahdja; Sewalk, Kara; Arınık, Nejat; Roche, Mathieu; Lancelot, Renaud; Arsevska, Elena

Dissemination of information in event-based surveillance, a case study of Avian Influenza Journal Article

In: PLoS ONE, 2023.

Abstract | Links | BibTeX | Tags: HPAI (Avian Influenza), OpenDataSet, Text mining

@article{nokey,

title = {Dissemination of information in event-based surveillance, a case study of Avian Influenza},

author = {Sarah Valentin and Bahdja Boudoua and Kara Sewalk and Nejat Arınık and Mathieu Roche and Renaud Lancelot and Elena Arsevska },

url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0285341},

doi = {10.1371/journal.pone.0285341},

year  = {2023},

date = {2023-09-05},

urldate = {2023-09-05},

journal = {PLoS ONE},

abstract = {Event-Based Surveillance (EBS) tools, such as HealthMap and PADI-web, monitor online news reports and other unofficial sources, with the primary aim to provide timely information to users from health agencies on disease outbreaks occurring worldwide. In this work, we describe how outbreak-related information disseminates from a primary source, via a secondary source, to a definitive aggregator, an EBS tool, during the 2018/19 avian influenza season. We analysed 337 news items from the PADI-web and 115 news articles from HealthMap EBS tools reporting avian influenza outbreaks in birds worldwide between July 2018 and June 2019. We used the sources cited in the news to trace the path of each outbreak. We built a directed network with nodes representing the sources (characterised by type, specialisation, and geographical focus) and edges representing the flow of information. We calculated the degree as a centrality measure to determine the importance of the nodes in information dissemination. We analysed the role of the sources in early detection (detection of an event before its official notification) to the World Organisation for Animal Health (WOAH) and late detection. A total of 23% and 43% of the avian influenza outbreaks detected by the PADI-web and HealthMap, respectively, were shared on time before their notification. For both tools, national and local veterinary authorities were the primary sources of early detection. The early detection component mainly relied on the dissemination of nationally acknowledged events by online news and press agencies, bypassing international reporting to the WAOH. WOAH was the major secondary source for late detection, occupying a central position between national authorities and disseminator sources, such as online news. PADI-web and HealthMap were highly complementary in terms of detected sources, explaining why 90% of the events were detected by only one of the tools. We show that current EBS tools can provide timely outbreak-related information and priority news sources to improve digital disease surveillance.



Figures},

keywords = {HPAI (Avian Influenza), OpenDataSet, Text mining},

pubstate = {published},

tppubtype = {article}

}

Event-Based Surveillance (EBS) tools, such as HealthMap and PADI-web, monitor online news reports and other unofficial sources, with the primary aim to provide timely information to users from health agencies on disease outbreaks occurring worldwide. In this work, we describe how outbreak-related information disseminates from a primary source, via a secondary source, to a definitive aggregator, an EBS tool, during the 2018/19 avian influenza season. We analysed 337 news items from the PADI-web and 115 news articles from HealthMap EBS tools reporting avian influenza outbreaks in birds worldwide between July 2018 and June 2019. We used the sources cited in the news to trace the path of each outbreak. We built a directed network with nodes representing the sources (characterised by type, specialisation, and geographical focus) and edges representing the flow of information. We calculated the degree as a centrality measure to determine the importance of the nodes in information dissemination. We analysed the role of the sources in early detection (detection of an event before its official notification) to the World Organisation for Animal Health (WOAH) and late detection. A total of 23% and 43% of the avian influenza outbreaks detected by the PADI-web and HealthMap, respectively, were shared on time before their notification. For both tools, national and local veterinary authorities were the primary sources of early detection. The early detection component mainly relied on the dissemination of nationally acknowledged events by online news and press agencies, bypassing international reporting to the WAOH. WOAH was the major secondary source for late detection, occupying a central position between national authorities and disseminator sources, such as online news. PADI-web and HealthMap were highly complementary in terms of detected sources, explaining why 90% of the events were detected by only one of the tools. We show that current EBS tools can provide timely outbreak-related information and priority news sources to improve digital disease surveillance.

Figures

17.

Decoupes, Rémy; Roche, Mathieu; Teisseire, Maguelonne

GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring Journal Article

In: Intelligent Data Analysis, pp. 1-25, 2023.

Abstract | Links | BibTeX | Tags: OpenDataSet, Text mining

@article{nokey,

title = {GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring},

author = {Rémy Decoupes and Mathieu Roche and Maguelonne Teisseire},

url = {https://content.iospress.com/articles/intelligent-data-analysis/ida230040},

doi = {10.3233/IDA-230040},

year  = {2023},

date = {2023-07-06},

urldate = {2023-07-06},

journal = {Intelligent Data Analysis},

pages = {1-25},

abstract = {Crises such as natural disasters and public health emergencies generate vast amounts of text data, making it challenging to classify the information into relevant categories. Acquiring expert-labeled data for such scenarios can be difficult, leading to limited training datasets for text classification by fine-tuning BERT-like models. Unfortunately, traditional data augmentation techniques only slightly improve F1-scores. How can data augmentation be used to obtain better results in this applied domain? In this paper, using neural network explicability methods, we aim to highlight that fine-tuned BERT-like models on crisis corpora give too much importance to spatial information to make their predictions. This overfitting of spatial information limits their ability to generalize especially when the event which occurs in a place has evolved and changed since the training dataset has been built. To reduce this bias, we propose GeoNLPlify,1

 a novel data augmentation technique that leverages spatial information to generate new labeled data for text classification related to crises. Our approach aims to address overfitting without necessitating modifications to the underlying model architecture, distinguishing it from other prevalent methods employed to combat overfitting. Our results show that GeoNLPlify significantly improves F1-scores, demonstrating the potential of the spatial information for data augmentation for crisis-related text classification tasks. In order to evaluate the contribution of our method, GeoNLPlify is applied to three public datasets (PADI-web, CrisisNLP and SST2) and compared with classical natural language processing data augmentations.},

keywords = {OpenDataSet, Text mining},

pubstate = {published},

tppubtype = {article}

}

16.

Arınık, Nejat; Bortel, Wim Van; Boudoua, Bahdja; Busani, Luca; Decoupes, Rémy; Interdonato, Roberto; Kafando, Rodrique; van Kleef, Esther; Roche, Mathieu; Syed, Mehtab Alam; Teisseire, Maguelonne

An annotated dataset for event-based surveillance of antimicrobial resistance Journal Article

In: ScienceDirect, 2023.

Abstract | Links | BibTeX | Tags: AMR (Antimicrobial Resistance), OpenDataSet, Text mining

15.

Valentin, Sarah; Arsevska, Elena; Mercier, Alizé; Falala, Sylvain; Rabatel, Julien; Lancelot, Renaud; Roche, Mathieu

PADI-web: An Event-Based Surveillance System for Detecting, Classifying and Processing Online News Conference

Human Language Technology. Challenges for Computer Science and Linguistics, vol. 12598, Springer International Publishing, 2022, ISBN: 978-3-030-66526-5.

Abstract | Links | BibTeX | Tags: ASF (African Swine Fever), HPAI (Avian Influenza), Text mining

14.

Valentin, Sarah; Arsevska, Elena; al.,

Elaboration of a new framework for fine-grained epidemiological annotation Journal Article

In: 2022.

Abstract | Links | BibTeX | Tags: OpenDataSet, Text mining

13.

Schaeffer, Camille; Interdonato, Roberto; Lancelot, Renaud; Roche, Mathieu; Teisseire, Maguelonne

Labeled entities from social media data related to avian influenza disease Journal Article Forthcoming

In: Data in Brief, vol. 43, pp. 108317, Forthcoming, ISSN: 2352-3409.

Abstract | Links | BibTeX | Tags: HPAI (Avian Influenza), OpenDataSet, Text mining

12.

Roche, Mathieu; Arsevska, Elena; Valentin, Sarah; Falala, Sylvain; Rabatel, Julien; Lancelot, Renaud

How Textual Datasets Enhance the PADI-Web Tool? Journal Article

In: SciTePress, 2022.

Abstract | Links | BibTeX | Tags: Text mining

11.

Syed, Mehtab Alam; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne

A Data-Driven Score Model to Assess Online News Articles in Event-Based Surveillance System Conference

Information Management and Big Data, vol. 1577, Springer International Publishing, 2022.

Abstract | BibTeX | Tags: Text mining

10.

Valentin, Sarah; Lancelot, Renaud; Roche, Mathieu

Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance Journal Article

In: Information Processing in Agriculture, 2022, ISSN: 2214-3173.

Abstract | Links | BibTeX | Tags: Text mining

@article{@article{VALENTIN2022,

title = {Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance},

author = {Sarah Valentin and Renaud Lancelot and Mathieu Roche},

url = {https://www.sciencedirect.com/science/article/pii/S2214317322000312},

doi = {https://doi.org/10.1016/j.inpa.2022.03.004},

issn = {2214-3173},

year  = {2022},

date = {2022-03-28},

journal = {Information Processing in Agriculture},

abstract = {Several internet-based surveillance systems have been created to monitor the web for animal health surveillance. These systems collect a large amount of news dealing with outbreaks related to animal diseases. Automatically identifying news articles that describe the same outbreak event is a key step to quickly detect relevant epidemiological information while alleviating manual curation of news content. This paper addresses the task of retrieving news articles that are related in epidemiological terms. We tackle this issue using text mining and feature fusion methods. The main objective of this paper is to identify a textual representation in which two articles that share the same epidemiological content are close. We compared two types of representations (i.e., features) to represent the documents: (i) morphosyntactic features (i.e., selection and transformation of all terms from the news, based on classical textual processing steps) and (ii) lexicosemantic features (i.e., selection, transformation and fusion of epidemiological terms including diseases, hosts, locations and dates). We compared two types of term weighing (i.e., Boolean and TF-IDF) for both representations. To combine and transform lexicosemantic features, we compared two data fusion techniques (i.e., early fusion and late fusion) and the effect of features generalisation, while evaluating the relative importance of each type of feature. We conducted our analysis using a corpus composed of a subset of news articles in English related to animal disease outbreaks. Our results showed that the combination of relevant lexicosemantic (epidemiological) features using fusion methods improves classical morphosyntactic representation in the context of disease-related news retrieval. The lexicosemantic representation based on TF-IDF and feature generalisation (F-measureÂ =Â 0.92, r-precisionÂ =Â 0.58) outperformed the morphosyntactic representation (F-measureÂ =Â 0.89, r-precisionÂ =Â 0.45), while reducing the features space. Converting the features into lower granular features (i.e., generalisation) contributed to improving the results of the lexicosemantic representation. Our results showed no difference between the early and late fusion approaches. Temporal features performed poorly on their own. Conversely, spatial features were the most discriminative features, highlighting the need for robust methods for spatial entity extraction, disambiguation and representation in internet-based surveillance systems.},

keywords = {Text mining},

pubstate = {published},

tppubtype = {article}

}

Several internet-based surveillance systems have been created to monitor the web for animal health surveillance. These systems collect a large amount of news dealing with outbreaks related to animal diseases. Automatically identifying news articles that describe the same outbreak event is a key step to quickly detect relevant epidemiological information while alleviating manual curation of news content. This paper addresses the task of retrieving news articles that are related in epidemiological terms. We tackle this issue using text mining and feature fusion methods. The main objective of this paper is to identify a textual representation in which two articles that share the same epidemiological content are close. We compared two types of representations (i.e., features) to represent the documents: (i) morphosyntactic features (i.e., selection and transformation of all terms from the news, based on classical textual processing steps) and (ii) lexicosemantic features (i.e., selection, transformation and fusion of epidemiological terms including diseases, hosts, locations and dates). We compared two types of term weighing (i.e., Boolean and TF-IDF) for both representations. To combine and transform lexicosemantic features, we compared two data fusion techniques (i.e., early fusion and late fusion) and the effect of features generalisation, while evaluating the relative importance of each type of feature. We conducted our analysis using a corpus composed of a subset of news articles in English related to animal disease outbreaks. Our results showed that the combination of relevant lexicosemantic (epidemiological) features using fusion methods improves classical morphosyntactic representation in the context of disease-related news retrieval. The lexicosemantic representation based on TF-IDF and feature generalisation (F-measureÂ =Â 0.92, r-precisionÂ =Â 0.58) outperformed the morphosyntactic representation (F-measureÂ =Â 0.89, r-precisionÂ =Â 0.45), while reducing the features space. Converting the features into lower granular features (i.e., generalisation) contributed to improving the results of the lexicosemantic representation. Our results showed no difference between the early and late fusion approaches. Temporal features performed poorly on their own. Conversely, spatial features were the most discriminative features, highlighting the need for robust methods for spatial entity extraction, disambiguation and representation in internet-based surveillance systems.

Roche, Mathieu; Teisseire, Maguelonne

Integrating Textual Data into Heterogeneous Data Ingestion Processing Conference

2021 IEEE International Conference on Big Data (Big Data), IEEE, Orlando, FL, USA, 2022, ISBN: 978-1-6654-3902-2.

Abstract | Links | BibTeX | Tags: Text mining

Syed, Mehtab; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne

Feature Selection for Sentiment Classification of COVID-19 Tweets: H-TFIDF Featuring BERT Proceedings Article

In: SciTePress, (Ed.): pp. 648-656, 2022, ISBN: 978-989-758-552-4.

Abstract | Links | BibTeX | Tags: Covid-19 (Coronavirus), OpenDataSet, Text mining

Syed, Mehtab Alam; Decoupes, Remy; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne

Spatial opinion mining from COVID-19 twitter data Journal Article

In: International Journal of Infectious Diseases, vol. 116, iss. 549, pp. 527, 2021.

Abstract | Links | BibTeX | Tags: Covid-19 (Coronavirus), Text mining

Valentin, Sarah; Lancelot, Renaud; Roche, Mathieu

Identifying associations between epidemiological entities in news data for animal disease surveillance Journal Article

In: Artificial Intelligence in Agriculture, vol. 5, pp. 163-174, 2021, ISSN: 2589-7217.

Abstract | Links | BibTeX | Tags: Text mining

@article{VALENTIN2021163,

title = {Identifying associations between epidemiological entities in news data for animal disease surveillance},

author = {Sarah Valentin and Renaud Lancelot and Mathieu Roche},

url = {https://www.sciencedirect.com/science/article/pii/S2589721721000246},

doi = {https://doi.org/10.1016/j.aiia.2021.07.003},

issn = {2589-7217},

year  = {2021},

date = {2021-01-01},

journal = {Artificial Intelligence in Agriculture},

volume = {5},

pages = {163-174},

abstract = {Event-based surveillance systems are at the crossroads of human and animal (and plant and ecosystem) health, epidemiology, statistics, and informatics. Thus, their deployment faces many challenges specific to each domain and their intersections, such as relations among automation, artificial intelligence, and expertise. In this context, our work pertins to the extraction of epidemiological events in textual data (i.e. news) by unsupervised methods. We define the event extraction task as detecting pairs of epidemiological entities (e.g. a disease name and location). The quality of the ranked lists of pairs was evaluated using specific ranking evaluation metrics. We used a publicly available annotated corpus of 438 documents (i.e. news articles) related to animal disease events. The statistical approach was able to detect event-related pairs of epidemiological features with a good trade-off between precision and recall. Our results showed that using a window of words outperformed document-based and sentence-based approaches, while reducing the probability of detecting false pairs. Our results indicated that Mutual Information was less adapted than the Dice coefficient for ranking pairs of features in the event extraction framework. We believe that Mutual Information would be more relevant for rare pair detection (i.e. weak signals), but requires higher manual curation to avoid false positive extraction pairs. Moreover, generalising the country-level spatial features enabled better discrimination (i.e. ranking) of relevant disease-location pairs for event extraction.},

keywords = {Text mining},

pubstate = {published},

tppubtype = {article}

}

Li, Sabrina L; Messina, Jane P; Pybus, Oliver G; Kraemer, Moritz U G; Gardner, Lauren

A review of models applied to the geographic spread of Zika virus Journal Article

In: Transactions of The Royal Society of Tropical Medicine and Hygiene, vol. 115, no. 9, pp. 956-964, 2021, ISSN: 0035-9203.

Abstract | Links | BibTeX | Tags: Text mining, Zika

Valentin, Sarah; Lancelot, Renaud; Roche, Mathieu

Automated Processing of Multilingual Online News for the Monitoring of Animal Infectious Diseases Proceedings Article

In: Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020), pp. 33–36, European Language Resources Association, Marseille, France, 2020, ISBN: 979-10-95546-65-8.

Abstract | Links | BibTeX | Tags: Text mining

Roche, Mathieu

COVID-19 and Media datasets: Period-and location-specific textual data mining Journal Article

In: Data in brief, vol. 33, pp. 106356, 2020.

Abstract | Links | BibTeX | Tags: Covid-19 (Coronavirus), OpenDataSet, Text mining

Valentin, Sarah; Arsevska, Elena; Rabatel, Julien; Falala, Sylvain; Mercier, Alizé; Lancelot, Renaud; Roche, Mathieu

PADI-web 3.0: A new framework for extracting and disseminating fine-grained information from the news for animal disease surveillance Journal Article

In: One Health, vol. 13, pp. 100357, 0000, ISSN: 2352-7714.

Links | BibTeX | Tags: OpenDataSet, Text mining

Syed, Mehtab Alam; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne

Feature Selection for Sentiment Classification of COVID-19 Tweets: H-TFIDF Featuring BERT Conference

Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF, INSTICC SciTePress, 0000, ISBN: 978-989-758-552-4.

Links | BibTeX | Tags: Covid-19 (Coronavirus), OpenDataSet, Text mining

Top-cited MOOD publications

Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK

Plessis et al., 2020

Evolution and epidemic spread of SARS-CoV-2 in Brazil

Darlan et al., 2020

Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study

Gilbert et al., 2020

Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B.1.1.7 emergence

Moritz et al., 2020

Crowding and the shape of COVID-19 epidemics

Rader et al., 2020

Potential short-term outcome of an uncontrolled COVID-19 epidemic in Lombardy, Italy, February to March 2020

Guzzetta et al., 2020

Novel coronavirus (2019-nCoV) early-stage importation risk to Europe, January 2020

Pullano et al., 2020

Useful Links

Newsletter

MOOD publications

Top-cited MOOD publications

Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK

Plessis et al., 2020

Evolution and epidemic spread of SARS-CoV-2 in Brazil

Darlan et al., 2020

Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study

Gilbert et al., 2020

Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B.1.1.7 emergence

Moritz et al., 2020

Crowding and the shape of COVID-19 epidemics

Rader et al., 2020

Potential short-term outcome of an uncontrolled COVID-19 epidemic in Lombardy, Italy, February to March 2020

Guzzetta et al., 2020

Novel coronavirus (2019-nCoV) early-stage importation risk to Europe, January 2020

Pullano et al., 2020

Antimicrobial Resistance case study

MOOD Case Studies