MOOD project is at the forefront of European research of infectious disease surveillance and modelling from a data science perspective, investigating the impact of global warming on disease outbreaks, and proposing innovations for building of One Health systems across Europe and the world.
In the table below all publications to which the MOOD project contributed are listed. Use the filter to select the most relevant articles.
SYED, Mehtab Alam; ARSEVSKA, Elena; ROCHE, Mathieu; TEISSEIRE, Maguelonne
GeospartRE: Extraction and Geocoding of spatial relation entities in textual documents Journal Article
In: Cartography and Geographic Information Science, 2023.
Links | BibTeX | Tags: OpenDataSet, Text mining
@article{nokey,
title = {GeospartRE: Extraction and Geocoding of spatial relation entities in textual documents},
author = {Mehtab Alam SYED and Elena ARSEVSKA and Mathieu ROCHE and Maguelonne TEISSEIRE},
doi = {https://doi.org/10.1080/15230406.2023.2264753},
year = {2023},
date = {2023-11-30},
urldate = {2023-11-30},
journal = {Cartography and Geographic Information Science},
keywords = {OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
Valentin, Sarah; Boudoua, Bahdja; Sewalk, Kara; Arınık, Nejat; Roche, Mathieu; Lancelot, Renaud; Arsevska, Elena
Dissemination of information in event-based surveillance, a case study of Avian Influenza Journal Article
In: PLoS ONE, 2023.
Abstract | Links | BibTeX | Tags: HPAI (Avian Influenza), OpenDataSet, Text mining
@article{nokey,
title = {Dissemination of information in event-based surveillance, a case study of Avian Influenza},
author = {Sarah Valentin and Bahdja Boudoua and Kara Sewalk and Nejat Arınık and Mathieu Roche and Renaud Lancelot and Elena Arsevska },
url = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0285341},
doi = {10.1371/journal.pone.0285341},
year = {2023},
date = {2023-09-05},
urldate = {2023-09-05},
journal = {PLoS ONE},
abstract = {Event-Based Surveillance (EBS) tools, such as HealthMap and PADI-web, monitor online news reports and other unofficial sources, with the primary aim to provide timely information to users from health agencies on disease outbreaks occurring worldwide. In this work, we describe how outbreak-related information disseminates from a primary source, via a secondary source, to a definitive aggregator, an EBS tool, during the 2018/19 avian influenza season. We analysed 337 news items from the PADI-web and 115 news articles from HealthMap EBS tools reporting avian influenza outbreaks in birds worldwide between July 2018 and June 2019. We used the sources cited in the news to trace the path of each outbreak. We built a directed network with nodes representing the sources (characterised by type, specialisation, and geographical focus) and edges representing the flow of information. We calculated the degree as a centrality measure to determine the importance of the nodes in information dissemination. We analysed the role of the sources in early detection (detection of an event before its official notification) to the World Organisation for Animal Health (WOAH) and late detection. A total of 23% and 43% of the avian influenza outbreaks detected by the PADI-web and HealthMap, respectively, were shared on time before their notification. For both tools, national and local veterinary authorities were the primary sources of early detection. The early detection component mainly relied on the dissemination of nationally acknowledged events by online news and press agencies, bypassing international reporting to the WAOH. WOAH was the major secondary source for late detection, occupying a central position between national authorities and disseminator sources, such as online news. PADI-web and HealthMap were highly complementary in terms of detected sources, explaining why 90% of the events were detected by only one of the tools. We show that current EBS tools can provide timely outbreak-related information and priority news sources to improve digital disease surveillance.
Figures},
keywords = {HPAI (Avian Influenza), OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
Figures
Decoupes, Rémy; Roche, Mathieu; Teisseire, Maguelonne
GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring Journal Article
In: Intelligent Data Analysis, pp. 1-25, 2023.
Abstract | Links | BibTeX | Tags: OpenDataSet, Text mining
@article{nokey,
title = {GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring},
author = {Rémy Decoupes and Mathieu Roche and Maguelonne Teisseire},
url = {https://content.iospress.com/articles/intelligent-data-analysis/ida230040},
doi = {10.3233/IDA-230040},
year = {2023},
date = {2023-07-06},
urldate = {2023-07-06},
journal = {Intelligent Data Analysis},
pages = {1-25},
abstract = {Crises such as natural disasters and public health emergencies generate vast amounts of text data, making it challenging to classify the information into relevant categories. Acquiring expert-labeled data for such scenarios can be difficult, leading to limited training datasets for text classification by fine-tuning BERT-like models. Unfortunately, traditional data augmentation techniques only slightly improve F1-scores. How can data augmentation be used to obtain better results in this applied domain? In this paper, using neural network explicability methods, we aim to highlight that fine-tuned BERT-like models on crisis corpora give too much importance to spatial information to make their predictions. This overfitting of spatial information limits their ability to generalize especially when the event which occurs in a place has evolved and changed since the training dataset has been built. To reduce this bias, we propose GeoNLPlify,1
a novel data augmentation technique that leverages spatial information to generate new labeled data for text classification related to crises. Our approach aims to address overfitting without necessitating modifications to the underlying model architecture, distinguishing it from other prevalent methods employed to combat overfitting. Our results show that GeoNLPlify significantly improves F1-scores, demonstrating the potential of the spatial information for data augmentation for crisis-related text classification tasks. In order to evaluate the contribution of our method, GeoNLPlify is applied to three public datasets (PADI-web, CrisisNLP and SST2) and compared with classical natural language processing data augmentations.},
keywords = {OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
a novel data augmentation technique that leverages spatial information to generate new labeled data for text classification related to crises. Our approach aims to address overfitting without necessitating modifications to the underlying model architecture, distinguishing it from other prevalent methods employed to combat overfitting. Our results show that GeoNLPlify significantly improves F1-scores, demonstrating the potential of the spatial information for data augmentation for crisis-related text classification tasks. In order to evaluate the contribution of our method, GeoNLPlify is applied to three public datasets (PADI-web, CrisisNLP and SST2) and compared with classical natural language processing data augmentations.
Arınık, Nejat; Bortel, Wim Van; Boudoua, Bahdja; Busani, Luca; Decoupes, Rémy; Interdonato, Roberto; Kafando, Rodrique; van Kleef, Esther; Roche, Mathieu; Syed, Mehtab Alam; Teisseire, Maguelonne
An annotated dataset for event-based surveillance of antimicrobial resistance Journal Article
In: ScienceDirect, 2023.
Abstract | Links | BibTeX | Tags: AMR (Antimicrobial Resistance), OpenDataSet, Text mining
@article{nokey,
title = {An annotated dataset for event-based surveillance of antimicrobial resistance},
author = {Nejat Arınık and Wim Van Bortel and Bahdja Boudoua and Luca Busani and Rémy Decoupes and Roberto Interdonato and Rodrique Kafando and Esther van Kleef and Mathieu Roche and Mehtab Alam Syed and Maguelonne Teisseire
},
url = {https://www.sciencedirect.com/science/article/pii/S2352340922010733?via%3Dihub},
doi = {10.1016/j.dib.2022.108870},
year = {2023},
date = {2023-02-08},
urldate = {2023-02-08},
journal = {ScienceDirect},
abstract = {This paper presents an annotated dataset used in the MOOD Antimicrobial Resistance (AMR) hackathon, hosted in Montpellier, June 2022. The collected data concerns unstructured data from news items, scientific publications and national or international reports, collected from four event-based surveillance (EBS) Systems, i.e. ProMED, PADI-web, HealthMap and MedISys. Data was annotated by relevance for epidemic intelligence (EI) purposes with the help of AMR experts and an annotation guideline. Extracted data were intended to include relevant events on the emergence and spread of AMR such as reports on AMR trends, discovery of new drug-bug resistances, or new AMR genes in human, animal or environmental reservoirs. This dataset can be used to train or evaluate classification approaches to automatically identify written text on AMR events across the different reservoirs and sectors of One Health (i.e. human, animal, food, environmental sources, such as soil and waste water) in unstructured data (e.g. news, tweets) and classify these events by relevance for EI purposes.
},
keywords = {AMR (Antimicrobial Resistance), OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
Valentin, Sarah; Arsevska, Elena; Mercier, Alizé; Falala, Sylvain; Rabatel, Julien; Lancelot, Renaud; Roche, Mathieu
PADI-web: An Event-Based Surveillance System for Detecting, Classifying and Processing Online News Conference
Human Language Technology. Challenges for Computer Science and Linguistics, vol. 12598, Springer International Publishing, 2022, ISBN: 978-3-030-66526-5.
Abstract | Links | BibTeX | Tags: ASF (African Swine Fever), HPAI (Avian Influenza), Text mining
@conference{@InProceedings{10.1007/978-3-030-66527-2_7,
title = {PADI-web: An Event-Based Surveillance System for Detecting, Classifying and Processing Online News},
author = {Sarah Valentin and Elena Arsevska and Alizé Mercier and Sylvain Falala and Julien Rabatel and Renaud Lancelot and Mathieu Roche},
editor = {Vetulani, Zygmunt and Paroubek, Patrick and Kubis, Marek},
url = {https://link.springer.com/chapter/10.1007/978-3-030-66527-2_7},
doi = {https://doi.org/10.1007/978-3-030-66527-2_7},
isbn = {978-3-030-66526-5},
year = {2022},
date = {2022-12-31},
urldate = {2022-12-31},
booktitle = {Human Language Technology. Challenges for Computer Science and Linguistics},
volume = {12598},
pages = {87-101},
publisher = {Springer International Publishing},
abstract = {The Platform for Automated Extraction of Animal Disease Information from the Web (PADI-web) is a multilingual text mining tool for automatic detection, classification, and extraction of disease outbreak information from online news articles. PADI-web currently monitors the Web for nine animal infectious diseases and eight syndromes in five animal hosts. The classification module is based on a supervised machine learning approach to filter the relevant news with an overall accuracy of 0.94. The classification of relevant news between 5 topic categories (confirmed, suspected or unknown outbreak, preparedness and impact) obtained an overall accuracy of 0.75. In the first six months of its implementation (January--June 2016), PADI-web detected 73{%} of the outbreaks of African swine fever; 20{%} of foot-and-mouth disease; 13{%} of bluetongue, and 62{%} of highly pathogenic avian influenza. The information extraction module of PADI-web obtained F-scores of 0.80 for locations, 0.85 for dates, 0.95 for diseases, 0.95 for hosts, and 0.85 for case numbers},
keywords = {ASF (African Swine Fever), HPAI (Avian Influenza), Text mining},
pubstate = {published},
tppubtype = {conference}
}
Valentin, Sarah; Arsevska, Elena; al.,
Elaboration of a new framework for fine-grained epidemiological annotation Journal Article
In: 2022.
Abstract | Links | BibTeX | Tags: OpenDataSet, Text mining
@article{nokey,
title = {Elaboration of a new framework for fine-grained epidemiological annotation},
author = {Sarah Valentin and Elena Arsevska and al.
},
url = {https://www.nature.com/articles/s41597-022-01743-2},
doi = {10.1038/s41597-022-01743-2},
year = {2022},
date = {2022-10-26},
urldate = {2022-10-26},
abstract = {Event-based surveillance (EBS) gathers information from a variety of data sources, including online news articles. Unlike the data from formal reporting, the EBS data are not structured, and their interpretation can overwhelm epidemic intelligence (EI) capacities in terms of available human resources. Therefore, diverse EBS systems that automatically process (all or part of) the acquired nonstructured data from online news articles have been developed. These EBS systems (e.g., GPHIN, HealthMap, MedISys, ProMED, PADI-web) can use annotated data to improve the surveillance systems. This paper describes a framework for the annotation of epidemiological information in animal disease-related news articles. We provide annotation guidelines that are generic and applicable to both animal and zoonotic infectious diseases, regardless of the pathogen involved or its mode of transmission (e.g., vector-borne, airborne, by contact). The framework relies on the successive annotation of all the sentences from a news article. The annotator evaluates the sentences in a specific epidemiological context, corresponding to the publication date of the news article.
},
keywords = {OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
Schaeffer, Camille; Interdonato, Roberto; Lancelot, Renaud; Roche, Mathieu; Teisseire, Maguelonne
Labeled entities from social media data related to avian influenza disease Journal Article Forthcoming
In: Data in Brief, vol. 43, pp. 108317, Forthcoming, ISSN: 2352-3409.
Abstract | Links | BibTeX | Tags: HPAI (Avian Influenza), OpenDataSet, Text mining
@article{@article{SCHAEFFER2022108317,,
title = {Labeled entities from social media data related to avian influenza disease},
author = {Camille Schaeffer and Roberto Interdonato and Renaud Lancelot and Mathieu Roche and Maguelonne Teisseire},
url = {https://www.sciencedirect.com/science/article/pii/S2352340922005194},
doi = {https://doi.org/10.1016/j.dib.2022.108317},
issn = {2352-3409},
year = {2022},
date = {2022-08-01},
urldate = {2022-08-01},
journal = {Data in Brief},
volume = {43},
pages = {108317},
abstract = {This dataset is composed by spatial (e.g. location) and thematic (e.g. diseases, symptoms, virus) entities concerning avian influenza in social media (textual) data in English. It was created from three corpora: the first one includes 10 transcriptions of YouTube videos and 70 tweets manually annotated. The second corpus is composed by the same textual data but automatically annotated with Named Entity Recognition (NER) tools. These two corpora have been built to evaluate NER tools and apply them to a bigger corpus. The third corpus is composed of 100 YouTube transcriptions automatically annotated with NER tools. The aim of the annotation task is to recognize spatial information such as the names of the cities and epidemiological information such as the names of the diseases. An annotation guideline is provided in order to ensure a unified annotation and to help the annotators. This dataset can be used to train or evaluate Natural Language Processing (NLP) approaches such as specialized entity recognition.},
keywords = {HPAI (Avian Influenza), OpenDataSet, Text mining},
pubstate = {forthcoming},
tppubtype = {article}
}
Roche, Mathieu; Arsevska, Elena; Valentin, Sarah; Falala, Sylvain; Rabatel, Julien; Lancelot, Renaud
How Textual Datasets Enhance the PADI-Web Tool? Journal Article
In: SciTePress, 2022.
Abstract | Links | BibTeX | Tags: Text mining
@article{nokey,
title = {How Textual Datasets Enhance the PADI-Web Tool?},
author = {Mathieu Roche and Elena Arsevska and Sarah Valentin and Sylvain Falala and Julien Rabatel and Renaud Lancelot
},
url = {https://www.scitepress.org/Link.aspx?doi=10.5220/0011590400003318},
doi = {10.5220/0011590400003318},
year = {2022},
date = {2022-07-27},
urldate = {2022-07-27},
journal = {SciTePress},
abstract = {The ability to rapidly detect outbreaks of emerging infectious diseases is a health priority of global health agencies. In this context, event-based surveillance (EBS) systems gather outbreak-related information from heterogeneous data sources, including online news articles. EBS systems, thus, increasingly marshal text-mining methods to alleviate the amount of manual curation of the freely available text. This paper documents the use of datasets obtained through an EBS system, PADI-Web (Platform for Automated extraction of Disease Information from the web), dedicated to digital outbreak detection in animal health. This paper describes the datasets used for improving 3 important tasks related to PADI-Web, i.e., news classification, information extraction and dissemination.},
keywords = {Text mining},
pubstate = {published},
tppubtype = {article}
}
Syed, Mehtab Alam; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne
A Data-Driven Score Model to Assess Online News Articles in Event-Based Surveillance System Conference
Information Management and Big Data, vol. 1577, Springer International Publishing, 2022.
Abstract | BibTeX | Tags: Text mining
@conference{@InProceedings{10.1007/978-3-031-04447-2_18,
title = {A Data-Driven Score Model to Assess Online News Articles in Event-Based Surveillance System},
author = {Mehtab Alam Syed and Elena Arsevska and Mathieu Roche and Maguelonne Teisseire},
editor = {Juan Antonio Lossio-Ventura, Eduardo Díaz, Carlos Gavidia-Calderon, Alan Demétrius Baria Valejo, Hugo Alatrista-Salas
},
year = {2022},
date = {2022-04-20},
urldate = {2022-04-20},
booktitle = {Information Management and Big Data},
volume = {1577},
pages = {264-280},
publisher = {Springer International Publishing},
abstract = {Online news sources are popular resources for learning about current health situations and developing event-based surveillance (EBS) systems. However, having access to diverse information originating from multiple sources can misinform stakeholders, eventually leading to false health risks. The existing literature contains several techniques for performing data quality evaluation to minimize the effects of misleading information. However, these methods only rely on the extraction of spatiotemporal information for representing health events. To address this research gap, a score-based technique is proposed to quantify the data quality of online news articles through three assessment measures: 1) news article metadata, 2) content analysis, and 3) epidemiological entity extraction with NLP to weight the contextual information. The results are calculated using classification metrics with two evaluation approaches: 1) a strict approach and 2) a flexible approach. The obtained results show significant enhancement in the data quality by filtering irrelevant news, which can potentially reduce false alert generation in EBS systems.},
keywords = {Text mining},
pubstate = {published},
tppubtype = {conference}
}
Valentin, Sarah; Lancelot, Renaud; Roche, Mathieu
Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance Journal Article
In: Information Processing in Agriculture, 2022, ISSN: 2214-3173.
Abstract | Links | BibTeX | Tags: Text mining
@article{@article{VALENTIN2022,
title = {Fusion of spatiotemporal and thematic features of textual data for animal disease surveillance},
author = {Sarah Valentin and Renaud Lancelot and Mathieu Roche},
url = {https://www.sciencedirect.com/science/article/pii/S2214317322000312},
doi = {https://doi.org/10.1016/j.inpa.2022.03.004},
issn = {2214-3173},
year = {2022},
date = {2022-03-28},
journal = {Information Processing in Agriculture},
abstract = {Several internet-based surveillance systems have been created to monitor the web for animal health surveillance. These systems collect a large amount of news dealing with outbreaks related to animal diseases. Automatically identifying news articles that describe the same outbreak event is a key step to quickly detect relevant epidemiological information while alleviating manual curation of news content. This paper addresses the task of retrieving news articles that are related in epidemiological terms. We tackle this issue using text mining and feature fusion methods. The main objective of this paper is to identify a textual representation in which two articles that share the same epidemiological content are close. We compared two types of representations (i.e., features) to represent the documents: (i) morphosyntactic features (i.e., selection and transformation of all terms from the news, based on classical textual processing steps) and (ii) lexicosemantic features (i.e., selection, transformation and fusion of epidemiological terms including diseases, hosts, locations and dates). We compared two types of term weighing (i.e., Boolean and TF-IDF) for both representations. To combine and transform lexicosemantic features, we compared two data fusion techniques (i.e., early fusion and late fusion) and the effect of features generalisation, while evaluating the relative importance of each type of feature. We conducted our analysis using a corpus composed of a subset of news articles in English related to animal disease outbreaks. Our results showed that the combination of relevant lexicosemantic (epidemiological) features using fusion methods improves classical morphosyntactic representation in the context of disease-related news retrieval. The lexicosemantic representation based on TF-IDF and feature generalisation (F-measure = 0.92, r-precision = 0.58) outperformed the morphosyntactic representation (F-measure = 0.89, r-precision = 0.45), while reducing the features space. Converting the features into lower granular features (i.e., generalisation) contributed to improving the results of the lexicosemantic representation. Our results showed no difference between the early and late fusion approaches. Temporal features performed poorly on their own. Conversely, spatial features were the most discriminative features, highlighting the need for robust methods for spatial entity extraction, disambiguation and representation in internet-based surveillance systems.},
keywords = {Text mining},
pubstate = {published},
tppubtype = {article}
}
Roche, Mathieu; Teisseire, Maguelonne
Integrating Textual Data into Heterogeneous Data Ingestion Processing Conference
2021 IEEE International Conference on Big Data (Big Data), IEEE, Orlando, FL, USA, 2022, ISBN: 978-1-6654-3902-2.
Abstract | Links | BibTeX | Tags: Text mining
@conference{@INPROCEEDINGS{9671759,
title = {Integrating Textual Data into Heterogeneous Data Ingestion Processing},
author = {Mathieu Roche and Maguelonne Teisseire},
url = {https://ieeexplore.ieee.org/document/9671759},
doi = {10.1109/BigData52589.2021.9671759},
isbn = {978-1-6654-3902-2},
year = {2022},
date = {2022-01-13},
urldate = {2022-01-13},
booktitle = {2021 IEEE International Conference on Big Data (Big Data)},
pages = {6008-6010},
publisher = {IEEE},
address = {Orlando, FL, USA},
abstract = {In this abstract, two methods for integrating textual data and textual features into ingestion processing are summarized. The first method involves integrating all features, including textual features, into dedicated frameworks, such as by using machine learning techniques. In the second method, text and textual features, such as keywords, are used to explain results returned by heterogeneous data mining. In this context, it is necessary to link data (e.g., databases, images, etc.) and/or obtained results with textual data (e.g., documents and keywords).},
keywords = {Text mining},
pubstate = {published},
tppubtype = {conference}
}
Syed, Mehtab; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne
Feature Selection for Sentiment Classification of COVID-19 Tweets: H-TFIDF Featuring BERT Proceedings Article
In: SciTePress, (Ed.): pp. 648-656, 2022, ISBN: 978-989-758-552-4.
Abstract | Links | BibTeX | Tags: Covid-19 (Coronavirus), OpenDataSet, Text mining
@inproceedings{@conference{healthinf22,,
title = {Feature Selection for Sentiment Classification of COVID-19 Tweets: H-TFIDF Featuring BERT},
author = {Mehtab Syed and Elena Arsevska and Mathieu Roche and Maguelonne Teisseire},
editor = {SciTePress},
url = {https://www.scitepress.org/Link.aspx?doi=10.5220/0010887800003123},
doi = {10.5220/0010887800003123},
isbn = {978-989-758-552-4},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
journal = {Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF},
pages = {648-656},
abstract = {In the first quarter of 2020, the World Health Organization (WHO) declared COVID-19 a public health emergency around the globe. Different users from all over the world shared their opinions about COVID-19 on social media platforms such as Twitter and Facebook. At the beginning of the pandemic, it became relevant to assess public opinions regarding COVID-19 using data available on social media. We used a recently proposed hierarchy-based measure for tweet analysis (H-TFIDF) for feature extraction over sentiment classification of tweets. We assessed how H-TFIDF and concatenation of H-TFIDF with bidirectional encoder representations from transformers (BH-TFIDF) perform over state-of-the-art bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF) features for sentiment classification of COVID-19 tweets. A uniform experimental setup of the training-test (90% and 10%) split scheme was used to train the classifier. Moreover, evaluation was performed with the gold standard expert labeled dataset to measure precision for each binary classified class. },
keywords = {Covid-19 (Coronavirus), OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {inproceedings}
}
Syed, Mehtab Alam; Decoupes, Remy; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne
Spatial opinion mining from COVID-19 twitter data Journal Article
In: International Journal of Infectious Diseases, vol. 116, iss. 549, pp. 527, 2021.
Abstract | Links | BibTeX | Tags: Covid-19 (Coronavirus), Text mining
@article{nokey,
title = {Spatial opinion mining from COVID-19 twitter data},
author = {Mehtab Alam Syed and Remy Decoupes and Elena Arsevska and Mathieu Roche and Maguelonne Teisseire},
url = {https://www.ijidonline.com/article/S1201-9712(21)00957-7/pdf},
doi = {https://doi.org/10.1016/j.ijid.2021.12.065},
year = {2021},
date = {2021-11-06},
urldate = {2021-11-06},
journal = {International Journal of Infectious Diseases},
volume = {116},
issue = {549},
pages = {527},
abstract = {: In the first quarter of 2020, World Health Organization (WHO) declared COVID-19 as a public health emergency around the globe. Therefore, different users from all over the world shared their thoughts about COVID-19 on social media platforms i.e., Twitter, Facebook etc. So, it is important to analyze public opinions about COVID-19 from different regions over different period of time. To fulfill the spatial analysis issue, a previous work called H-TF-IDF (Hierarchy-based measure for tweet analysis) for term extraction from tweet data has been proposed. In this work, we focus on the sentiment analysis performed on terms selected by H-TFIDF for spatial tweets groups to know local situations during the ongoing epidemic COVID-19 over different time frames.},
keywords = {Covid-19 (Coronavirus), Text mining},
pubstate = {published},
tppubtype = {article}
}
Valentin, Sarah; Lancelot, Renaud; Roche, Mathieu
Identifying associations between epidemiological entities in news data for animal disease surveillance Journal Article
In: Artificial Intelligence in Agriculture, vol. 5, pp. 163-174, 2021, ISSN: 2589-7217.
Abstract | Links | BibTeX | Tags: Text mining
@article{VALENTIN2021163,
title = {Identifying associations between epidemiological entities in news data for animal disease surveillance},
author = {Sarah Valentin and Renaud Lancelot and Mathieu Roche},
url = {https://www.sciencedirect.com/science/article/pii/S2589721721000246},
doi = {https://doi.org/10.1016/j.aiia.2021.07.003},
issn = {2589-7217},
year = {2021},
date = {2021-01-01},
journal = {Artificial Intelligence in Agriculture},
volume = {5},
pages = {163-174},
abstract = {Event-based surveillance systems are at the crossroads of human and animal (and plant and ecosystem) health, epidemiology, statistics, and informatics. Thus, their deployment faces many challenges specific to each domain and their intersections, such as relations among automation, artificial intelligence, and expertise. In this context, our work pertins to the extraction of epidemiological events in textual data (i.e. news) by unsupervised methods. We define the event extraction task as detecting pairs of epidemiological entities (e.g. a disease name and location). The quality of the ranked lists of pairs was evaluated using specific ranking evaluation metrics. We used a publicly available annotated corpus of 438 documents (i.e. news articles) related to animal disease events. The statistical approach was able to detect event-related pairs of epidemiological features with a good trade-off between precision and recall. Our results showed that using a window of words outperformed document-based and sentence-based approaches, while reducing the probability of detecting false pairs. Our results indicated that Mutual Information was less adapted than the Dice coefficient for ranking pairs of features in the event extraction framework. We believe that Mutual Information would be more relevant for rare pair detection (i.e. weak signals), but requires higher manual curation to avoid false positive extraction pairs. Moreover, generalising the country-level spatial features enabled better discrimination (i.e. ranking) of relevant disease-location pairs for event extraction.},
keywords = {Text mining},
pubstate = {published},
tppubtype = {article}
}
Li, Sabrina L; Messina, Jane P; Pybus, Oliver G; Kraemer, Moritz U G; Gardner, Lauren
A review of models applied to the geographic spread of Zika virus Journal Article
In: Transactions of The Royal Society of Tropical Medicine and Hygiene, vol. 115, no. 9, pp. 956-964, 2021, ISSN: 0035-9203.
Abstract | Links | BibTeX | Tags: Text mining, Zika
@article{10.1093/trstmh/trab009,
title = {A review of models applied to the geographic spread of Zika virus},
author = {Sabrina L Li and Jane P Messina and Oliver G Pybus and Moritz U G Kraemer and Lauren Gardner},
url = {https://doi.org/10.1093/trstmh/trab009},
doi = {10.1093/trstmh/trab009},
issn = {0035-9203},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
journal = {Transactions of The Royal Society of Tropical Medicine and Hygiene},
volume = {115},
number = {9},
pages = {956-964},
abstract = {In recent years, Zika virus (ZIKV) has expanded its geographic range and in 2015–2016 caused a substantial epidemic linked to a surge in developmental and neurological complications in newborns. Mathematical models are powerful tools for assessing ZIKV spread and can reveal important information for preventing future outbreaks. We reviewed the literature and retrieved modelling studies that were developed to understand the spatial epidemiology of ZIKV spread and risk. We classified studies by type, scale, aim and applications and discussed their characteristics, strengths and limitations. We examined the main objectives of these models and evaluated the effectiveness of integrating epidemiological and phylogeographic data, along with socioenvironmental risk factors that are known to contribute to vector–human transmission. We also assessed the promising application of human mobility data as a real-time indicator of ZIKV spread. Lastly, we summarised model validation methods used in studies to ensure accuracy in models and modelled outcomes. Models are helpful for understanding ZIKV spread and their characteristics should be carefully considered when developing future modelling studies to improve arbovirus surveillance.},
keywords = {Text mining, Zika},
pubstate = {published},
tppubtype = {article}
}
Valentin, Sarah; Lancelot, Renaud; Roche, Mathieu
Automated Processing of Multilingual Online News for the Monitoring of Animal Infectious Diseases Proceedings Article
In: Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020), pp. 33–36, European Language Resources Association, Marseille, France, 2020, ISBN: 979-10-95546-65-8.
Abstract | Links | BibTeX | Tags: Text mining
@inproceedings{valentin-etal-2020-automated,
title = {Automated Processing of Multilingual Online News for the Monitoring of Animal Infectious Diseases},
author = {Sarah Valentin and Renaud Lancelot and Mathieu Roche},
url = {https://aclanthology.org/2020.multilingualbio-1.6},
isbn = {979-10-95546-65-8},
year = {2020},
date = {2020-05-01},
urldate = {2020-05-01},
booktitle = {Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)},
pages = {33--36},
publisher = {European Language Resources Association},
address = {Marseille, France},
abstract = {The Platform for Automated extraction of animal Disease Information from the web (PADI-web) is an automated system which monitors the web for monitoring and detecting emerging animal infectious diseases. The tool automatically collects news via customised multilingual queries, classifies them and extracts epidemiological information. We detail the processing of multilingual online sources by PADI-web and analyse the translated outputs in a case study},
keywords = {Text mining},
pubstate = {published},
tppubtype = {inproceedings}
}
Roche, Mathieu
COVID-19 and Media datasets: Period-and location-specific textual data mining Journal Article
In: Data in brief, vol. 33, pp. 106356, 2020.
Abstract | Links | BibTeX | Tags: Covid-19 (Coronavirus), OpenDataSet, Text mining
@article{roche2020covid,
title = {COVID-19 and Media datasets: Period-and location-specific textual data mining},
author = {Mathieu Roche},
doi = {10.1016/j.dib.2020.106356},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
journal = {Data in brief},
volume = {33},
pages = {106356},
publisher = {Elsevier},
abstract = {The vocabulary used in news on a disease such as COVID-19 changes according the period [4]. This aspect is discussed on the basis of MEDISYS-sourced media datasets via two studies. The first focuses on terminology extraction and the second on period prediction according to the textual content using machine learning approaches.},
keywords = {Covid-19 (Coronavirus), OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
Valentin, Sarah; Arsevska, Elena; Rabatel, Julien; Falala, Sylvain; Mercier, Alizé; Lancelot, Renaud; Roche, Mathieu
PADI-web 3.0: A new framework for extracting and disseminating fine-grained information from the news for animal disease surveillance Journal Article
In: One Health, vol. 13, pp. 100357, 0000, ISSN: 2352-7714.
Links | BibTeX | Tags: OpenDataSet, Text mining
@article{@article{VALENTIN2021100357,
title = {PADI-web 3.0: A new framework for extracting and disseminating fine-grained information from the news for animal disease surveillance},
author = {Sarah Valentin and Elena Arsevska and Julien Rabatel and Sylvain Falala and Alizé Mercier and Renaud Lancelot and Mathieu Roche},
url = {https://www.sciencedirect.com/science/article/pii/S2352771421001476},
doi = {https://doi.org/10.1016/j.onehlt.2021.100357},
issn = {2352-7714},
journal = {One Health},
volume = {13},
pages = {100357},
keywords = {OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {article}
}
Syed, Mehtab Alam; Arsevska, Elena; Roche, Mathieu; Teisseire, Maguelonne
Feature Selection for Sentiment Classification of COVID-19 Tweets: H-TFIDF Featuring BERT Conference
Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF, INSTICC SciTePress, 0000, ISBN: 978-989-758-552-4.
Links | BibTeX | Tags: Covid-19 (Coronavirus), OpenDataSet, Text mining
@conference{@conference{healthinf22,
title = {Feature Selection for Sentiment Classification of COVID-19 Tweets: H-TFIDF Featuring BERT},
author = {Syed, Mehtab Alam and Arsevska, Elena and Roche, Mathieu and Teisseire, Maguelonne},
url = {https://www.scitepress.org/Link.aspx?doi=10.5220/0010887800003123},
doi = {10.5220/0010887800003123},
isbn = {978-989-758-552-4},
booktitle = {Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF},
pages = {648-656},
publisher = {SciTePress},
organization = {INSTICC},
keywords = {Covid-19 (Coronavirus), OpenDataSet, Text mining},
pubstate = {published},
tppubtype = {conference}
}