Phylodynamic Datasets

Introduction

In the event of an outbreak, it is important to understand what the pathogen is, which hosts it infects, where it came from, and what its transmission, virulence, and drug resistant properties are. Sequencing the outbreak pathogen provides the opportunity to deduce this information by comparison with previous known strains. Additionally generation of phylogenetic trees and phylodynamic models enables the recent history of the strain to be inferred, together with parameter values which may be used in forward simulations of the outbreak.

Consequently, in addition to sequences obtained from the outbreak its self (which in the beginning of the outbreak may only be a small number of sequences), it is necessary to include a number of background sequences in the analysis. The background sequences should contain high quality genetic information (complete genomes if possible), have known host and location origin, and have a recorded sampling date. The sequences should also be aligned to each other, and/or a known reference strain - enabling phylogenetic tree creation and identification of insertions/deletion and mutations (these can affect virulence, transmissibility and drug resistance).

Data

Public databases contain previously generated pathogen sequences, and in the Data section on this site you will find a curated set of useful background sequences and other information useful for phylodynamic analyses and the further development of phylodynamic models.

The data consists of:

  • Fasta format sequence alignments containing representative strains
  • Tables corresponding to the sequence alignments containing sequence name, isolation date, host, location, subtype or strain type.
  • Phylogenetic trees showing the main lineages

Note that differing amounts of information exist, and that some pathogens have many more publically available sequences than others. Using the number of records in GenBank, the number of sequences range from 5000+ for FMDV and AIV to less than 100 for Schmallenberg.

Pathogens of interest: