ICTV Updated taxonomy approach

I am sending this to the ICTV EC. I would be grateful for comments on it both supportive and opposing

Discussion on ICTV taxonomy approach

Roger Hull

Rogerhull@btinternet.com

A.    Background

1.      ICTV Statutes. Article3, Objectives

“The objects of the Committee shall be for the public benefit and in particular to advance education in the taxonomy of viruses and in furtherance thereof”.

2.      Classification and taxonomy

 

a)      Why classify?

Biological classification is the arrangement of a multitude of organisms (viruses) into groups “classes” based on common characteristics so that the human mind can comprehend them more easily.

Taxonomy (Greek Taxis = arrangement) is the arrangement of “classes” in a hierarchical structure (e.g. species, genus, family, order etc.) each level based on less similar properties. The basic keystone level of biological taxonomy is species.

Thus, classification groups organisms and taxonomy describes relationships between them.

 

b)      Uses of classification and taxonomy (Hull, 2014)

·         It helps communication between virologists.

·         It enables properties of new viruses to be predicted

·         It reveals possible evolutionary relationships

·         It helps communication between virologists and non-virologists (e.g. regulators, advisers, other stakeholders, potential funders,  lay people, etc.).

 

3)      Current and upcoming problems in the ICTV taxonomy system.

 

a)      The basic Linnaean system of ascending taxa with species (currently having binomial names made up from the genus name and species name) forming the root works reasonably well at present for viruses.

 

b)      However, there are problems with the binomial naming of species. In the 172 virus taxonomy proposals for ICTV-wide ratification (February 2022) the genus part of the binomial was consistent but there was considerable variation of the species part of the name For example, species were given names based on 4 letter epithet contraction of host species name followed by a number or an epithet generally derived from the common name of the representative virus (proposal 2021.003F); proposal 2021.009F suggested a mixture of Greek (alpha, beta etc) and Latin binomials; proposal 2021.007P suggested an acronym derived from the virus common name sometimes with numbering (proposal 2021.002P). This gives a very confusing appearance especially for communication between virologists and non-virologists.

 

c)      The characters used in defining and demarcating virus species are varied and much discussed (see Calisher et al., 2018; Gorbalenya and Siddell, 2021).

 

d)      There are potentially massive problems with new viruses based on metagenomic sequences (termed here metagenomic viruses but also termed uncultivated viruses (Dutilh et al., 2021)) which, provided certain criteria are adhered to, are incorporated into the ICTV taxonomy (Simmonds et al., 2017; Dutilh et al., 2021).

·         The most recent edition of the ICTV Virus Metadata Resources (October 2021) lists about 104 virus species but advances in sequencing technology (e.g. next generation sequencing) and analyses of these sequences are eliciting terms like “astounding number of previously unknown viruses” and “the vastness of the virosphere” (Dutilh et al., 2021). Examples of the extent of the virosphere include a compendium of 54,118 virus species in the human gut microbiome, 92% of which were not found in existing databases (Nayfach et al. 2021); Georghegan and Holmes (2017) estimated that more than 99.99% of the virosphere of eukaryotic organisms remains undiscovered and Hull and Rima (2020) predicted the overall virosphere to comprise of the order of 109 virus species.

·         Strict quality control has to be given to the analyses of metagenomic (and meta-transcriptomic) data to ensure that they are of an unknown virus species ( Kieft and Anantharaman, 2019; Cobbin et al., 2021).

·         Virus metagenomic sequences are usually derived from environmental sources rather than a host. The environmental source can be related to a host e.g. human faeces (Nayfach et al., 2021) but even then the actual virus host might be gut bacteria, cells from the human gut or even food eaten by the human. More often, the source is samples taken from the broader environment, e.g. marine or fresh water, a mixture of vegetation or a collection of insects; in this case, the host is not known. Even if the host is known or suspected, co-evolution of the virus and host often leads to an asymptomatic infection (e.g. Kaján et al., 2020; Hull, 2014, Chapter 8 VII).

.

B.     Needs for a new approach to classification/taxonomy.

1.      As noted in section A2b, there are three basic needs for a classification/taxonomy system – giving a structured arrangement of different viruses so that the human mind can comprehend the subject more easily, communication and prediction e.g  properties of new related viruses, possible evolution of viruses.

2.      The objectives of the ICTV are for “public benefit and enhancing education” (section A1) which indicate that their system for classification and taxonomy should facilitate communication between all the parties (the customers or stakeholders) involved in viruses. The customers/stakeholders covers a range of people including:

·         Virologists

Virologists have a wide range of uses for an up-to-date taxonomy system including planning research, comparing and predicting properties and how they function, preparing publications and talks and general communication with other virologists.

·         Teachers

Teaching, including what is a virus, the types and composition of viruses and new advances in virology, is not only to students but also to wider audiences e.g. updating research virologists and preparing decision makers.

·         Industry

Industry needs up-to-date information on the relationships between viruses and on newly emerging viruses which could be of potential interest. This includes viruses (e.g. bacteriophage) which are potentially damaging to industrial processes, for medicine (e.g. vaccine construction), for biotechnology(e.g. virus gene vectors, nanotechnology) (see Viruses | Special Issue : The Impact and Applications of Phages in the Food Industry and Agriculture (mdpi.com); Varanda, et al., 2021)

·         Regulators

e.g. government officials who draw up national regulations on trade especially in animals and plants. (see Roberts and Andrews, 2018; Chalam, 2020)

·         Quarantine officers

The people who implement animal and plant regulations have to know what they are looking for and also recognise new infection occurrences and identify potential virus risk introductions..

·         Advisors

Advisors, such as extension officers involved in animal and plant and plant advisory services, have to communicate with the public (e.g. farmers) (see problem of binomial naming of plant viruses with long names in Hull and Rima, 2020); they also advise others such as regulators and quarantine officers who may not be virologists.

The latter 3 groups have to be aware of the possibility of introducing an asymptomatic virus host in which the virus has co-evolved with the host but is pathogenic in a new host (the wrong virus in the wrong host at the wrong time). Examples are HIV (thought to have initially spread to humans from asymptomatic chimpanzees with Central Africa) and Cacao swollen shoot [thought to have spread from asymptomatic shade trees to cacao brought from Brazil (where the virus does not exist) to West Africa]. In both these cases the virus might have been detected early in metagenomic samples.

·         Potential funders

Funding bodies require up-to-date information in designing project areas and applicants need collated information for writing applications.

 

3.      The potentially large numbers of metagenomic viruses, coupled with the lack of information on host and symptomatology is leading to immense problems in giving binomial names to species. It was noted by Hull and Rima (2020) that more than 6 or 7 letters would be needed for the abbreviated epithet of an increasing number of virus species; even that would be a problem with the 26 letters of the English alphabet for which there are 67, 108,863 combinations with no duplication.

 

C.    Suggestions for discussion on new approaches to virus classification and taxonomy

1.      Naming of virus species.

As noted above, there are problems with the current binomial system which will be exacerbated with the predicted large numbers of metagenomic viruses. Even if the expected numbers are overestimated by a log or two, the present binomial system would not be able to handle them. Hull and Rima (2020) pointed out that there are two categories of virus species – characterised-species (with phenotypic properties such as symptomatology, host range, transmission, etc. known) and sequence-species with no phenotypic properties known and, even though the genus was known (from the sequence similarity to others in that genus), there would be no information on which to base the species nomial. They suggested that this could be overcome by developing a numeral system for virus species. The formal use of a  binumeral system would accommodate the current wish to have the formal naming of viruses to comprise the genus and the species and would also give consistency throughout the virosphere.

The ICTV Virus Taxonomy 2020 release (talk.ictvonline.org/taxonomy) reported that, at that time, the virosphere comprised 6 realms, 10 kingdoms, 17 phyla. 2 subphyla, 39 classes, 59 orders, 8 suborders,2224 genera, 70 subgenera and 9110 species (62 of these species were not in a genus); thus there is an average of about 4.1 species per genus; the Begomovirus genus is the largest with 445 species.

A suggestion is that a binumeral system could use 5 digits (say 12345) for the genus name and 5 digits for the species name (say 0/14321) giving the formal virus name 12345 0/14321; the first digit of the species part of the name could be 0 for a sequence-species or 1 for a characterised species. Anyway, if this approach is acceptable, details can be sorted out later.

2.      Linking virus species with common name

Most of the communication between the various users of virus names, be they virologists or non-virologists, use common names [or abbreviations of the name, e.g. the first recognised virus tobacco mosaic virus (TMV)] for what are classified as virus species. The formal binomial system which reduces the species component to an epithet, or if the above binumeral system is adopted, a series of numbers, would often mean nothing to a non-specialist. This problem is discussed in depth by Calisher et al. (2019) and Gorbalenya and Siddell (2021). However, the use of a binumeral system would open up the application of Information Technology (IT) techniques which could be linked as an integral part of the taxonomy structure and overcome this problem.

3.      Use of information technology

IT is already providing inputs into virus taxonomy framework, especially for identifying and  incorporating sequence-species into the structure (see Dutilh et al., 2021; Gorbalenya and Lauber, 2022).

The use of IT and Information and Communications Technology (ICT) opens up a wide range of potential outputs  from the formal taxonomy framework that the ICTV could capitalise on. For example, a QR code containing information on each virus species (e.g common name, host range, symptomatology, transmission and links to molecular data, etc)  could be generated and linked to that species in the taxonomy framework. By making the overall system searchable, information on the whole known virosphere would be open for data mining and comparative analyses and also be linked seamlessly to other sources of data.

Such a system would serve the increasingly wide range of  customers/stakeholders described in Section B1b.

4.      Suggestion for the future

If these above broad suggestions are attractive, I suggest that the ICTV convenes a blue sky workshop similar to that held in 2016 to discuss viral taxonomy in the age of metagenomics (Simmonds et al. 2017) which led to the acceptance of viral sequences and species based on metagenomic data. Such a workshop would need input from IT and ITC experts as well as virologists and representatives from other potential users

The complexities of microbial taxonomy in the age of increasing information from different sources (e.g. unculturable bacteria, metagenomic sequences) is being discussed by extensively (see Lloyd and Tahon, 2022; Murray et al., 2020; Sanford et al., 2021). The points they discuss are essentially those faced by virologists and, if we can develop a system fit for the 21st century, it may be a paradigm for how microbiologists can approach the situation.

 

References.

Calisher, C.H., Briese, T., Brister, J.R., Charrel, R.N., Durrwald, R. (2019). Strengthening the interaction of the virology community with the International Committee on Taxonomy of Viruses by linking virus names and their abbreviations to virus species. Sys. Biol. 68:828-839.

Chalam, V.V. (2020). Chapter 52: Elimination of plant viruses by certification and quarantine. Applied Plant Virology; Advances, Detection and Antiviral Strategies. Chapter 52:749-762.

Cobbin, J.C.A., Charon, J., Harvey, E., Holmes, E.C., Mahar, J.E. (2021). Current challenges to virus discovery by metatranscriptomics. Curr. Opin. Virol. 51:48-55.

Dutilh, B.E., Varsani, A., Tong, Y., Simmonds, P., Sabanadzovic, S., et. al. (2021) Perspectives on taxonomic classification of uncultivated viruses. Curr. Opin. Virol. 51: 207-215.

Geoghegan, J.L., Holmes, E.C. (2017). Predicting virus emergence amid evolutionary noise. Open Biol. 7: 170189.

Gorbalenya, A.E., Lauber, C. (2022). Bioinformatics of virus taxonomy: foundations and tools for developing sequence-based hierarchical classification/ Curr. Opin Virol. 52: 48-56.

Gorbalenya, A.E., Siddell, S.G. (2021). Recognizing species as a new focus of virus research. PLoS Pathog. 17: e1009318.

Hull, R. (2014). Plant Virology 5th edn. Academic Press.

Hull, R., Rima, B. (2020). Virus taxonomy and classification: naming of virus species. Arch. Virol. 165: 2733-2736.

Kaján, G.L., Doszpoly, A., Tarján, J.L., Vidovszky, M.Z., Papp, T. (2020). Virus-host coevolution with a focus on animal and human DNA. J. Molec. Evol. 88: 41-56.

Kieft, K., Anantharaman, K. (2019). Virus genomics: what is being overlooked? Curr. Opin. Virol. 53: 101200.

Lloyd, K.G. and Tahon, G. (2022). Science depends on nomenclature but nomenclature is not science. Nat. Rev. Microbiol. 20:123-124

Murray, A.E., Freudenstein, J., Gribaldo, S., Hatzenpichler, R.,  Hugenholtz, P.et al., (2020). Roadmap for naming uncultivated Archaea and bacteria. Nat. Microbiol. 5: 987-994.

Nayfach, S., Pάez-Espino, D., Call, L., Low, S.J., Sberro, H., et al. (2021). Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 6: 960-970.

Roberts, J.A., Andrews, K. (2008). Nonhuman primate quarantine: its evolution and practice. ILAR J. 49: 145-156.

Sanford, R.A., Lloyd, K.G., Konstantinidis, K., Löffler, F.E. (2021). Microibial taxonomy run amok. Trends Microbiol. 29: 394-404

Simmonds, P., Adams, M.J., Benkõ, M., Breitbart, M., Brister, J.R., et al. (2017) Consensus statement: Virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15:161– 168.

Varanda C., Félix, M.do R.F., Campos, M.D., Materatski, P. (2021). An overview of the application of viruses to biotechnology. Viruses 13: 2073.

 

March 2022