Our method based on the virus-host protein-protein interaction network has shown us the relationships among different viruses according to molecular function. As it was demonstrated in the molecular function tree (Figure ), 114 viruses whose proteins interacted with human proteins were displayed. In the Figure HIV-1 viruses that belong to the retro-transcribing virus category were generally grouped together. Furthermore, some dsDNA viruses, including CRPVK, BPV1, HPV18, HPV11, HPV6B, HHV11, HPV11, ADE09, and HPV16, were also grouped together. All of these placements are consistent with previous virus taxonomy, which illustrates that the method based on the virus-host protein-protein interaction network has the ability to disclose the similarities among the same type of viruses. However, in the very same figure, it reveals that some viruses possessed different nucleic acid types and were clustered together, an outcome that conflicted with the Baltimore classification method. This phenomenon signifies at least two points. First, as our method of cataloging viruses relies on the protein-protein interaction data between each virus and its host, more available PPI data leads to better elucidation of the relationship. Since the protein-protein interaction data between most viruses and human are incomplete, and some viruses have more PPI data available than others, the unbalanced protein-protein interaction data could potentially have an effect on the tree structure. This conclusion is supported by the finding that the relationships between viruses with more PPI data have much better consistency with the results of other classification systems. Second, by exploring the virus-human protein-protein interaction network, we found that viruses of different types can target the same human proteins. For example, the proteins (PHOSP_RABVH and Q910M0_MOKV) of RABVH and MOKV, which belong to single-stranded RNA viruses, interact with the human protein that has the GO annotation Dynein light chain 1 (DYL1_HUMAN), cytoplasmic, while the protein (VA36_VACCC) of VACCC, which is a double-stranded DNA virus, also interacts with the similar human protein (KLC2_HUMAN, Kinesin light chain). All of the interactions in our network are from Dyer et al. (23) and Pathogen Interaction Gateway, which were experimentally supported. The functional similarity of the target human proteins might narrow the distance of different types of virus. These results may indicate the relationship between viruses of different types. Due to the complexity of the virus classification system and the rapid evolution rate of viruses, there could be some hidden relationship between those dissimilar viruses. For example, different viruses may evolve to target the same cellular processes, called convergent evolution. Traditional taxonomy systems, which merely take chemical or physical characteristics into consideration, might not have the ability to reveal this kind of hidden relationship. Moreover, we would like to point out that our method does not reflect the phylogenic relationships between different viruses. Instead, our classification could disclose the similarity of potential pathogenic mechanisms of distinct viruses, which could be helpful to the treatment of related disease. Our new method could be regarded as complementary for the virologist to discover the relationship between viruses of different types, especially when the potential mechanism that our method may disclose is taken into consideration.
Figure 1 Virus relationship tree based on molecular function. 114 viruses are shown here. The name of each item is composed of two parts. The first part is the Uniprot ID of a virus, and the second part of the item name is the basic information of the virus. The (more ...)
Examination of HIV classification using our method reveals additional insights. HIV has two subtypes, HIV-1 and HIV-2, which are extracted from chimpanzees and sooty mangabeys respectively [20
]. Both subtypes of HIV are transmitted by sexual contact, bodily fluid, or from mother to child. They can cause AIDS, and lead to the symptom of Immunodeficiency. The two subtypes cannot be distinguished without tests performed by a specialized physician. The viruses that belong to the Primate lentivirus group, which could be displayed as the representation of the HIV in our dataset, were selected out and integrated in Figure . As shown in the molecular function tree (Figure ), almost all HIV-1 and HIV-2 viruses were divided into two categories (GO terms of human proteins that correspond to separate HIV-1 and HIV-2 are stored in Additional file 1
). The only two exceptions were the HV1LA and HV2KR. The former represents an HIV-1 virus belonging to group M, and is grouped with the HIV-2, while HV2KR, which is defined as a Human immunodeficiency virus type 2 virus, is classified with the majority of HIV-1. Considering the subtle differences between the two subtypes of HIV, our result revealed the potential relationship and indicated the similarity between not only HV1LA and HIV-2 virus, but also HV2KR and HIV-1 virus. Meanwhile, SIVM1, which is regarded as simian immunodeficiency virus, was grouped closer with the HIV-2 viruses than HIV-1 viruses. In fact, previous study has proposed that a strain of SIV jumped from Sooty Mangabey to become the HIV-2 virus [21
]. In addition, phylogenic methods also have shown the SIV in Sooty Mangabey and Macaque have close relationship to HIV-2 virus [22
]. This evidence, to certain extent, strengthens the reliability of our method. Moreover, we had computed the relationship between different HIV viruses on the basis of 'Biological Process' and 'Cellular Component', and obtained very similar tree structures to the one based on 'Molecular Function' (Additional 2
). The generally successful separation of HIV-1 and HIV-2 viruses in the molecular function tree demonstrated the feasibility of our method to evaluate the relationship between viruses.
Figure 2 HIV relationship tree based on molecular function. All the virus IDs shown in the figure are Uniprot IDs. The length of each branch represents the distance between different viruses. HIV-1 and HIV-2 virus are almost completely separated, except HV1LA, (more ...)
In evolution research, different indicators, such as the similarity between conserved sequences, have been used to determine the distances between organisms. In our method, we have defined the smallest special score derived from the SSBP of proteins between two sets as the distances between different viruses. The mathematical definition of the distance between two sets is the infimum of the distance between any components of the two sets. Our definition of the distance between two viruses is consistent with the mathematical distance definition of two point sets. Moreover, some human proteins, with which a viral protein interacts, could exert their function in relatively general processes. These general proteins contribute less to differentiate viruses. In an ideal situation, we should use the proteins that have more specific functions and participate in more special processes to reflect the relationship between different viruses. The infimum represents the most specific similarity between two protein sets and could reflect the relationship between two viruses on the most specific level. Considering the definition of GO term is at a general level, the smallest special score also has the tendency to get rid of the non-specificity of some GO terms in our sets.
In our new approach, the ability to detect the relationship between distinct viruses relies on the quality of the virus-host protein-protein interaction network explicitly. If the network is reliable and contains enough information to bridge the connection between viruses and their hosts, the relationship disclosed based on PPI network would reveal more functional associations to the virologists who are interested in the relationships between different viruses. In total, 9683 human proteins are confirmed to interact with these viral proteins of 114 viruses, and among them 8249 human proteins are verified to interact with 48 HIV viruses, while 66 non-HIV viruses correspond only to 1434 human proteins. This number is relatively low compared to the number of human proteins that interact with HIV proteins. As discussed above, the classification of HIV viruses displayed much more reliable result than for the rest groups of the viruses. This might be caused by the difference between the amounts of data in the two corresponding datasets that are currently available. It is expected that more verified virus-host protein-protein interaction data of other viruses may lead to more reliable and valuable results for exploring potential relationships between distinct viruses. Our method points to a new direction to elucidate the relationship between viruses on the systematical level and provides rich information for virologists to study the relationships among various viruses.