Raw data from transcriptome sequencing could contain up to 30% contaminating reads. They are usually filtered out due to a deteriorating effect on a de novo assembly quality and subsequent expression analysis. The origin of contaminating reads is either a laboratory introduced contamination or the biologically relevant data, which could be studied using metatranscriptomics approach (Fig.1). The presence of contaminating RNA in an otherwise clean sample could indicate that this RNA is either from a living symbiotic organism or it is somehow recently circulating inside the target organism. Studying the RNA from symbionts can give information of immediate surroundings of the host organism and potentially should be taken into account for gene expression studies since such interaction of host plant with it’s symbiome can alter plant’s gene expression and mask or mimic expression changes at RNA level and the overall phenotype.We analyzed the data from our own RNA-seq experiments of Secale cereale (five accessions) and publicly available datasets (60 accessions) from NCBI SRA archive. For the taxonomic classification we used Kraken2 with a full RefSeq database combined with rye genome, rye specific viruses and aphids not yet included in the RefSeq. We compared search results obtained by searching quality trimmed raw reads and de novo assembled transcriptome and found comparable results, which allows us to to use quality trimmed raw reads for the rest of the work as it gives a direct count of reads for each species.Every accession, even from plants grown in a controlled greenhouse, can contain a number of contaminating species. After removing common laboratory contaminants and bar-coding artifacts from the list, a number of Aphids, symbiotic fungi, bacteria and viruses present can be compared between accessions (Fig 2,3).For biological replicates from the same conditions the distribution of symbiotic organism’s reads is similar, but differs for different parts of the plant (Fig.4)For different geographic locations and harvest years the distribution of symbiotic organism’s reads is not uniform, which supports the idea of using sequencing data as a screening for distribution of plant pests and symbiotic organisms.This approach seems feasible because a number of reasons: (1) the data is already available and will be even more accessible in the future; (2) it can be possible to devise a semi-quantitive analysis; (3) the data can be reanalyzed for newly sequenced organisms; (4) the traditional screening using ITSs or 16S rDNA sequencing will produce mostly ITSs of the rye itself and 16S rDNA of it’s mitochondrion and chloroplasts while specialized test systems can detect only a set of known species and are not an option for screening of a whole symbiome.Nevertheless, a number of additional metadata fields describing an SRA’s experiment are required for this approach to be useful and are proposed in conclusions section.
Original languageEnglish
StatePublished - 21 Jul 2021
Event5th International Conference on Bioinformatics - From Algorithms to Applications (BIATA) - Санкт-Петербург, Russian Federation
Duration: 12 Jul 202115 Jul 2021

Conference

Conference5th International Conference on Bioinformatics - From Algorithms to Applications (BIATA)
Country/TerritoryRussian Federation
CityСанкт-Петербург
Period12/07/2115/07/21

ID: 86183211