Identifying components of mixed and contaminated soil samples by detecting specific signatures of control 16S rRNA libraries

A. A. Igolkina, G. A. Grekhov, E. V. Pershina, G. G. Samosorov, V. M. Leunova, A. N. Semenov, O. A. Baturina, M. R. Kabilov, E. E. Andronov

Research output

1 Citation (Scopus)

Abstract

Identifying particular control components of a test soil sample presented as mixed, contaminated, improperly stored or damaged soil is an important problem in soil forensics, soil monitoring and other types of soil analysis. This problem is reduced to determining whether two soil samples — test and control — have the same origin or source. Here, we propose an algorithm which copes with this problem based on 16S rRNA gene libraries of test and control soil samples and does not rely on OTU clustering. The algorithm first extracts the Library-SPECific sets of sequences (LSPECs) for alternative control libraries and then quantifies signals of LSPECs in a test library. The heavy use of the suffix array approach for sequence comparison accelerates the algorithm significantly. To evaluate the performance of the algorithm, we collected a control set of 29 soil samples and created two test sets (real and simulated), containing mixed, contaminated and extremely small single-source soil samples (last samples resemble forensics probes). We then carried out 16S rRNA amplicon sequencing of total soil DNA isolated from both test and control soil samples. The algorithm successfully identified the origin of all single-source soil samples and the compositions of mixed and even low/highly contaminated samples. The algorithm also demonstrated robustness to the increase in control set size from 9 to 29. We believe the proposed algorithm is suitable for identification problems with various degrees of complexity and is flexible enough to manage other molecular markers and microbiological samples from different non-soil sources.

Original languageEnglish
Pages (from-to)446-453
Number of pages8
JournalEcological Indicators
Volume94
DOIs
Publication statusPublished - 1 Nov 2018

Fingerprint

polluted soils
soil sampling
ribosomal RNA
soil
testing
RNA libraries
control components
soil analysis
contaminated soil
library
Soil
sampling
probes (equipment)
soil test
genetic markers
monitoring
extracts
DNA
probe
test

Scopus subject areas

  • Decision Sciences(all)
  • Ecology, Evolution, Behavior and Systematics
  • Ecology

Cite this

Igolkina, A. A. ; Grekhov, G. A. ; Pershina, E. V. ; Samosorov, G. G. ; Leunova, V. M. ; Semenov, A. N. ; Baturina, O. A. ; Kabilov, M. R. ; Andronov, E. E. / Identifying components of mixed and contaminated soil samples by detecting specific signatures of control 16S rRNA libraries. In: Ecological Indicators. 2018 ; Vol. 94. pp. 446-453.
@article{eb22c36d398d43a19ec7fe0767054ed9,
title = "Identifying components of mixed and contaminated soil samples by detecting specific signatures of control 16S rRNA libraries",
abstract = "Identifying particular control components of a test soil sample presented as mixed, contaminated, improperly stored or damaged soil is an important problem in soil forensics, soil monitoring and other types of soil analysis. This problem is reduced to determining whether two soil samples — test and control — have the same origin or source. Here, we propose an algorithm which copes with this problem based on 16S rRNA gene libraries of test and control soil samples and does not rely on OTU clustering. The algorithm first extracts the Library-SPECific sets of sequences (LSPECs) for alternative control libraries and then quantifies signals of LSPECs in a test library. The heavy use of the suffix array approach for sequence comparison accelerates the algorithm significantly. To evaluate the performance of the algorithm, we collected a control set of 29 soil samples and created two test sets (real and simulated), containing mixed, contaminated and extremely small single-source soil samples (last samples resemble forensics probes). We then carried out 16S rRNA amplicon sequencing of total soil DNA isolated from both test and control soil samples. The algorithm successfully identified the origin of all single-source soil samples and the compositions of mixed and even low/highly contaminated samples. The algorithm also demonstrated robustness to the increase in control set size from 9 to 29. We believe the proposed algorithm is suitable for identification problems with various degrees of complexity and is flexible enough to manage other molecular markers and microbiological samples from different non-soil sources.",
keywords = "16S rRNA, Contaminated soil, Mixed samples, Soil signature, Source identification, Suffix array",
author = "Igolkina, {A. A.} and Grekhov, {G. A.} and Pershina, {E. V.} and Samosorov, {G. G.} and Leunova, {V. M.} and Semenov, {A. N.} and Baturina, {O. A.} and Kabilov, {M. R.} and Andronov, {E. E.}",
year = "2018",
month = "11",
day = "1",
doi = "10.1016/j.ecolind.2018.06.060",
language = "English",
volume = "94",
pages = "446--453",
journal = "Ecological Indicators",
issn = "1470-160X",
publisher = "Elsevier",

}

Identifying components of mixed and contaminated soil samples by detecting specific signatures of control 16S rRNA libraries. / Igolkina, A. A.; Grekhov, G. A.; Pershina, E. V.; Samosorov, G. G.; Leunova, V. M.; Semenov, A. N.; Baturina, O. A.; Kabilov, M. R.; Andronov, E. E.

In: Ecological Indicators, Vol. 94, 01.11.2018, p. 446-453.

Research output

TY - JOUR

T1 - Identifying components of mixed and contaminated soil samples by detecting specific signatures of control 16S rRNA libraries

AU - Igolkina, A. A.

AU - Grekhov, G. A.

AU - Pershina, E. V.

AU - Samosorov, G. G.

AU - Leunova, V. M.

AU - Semenov, A. N.

AU - Baturina, O. A.

AU - Kabilov, M. R.

AU - Andronov, E. E.

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Identifying particular control components of a test soil sample presented as mixed, contaminated, improperly stored or damaged soil is an important problem in soil forensics, soil monitoring and other types of soil analysis. This problem is reduced to determining whether two soil samples — test and control — have the same origin or source. Here, we propose an algorithm which copes with this problem based on 16S rRNA gene libraries of test and control soil samples and does not rely on OTU clustering. The algorithm first extracts the Library-SPECific sets of sequences (LSPECs) for alternative control libraries and then quantifies signals of LSPECs in a test library. The heavy use of the suffix array approach for sequence comparison accelerates the algorithm significantly. To evaluate the performance of the algorithm, we collected a control set of 29 soil samples and created two test sets (real and simulated), containing mixed, contaminated and extremely small single-source soil samples (last samples resemble forensics probes). We then carried out 16S rRNA amplicon sequencing of total soil DNA isolated from both test and control soil samples. The algorithm successfully identified the origin of all single-source soil samples and the compositions of mixed and even low/highly contaminated samples. The algorithm also demonstrated robustness to the increase in control set size from 9 to 29. We believe the proposed algorithm is suitable for identification problems with various degrees of complexity and is flexible enough to manage other molecular markers and microbiological samples from different non-soil sources.

AB - Identifying particular control components of a test soil sample presented as mixed, contaminated, improperly stored or damaged soil is an important problem in soil forensics, soil monitoring and other types of soil analysis. This problem is reduced to determining whether two soil samples — test and control — have the same origin or source. Here, we propose an algorithm which copes with this problem based on 16S rRNA gene libraries of test and control soil samples and does not rely on OTU clustering. The algorithm first extracts the Library-SPECific sets of sequences (LSPECs) for alternative control libraries and then quantifies signals of LSPECs in a test library. The heavy use of the suffix array approach for sequence comparison accelerates the algorithm significantly. To evaluate the performance of the algorithm, we collected a control set of 29 soil samples and created two test sets (real and simulated), containing mixed, contaminated and extremely small single-source soil samples (last samples resemble forensics probes). We then carried out 16S rRNA amplicon sequencing of total soil DNA isolated from both test and control soil samples. The algorithm successfully identified the origin of all single-source soil samples and the compositions of mixed and even low/highly contaminated samples. The algorithm also demonstrated robustness to the increase in control set size from 9 to 29. We believe the proposed algorithm is suitable for identification problems with various degrees of complexity and is flexible enough to manage other molecular markers and microbiological samples from different non-soil sources.

KW - 16S rRNA

KW - Contaminated soil

KW - Mixed samples

KW - Soil signature

KW - Source identification

KW - Suffix array

UR - http://www.scopus.com/inward/record.url?scp=85049732972&partnerID=8YFLogxK

U2 - 10.1016/j.ecolind.2018.06.060

DO - 10.1016/j.ecolind.2018.06.060

M3 - Article

AN - SCOPUS:85049732972

VL - 94

SP - 446

EP - 453

JO - Ecological Indicators

JF - Ecological Indicators

SN - 1470-160X

ER -