Data Processing // Bioinformatics and Analytics Core // University of Missouri

Data Processing

Processing and Workflow Methods

The BAC has built and maintains several high-throughput, production level, data processing and analysis workflows for other service units on the MU campus that handle common bioinformatics processing needs. These workflows are predominately built using Nextflow to create robust and stable tools utilizing best-practices methods. For researchers who receive data from these workflows, you can find basic information about what processing and/or analysis was done below. If you have questions, please reach out for the latest information.

GTC Processing Workflow

Updated 2021-12-10

All sequencing that comes from the MU GTC is processed through this workflow unless otherwise requested. Raw reads are demultiplexed with bcl-convert (v3.8.2-12-g85770e0b). Reads are then trimmed with fastp version 0.23.1 using default settings. Additional quality control is done by mapping of a sub-sampled portion of the reads to an rRNA reference to check for contamination during sample preparation. This process uses seqtk to sub-sample, minimap2 to map the reads, and samtools stats (v1.14) to determine percentage of reads mapped to a RefSeq rRNA reference. Overall quality control statistics are generated with MultiQC (v1.12).

Metagenomics Workflow

Updated 2021-12-10

The metagenomics pipelines utilizes the QIIME2 (2021.8.0) metagenomics pipelines. Currently, 16S and ITS amplicon sequencing can be performed with this pipeline, and can be done completely automated for sequencing that is done with the MU GTC, and on-demand for data from other sources. Custom classifiers are built for each analysis type using a naive-bayes model, the 16S classifier is based on the Silva (v.138) 99% reference database, targeted for PE250 reads. The ITS classifier is based on the Unite (v.10.05.2021) 99% reference database.