Skip to content

Getting started#

Installation#

1. Nextflow and third-party software#

Nextflow can be used on any POSIX-compatible system (Linux, OS X, WSL). It requires Bash 3.2 (or later) and Java 11 (or later, up to 18) to be installed.

wget -qO- https://get.nextflow.io | bash

After it, we need to do two easy steps:

  • Make the binary executable on your system by running chmod +x nextflow.

  • Optionally, move the nextflow file to a directory accessible by your $PATH variable (this is only required to avoid remembering and typing the full path to nextflow each time you need to run it).

2. Containerization#

In line with contemporary pipelines, the BTC scRNA pipeline is powered by multiple Docker containers. On that note, distinct computational environments depend on container technologies, such as Docker (v20.10.22) and Singularity (v3.7.0). For instance, HPC strongly depend on Singularity, therefore it should be explicitly defined into profile configurations. For a better understanding, refer to the advanced section. Additionally, check the containers repository.

Warn

Please, note that Docker/Singularity images will be automatically download by the pipeline.

3. Cloning scRNA-Seq Pipeline#

git clone --recurse-submodules https://github.com/WangLab-ComputationalBiology/btc-scrna-pipeline

4. Running single-cell pipeline#

The pipeline requires four parameters: project name, sample_table, meta_data, cancer_type. In particular, sample_table and meta_data should follow a mandatory format as described below.

4.1. Preparing inputs#

The sample table must be a CSV file containing three columns: sample, fastq_1, and fastq_2. The sample column will be linked to all reports generated by the pipeline. Additionally, it's essential for merging the metadata with the Seurat object. Example sample sheet.

sample fastq_1 fastq_2
SPECTRUM-OV-009_S1_CD45N_BOWEL path/to/fastq/SPECTRUM-OV-009_S1_CD45N_BOWEL_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-009_S1_CD45N_BOWEL_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-009_S1_CD45N_LEFT_OVARY path/to/fastq/SPECTRUM-OV-009_S1_CD45N_LEFT_OVARY_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-009_S1_CD45N_LEFT_OVARY_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-009_S1_CD45P_ASCITES path/to/fastq/SPECTRUM-OV-009_S1_CD45P_ASCITES_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-009_S1_CD45P_ASCITES_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-009_S1_CD45P_BOWEL path/to/fastq/SPECTRUM-OV-009_S1_CD45P_BOWEL_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-009_S1_CD45P_BOWEL_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-009_S1_CD45P_LEFT_UPPER_QUADRANT path/to/fastq/SPECTRUM-OV-009_S1_CD45P_LEFT_UPPER_QUADRANT_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-009_S1_CD45P_LEFT_UPPER_QUADRANT_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-009_S1_CD45P_RIGHT_UPPER_QUADRANT path/to/fastq/SPECTRUM-OV-009_S1_CD45P_RIGHT_UPPER_QUADRANT_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-009_S1_CD45P_RIGHT_UPPER_QUADRANT_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-022_S1_CD45N_RIGHT_ADNEXA path/to/fastq/SPECTRUM-OV-022_S1_CD45N_RIGHT_ADNEXA_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-022_S1_CD45N_RIGHT_ADNEXA_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-022_S1_CD45P_BOWEL path/to/fastq/SPECTRUM-OV-022_S1_CD45P_BOWEL_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-022_S1_CD45P_BOWEL_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA path/to/fastq/SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-065_S1_CD45N_INFRACOLIC_OMENTUM path/to/fastq/SPECTRUM-OV-065_S1_CD45N_INFRACOLIC_OMENTUM_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-065_S1_CD45N_INFRACOLIC_OMENTUM_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-065_S1_CD45P_ASCITES path/to/fastq/SPECTRUM-OV-065_S1_CD45P_ASCITES_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-065_S1_CD45P_ASCITES_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-065_S1_CD45P_INFRACOLIC_OMENTUM path/to/fastq/SPECTRUM-OV-065_S1_CD45P_INFRACOLIC_OMENTUM_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-065_S1_CD45P_INFRACOLIC_OMENTUM_S1_L001_R2_001.fastq.gz
SPECTRUM-OV-065_S1_CD45P_RIGHT_OVARY path/to/fastq/SPECTRUM-OV-065_S1_CD45P_RIGHT_OVARY_S1_L001_R1_001.fastq.gz path/to/fastq/SPECTRUM-OV-065_S1_CD45P_RIGHT_OVARY_S1_L001_R2_001.fastq.gz

The metadata file, in .csv format, should include columns pertinent to the experimental design, such as batch and cell sorting status. It can also contain additional biological information about the sample. The batch variable is used to correct the technical effects. In this version of the pipeline, correction is based on a singular variable. Example meta-data.

patient_id sample_id Sort source_name batch
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_BOWEL CD45- Bowel SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_INFRACOLIC_OMENTUM CD45- Omentum SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_LEFT_OVARY CD45- Adnexa SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_LEFT_UPPER_QUADRANT CD45- UQ SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_PELVIC_PERITONEUM CD45- Peritoneum SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_RIGHT_OVARY CD45- Adnexa SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45N_RIGHT_UPPER_QUADRANT CD45- UQ SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_ASCITES CD45+ Ascites SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_BOWEL CD45+ Bowel SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_INFRACOLIC_OMENTUM CD45+ Omentum SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_LEFT_OVARY CD45+ Adnexa SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_LEFT_UPPER_QUADRANT CD45+ UQ SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_PELVIC_PERITONEUM CD45+ Peritoneum SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_RIGHT_OVARY CD45+ Adnexa SPECTRUM-OV-009
SPECTRUM-OV-009 SPECTRUM-OV-009_S1_CD45P_RIGHT_UPPER_QUADRANT CD45+ UQ SPECTRUM-OV-009
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45N_ASCITES CD45- Ascites SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45N_BOWEL CD45- Bowel SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45N_LEFT_ADNEXA CD45- Adnexa SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45N_RIGHT_ADNEXA CD45- Adnexa SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45P_ASCITES CD45+ Ascites SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45P_BOWEL CD45+ Bowel SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45P_LEFT_ADNEXA CD45+ Adnexa SPECTRUM-OV-022
SPECTRUM-OV-022 SPECTRUM-OV-022_S1_CD45P_RIGHT_ADNEXA CD45+ Adnexa SPECTRUM-OV-022
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45N_ASCITES CD45- Ascites SPECTRUM-OV-065
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45N_INFRACOLIC_OMENTUM CD45- Omentum SPECTRUM-OV-065
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45N_RIGHT_FALLOPIAN_TUBE CD45- Adnexa SPECTRUM-OV-065
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45N_RIGHT_OVARY CD45- Adnexa SPECTRUM-OV-065
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45P_ASCITES CD45+ Ascites SPECTRUM-OV-065
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45P_INFRACOLIC_OMENTUM CD45+ Omentum SPECTRUM-OV-065
SPECTRUM-OV-065 SPECTRUM-OV-065_S1_CD45P_RIGHT_OVARY CD45+ Adnexa SPECTRUM-OV-065

Warning

Internally, the pipeline expects the batch column. This column will be used to perform the batch correction approach.

4.2. Minimal command-line#

To execute the pipeline, users should use the command line structure outlined below. Please, note the semantic differences between using one dash (-) for Nextflow commands and two dashes (--) for pipeline commands. Commands with two dashes are reserved for specific pipeline tasks, like adjusting filtering or thresholds on the single-cell analysis.

nextflow run main.nf --project_name <PROJECT> --sample_csv <path/to/sample_table.csv> --meta_data <path/to/meta_data.csv> --cancer_type <CANCER TYPE> -resume -profile <PROFILE>
Ultimately, the pipeline will make a folder named after the --project_name command. This folder contain all the results. The -resume command leverages Nextflow caching, i.e., resuming executions to avoid excessive computational time.

4.3. Staging images and genome indexes#

The pipeline requires staging (downloading) multiple components to operate. This can pose challenges in HPC environments with strict network policies. As a workaround, consider using the -stub option on a node with a network connection. The -stub will stage all the necessary components without actually executing any analysis. Thus, it serves as a bootstrap run for the pipeline. Please note that stub will generate dummy outputs.

nextflow run main.nf --project_name <PROJECT> --sample_csv <path/to/sample_table.csv> --meta_data <path/to/meta_data.csv> --cancer_type <CANCER TYPE> -resume -profile <PROFILE> -stub

4.4. Shorten command-line#

Long command lines can be tricky. Thankfully, with Nextflow's -params-file, we can make things simpler. This is a JSON file that has all the instructions related to a specific run. If you're trying out different settings, it might be best practice to maintain separate files for each test, e.g., PARAMS_TEST_01.json or PARAMS_TEST_02.json.

{
 "project_name": "BTC-CANCER-X",
 "sample_csv": "path/to/sample_table.csv",
 "meta_data": "path/to/meta_data.csv",
 "cancer_type": "CANCER TYPE X"
 "thr_mean_reads_per_cells": 10000
}

Note, other paramaters can be added into the -params-file. For your convenience, please check the command-line documentation.

nextflow run main.nf -params-file <PARAMS.json> -resume -profile <PROFILE>

5. Expect outputs#

Image caption