Recently, I learned about microbial macrogenome binning, installed metaWRAP according to the official documents, stepped on a pile of pits, and recorded the error reports and solutions:
1. metaWRAP installation
Installation tutorial and download address: GitHub - bxlab/metaWRAP: MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
The author recommends using Conda/Mamba installation instead of bioconda and docker, so he found a docker image containing conda and began the first step in a long way:
(1) conda installation software
conda create -y -n metawrap-env python=2.7 source activate metawrap-env conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda conda config --add channels ursky conda install -y -c ursky metawrap-mg conda install -y blas=2.5=mkl
After loading, it is about 5GB in size and submitted to the docker hub:
docker push raser216/metawrap:v1.0.0
I thought it was a success, but a series of mistakes followed
(2) Install libtbb2 Library
Run to quant_ In bins, it was found that there was no dependency library installed, which led to an error in the statistical gene abundance report of salmon software:
salmon: error while loading shared libraries: libtbb.so.2
resolvent:
#Install libtbb2 Library apt-get install libtbb2
(3) Install libGL.so.1
bin_ There is no picture in the figures directory of the reform step, and the python drawing program reports an error:
ImportError: Failed to import any qt binding #python2.7 already installed matplotlib,But it cannot be imported import matplotlib import matplotlib.pyplot as plt ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Solution: install libGL.so.1 dependency.
apt-get -y update apt-get install -y libgl1-mesa-glx #After installation, python2 can import this module without reporting any errors python 2.7 import matplotlib.pyplot as plt
(4) prokka installation failed with an error
prokka cannot be used, installation failed:
Possible cause: the perl version installed by metagraph does not meet the requirements of prokka (metagraph does not support perl 5.26?).
prokka -h Can't locate Bio/Root/Version.pm in @INC (you may need to install the Bio::Root::Version module) (@INC contains: /opt/conda/envs/metawrap-env/bin/../perl5 /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2//x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/ /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2 /opt/conda/envs/metawrap-env/lib/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/5.26.2 .) at /opt/conda/envs/metawrap-env/bin/prokka line 32. BEGIN failed--compilation aborted at /opt/conda/envs/metawrap-env/bin/prokka line 32.
Solution: reinstall prokka-1.13 with conda in the current metagraph environment.
conda create -n prokka-test prokka=1.13 minced=0.3.0 parallel=20180522 blast=2.12.0 source activate prokka-test
2.conda error reporting
(1) Unable to enter conda environment
Unable to enter the conda environment through source activate metaclip env in the shell script. An error is reported:
/opt/conda/envs/metawrap-env/etc/conda/activate.d/activate-binutils_linux-64.sh: line 65: ADDR2LINE: unbound variable
Solution: enter the conda environment through the dockerfile and add the path of installing the software to the environment variable:
cat metawrap_v1.dockerfile #The dockerfile is as follows FROM raser216/metawrap:v1.0.0 RUN echo "source activate metawrap-env" > ~/.bashrc
ENV PATH /opt/conda/envs/metawrap-env/bin:$PATH
3. Database path and version
The database of the comparison software (kraken, BLAST and so on) invoked in metaWRAP can be externally placed, but the path outside the database needs to be written in config:
#config file path which config-metawrap /opt/conda/envs/metawrap-env/bin/config-metawrap #Using sed -i Change to the real path of each database kraken_database=/database/kraken_database/kraken_newdb2/axel_dowload nt_database=/database/newdownload3 tax_database=/database/metawrap_database/ncbi_taxonomy sed -i "s#~/KRAKEN_DB#$kraken_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap sed -i "s#~/NCBI_NT_DB#$nt_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap sed -i "s#~/NCBI_TAX_DB#$tax_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap
The file must have write permission, otherwise bin_ Error in the reform step:
#bin_ Error in the reform step You do not seem to have permission to edit the checkm config file located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG
Solution: change the permissions of the config file and no more errors will be reported.
chmod 777 /opt/conda/envs/metawrap-env/bin/config-metawrap
4. kraken software reports an error
kraken is a software that directly annotates the species of sequencing reads (fastq). At present, there are two main versions. The memory consumption of generation 1 (kraken) is very high (> 100GB), and generation 2 (kraken 2) has been improved a lot (about 35GB).
(1) Error message caused by comment line
The path of kraken.sh script is / opt / CONDA / envs / metawap env / bin / metawap modules /. The comment information on lines 123-125 of the script is written directly after the line, resulting in an error (the error information is not recorded):
123 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | \ #combine paired end reads onto one line 124 shuf | head -n $depth | sed 's/\t\t/\n/g' | \ #shuffle reads, select top N reads, and then restore tabulation 125 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}' #separate reads into F and R files
Solution: replace all comment lines with new lines
123 # combine paired end reads onto one line, then 124 # shuffle reads, select top N reads, and then restore tabulation, then 125 # separate reads into F and R files 126 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | \ 127 shuf | head -n $depth | sed 's/\t\t/\n/g' | \ 128 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}'
(2) Script no permission error
Note that the kraken.sh script permission should be executable, otherwise an error will be reported when using:
/opt/conda/envs/metawrap-env/bin/metawrap: line 69: /opt/conda/envs/metawrap-env/bin/metawrap-modules/kraken.sh: Permission denied
Solution: modify the script permission to 775 and no more errors will be reported.
chmod 775 kraken.sh ls -l kraken.sh -rwxrwxr-x 1 root root 8.9K Sep 22 20:12 kraken.sh
(3) python comment script error
python script kraken2_translate.py, dictionary names_map encountered an unknown key and reported KeyError.
Something went wrong with running kraken-translate... Exiting. Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 120, in <module> main() File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 114, in main translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 98, in translate_kraken2_annotations taxonomy = get_full_name(taxid, names_map, ranks_map) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 30, in get_full_name name = names_map[taxid] KeyError: '1054037'
Solution: modify the way the dictionary obtains the value, change it to dict.get() function, and add the None value judgment.
vi /opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py #Modify get_full_name function, so that when the key does not exist, names_map does not report an error: for taxid in taxid_lineage: #name = names_map[taxid] name = names_map.get(taxid) if name == None: name = "unknown" names_lineage.append(name)
(4) An error occurred when the taxonomy database was not found
The downloaded NCBI taxonomy database needs to be placed in the downloaded kraken database directory, otherwise an error is reported:
Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 120, in <module> main() File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 114, in main translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 87, in translate_kraken2_annotations names_map, ranks_map = load_kraken_db_metadata(kraken2_db) File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 50, in load_kraken_db_metadata with open(names_path) as input: IOError: [Errno 2] No such file or directory: '/database/kraken_database/kraken_newdb2/axel_dowload/taxonomy/names.dmp'
Solution: copy the taxonomy database to the kraken database directory.
(5) The kraken software does not match the database version, and an error is reported
Previously, I used kraken 2 (generation 2 software), and the (huge) database required by generation 2 has been downloaded on the server. I don't want to use kraken (generation 1 software) database again, so I tried to see if the database of generation 2 is compatible with generation 1 software. As expected, it doesn't work, and an error is reported:
kraken: database ("/database/kraken_database/kraken_newdb2/axel_dowload") does not contain necessary file database.kdb
Therefore, it is considered to update the kraken version in metaWRAP. It is found that the default installed metaWRAP does not support Kraken 2 and needs to be updated to the latest version 1.3.2:
Solution: update metaWRAP version to 1.3.2.
conda install -y -c ursky metawrap-mg=1.3.2 #After the update, you need to modify the permissions of the config file and its contents chmod 777 /opt/conda/envs/metawrap-env/bin/config-metawrap
5.checkM software reports an error
(1) py line feed error
checkM is a software used to detect the integrity of genome splicing and assembly, bin_ The refinement will be used and an error will be reported directly:
Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/checkm", line 36, in <module> from checkm import main File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 25, in <module> from checkm.defaultValues import DefaultValues File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 26, in <module> class DefaultValues(): File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 29, in DefaultValues __DBM = DBManager() File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 114, in __init__ if not self.setRoot(): File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 140, in setRoot path = self.confirmPath(path=path) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 162, in confirmPath path = raw_input("Where should CheckM store it's data?\n" \ EOFError: EOF when reading a line
Solution: modify the checkmData.py file raw_input() function parameter.
Path of the py script: / opt / CONDA / envs / metaclip env / lib / python2.7/site-packages/checkm/
Error reason: row 162 raw_ The input() function adds "\" as a newline character, which is not recognized by python
162 path = raw_input("Where should CheckM store it's data?\n" \ 163 Please specify a location or type 'abort' to stop trying: \n")
Solution: delete the newline character.
162 path = raw_input("Where should CheckM store it's data?\nPlease specify a location or type 'abort' to stop trying: \n")
(2) Database error not found
When running checkM for the first time, you will be asked to select the database location, so it is best to run checkm data setRoot after installation, and set the database path first:
checkm data setRoot ******************************************************************************* [CheckM - data] Check for database updates. [setRoot] ******************************************************************************* Where should CheckM store it's data? Please specify a location or type 'abort' to stop trying: /checkm_database Path [/checkm_database] exists and you have permission to write to this folder.
Otherwise, if checkM cannot find the database, the following information will be displayed:
It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'. You do not seem to have permission to edit the checkm config file located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG Please try again with updated privileges. Unexpected error: <type 'exceptions.TypeError'>
(3) tmpdir path is too long and an error is reported
******************************************************************************* [CheckM - tree] Placing bins in reference genome tree. ******************************************************************************* Identifying marker genes in 8 bins with 32 threads: Process SyncManager-1: Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 550, in _run_server server = cls._Server(registry, address, authkey, serializer) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 162, in __init__ self.listener = Listener(address=address, backlog=16) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line 132, in __init__ self._listener = SocketListener(address, family, backlog) File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line 256, in __init__ self._socket.bind(address) File "/opt/conda/envs/metawrap-env/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: AF_UNIX path too long Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/bin/checkm", line 708, in <module> checkmParser.parseOptions(args) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 1251, in parseOptions self.tree(options) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 133, in tree options.bCalledGenes) File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line 67, in find binIdToModels = mp.Manager().dict() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/__init__.py", line 99, in Manager m.start() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 528, in start self._address = reader.recv() EOFError
Solution: modify the checkm --tmpdir specified in binning.sh and other scripts, and specify a temporary file storage directory with a short absolute path.
#The three scripts under this path all use checkM, and the default needs to be changed--tmpdir cd /opt/conda/envs/metawrap-env/bin/metawrap-modules grep checkm *sh|awk -F ":" '{print $1}'|sort|uniq bin_refinement.sh binning.sh reassemble_bins.sh #Take binning.sh as an example #Add a line before the checkm command to create a short tmp directory for storing the tmp files of checkm mkdir -p /tmp/$(basename ${1}).tmp #Modify the of checkm--tmpdir 61 checkm lineage_wf -x fa ${1} ${1}.checkm -t $threads --tmpdir /tmp/$(basename ${1}).tmp --pplacer_threads $p_threads 62 if [[ ! -s ${1}.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi #Delete the tmp directory after running rm -r /tmp/$(basename ${1}).tmp
#The other two scripts also need to modify the corresponding checkm line #bin_ Modification.sh script if [ ! -d /tmp/$(basename ${bin_set}) ]; then mkdir -p /tmp/$(basename ${bin_set}).tmp; fi if [ "$quick" == "true" ]; then comm "Note: running with --reduced_tree option" checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename ${bin_set}).tmp --pplacer_threads $p_threads --reduced_tree else checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename ${bin_set}).tmp --pplacer_threads $p_threads fi if [[ ! -s ${bin_set}.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi ${SOFT}/summarize_checkm.py ${bin_set}.checkm/storage/bin_stats_ext.tsv $bin_set | (read -r; printf "%s\n" "$REPLY"; sort) > ${bin_set}.stats if [[ $? -ne 0 ]]; then error "Cannot make checkm summary file. Exiting."; fi rm -r ${bin_set}.checkm; rm -r /tmp/$(basename ${bin_set}).tmp mkdir -p /tmp/binsO.tmp if [ "$quick" == "true" ]; then checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads --reduced_tree else checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads fi if [[ ! -s binsO.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi rm -r /tmp/binsO.tmp #reassemble_ Modifying the bins.sh script mkdir -p /tmp/$(basename ${out}).tmp checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename ${out}).tmp --pplacer_threads $p_threads if [[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi ${SOFT}/summarize_checkm.py ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv | (read -r; printf "%s\n" "$REPLY"; sort) > ${out}/reassembled_bins.stats if [[ $? -ne 0 ]]; then error "Cannot make checkm summary file. Exiting."; fi rm -r /tmp/$(basename ${out}).tmp mkdir -p /tmp/$(basename ${out}).tmp checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename ${out}).tmp --pplacer_threads $p_threads if [[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi rm -r /tmp/$(basename ${out}).tmp
This error causes bin_ Error in revision (because checkM does not run correctly and there is no corresponding statistical result):
Traceback (most recent call last): File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 277, in _run_finalizers finalizer() File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 207, in __call__ res = self._callback(*self._args, **self._kwargs) File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line 266, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line 264, in rmtree os.remove(fullname) OSError: [Errno 16] Device or resource busy: 'binsO.tmp/pymp-REeR36/.nfs9061e516f4bd263400000b82' mv: cannot stat 'binning_results.eps': No such file or directory mv: cannot stat 'binning_results.eps': No such file or directory
6.BLAST error
BLAST Database error: Error: Not a valid version 4 database.
Solution: update BLAST version.
#Download and unzip the new BLAST software wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.12.0+-x64-linux.tar.gz tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz #Replace the BLAST in the conda image mkdir /opt/conda/envs/metawrap-env/bin/bak for i in $(ls);do mv /opt/conda/envs/metawrap-env/bin/$i /opt/conda/envs/metawrap-env/bin/bak;cp $i /opt/conda/envs/metawrap-env/bin;done
7. Error reported by prokka
(1) The installation is unsuccessful and an error is reported
prokka software is used to annotate the assembled genome. It is a perl script. conda installation fails and cannot be opened at all
(1) The blast version is not recognized and an error is reported
prokka's requirements for software blastp and makeblastdb are that the version is greater than 2.8 or above, but there is a problem with the judgment conditions here, and my blast 2.12.0 cannot be recognized (I think version 2.12 is less than 2.8...).
I didn't understand perl language and couldn't optimize, so I had to change MINVER to 2.1:
'blastp' => { GETVER => "blastp -version", REGEXP => qr/blastp:\s+($BIDEC)/, MINVER => "2.1", NEEDED => 1, }, 'makeblastdb' => { GETVER => "makeblastdb -version", REGEXP => qr/makeblastdb:\s+($BIDEC)/, MINVER => "2.1", NEEDED => 0, # only if --proteins used },