Error report record of metaWRAP software

Recently, I learned about microbial macrogenome binning, installed metaWRAP according to the official documents, stepped on a pile of pits, and recorded the error reports and solutions:

 

1. metaWRAP installation

Installation tutorial and download address: GitHub - bxlab/metaWRAP: MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis

The author recommends using Conda/Mamba installation instead of bioconda and docker, so he found a docker image containing conda and began the first step in a long way:

(1) conda installation software

conda create -y -n metawrap-env python=2.7
source activate metawrap-env
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels ursky
conda install -y -c ursky metawrap-mg
conda install -y blas=2.5=mkl

After loading, it is about 5GB in size and submitted to the docker hub:

docker push raser216/metawrap:v1.0.0

I thought it was a success, but a series of mistakes followed

 

(2) Install libtbb2 Library

Run to quant_ In bins, it was found that there was no dependency library installed, which led to an error in the statistical gene abundance report of salmon software:

salmon: error while loading shared libraries: libtbb.so.2

resolvent:

#Install libtbb2 Library
apt-get install libtbb2

 

(3) Install libGL.so.1

bin_ There is no picture in the figures directory of the reform step, and the python drawing program reports an error:

ImportError: Failed to import any qt binding
#python2.7 already installed matplotlib,But it cannot be imported
import matplotlib
import matplotlib.pyplot as plt
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Solution: install libGL.so.1 dependency.

apt-get -y update
apt-get install -y libgl1-mesa-glx

#After installation, python2 can import this module without reporting any errors
python 2.7
import matplotlib.pyplot as plt

 

 

(4) prokka installation failed with an error

prokka cannot be used, installation failed:

Possible cause: the perl version installed by metagraph does not meet the requirements of prokka   (metagraph does not support perl 5.26?).

prokka -h
Can't locate Bio/Root/Version.pm in @INC (you may need to install the Bio::Root::Version module) (@INC contains: /opt/conda/envs/metawrap-env/bin/../perl5 /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2//x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/ /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/site_perl/5.26.2 /opt/conda/envs/metawrap-env/lib/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/metawrap-env/lib/5.26.2 .) at /opt/conda/envs/metawrap-env/bin/prokka line 32.
BEGIN failed--compilation aborted at /opt/conda/envs/metawrap-env/bin/prokka line 32.

 

Solution: reinstall prokka-1.13 with conda in the current metagraph environment.

conda create -n prokka-test prokka=1.13 minced=0.3.0 parallel=20180522 blast=2.12.0
source activate prokka-test

 

 

2.conda error reporting

(1) Unable to enter conda environment

Unable to enter the conda environment through source activate metaclip env in the shell script. An error is reported:

/opt/conda/envs/metawrap-env/etc/conda/activate.d/activate-binutils_linux-64.sh: line 65: ADDR2LINE: unbound variable

Solution: enter the conda environment through the dockerfile and add the path of installing the software to the environment variable:

cat metawrap_v1.dockerfile 
#The dockerfile is as follows
FROM raser216/metawrap:v1.0.0
RUN echo "source activate metawrap-env" > ~/.bashrc
ENV PATH /opt/conda/envs/metawrap-env/bin:$PATH

 

3. Database path and version

The database of the comparison software (kraken, BLAST and so on) invoked in metaWRAP can be externally placed, but the path outside the database needs to be written in config:

#config file path
which config-metawrap
/opt/conda/envs/metawrap-env/bin/config-metawrap

#Using sed -i Change to the real path of each database
kraken_database=/database/kraken_database/kraken_newdb2/axel_dowload
nt_database=/database/newdownload3
tax_database=/database/metawrap_database/ncbi_taxonomy
sed -i "s#~/KRAKEN_DB#$kraken_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap
sed -i "s#~/NCBI_NT_DB#$nt_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap
sed -i "s#~/NCBI_TAX_DB#$tax_database#g" /opt/conda/envs/metawrap-env/bin/config-metawrap

The file must have write permission, otherwise bin_ Error in the reform step:

#bin_ Error in the reform step
You do not seem to have permission to edit the checkm config file
located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG

Solution: change the permissions of the config file and no more errors will be reported.

chmod 777 /opt/conda/envs/metawrap-env/bin/config-metawrap

 

4. kraken software reports an error

kraken is a software that directly annotates the species of sequencing reads (fastq). At present, there are two main versions. The memory consumption of generation 1 (kraken) is very high (> 100GB), and generation 2 (kraken 2) has been improved a lot (about 35GB).

(1) Error message caused by comment line

The path of kraken.sh script is / opt / CONDA / envs / metawap env / bin / metawap modules /. The comment information on lines 123-125 of the script is written directly after the line, resulting in an error (the error information is not recorded):

   123 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | \ #combine paired end reads onto one line 
   124 shuf | head -n $depth | sed 's/\t\t/\n/g' | \ #shuffle reads, select top N reads, and then restore tabulation 
   125 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}' #separate reads into F and R files

Solution: replace all comment lines with new lines

   123 # combine paired end reads onto one line, then 
   124 # shuffle reads, select top N reads, and then restore tabulation, then
   125 # separate reads into F and R files
   126 awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | \
   127 shuf | head -n $depth | sed 's/\t\t/\n/g' | \
   128 awk -F"\t" '{print $1 > "'"${out}/tmp_1.fastq"'"; print $2 > "'"${out}/tmp_2.fastq"'"}'

 

(2) Script no permission error

Note that the kraken.sh script permission should be executable, otherwise an error will be reported when using:

/opt/conda/envs/metawrap-env/bin/metawrap: line 69: /opt/conda/envs/metawrap-env/bin/metawrap-modules/kraken.sh: Permission denied

Solution: modify the script permission to 775 and no more errors will be reported.

chmod 775 kraken.sh
ls -l kraken.sh
-rwxrwxr-x 1 root root 8.9K Sep 22 20:12 kraken.sh

 

(3) python comment script error

python script kraken2_translate.py, dictionary names_map encountered an unknown key and reported KeyError.

Something went wrong with running kraken-translate... Exiting.

Traceback (most recent call last):
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 120, in <module>
    main()
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 114, in main
    translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file)
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 98, in translate_kraken2_annotations
    taxonomy = get_full_name(taxid, names_map, ranks_map)
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 30, in get_full_name
    name = names_map[taxid]
KeyError: '1054037'

  Solution: modify the way the dictionary obtains the value, change it to dict.get() function, and add the None value judgment.

vi /opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py
#Modify get_full_name function, so that when the key does not exist, names_map does not report an error:
    for taxid in taxid_lineage:
        #name = names_map[taxid]
        name = names_map.get(taxid)
        if name == None:
            name = "unknown"
        names_lineage.append(name)

 

 

(4) An error occurred when the taxonomy database was not found

The downloaded NCBI taxonomy database needs to be placed in the downloaded kraken database directory, otherwise an error is reported:

Traceback (most recent call last):
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 120, in <module>
    main()
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 114, in main
    translate_kraken2_annotations(annotation_file=kraken_file, kraken2_db=database_location, output=output_file)
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 87, in translate_kraken2_annotations
    names_map, ranks_map = load_kraken_db_metadata(kraken2_db)
  File "/opt/conda/envs/metawrap-env/bin/metawrap-scripts/kraken2_translate.py", line 50, in load_kraken_db_metadata
    with open(names_path) as input:
IOError: [Errno 2] No such file or directory: '/database/kraken_database/kraken_newdb2/axel_dowload/taxonomy/names.dmp'

 

 

  Solution: copy the taxonomy database to the kraken database directory.

 

(5) The kraken software does not match the database version, and an error is reported

Previously, I used kraken 2 (generation 2 software), and the (huge) database required by generation 2 has been downloaded on the server. I don't want to use kraken (generation 1 software) database again, so I tried to see if the database of generation 2 is compatible with generation 1 software. As expected, it doesn't work, and an error is reported:

kraken: database ("/database/kraken_database/kraken_newdb2/axel_dowload") does not contain necessary file database.kdb

 

Therefore, it is considered to update the kraken version in metaWRAP. It is found that the default installed metaWRAP does not support Kraken 2 and needs to be updated to the latest version 1.3.2:

Solution: update metaWRAP version to 1.3.2.

conda install -y -c ursky metawrap-mg=1.3.2
#After the update, you need to modify the permissions of the config file and its contents
chmod 777 /opt/conda/envs/metawrap-env/bin/config-metawrap

 

 

5.checkM software reports an error

(1) py line feed error

checkM is a software used to detect the integrity of genome splicing and assembly, bin_ The refinement will be used and an error will be reported directly:

Traceback (most recent call last):
  File "/opt/conda/envs/metawrap-env/bin/checkm", line 36, in <module>
    from checkm import main
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 25, in <module>
    from checkm.defaultValues import DefaultValues
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 26, in <module>
    class DefaultValues():
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 29, in DefaultValues
    __DBM = DBManager()
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 114, in __init__
    if not self.setRoot():
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 140, in setRoot
    path = self.confirmPath(path=path)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 162, in confirmPath
    path = raw_input("Where should CheckM store it's data?\n" \
EOFError: EOF when reading a line

Solution: modify the checkmData.py file raw_input() function parameter.

Path of the py script: / opt / CONDA / envs / metaclip env / lib / python2.7/site-packages/checkm/

Error reason: row 162 raw_ The input() function adds "\" as a newline character, which is not recognized by python

162  path = raw_input("Where should CheckM store it's data?\n" \
163    Please specify a location or type 'abort' to stop trying: \n")

Solution: delete the newline character.

162 path = raw_input("Where should CheckM store it's data?\nPlease specify a location or type 'abort' to stop trying: \n")

 

(2) Database error not found

When running checkM for the first time, you will be asked to select the database location, so it is best to run checkm data setRoot after installation, and set the database path first:

checkm data setRoot

*******************************************************************************
 [CheckM - data] Check for database updates. [setRoot]
*******************************************************************************

Where should CheckM store it's data?
Please specify a location or type 'abort' to stop trying: 
/checkm_database

Path [/checkm_database] exists and you have permission to write to this folder.

Otherwise, if checkM cannot find the database, the following information will be displayed:

It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'.
You do not seem to have permission to edit the checkm config file
located at /opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/DATA_CONFIG
Please try again with updated privileges.
Unexpected error: <type 'exceptions.TypeError'>

 

  (3) tmpdir path is too long and an error is reported

*******************************************************************************
 [CheckM - tree] Placing bins in reference genome tree.
*******************************************************************************

  Identifying marker genes in 8 bins with 32 threads:
Process SyncManager-1:
Traceback (most recent call last):
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 550, in _run_server
    server = cls._Server(registry, address, authkey, serializer)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 162, in __init__
    self.listener = Listener(address=address, backlog=16)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line 132, in __init__
    self._listener = SocketListener(address, family, backlog)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/connection.py", line 256, in __init__
    self._socket.bind(address)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
error: AF_UNIX path too long
Traceback (most recent call last):
  File "/opt/conda/envs/metawrap-env/bin/checkm", line 708, in <module>
    checkmParser.parseOptions(args)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 1251, in parseOptions
    self.tree(options)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 133, in tree
    options.bCalledGenes)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/markerGeneFinder.py", line 67, in find
    binIdToModels = mp.Manager().dict()
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/__init__.py", line 99, in Manager
    m.start()
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/managers.py", line 528, in start
    self._address = reader.recv()
EOFError

 

  Solution: modify the checkm --tmpdir specified in binning.sh and other scripts, and specify a temporary file storage directory with a short absolute path.

#The three scripts under this path all use checkM, and the default needs to be changed--tmpdir
cd /opt/conda/envs/metawrap-env/bin/metawrap-modules
grep checkm *sh|awk -F ":" '{print $1}'|sort|uniq

bin_refinement.sh
binning.sh
reassemble_bins.sh

#Take binning.sh as an example
#Add a line before the checkm command to create a short tmp directory for storing the tmp files of checkm
mkdir -p /tmp/$(basename ${1}).tmp

#Modify the of checkm--tmpdir
61  checkm lineage_wf -x fa ${1} ${1}.checkm -t $threads --tmpdir /tmp/$(basename ${1}).tmp --pplacer_threads $p_threads
62 if [[ ! -s ${1}.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi

#Delete the tmp directory after running
rm -r /tmp/$(basename ${1}).tmp

#The other two scripts also need to modify the corresponding checkm line #bin_ Modification.sh script
if [ ! -d /tmp/$(basename ${bin_set}) ]; then mkdir -p /tmp/$(basename ${bin_set}).tmp; fi if [ "$quick" == "true" ]; then comm "Note: running with --reduced_tree option" checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename ${bin_set}).tmp --pplacer_threads $p_threads --reduced_tree else checkm lineage_wf -x fa $bin_set ${bin_set}.checkm -t $threads --tmpdir /tmp/$(basename ${bin_set}).tmp --pplacer_threads $p_threads fi if [[ ! -s ${bin_set}.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi ${SOFT}/summarize_checkm.py ${bin_set}.checkm/storage/bin_stats_ext.tsv $bin_set | (read -r; printf "%s\n" "$REPLY"; sort) > ${bin_set}.stats if [[ $? -ne 0 ]]; then error "Cannot make checkm summary file. Exiting."; fi rm -r ${bin_set}.checkm; rm -r /tmp/$(basename ${bin_set}).tmp mkdir -p /tmp/binsO.tmp if [ "$quick" == "true" ]; then checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads --reduced_tree else checkm lineage_wf -x fa binsO binsO.checkm -t $threads --tmpdir /tmp/binsO.tmp --pplacer_threads $p_threads fi if [[ ! -s binsO.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi rm -r /tmp/binsO.tmp #reassemble_ Modifying the bins.sh script mkdir -p /tmp/$(basename ${out}).tmp checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename ${out}).tmp --pplacer_threads $p_threads if [[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi ${SOFT}/summarize_checkm.py ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv | (read -r; printf "%s\n" "$REPLY"; sort) > ${out}/reassembled_bins.stats if [[ $? -ne 0 ]]; then error "Cannot make checkm summary file. Exiting."; fi rm -r /tmp/$(basename ${out}).tmp mkdir -p /tmp/$(basename ${out}).tmp checkm lineage_wf -x fa ${out}/reassembled_bins ${out}/reassembled_bins.checkm -t $threads --tmpdir /tmp/$(basename ${out}).tmp --pplacer_threads $p_threads if [[ ! -s ${out}/reassembled_bins.checkm/storage/bin_stats_ext.tsv ]]; then error "Something went wrong with running CheckM. Exiting..."; fi rm -r /tmp/$(basename ${out}).tmp

 

  This error causes bin_ Error in revision (because checkM does not run correctly and there is no corresponding statistical result):

Traceback (most recent call last):
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 277, in _run_finalizers
    finalizer()
  File "/opt/conda/envs/metawrap-env/lib/python2.7/multiprocessing/util.py", line 207, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line 266, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "/opt/conda/envs/metawrap-env/lib/python2.7/shutil.py", line 264, in rmtree
    os.remove(fullname)
OSError: [Errno 16] Device or resource busy: 'binsO.tmp/pymp-REeR36/.nfs9061e516f4bd263400000b82'

mv: cannot stat 'binning_results.eps': No such file or directory
mv: cannot stat 'binning_results.eps': No such file or directory

 

 

  6.BLAST error

In the blog step, the BLAST version is inconsistent with the downloaded nt database (version 5, the latest version of the database), and an error is reported:
BLAST Database error: Error: Not a valid version 4 database.

 

  Solution: update BLAST version.

#Download and unzip the new BLAST software
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.12.0+-x64-linux.tar.gz
tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz 

#Replace the BLAST in the conda image
mkdir /opt/conda/envs/metawrap-env/bin/bak
for i in $(ls);do mv /opt/conda/envs/metawrap-env/bin/$i /opt/conda/envs/metawrap-env/bin/bak;cp $i /opt/conda/envs/metawrap-env/bin;done

 

 

7. Error reported by prokka

(1) The installation is unsuccessful and an error is reported

prokka software is used to annotate the assembled genome. It is a perl script. conda installation fails and cannot be opened at all

 

(1) The blast version is not recognized and an error is reported

prokka's requirements for software blastp and makeblastdb are that the version is greater than 2.8 or above, but there is a problem with the judgment conditions here, and my blast 2.12.0 cannot be recognized (I think version 2.12 is less than 2.8...).

I didn't understand perl language and couldn't optimize, so I had to change MINVER to 2.1:

  'blastp' => {
    GETVER  => "blastp -version",
    REGEXP  => qr/blastp:\s+($BIDEC)/,
    MINVER  => "2.1",
    NEEDED  => 1,
  },
  'makeblastdb' => {
    GETVER  => "makeblastdb -version",
    REGEXP  => qr/makeblastdb:\s+($BIDEC)/,
    MINVER  => "2.1",
    NEEDED  => 0,  # only if --proteins used
  },

 

Posted by angryjohnny on Sun, 28 Nov 2021 04:43:33 -0800