Caution
These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.
Attention
The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly
Running Unzip on HGAP4 output¶
Overview¶
HGAP4
is a FALCON
-based assembly pipeline, available through the SMRT Link interface. The pipeline itself encapsulates de novo assembly and polishing of the resulting contigs, but not the FALCON-unzip
process as well. FALCON-unzip
is currently available as a standalone tool, runnable only via command line.
Although HGAP4
runs FALCON
under the hood, the folder structure it generates is different than that of FALCON
. The FALCON-unzip
, however, requires the assembly folders to be formatted in the FALCON
-style.
This tutorial describes the necessary steps required to adjust the HGAP4 output to be compatible with a form required by FALCON-unzip
.
In brief, the majority of work required to adjust the HGAP4
output to a FALCON
-compatible directory structure is implemented in a script called hgap4_adapt
. This script lives in the FALCON
repository.
The complete process is composed of the following steps:
- Installing
FALCON
andFALCON-unzip
. - Running
hgap4_adapt
. - Creating the
fc_unzip.cfg
configuration file forFALCON-unzip
. - Creating the
input.fofn
andinput_bam.fofn
. - Running
FALCON-unzip
.
IMPORTANT: FALCON-unzip
can only be run on HGAP4 jobs which had the Save Output for Unzip
option turned on. It is not possible to run FALCON-unzip
otherwise, because critical files will be missing from your job’s output.
1. Installing FALCON
and FALCON-unzip
¶
The latest versions of FALCON
and FALCON-unzip
are available as precompiled Linux binaries. The easiest approach to installing them is through a wrapper script, described here:
Follow this approach to set-up the environment before moving on to step 2.
Alternatively, one can install the binaries manually by following the instructions here: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
2. Running hgap4_adapt
¶
Once the FALCON
installation was successful, one needs to activate the installation environment to make the hgap4_adapt
script available. This will also activate FALCON
and FALCON-unzip
. To verify the installation, run the following:
source /path/to/your/install/dir/fc_env/bin/activate
python -m falcon_kit.mains.hgap4_adapt --help
If everything was successful, this should output verbose usage information to screen. After this is set-up and working, adapting an existing HGAP4 run is as simple as the following example (take note of the dummy path, and replace it with a real one):
source /path/to/your/install/dir/fc_env/bin/activate
job_dir=/path/to/your/hgap4/job/123/123456/
mkdir –p example1
cd example1
python -m falcon_kit.mains.hgap4_adapt --job-output-dir=${job_dir}
The result should be visible in the example1
directory - it should now be populated to folders resembling a typical FALCON
assembly run.
3. Creating the fc_unzip.cfg
configuration file for FALCON-unzip
¶
For help on .cfg files, please take a look at these Wiki pages:
4. Creating the input.fofn
and input_bam.fofn
¶
The input.fofn
file (“file of file names”) contains the paths to files containing plain FASTA sequences of your raw reads, one file per row. All raw reads in the FASTA format should be available in your job dir:
job_dir=/path/to/your/hgap4/job/123/123456/
mkdir –p example1
cd example1
echo "${job_dir}/tasks/pbcoretools.tasks.gather_fasta-1/file.fasta" > input.fofn
The input_bam.fofn
is required for the polishing step. This file is composed of a list of all BAM files from the input dataset which was provided to the initial HGAP4 run:
source /path/to/your/install/dir/fc_env/bin/activate
job_dir=/path/to/your/hgap4/job/123/123456/
mkdir –p example1
cd example1
dataset summarize ${job_dir}/tasks/pbcoretools.tasks.filterdataset-0/filtered.subreadset.xml | grep -E "*.bam$" > input_bam.fofn
5. Running FALCON-unzip
¶
Before running FALCON-unzip
, the adapted folder structure should be similar to the following:
$ cd example1
$ ls | xargs -n 1
0-rawreads
1-preads_ovl
2-asm-falcon
fc_unzip.cfg
input_bam.fofn
input.fofn
Finally, to run FALCON-unzip
, do the following:
source /path/to/your/install/dir/fc_env/bin/activate
cd example1
fc_unzip.py fc_unzip.cfg
fc_quiver.py fc_unzip.cfg