These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.


The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly


FALCON Commands

Used to dump dazzler preads.db into FASTA format for subsequent String Graph assembly
This script drives the entire assembly process
fc_consensus has many options. You can use the parameter falcon_sense_option to control it. In most cases, the --min_cov and --max_n_read are the most important options. --min_cov controls when a seed read gets trimmed or broken due to low coverage. --max_n_read puts a cap on the number of reads used for error correction. In highly repetitive genome, you will need to make the value for --max_n_read smaller to make sure the consensus code does not waste time aligning repeats. The longest proper overlaps are used for correction to reduce the probability of collapsed repeats.
remove duplicated associated contigs, mostly induced by tandem repeat alignment uncertainty
Generate contigs based on assembly graph
Generate an assembly graph given a list of overlapping preads.
Filter overlaps based on given criteria

FALCON_unzip commands
Generate a read-to-contig map

Dazzler commands

These commands are part of Gene Meyer’s Dazzler Suite of tools Dazzler Blog

FALCON relies on a slightly modified version of Gene Meyer’s code that can be found here, but is also bundled with the FALCON-integrate github repository.


Compare subject sequences to target sequences daligner is controlled by pa_HPCdaligner_option and ovlp_HPCdaligner_option.

To limit memory, one can use the -M option. For human assembly, we’ve tested with -M 32 for using 32G RAM for each daligner. Other possibilities are under investigation.

For more details on daligner options, see the Dazzler Blog

The set of .fasta files for the given DB are recreated from the DB exactly as they were input.
Like DBshow, DBdump allows one to display a subset of the reads in the DB and select which information to show about them including any mask tracks.
Runs the symmetric DUST algorithm over the reads in the untrimmed DB
The total number of jobs that are run is determined by how one “splits” the sequence database. You should read Gene Myers’s blog Dazzler Blog <> carefully to understand how the tuning options, pa_DBsplit_option and pa_HPCdaligner_option work. Generally, for large genomes, you should use -s400 (400Mb sequence per block) in pa_DBsplit_option. This will make a smaller number of jobs but each job will run longer. However, if you have a job scheduler which limits how long a job can run, it might be desirable to have a smaller number for the -s option.
Show overview statistics for all the reads in the trimmed data base <path>.db
Convert a fasta to a dazzler DB.
Generates overlap script to run all necessary daligner, LAsort and LAmerge commands
Output data from a Dazzler DB into fasta format for FALCON. You can supply the argument -H with an integer value to filter reads below a given threshold.
Check integrity of alignment files.
Merge the .las files <parts> into a singled sorted file
Sort alignment files