Caution

These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.

Attention

The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly

Commands

FALCON Commands

DB2Falcon
Used to dump dazzler preads.db into FASTA format for subsequent String Graph assembly
fc_run.py
This script drives the entire assembly process
fc_consensus.py
fc_consensus has many options. You can use the parameter falcon_sense_option to control it. In most cases, the --min_cov and --max_n_read are the most important options. --min_cov controls when a seed read gets trimmed or broken due to low coverage. --max_n_read puts a cap on the number of reads used for error correction. In highly repetitive genome, you will need to make the value for --max_n_read smaller to make sure the consensus code does not waste time aligning repeats. The longest proper overlaps are used for correction to reduce the probability of collapsed repeats.
fc_dedup_a_tigs.py
remove duplicated associated contigs, mostly induced by tandem repeat alignment uncertainty
fc_graph_to_contig.py
Generate contigs based on assembly graph
fc_ovlp_to_graph.py
Generate an assembly graph given a list of overlapping preads.
fc_ovlp_filter.py
Filter overlaps based on given criteria

FALCON_unzip commands

fc_get_read_hctg_map.py
Generate a read-to-contig map

fc_dedup_h_tigs.py

fc_graphs_to_h_tigs.py

fc_ovlp_filter_with_phase.py

fc_phased_ovlp_to_graph.py

fc_phasing.py

fc_phasing_readmap.py

fc_quiver.py

fc_rr_hctg_track.py

fc_select_reads_from_bam.py

fc_track_reads_htigs.py

fc_unzip.py

Dazzler commands

These commands are part of Gene Meyer’s Dazzler Suite of tools Dazzler Blog

FALCON relies on a slightly modified version of Gene Meyer’s code that can be found here, but is also bundled with the FALCON-integrate github repository.

daligner:

Compare subject sequences to target sequences daligner is controlled by pa_HPCdaligner_option and ovlp_HPCdaligner_option.

To limit memory, one can use the -M option. For human assembly, we’ve tested with -M 32 for using 32G RAM for each daligner. Other possibilities are under investigation.

For more details on daligner options, see the Dazzler Blog

DB2fasta:
The set of .fasta files for the given DB are recreated from the DB exactly as they were input.
DBdump:
Like DBshow, DBdump allows one to display a subset of the reads in the DB and select which information to show about them including any mask tracks.
DBdust:
Runs the symmetric DUST algorithm over the reads in the untrimmed DB
DBsplit:
The total number of jobs that are run is determined by how one “splits” the sequence database. You should read Gene Myers’s blog Dazzler Blog <http://dazzlerblog.wordpress.com> carefully to understand how the tuning options, pa_DBsplit_option and pa_HPCdaligner_option work. Generally, for large genomes, you should use -s400 (400Mb sequence per block) in pa_DBsplit_option. This will make a smaller number of jobs but each job will run longer. However, if you have a job scheduler which limits how long a job can run, it might be desirable to have a smaller number for the -s option.
DBstats:
Show overview statistics for all the reads in the trimmed data base <path>.db
fasta2DB:
Convert a fasta to a dazzler DB.
HPC.daligner:
Generates overlap script to run all necessary daligner, LAsort and LAmerge commands
LA4Falcon:
Output data from a Dazzler DB into fasta format for FALCON. You can supply the argument -H with an integer value to filter reads below a given threshold.
LAcheck:
Check integrity of alignment files.
LAmerge:
Merge the .las files <parts> into a singled sorted file
LAsort:
Sort alignment files