Caution

These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.

Attention

The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly

Commands¶

FALCON Commands¶

DB2Falcon: Used to dump dazzler preads.db into FASTA format for subsequent String Graph assembly

fc_run.py: This script drives the entire assembly process

fc_consensus.py: fc_consensus has many options. You can use the parameter falcon_sense_option to control it. In most cases, the --min_cov and --max_n_read are the most important options. --min_cov controls when a seed read gets trimmed or broken due to low coverage. --max_n_read puts a cap on the number of reads used for error correction. In highly repetitive genome, you will need to make the value for --max_n_read smaller to make sure the consensus code does not waste time aligning repeats. The longest proper overlaps are used for correction to reduce the probability of collapsed repeats.

fc_dedup_a_tigs.py: remove duplicated associated contigs, mostly induced by tandem repeat alignment uncertainty

fc_graph_to_contig.py: Generate contigs based on assembly graph

fc_ovlp_to_graph.py: Generate an assembly graph given a list of overlapping preads.

fc_ovlp_filter.py: Filter overlaps based on given criteria

FALCON_unzip commands¶

fc_get_read_hctg_map.py: Generate a read-to-contig map

fc_dedup_h_tigs.py

fc_graphs_to_h_tigs.py

fc_ovlp_filter_with_phase.py

fc_phased_ovlp_to_graph.py

fc_phasing.py

fc_phasing_readmap.py

fc_quiver.py

fc_rr_hctg_track.py

fc_select_reads_from_bam.py

fc_track_reads_htigs.py

fc_unzip.py

Dazzler commands¶

These commands are part of Gene Meyer’s Dazzler Suite of tools Dazzler Blog

FALCON relies on a slightly modified version of Gene Meyer’s code that can be found here, but is also bundled with the FALCON-integrate github repository.

daligner:

Compare subject sequences to target sequences daligner is controlled by pa_HPCdaligner_option and ovlp_HPCdaligner_option.

To limit memory, one can use the -M option. For human assembly, we’ve tested with -M 32 for using 32G RAM for each daligner. Other possibilities are under investigation.

For more details on daligner options, see the Dazzler Blog

DB2fasta:: The set of .fasta files for the given DB are recreated from the DB exactly as they were input.

DBdump:: Like DBshow, DBdump allows one to display a subset of the reads in the DB and select which information to show about them including any mask tracks.

DBdust:: Runs the symmetric DUST algorithm over the reads in the untrimmed DB

DBsplit:: The total number of jobs that are run is determined by how one “splits” the sequence database. You should read Gene Myers’s blog Dazzler Blog <http://dazzlerblog.wordpress.com> carefully to understand how the tuning options, pa_DBsplit_option and pa_HPCdaligner_option work. Generally, for large genomes, you should use -s400 (400Mb sequence per block) in pa_DBsplit_option. This will make a smaller number of jobs but each job will run longer. However, if you have a job scheduler which limits how long a job can run, it might be desirable to have a smaller number for the -s option.

DBstats:: Show overview statistics for all the reads in the trimmed data base <path>.db

fasta2DB:: Convert a fasta to a dazzler DB.

HPC.daligner:: Generates overlap script to run all necessary daligner, LAsort and LAmerge commands

LA4Falcon:: Output data from a Dazzler DB into fasta format for FALCON. You can supply the argument -H with an integer value to filter reads below a given threshold.

LAcheck:: Check integrity of alignment files.

LAmerge:: Merge the .las files <parts> into a singled sorted file

LAsort:: Sort alignment files