Caution
These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.
Attention
The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly
Commands¶
FALCON Commands¶
- DB2Falcon
- Used to dump dazzler preads.db into FASTA format for subsequent String Graph assembly
- fc_run.py
- This script drives the entire assembly process
- fc_consensus.py
fc_consensus
has many options. You can use the parameter falcon_sense_option to control it. In most cases, the--min_cov
and--max_n_read
are the most important options.--min_cov
controls when a seed read gets trimmed or broken due to low coverage.--max_n_read
puts a cap on the number of reads used for error correction. In highly repetitive genome, you will need to make the value for--max_n_read
smaller to make sure the consensus code does not waste time aligning repeats. The longest proper overlaps are used for correction to reduce the probability of collapsed repeats.
- fc_dedup_a_tigs.py
- remove duplicated associated contigs, mostly induced by tandem repeat alignment uncertainty
- fc_graph_to_contig.py
- Generate contigs based on assembly graph
- fc_ovlp_to_graph.py
- Generate an assembly graph given a list of overlapping preads.
- fc_ovlp_filter.py
- Filter overlaps based on given criteria
FALCON_unzip commands¶
- fc_get_read_hctg_map.py
- Generate a read-to-contig map
fc_dedup_h_tigs.py
fc_graphs_to_h_tigs.py
fc_ovlp_filter_with_phase.py
fc_phased_ovlp_to_graph.py
fc_phasing.py
fc_phasing_readmap.py
fc_rr_hctg_track.py
fc_select_reads_from_bam.py
fc_track_reads_htigs.py
fc_unzip.py
Dazzler commands¶
These commands are part of Gene Meyer’s Dazzler Suite of tools Dazzler Blog
FALCON relies on a slightly modified version of Gene Meyer’s code that can be found here, but is also bundled with the FALCON-integrate github repository.
- daligner:
Compare subject sequences to target sequences
daligner
is controlled by pa_HPCdaligner_option and ovlp_HPCdaligner_option.To limit memory, one can use the
-M
option. For human assembly, we’ve tested with-M 32
for using 32G RAM for each daligner. Other possibilities are under investigation.For more details on daligner options, see the Dazzler Blog
- DB2fasta:
- The set of .fasta files for the given DB are recreated from the DB exactly as they were input.
- DBdump:
- Like DBshow, DBdump allows one to display a subset of the reads in the DB and select which information to show about them including any mask tracks.
- DBdust:
- Runs the symmetric DUST algorithm over the reads in the untrimmed DB
- DBsplit:
- The total number of jobs that are run is determined by how one “splits” the sequence database. You should read
Gene Myers’s blog Dazzler Blog <http://dazzlerblog.wordpress.com> carefully to understand how the tuning options,
pa_DBsplit_option and pa_HPCdaligner_option work. Generally, for large genomes, you should use
-s400
(400Mb sequence per block) in pa_DBsplit_option. This will make a smaller number of jobs but each job will run longer. However, if you have a job scheduler which limits how long a job can run, it might be desirable to have a smaller number for the-s
option.
- DBstats:
- Show overview statistics for all the reads in the trimmed data base <path>.db
- fasta2DB:
- Convert a fasta to a dazzler DB.
- HPC.daligner:
- Generates overlap script to run all necessary daligner, LAsort and LAmerge commands
- LA4Falcon:
- Output data from a Dazzler DB into fasta format for FALCON. You can supply the argument
-H
with an integer value to filter reads below a given threshold.
- LAcheck:
- Check integrity of alignment files.
- LAmerge:
- Merge the .las files <parts> into a singled sorted file
- LAsort:
- Sort alignment files