These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.
The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly
- This script drives the entire assembly process
fc_consensushas many options. You can use the parameter falcon_sense_option to control it. In most cases, the
--max_n_readare the most important options.
--min_covcontrols when a seed read gets trimmed or broken due to low coverage.
--max_n_readputs a cap on the number of reads used for error correction. In highly repetitive genome, you will need to make the value for
--max_n_readsmaller to make sure the consensus code does not waste time aligning repeats. The longest proper overlaps are used for correction to reduce the probability of collapsed repeats.
- remove duplicated associated contigs, mostly induced by tandem repeat alignment uncertainty
- Generate contigs based on assembly graph
- Generate an assembly graph given a list of overlapping preads.
- Filter overlaps based on given criteria
- Generate a read-to-contig map
These commands are part of Gene Meyer’s Dazzler Suite of tools Dazzler Blog
To limit memory, one can use the
-Moption. For human assembly, we’ve tested with
-M 32for using 32G RAM for each daligner. Other possibilities are under investigation.
For more details on daligner options, see the Dazzler Blog
- The set of .fasta files for the given DB are recreated from the DB exactly as they were input.
- Like DBshow, DBdump allows one to display a subset of the reads in the DB and select which information to show about them including any mask tracks.
- Runs the symmetric DUST algorithm over the reads in the untrimmed DB
- The total number of jobs that are run is determined by how one “splits” the sequence database. You should read
Gene Myers’s blog Dazzler Blog <http://dazzlerblog.wordpress.com> carefully to understand how the tuning options,
pa_DBsplit_option and pa_HPCdaligner_option work. Generally, for large genomes, you should use
-s400(400Mb sequence per block) in pa_DBsplit_option. This will make a smaller number of jobs but each job will run longer. However, if you have a job scheduler which limits how long a job can run, it might be desirable to have a smaller number for the
- Show overview statistics for all the reads in the trimmed data base <path>.db
- Convert a fasta to a dazzler DB.
- Generates overlap script to run all necessary daligner, LAsort and LAmerge commands
- Output data from a Dazzler DB into fasta format for FALCON. You can supply the argument
-Hwith an integer value to filter reads below a given threshold.
- Check integrity of alignment files.
- Merge the .las files <parts> into a singled sorted file
- Sort alignment files