Caution
These documents refer to an obsolete way of installing and running FALCON. They will remain up for historical context and for individuals still using the older version of FALCON/FALCON_unzip.
Attention
The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly

Parameters¶
Configuration¶
Here are some example fc_run.cfg
and fc_unzip.cfg
files. We make no guarantee that they will work with your
dataset and cluster configuration. We merely provide them as starting points that have proven themselves on internal
datasets. A lot of your success will depend purely on the quality of the input data prior to even engaging the FALCON
pipeline. Also, these particular configs were designed to work in our SGE compute cluster, so some tuning will likely
be necessary on your part. You should consult with your HPC administrator to assist in tuning to your cluster.
FALCON Parameter sets¶
fc_run_fungal.cfg
- Has worked well on a 40Mb fungal genome
fc_run_human.cfg
- Has worked well on at least one human dataset
fc_run_bird.cfg
- Has worked well on at least one avian dataset
fc_run_yeast.cfg
- Has worked well on at least one yeast dataset
fc_run_dipteran.cfg
- Has worked well on at least one dipteran (insect) dataset
fc_run_mammal.cfg
- Has worked well on at least one mammalian dataset
fc_run_mammalSequel.cfg
- Has worked well on at least one mammalian Sequel dataset
fc_run_plant.cfg
- Has worked well on at least one plant (Ranunculales) dataset
fc_run_arabidopsis.cfg
- Configuration for arabidopsis assembly in Chin et al.
2016
fc_run_ecoli.cfg
- Configuration for test E. coli dataset
fc_run_ecoli_local.cfg
- Configuration for test E. coli dataset run locally
FALCON_unzip Parameter sets¶
fc_unzip.cfg
- General all purpose unzip config
Available Parameters¶
fc_run.cfg¶
- input_fofn <str>
- filename for the file-of-filenames (fofn) Each line is fasta filename. Any relative paths are relative to the location of the input_fofn.
- input_type <str>
- “raw” or “preads”
- genome_size <int>
- estimated number of base-pairs in haplotype
- seed-coverage <int>
- requested coverage for auto-calculated cutoff
- length_cutoff <int>
- Raw reads shorter than this cutoff won’t be considered in the assembly process. If ‘-1’, then auto-calculate the cutoff based on genome_size and seed_coverage.
- length_cutoff_pr <int>
- minimum length of seed-reads used after pre-assembly, for the “overlap” stage
- target <str>
- “assembly” or “preads” If “preads”, then pre-assembly stage is skipped and input is assumed to be preads.
- default_concurrent_jobs <int>
- maximum concurrency This applies even to “local” (non-distributed) jobs.
- pa_concurrent_jobs <str>
- Concurrency settings for pre-assembly
- cns_concurrent_jobs <str>
Concurrency settings for consensus calling
One can use cns_concurrent_jobs to control the maximum number of concurrent consensus jobs submitted to the job management system. The
out.XXXXX.fasta
files produced are used as input for the next step in the pipeline.
- ovlp_concurrent_jobs <str>
- Concurrency settings for Overlap detection
- job_type <str>
- grid submission system, or “local” Supported types include: “sge”, “lsf”, “pbs”, “torque”, “slurm”, “local” case-insensitive
- job_queue <str>
- grid job-queue name Can be overridden with section-specific sge_option_*
- sge_option_da <str>
- Grid concurrency settings for initial daligner steps
0-rawreads/
- sge_option_la <str>
- Grid concurrency settings for initial las-merging
0-rawreads/
- sge_option_cns <str>
- Grid concurrency settings for error correction consensus calling
- sge_option_pda <str>
- Grid concurrency settings for daligner on preads
1-preads_ovl/
- sge_option_pla <str>
- Grid concurrency settings for las-merging on preads in
1-preads_ovl/
- sge_option_fc <str>
- Grid concurrency settings for stage 2 in
2-asm-falcon/
- pa_DBdust_option <str>
- Passed to
DBdust
. Used only ifdust = true
.
- pa_DBsplit_option <str>
- Passed to
DBsplit
during pre-assembly stage.
- pa_HPCdaligner_option <str>
Passed to
HPC.daligner
during pre-assembly stage. We will add-H
based on``length_cutoff``.The
-dal
option also controls the number of jobs being spawned. The number for the-dal
option determines how many blocks are compared to each in single jobs. Having a larger number will spawn a fewer number of larger jobs, while the opposite will give you a larger number of small jobs. This will depend on your on your compute resources available.In this workflow, the trace point generated by
daligner
is not used. ( Well, to be efficient, one should use the trace points but one have to know how to pull them out correctly first. ) The-s1000
argument makes the trace points sparse to save some disk space (not much though). We can also ignore all reads below a certain threshold by specifying a length cutoff with-l1000
.The biggest difference between this parameter and the
ovlp_HPCdaligner_option
parameter is that the latter needs to have a relaxed error rate switch-e
as the alignment is being performed on uncorrected reads.
- pa_dazcon_option <str>
- Passed to
dazcon
. Used only ifdazcon = true
.
- falcon_sense_option <str>
- Passed to
fc_consensus
. Ignored ifdazcon = true
.
- falcon_sense_skip_contained <str>
- Causes
-s
to be passed toLA4Falcon
. Rarely needed.
- ovlp_DBsplit_option <str>
- Passed to
DBsplit
during overlap stage.
- ovlp_HPCdaligner_option <str>
- Passed to
HPC.daligner
during overlap stage.
- overlap_filtering_setting <str>
- Passed to
fc_ovlp_filter
during assembly stage.
- fc_ovlp_to_graph_option <str>
- Passed to
fc_ovlp_to_graph
.
- skip_check <bool>
- If “true”, then skip
LAcheck
duringLAmerge
/LAsort
. (Actually,LAcheck
is run, but failures are ignored.) Whendaligner
bugs are finally fixed, this will be unnecessary.
- dust <bool>
- If true, then run
DBdust
before pre-assembly.
- dazcon <bool>
- If true, then use
dazcon
(from pbdagcon repo).
- stop_all_jobs_on_failure <bool>
- DEPRECATED
This was used for the old pypeFLOW refresh-loop, used by
run0.py
. (This is not the option to let jobs currently in SGE (etc) to keep running, which is still TODO.)
- use_tmpdir <bool>
- (boolean string) whether to run each job in
TMPDIR
and copy results back to nfs If “true”, useTMPDIR
. (Actually,tempfile.tmpdir
. See standard Python docs: https://docs.python.org/2/library/tempfile.html ) If the value looks like a path, then it is used instead ofTMPDIR
.
fc_unzip.cfg¶
- job_type <str>
- same as above. grid submission system, or “local” Supported types include: “sge”, “lsf”, “pbs”, “torque”, “slurm”, “local” case-insensitive
- input_fofn <str>
- This will be the same input file you used in your fc_run.cfg
- input_bam_fofn <str>
- List of movie bam files. Only necessary if performing consensus calling step at the end.
- smrt_bin <str>
- path to
bin
directory containing samtools, blasr, and various GenomicConsensus utilities - jobqueue <str>
- Queue to submit SGE jobs to.
- sge_phasing <str>
- Phasing grid settings. Example:
-pe smp 12 -q %(jobqueue)s
- sge_quiver <str>
- Consensus calling grid settings. Example
-pe smp 24 -q %(jobqueue)s
- sge_track_reads <str>
- Read tracking grid settings. Example
-pe smp 12 -q %(jobqueue)s
- sge_blasr_aln <str>
blasr
alignment grid settings. Example-pe smp 24 -q %(jobqueue)s
- sge_hasm <str>
- Final haplotyped assemble grid settings Example
-pe smp 48 -q %(jobqueue)s
- unzip_concurrent_jobs <int>
- Number of concurrent unzip jobs to run at a time
- quiver_concurrent_jobs <int>
- Number of concurrent consensus calling jobs to run