fc_graph_to_contig.pyΒΆ

The final step in the generation of draft contigs is to find a single path for each contig graph and to generate sequence accordingly. In the case that a contig graph is not a simple path, we find the end-to-end path that has the most overlapped bases. This is called as the primary contig. For each compound path within the graph, if an alternative path different from primary one is possible, we will construct the associated contig. In the case where the associated contigs are induced by sequencing error, the identity of the alternative contig and the primary contig will be high ( > 99% identity most of time). In the case where there are true structural variations, there are typically bigger differences between the associated contigs and the primary contigs.

Essentially, the script fc_graph_to_contig generates contigs given sequence data and the final assembly graph. Currently it generates primary contigs as well as all associated contigs without any filtering. Some post-processing to remove duplicate associated contigs induced by errors will generally be necessary.