HG-CoLoR: Hybrid Graph for the error Correction of Long Reads Pierre - - PowerPoint PPT Presentation

hg color hybrid graph for the error correction of long
SMART_READER_LITE
LIVE PREVIEW

HG-CoLoR: Hybrid Graph for the error Correction of Long Reads Pierre - - PowerPoint PPT Presentation

HG-CoLoR: Hybrid Graph for the error Correction of Long Reads Pierre Morisse , Thierry Lecroq and Arnaud Lefebvre pierre.morisse2@univ-rouen.fr Laboratoire dInformatique, de Traitement de lInformation et des Syst` emes July 5, 2017


slide-1
SLIDE 1

HG-CoLoR: Hybrid Graph for the error Correction of Long Reads

Pierre Morisse, Thierry Lecroq and Arnaud Lefebvre

pierre.morisse2@univ-rouen.fr

Laboratoire d’Informatique, de Traitement de l’Information et des Syst` emes July 5, 2017

slide-2
SLIDE 2

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Plan

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 2/30

slide-3
SLIDE 3

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 3/30

slide-4
SLIDE 4

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Next Generation Sequencing

In 2005, Next Generation Sequencing (NGS) technologies started to develop Production of millions of short sequences (100-300 bases), called reads, used to resolve mapping and assembly problems Due to their high number, efficient algorithms are required to process these reads These reads also contain sequencing errors (∼ 1%) NGS data analysis became an important research field

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 4/30

slide-5
SLIDE 5

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Next Generation Sequencing

In 2005, Next Generation Sequencing (NGS) technologies started to develop Production of millions of short sequences (100-300 bases), called reads, used to resolve mapping and assembly problems Due to their high number, efficient algorithms are required to process these reads These reads also contain sequencing errors (∼ 1%) NGS data analysis became an important research field

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 4/30

slide-6
SLIDE 6

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Next Generation Sequencing

In 2005, Next Generation Sequencing (NGS) technologies started to develop Production of millions of short sequences (100-300 bases), called reads, used to resolve mapping and assembly problems Due to their high number, efficient algorithms are required to process these reads These reads also contain sequencing errors (∼ 1%) NGS data analysis became an important research field

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 4/30

slide-7
SLIDE 7

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Next Generation Sequencing

In 2005, Next Generation Sequencing (NGS) technologies started to develop Production of millions of short sequences (100-300 bases), called reads, used to resolve mapping and assembly problems Due to their high number, efficient algorithms are required to process these reads These reads also contain sequencing errors (∼ 1%) NGS data analysis became an important research field

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 4/30

slide-8
SLIDE 8

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Next Generation Sequencing

In 2005, Next Generation Sequencing (NGS) technologies started to develop Production of millions of short sequences (100-300 bases), called reads, used to resolve mapping and assembly problems Due to their high number, efficient algorithms are required to process these reads These reads also contain sequencing errors (∼ 1%) NGS data analysis became an important research field

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 4/30

slide-9
SLIDE 9

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Third Generation Sequencing

More recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 5/30

slide-10
SLIDE 10

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Third Generation Sequencing

More recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 5/30

slide-11
SLIDE 11

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Third Generation Sequencing

More recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 5/30

slide-12
SLIDE 12

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Third Generation Sequencing

More recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 5/30

slide-13
SLIDE 13

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Third Generation Sequencing

More recently, Third Generation Sequencing technologies started to develop Two main technologies: Pacific Biosciences and Oxford Nanopore Allow the sequencing of longer reads (several thousand of bases) Very useful to resolve assembly problems for large and complex genomes Much higher error rate, around 15% for Pacific Biosciences and up to 30% for Oxford Nanopore

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 5/30

slide-14
SLIDE 14

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Problem

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 6/30

slide-15
SLIDE 15

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Problem

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 6/30

slide-16
SLIDE 16

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Problem

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 6/30

slide-17
SLIDE 17

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Problem

Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 6/30

slide-18
SLIDE 18

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 7/30

slide-19
SLIDE 19

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Inspiration

NaS [Madoui et al., 2015] Does not locally correct erroneous regions Uses long reads as templates to generate corrected long reads from assemblies of short reads Requires the mapping of the short reads both on the long reads and against each other

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 8/30

slide-20
SLIDE 20

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Inspiration

NaS [Madoui et al., 2015] Does not locally correct erroneous regions Uses long reads as templates to generate corrected long reads from assemblies of short reads Requires the mapping of the short reads both on the long reads and against each other

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 8/30

slide-21
SLIDE 21

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Inspiration

NaS [Madoui et al., 2015] Does not locally correct erroneous regions Uses long reads as templates to generate corrected long reads from assemblies of short reads Requires the mapping of the short reads both on the long reads and against each other

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 8/30

slide-22
SLIDE 22

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Inspiration

NaS [Madoui et al., 2015] Does not locally correct erroneous regions Uses long reads as templates to generate corrected long reads from assemblies of short reads Requires the mapping of the short reads both on the long reads and against each other

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 8/30

slide-23
SLIDE 23

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

NaS overview

NaS corrects a long read as follows:

long read seeds

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 9/30

slide-24
SLIDE 24

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

NaS overview

NaS corrects a long read as follows:

seeds similar short reads

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 9/30

slide-25
SLIDE 25

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

NaS overview

NaS corrects a long read as follows:

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 9/30

slide-26
SLIDE 26

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

NaS overview

NaS corrects a long read as follows:

contig

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 9/30

slide-27
SLIDE 27

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Main idea

Generate corrected long reads from assemblies of short reads Get rid of the time consuming step of aligning the short reads against each other Focus on a seed and extend approach Rely on a hybrid structure between a de Bruijn graph and an

  • verlap graph, built from the short reads
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 10/30

slide-28
SLIDE 28

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Main idea

Generate corrected long reads from assemblies of short reads Get rid of the time consuming step of aligning the short reads against each other Focus on a seed and extend approach Rely on a hybrid structure between a de Bruijn graph and an

  • verlap graph, built from the short reads
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 10/30

slide-29
SLIDE 29

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Main idea

Generate corrected long reads from assemblies of short reads Get rid of the time consuming step of aligning the short reads against each other Focus on a seed and extend approach Rely on a hybrid structure between a de Bruijn graph and an

  • verlap graph, built from the short reads
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 10/30

slide-30
SLIDE 30

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 11/30

slide-31
SLIDE 31

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Overlap graph

GCGTAAC ATTGCGT ATAACGG 4 1

de Bruijn graph

ATTGC TTGCG TGCGT GCGTA CGTAA GTAAC TAACG AACGG ATAAC

Idea Mix the advantages of a de Bruijn graph and of an overlap graph, and allow to compute overlaps of variable lengths between the k-mers from the reads of a given set.

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 12/30

slide-32
SLIDE 32

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Overlap graph

GCGTAAC ATTGCGT ATAACGG 4 1

de Bruijn graph

ATTGC TTGCG TGCGT GCGTA CGTAA GTAAC TAACG AACGG ATAAC

Idea Mix the advantages of a de Bruijn graph and of an overlap graph, and allow to compute overlaps of variable lengths between the k-mers from the reads of a given set.

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 12/30

slide-33
SLIDE 33

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Overlap graph

GCGTAAC ATTGCGT ATAACGG 4 1

de Bruijn graph

ATTGC TTGCG TGCGT GCGTA CGTAA GTAAC TAACG AACGG ATAAC

Idea Mix the advantages of a de Bruijn graph and of an overlap graph, and allow to compute overlaps of variable lengths between the k-mers from the reads of a given set.

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 12/30

slide-34
SLIDE 34

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Overlap graph

GCGTAAC ATTGCGT ATAACGG 4 1

de Bruijn graph

ATTGC TTGCG TGCGT GCGTA CGTAA GTAAC TAACG AACGG ATAAC

Idea Mix the advantages of a de Bruijn graph and of an overlap graph, and allow to compute overlaps of variable lengths between the k-mers from the reads of a given set.

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 12/30

slide-35
SLIDE 35

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Example On the set of reads S = {AAGCTTAG, CTTACGTA, GTATACTG}

AGCTTA AAGCTT CTTACG TTACGT TACGTA GCTTAG GTATAC TATACT ATACTG 5 5 5 5 5 5 4 4 4 4 3 3 3 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 13/30

slide-36
SLIDE 36

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Example On the set of reads S = {AAGCTTAG, CTTACGTA, GTATACTG}

AGCTTA AAGCTT CTTACG TTACGT TACGTA GCTTAG GTATAC TATACT ATACTG 5 5 5 5 5 5 4 4 4 4 3 3 3 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 13/30

slide-37
SLIDE 37

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph

Example On the set of reads S = {AAGCTTAG, CTTACGTA, GTATACTG}

AGCTTA AAGCTT CTTACG TTACGT TACGTA GCTTAG GTATAC TATACT ATACTG 5 5 5 5 5 5 4 4 4 4 3 3 3 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 13/30

slide-38
SLIDE 38

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph traversal

The graph is not explicitly built Its traversal is simulated with PgSA [Kowalski et al., 2015] PgSA can index a set of reads and answer queries about strings

  • f variable lengths

One of the queries returns the positions of all the occurrences of a given string in the different reads

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 14/30

slide-39
SLIDE 39

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph traversal

The graph is not explicitly built Its traversal is simulated with PgSA [Kowalski et al., 2015] PgSA can index a set of reads and answer queries about strings

  • f variable lengths

One of the queries returns the positions of all the occurrences of a given string in the different reads

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 14/30

slide-40
SLIDE 40

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph traversal

The graph is not explicitly built Its traversal is simulated with PgSA [Kowalski et al., 2015] PgSA can index a set of reads and answer queries about strings

  • f variable lengths

One of the queries returns the positions of all the occurrences of a given string in the different reads

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 14/30

slide-41
SLIDE 41

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Hybrid graph traversal

The graph is not explicitly built Its traversal is simulated with PgSA [Kowalski et al., 2015] PgSA can index a set of reads and answer queries about strings

  • f variable lengths

One of the queries returns the positions of all the occurrences of a given string in the different reads

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 14/30

slide-42
SLIDE 42

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 15/30

slide-43
SLIDE 43

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long read, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the hybrid graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 16/30

slide-44
SLIDE 44

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long read, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the hybrid graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 16/30

slide-45
SLIDE 45

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long read, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the hybrid graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 16/30

slide-46
SLIDE 46

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long read, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the hybrid graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 16/30

slide-47
SLIDE 47

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Workflow

5 steps:

1

Correct the short reads

2

Align the short reads on the long read, to find seeds

3

Merge the overlapping seeds

4

Link the seeds, by traversing the hybrid graph

5

Extend the obtained corrected long read, on the left (resp. right)

  • f the leftmost (resp. rightmost) seed
  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 16/30

slide-48
SLIDE 48

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

Seeds are used as anchor points on the hybrid graph The graph is traversed to link together the seeds and assemble the k-mers

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 17/30

slide-49
SLIDE 49

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

Seeds are used as anchor points on the hybrid graph The graph is traversed to link together the seeds and assemble the k-mers

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 17/30

slide-50
SLIDE 50

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

seed1 seed2 seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-51
SLIDE 51

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-52
SLIDE 52

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-53
SLIDE 53

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-54
SLIDE 54

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-55
SLIDE 55

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-56
SLIDE 56

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-57
SLIDE 57

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-58
SLIDE 58

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-59
SLIDE 59

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-60
SLIDE 60

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-61
SLIDE 61

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-62
SLIDE 62

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-63
SLIDE 63

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-64
SLIDE 64

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-65
SLIDE 65

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . . . . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-66
SLIDE 66

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . . . . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-67
SLIDE 67

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src

. . .

dst dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-68
SLIDE 68

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst seed3

. . .

src src

. . . . . .

dst dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2 k − 2 k − 1 k − 1

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-69
SLIDE 69

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

linked seeds seed3

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-70
SLIDE 70

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

src dst

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-71
SLIDE 71

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 4: Seeds linking

long read

corrected long read

. . .

src

. . .

dst

. . . . . . . . . . . .

k − 3 k − 2 k − 3 k − 1 k − 1 k − 2

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 18/30

slide-72
SLIDE 72

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 5: Tips extension

Seeds don’t always map right at the beginning or until the end of the long read Once all the seeds have been linked, HG-CoLoR keeps on traversing the graph The traversal stops when the borders of the long read or a branching path are reached

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 19/30

slide-73
SLIDE 73

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 5: Tips extension

Seeds don’t always map right at the beginning or until the end of the long read Once all the seeds have been linked, HG-CoLoR keeps on traversing the graph The traversal stops when the borders of the long read or a branching path are reached

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 19/30

slide-74
SLIDE 74

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Step 5: Tips extension

Seeds don’t always map right at the beginning or until the end of the long read Once all the seeds have been linked, HG-CoLoR keeps on traversing the graph The traversal stops when the borders of the long read or a branching path are reached

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 19/30

slide-75
SLIDE 75

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Remark

Some seeds might be impossible to link together

⇒ Production of a corrected long read fragmented in multiple

parts

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 20/30

slide-76
SLIDE 76

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Remark

Some seeds might be impossible to link together

⇒ Production of a corrected long read fragmented in multiple

parts

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 20/30

slide-77
SLIDE 77

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 21/30

slide-78
SLIDE 78

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Datasets

HG-CoLoR was compared to NaS, and two other state-of-the-art long read hybrid correction methods: CoLoRMap [Haghshenas et al., 2016] and Jabba [Miclotte et al., 2016] The different tools were compared on the following datasets

Dataset Reference genome Oxford Nanopore data Illumina data Name Strain Genome size # Reads Average length Coverage # Reads Read length Coverage

  • E. coli
  • E. coli

K-12 substr. MG1655 4.6 Mbp 22,270 5,999 28x 775,500 300 50x Yeast

  • S. cerevisae

W303 12.4 Mbp 205,923 5,698 31x 2,500,000 250 50x

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 22/30

slide-79
SLIDE 79

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Alignment-based comparison

Dataset Method # Reads Average length Average identity Genome coverage Runtime

  • E. coli

Original 22,270 5,999 79.46% 100% N/A CoLoRMap 22,270 6,219 89.02% 100% 8h26min Jabba 22,065 5,794 99.81% 99.41% 12min56 NaS 21,818 7,926 99.86% 100% 3 days HG-CoLoR 22,549 5,897 99.59% 100% 3h Yeast Original 205,923 5,698 55.49% 99.90% N/A CoLoRMap 205,923 5,737 39.93% 99.40% 37h36min Jabba 36,958 6,613 99.55% 93.21% 44min05 NaS 71,793 5,938 99.59% 98.70%

> 16 days

HG-CoLoR 71,518 6,604 99.17% 98.39% 22h

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 23/30

slide-80
SLIDE 80

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Alignment-based comparison

Dataset Method # Reads Average length Average identity Genome coverage Runtime

  • E. coli

Original 22,270 5,999 79.46% 100% N/A CoLoRMap 22,270 6,219 89.02% 100% 8h26min Jabba 22,065 5,794 99.81% 99.41% 12min56 NaS 21,818 7,926 99.86% 100% 3 days HG-CoLoR 22,549 5,897 99.59% 100% 3h Yeast Original 205,923 5,698 55.49% 99.90% N/A CoLoRMap 205,923 5,737 39.93% 99.40% 37h36min Jabba 36,958 6,613 99.55% 93.21% 44min05 NaS 71,793 5,938 99.59% 98.70%

> 16 days

HG-CoLoR 71,518 6,604 99.17% 98.39% 22h

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 23/30

slide-81
SLIDE 81

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Alignment-based comparison

Dataset Method # Reads Average length Average identity Genome coverage Runtime

  • E. coli

Original 22,270 5,999 79.46% 100% N/A CoLoRMap 22,270 6,219 89.02% 100% 8h26min Jabba 22,065 5,794 99.81% 99.41% 12min56 NaS 21,818 7,926 99.86% 100% 3 days HG-CoLoR 22,549 5,897 99.59% 100% 3h Yeast Original 205,923 5,698 55.49% 99.90% N/A CoLoRMap 205,923 5,737 39.93% 99.40% 37h36min Jabba 36,958 6,613 99.55% 93.21% 44min05 NaS 71,793 5,938 99.59% 98.70%

> 16 days

HG-CoLoR 71,518 6,604 99.17% 98.39% 22h

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 23/30

slide-82
SLIDE 82

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Alignment-based comparison

Dataset Method # Reads Average length Average identity Genome coverage Runtime

  • E. coli

Original 22,270 5,999 79.46% 100% N/A CoLoRMap 22,270 6,219 89.02% 100% 8h26min Jabba 22,065 5,794 99.81% 99.41% 12min56 NaS 21,818 7,926 99.86% 100% 3 days HG-CoLoR 22,549 5,897 99.59% 100% 3h Yeast Original 205,923 5,698 55.49% 99.90% N/A CoLoRMap 205,923 5,737 39.93% 99.40% 37h36min Jabba 36,958 6,613 99.55% 93.21% 44min05 NaS 71,793 5,938 99.59% 98.70%

> 16 days

HG-CoLoR 71,518 6,604 99.17% 98.39% 22h

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 23/30

slide-83
SLIDE 83

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Alignment-based comparison

Dataset Method # Reads Average length Average identity Genome coverage Runtime

  • E. coli

Original 22,270 5,999 79.46% 100% N/A CoLoRMap 22,270 6,219 89.02% 100% 8h26min Jabba 22,065 5,794 99.81% 99.41% 12min56 NaS 21,818 7,926 99.86% 100% 3 days HG-CoLoR 22,549 5,897 99.59% 100% 3h Yeast Original 205,923 5,698 55.49% 99.90% N/A CoLoRMap 205,923 5,737 39.93% 99.40% 37h36min Jabba 36,958 6,613 99.55% 93.21% 44min05 NaS 71,793 5,938 99.59% 98.70%

> 16 days

HG-CoLoR 71,518 6,604 99.17% 98.39% 22h

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 23/30

slide-84
SLIDE 84

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Assembly-based comparison

Dataset Method Coverage # Expected contigs # Obtained contigs Genome coverage Identity

  • E. coli

CoLoRMap 28x 1 29 97,74% 99.81% Jabba 28x 1 41 95.76% 99.92% NaS 37x 1 1 99.90% 99.99% HG-CoLoR 29x 1 2 99.95% 99.95% Yeast CoLoRMap 14x 30 Jabba 21x 30 134 70.52% 99.83% NaS 35x 30 123 97.44% 99.77% HG-CoLoR 39x 30 108 92.19% 99.61%

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 24/30

slide-85
SLIDE 85

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Assembly-based comparison

Dataset Method Coverage # Expected contigs # Obtained contigs Genome coverage Identity

  • E. coli

CoLoRMap 28x 1 29 97,74% 99.81% Jabba 28x 1 41 95.76% 99.92% NaS 37x 1 1 99.90% 99.99% HG-CoLoR 29x 1 2 99.95% 99.95% Yeast CoLoRMap 14x 30 Jabba 21x 30 134 70.52% 99.83% NaS 35x 30 123 97.44% 99.77% HG-CoLoR 39x 30 108 92.19% 99.61%

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 24/30

slide-86
SLIDE 86

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Assembly-based comparison

Dataset Method Coverage # Expected contigs # Obtained contigs Genome coverage Identity

  • E. coli

CoLoRMap 28x 1 29 97,74% 99.81% Jabba 28x 1 41 95.76% 99.92% NaS 37x 1 1 99.90% 99.99% HG-CoLoR 29x 1 2 99.95% 99.95% Yeast CoLoRMap 14x 30 Jabba 21x 30 134 70.52% 99.83% NaS 35x 30 123 97.44% 99.77% HG-CoLoR 39x 30 108 92.19% 99.61%

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 24/30

slide-87
SLIDE 87

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Assembly-based comparison

Dataset Method Coverage # Expected contigs # Obtained contigs Genome coverage Identity

  • E. coli

CoLoRMap 28x 1 29 97,74% 99.81% Jabba 28x 1 41 95.76% 99.92% NaS 37x 1 1 99.90% 99.99% HG-CoLoR 29x 1 2 99.95% 99.95% Yeast CoLoRMap 14x 30 Jabba 21x 30 134 70.52% 99.83% NaS 35x 30 123 97.44% 99.77% HG-CoLoR 39x 30 108 92.19% 99.61%

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 24/30

slide-88
SLIDE 88

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Assembly-based comparison

Dataset Method Coverage # Expected contigs # Obtained contigs Genome coverage Identity

  • E. coli

CoLoRMap 28x 1 29 97,74% 99.81% Jabba 28x 1 41 95.76% 99.92% NaS 37x 1 1 99.90% 99.99% HG-CoLoR 29x 1 2 99.95% 99.95% Yeast CoLoRMap 14x 30 Jabba 21x 30 134 70.52% 99.83% NaS 35x 30 123 97.44% 99.77% HG-CoLoR 39x 30 108 92.19% 99.61%

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 24/30

slide-89
SLIDE 89

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

1

Introduction

2

Main idea

3

Hybrid graph

4

Workflow

5

Experimental results

6

Conclusion

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 25/30

slide-90
SLIDE 90

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Conclusion

We introduced a new graph structure and proved its usefulness We developed a new hybrid long read error correction method We showed that this new method provides the best trade off between runtime, accuracy and genome coverage, when compared to state-of-the-art methods HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 26/30

slide-91
SLIDE 91

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Conclusion

We introduced a new graph structure and proved its usefulness We developed a new hybrid long read error correction method We showed that this new method provides the best trade off between runtime, accuracy and genome coverage, when compared to state-of-the-art methods HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 26/30

slide-92
SLIDE 92

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Conclusion

We introduced a new graph structure and proved its usefulness We developed a new hybrid long read error correction method We showed that this new method provides the best trade off between runtime, accuracy and genome coverage, when compared to state-of-the-art methods HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 26/30

slide-93
SLIDE 93

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Conclusion

We introduced a new graph structure and proved its usefulness We developed a new hybrid long read error correction method We showed that this new method provides the best trade off between runtime, accuracy and genome coverage, when compared to state-of-the-art methods HG-CoLoR is available from:

https://github.com/pierre-morisse/HG-CoLoR

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 26/30

slide-94
SLIDE 94

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Future work

Run HG-CoLoR on larger genomes Filter out weak k-mers after the short reads correction step Build a proper assembly tool from the hybrid graph structure Adapt HG-CoLoR to self-correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 27/30

slide-95
SLIDE 95

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Future work

Run HG-CoLoR on larger genomes Filter out weak k-mers after the short reads correction step Build a proper assembly tool from the hybrid graph structure Adapt HG-CoLoR to self-correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 27/30

slide-96
SLIDE 96

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Future work

Run HG-CoLoR on larger genomes Filter out weak k-mers after the short reads correction step Build a proper assembly tool from the hybrid graph structure Adapt HG-CoLoR to self-correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 27/30

slide-97
SLIDE 97

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Future work

Run HG-CoLoR on larger genomes Filter out weak k-mers after the short reads correction step Build a proper assembly tool from the hybrid graph structure Adapt HG-CoLoR to self-correction

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 27/30

slide-98
SLIDE 98

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

References I

Haghshenas, E., Hach, F., Sahinalp, S. C., and Chauve, C. (2016). CoLoRMap: Correcting Long Reads by Mapping short reads. Bioinformatics, 32(17):i545–i551. Kowalski, T., Grabowski, S., and Deorowicz, S. (2015). Indexing arbitrary-length k-mers in sequencing reads. PLoS ONE, 10(7):1–14. Madoui, M.-A., Engelen, S., Cruaud, C., Belser, C., Bertrand, L., Alberti, A., Lemainque, A., Wincker, P ., and Aury, J.-M. (2015). Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics, 16:327.

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 28/30

slide-99
SLIDE 99

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

References II

Miclotte, G., Heydari, M., Demeester, P ., Rombauts, S., Van de Peer, Y., Audenaert, P ., and Fostier, J. (2016). Jabba: hybrid error correction for long sequencing reads. Algorithms Mol Biol, 11:10.

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 29/30

slide-100
SLIDE 100

Introduction Main idea Hybrid graph Workflow Experimental results Conclusion

Questions?

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-101
SLIDE 101

Fragmented corrected long reads

long read

linked seeds seedn−1 seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-102
SLIDE 102

Fragmented corrected long reads

long read

src dst seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-103
SLIDE 103

Fragmented corrected long reads

long read

linked seeds seedn−1 seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-104
SLIDE 104

Fragmented corrected long reads

long read

src dst

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-105
SLIDE 105

Fragmented corrected long reads

long read

corrected long read

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-106
SLIDE 106

Fragmented corrected long reads

long read

linked seeds seedn−1 seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-107
SLIDE 107

Fragmented corrected long reads

long read

src dst seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-108
SLIDE 108

Fragmented corrected long reads

long read

linked seeds seedn−1 seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-109
SLIDE 109

Fragmented corrected long reads

long read

src dst

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-110
SLIDE 110

Fragmented corrected long reads

long read

corrected long read part 1 seedn−1 seedn

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-111
SLIDE 111

Fragmented corrected long reads

long read

corrected long read part 1 src dst

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-112
SLIDE 112

Fragmented corrected long reads

long read

corrected long read part 1 corrected long read part 2

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-113
SLIDE 113

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-114
SLIDE 114

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-115
SLIDE 115

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

AAGCTT AGCTTA GCTTAG CTTACG

5 4 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-116
SLIDE 116

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

AAGCTT AGCTTA GCTTAG CTTACG

5 4 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-117
SLIDE 117

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

{(1,1) (2,0)}

AAGCTT AGCTTA GCTTAG CTTACG

5 4 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-118
SLIDE 118

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,1) (2,0)}

AAGCTT AGCTTA GCTTAG CTTACG

5 4 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-119
SLIDE 119

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,1) (2,0)}

AAGCTT AGCTTA AGCTTA GCTTAG CTTACG

5 4 3 5

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-120
SLIDE 120

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

AAGCTT AGCTTA AGCTTA GCTTAG CTTACG

5 4 3 5

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-121
SLIDE 121

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

{(1,2) ; (2,1) ; (5,0)}

AAGCTT AGCTTA AGCTTA GCTTAG CTTACG

5 4 3 5

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-122
SLIDE 122

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (2,1) ; (5,0)}

AAGCTT AGCTTA AGCTTA GCTTAG CTTACG

5 4 3 5

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-123
SLIDE 123

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (2,1) ; (5,0)}

AAGCTT AGCTTA AGCTTA GCTTAG CTTACG

5 4 3 5

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-124
SLIDE 124

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,2) ; (2,1) ; (5,0)}

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG

5 4 3 5 4

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-125
SLIDE 125

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG

5 4 3 5 4

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-126
SLIDE 126

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index Occurrences positions?

{(1,3) ; (2,2) ; (4,0) ; (5,1)}

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG

5 4 3 5 4

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-127
SLIDE 127

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (2,2) ; (4,0) ; (5,1)}

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG

5 4 3 5 4

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-128
SLIDE 128

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (2,2) ; (4,0) ; (5,1)}

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG

5 4 3 5 4

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-129
SLIDE 129

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (2,2) ; (4,0) ; (5,1)}

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG CTTACG

5 4 3 5 4 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30

slide-130
SLIDE 130

Hybrid graph traversal

Example k-mers set 1: AAGCTT 2: AGCTTA 3: ATACTG 4: CTTACG 5: GCTTAG 6: GTATAC 7: TACGTA 8: TATACT 9: TTACGT PgSA Index

{(1,3) ; (2,2) ; (4,0) ; (5,1)}

AAGCTT AGCTTA AGCTTA GCTTAG GCTTAG CTTACG CTTACG

5 4 3 5 4 3

  • P. Morisse, T. Lecroq, A. Lefebvre

HG-CoLoR 30/30