FASTQ, BAM, and VCF file formats easily explained - A must watch if you have had a DNA test

Поделиться
HTML-код
  • Опубликовано: 22 ноя 2024

Комментарии • 55

  • @zainabumarabdullahi9446
    @zainabumarabdullahi9446 8 месяцев назад

    No one can ever explain this better, love from Australia!

  • @KatharineME
    @KatharineME  2 года назад +7

    The CRAM file format is simply a newer and more compressed version of the BAM file format, for anyone who was wondering that :)

    • @programmer5350
      @programmer5350 2 года назад +1

      Could you also do a video for the SLAM, JAM, and THANK YOU MA'AM file formats?

    • @cristianm7097
      @cristianm7097 2 года назад +1

      @@programmer5350 Are you already familiar with the WHAM-BAM file formats ?

  • @aleksandraperz5037
    @aleksandraperz5037 9 месяцев назад +1

    Katharine, thank you so much for this video

  • @miguelarellano5260
    @miguelarellano5260 Год назад

    Thank you so much Katharine! you saved a biotech eng. student from Mexico! 🇲🇽

  • @programmer5350
    @programmer5350 2 года назад +7

    Awesome high quality bioinformatics video! We need more of these :)

  • @sapandeepsandhu4410
    @sapandeepsandhu4410 5 месяцев назад

    SAM File Structure:
    Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments.
    Alignment Section: Contains alignment information with each line representing a read.
    Columns in SAM:
    QNAME: Query template name.
    FLAG: Bitwise flag.
    RNAME: Reference sequence name.
    POS: 1-based leftmost mapping position.
    MAPQ: Mapping quality.
    CIGAR: CIGAR string.
    RNEXT: Reference name of the mate/next read.
    PNEXT: Position of the mate/next read.
    TLEN: Observed template length.
    SEQ: Segment sequence.
    QUAL: ASCII of Phred-scaled base quality+33.

  • @secondeye3927
    @secondeye3927 2 месяца назад

    thank you very much. this is so helpful and very clear to understand easily

  • @deepap1307
    @deepap1307 6 месяцев назад

    Thank you for the clear explanations of basics.

  • @sapandeepsandhu4410
    @sapandeepsandhu4410 5 месяцев назад

    .FASTQ (Raw Sequence Data)
    FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing.
    File Structure:
    Header Line: Starts with '@' followed by a sequence identifier.
    Sequence Line: Contains the nucleotide sequence.
    Plus Line: Starts with a '+' and may be followed by the same sequence identifier.
    Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters

  • @stephenjohnson9733
    @stephenjohnson9733 2 месяца назад

    good explanation thanks

  • @md.mohiuddinmasum3632
    @md.mohiuddinmasum3632 2 года назад

    Simple and amazing explanation.
    This video deserves more views.

  • @NM-tx7zm
    @NM-tx7zm 6 месяцев назад

    This was excellently done and easy to follow! Thank you!

  • @doctorkash0792
    @doctorkash0792 6 месяцев назад

    Amazing explanation, really cleared up many things just by watching, thanks a ton and keep up the good work:)

  • @TimoHromadka
    @TimoHromadka Год назад

    Great video, helped me disambiguate many concepts!

  • @europhile2658
    @europhile2658 3 месяца назад

    excellent description!

  • @sanakhawer693
    @sanakhawer693 2 года назад

    super helpful thank you so much.... please do a video on how to use different softwares

    • @KatharineME
      @KatharineME  Год назад

      Hi! I have been working on some content for certain softwares, what software did you have in mind?

  • @mst63th
    @mst63th 2 года назад

    Thanks, you make it easy to understand. Keep going.

  • @kaoulkae
    @kaoulkae 2 года назад +1

    This was very helpful and very well explained. You are talented 🙂

  • @oksana03fel
    @oksana03fel 2 года назад

    Very clear video. Thank you.
    Katherine, could you please explain how to convert .fastq files to .vsf. Thank you

  • @kikiarev
    @kikiarev 2 года назад

    Thank you for the explanation! It's really confusing at first glance!

  • @wakeup9199
    @wakeup9199 5 месяцев назад

    Well done, bt still i have doubt!!! So if uploated vcf file in yfull and after that i upload da bam wht is da advantages??

  • @iamadityavaishy
    @iamadityavaishy 2 года назад

    Thank you so much. You explained all this so easily 🤗🤩

  • @MrFilu13
    @MrFilu13 2 месяца назад

    Good... 👍 Nicely explain ed

  • @humarafique3093
    @humarafique3093 9 месяцев назад

    Superb👏

  • @KristinaBecanovic
    @KristinaBecanovic 9 месяцев назад

    thanks brilliant- very helpful!

  • @shawnmcmurtrey8090
    @shawnmcmurtrey8090 2 года назад

    Very good explanations!! Looking forward to watching more of your videos!

  • @franciscoromogaray3076
    @franciscoromogaray3076 10 месяцев назад

    Really clear, thanks!

  • @carlloeber
    @carlloeber Год назад

    You are amazing..

  • @sinaisbitt
    @sinaisbitt Год назад +1

    Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?

    • @KatharineME
      @KatharineME  Год назад

      Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.

  • @arioche
    @arioche 2 месяца назад

    great

  • @gerardmingarro6788
    @gerardmingarro6788 Год назад

    excellent video!

  • @mohamedesmailelsalahaty6050
    @mohamedesmailelsalahaty6050 2 года назад

    Great go on

  • @LappingMaster
    @LappingMaster 10 месяцев назад

    GREAT VIDEO!!!!!!!!!

  • @eduardofernandezdelpeloso8663
    @eduardofernandezdelpeloso8663 Год назад

    Nice video!
    do you know any software tool
    I can use to compare the results of full genome
    sequencing from two different companies?
    I have bought tests from Dante and Nebula, and once
    I get the results I would like to be able to compare them
    and do some statistical analysis of the differences.

  • @lolisimon2933
    @lolisimon2933 Год назад

    Awesome video
    But Im not too sure about your explanation of genome coverage
    Your explanation for it sounded more like read depth

    • @KatharineME
      @KatharineME  Год назад

      Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.

  • @spicesmiles
    @spicesmiles Год назад

    Amazing! Succinct! Thank you!!!!

  • @جزائريهوآفتخر-ص9ث

    Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?

    • @hatchet646
      @hatchet646 Год назад +1

      bam is aligned to the reference genome, fastq is not.

    • @KatharineME
      @KatharineME  Год назад

      I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.

  • @e3.s.nro2tan75
    @e3.s.nro2tan75 Год назад

    30 times coverage or 100 times coverage is better? Which is better on accuracy? Is 100x an over do or it is necessary to reduce the error margin?

    • @KatharineME
      @KatharineME  Год назад

      Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide.
      So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.

  • @mubinpshtiwan2006
    @mubinpshtiwan2006 20 дней назад

    It’s so helpful thanks, but the music is not necessary

  • @saud319
    @saud319 2 года назад

    So the company I tested with gave me these files but none of them is transferable to the famous ancestry data bases. Is there a way to convert them?

    • @chibi171
      @chibi171 2 года назад

      DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.

  • @dpchand
    @dpchand 2 года назад

    Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?

  • @RayY-r4j
    @RayY-r4j 7 месяцев назад +1

    FASTQ data need trimming.

  • @markcuello5
    @markcuello5 Год назад

    HELP

    • @KatharineME
      @KatharineME  Год назад

      Any question is particular I can help with?