SAM File Structure: Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments. Alignment Section: Contains alignment information with each line representing a read. Columns in SAM: QNAME: Query template name. FLAG: Bitwise flag. RNAME: Reference sequence name. POS: 1-based leftmost mapping position. MAPQ: Mapping quality. CIGAR: CIGAR string. RNEXT: Reference name of the mate/next read. PNEXT: Position of the mate/next read. TLEN: Observed template length. SEQ: Segment sequence. QUAL: ASCII of Phred-scaled base quality+33.
.FASTQ (Raw Sequence Data) FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing. File Structure: Header Line: Starts with '@' followed by a sequence identifier. Sequence Line: Contains the nucleotide sequence. Plus Line: Starts with a '+' and may be followed by the same sequence identifier. Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters
Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?
Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.
Nice video! do you know any software tool I can use to compare the results of full genome sequencing from two different companies? I have bought tests from Dante and Nebula, and once I get the results I would like to be able to compare them and do some statistical analysis of the differences.
Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.
Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?
I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.
Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide. So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.
DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.
Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?
No one can ever explain this better, love from Australia!
The CRAM file format is simply a newer and more compressed version of the BAM file format, for anyone who was wondering that :)
Could you also do a video for the SLAM, JAM, and THANK YOU MA'AM file formats?
@@programmer5350 Are you already familiar with the WHAM-BAM file formats ?
Katharine, thank you so much for this video
Thank you so much Katharine! you saved a biotech eng. student from Mexico! 🇲🇽
Awesome high quality bioinformatics video! We need more of these :)
SAM File Structure:
Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments.
Alignment Section: Contains alignment information with each line representing a read.
Columns in SAM:
QNAME: Query template name.
FLAG: Bitwise flag.
RNAME: Reference sequence name.
POS: 1-based leftmost mapping position.
MAPQ: Mapping quality.
CIGAR: CIGAR string.
RNEXT: Reference name of the mate/next read.
PNEXT: Position of the mate/next read.
TLEN: Observed template length.
SEQ: Segment sequence.
QUAL: ASCII of Phred-scaled base quality+33.
thank you very much. this is so helpful and very clear to understand easily
Thank you for the clear explanations of basics.
.FASTQ (Raw Sequence Data)
FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing.
File Structure:
Header Line: Starts with '@' followed by a sequence identifier.
Sequence Line: Contains the nucleotide sequence.
Plus Line: Starts with a '+' and may be followed by the same sequence identifier.
Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters
good explanation thanks
Simple and amazing explanation.
This video deserves more views.
This was excellently done and easy to follow! Thank you!
Amazing explanation, really cleared up many things just by watching, thanks a ton and keep up the good work:)
Great video, helped me disambiguate many concepts!
Glad it helped!
excellent description!
super helpful thank you so much.... please do a video on how to use different softwares
Hi! I have been working on some content for certain softwares, what software did you have in mind?
Thanks, you make it easy to understand. Keep going.
This was very helpful and very well explained. You are talented 🙂
Very clear video. Thank you.
Katherine, could you please explain how to convert .fastq files to .vsf. Thank you
Thank you for the explanation! It's really confusing at first glance!
Well done, bt still i have doubt!!! So if uploated vcf file in yfull and after that i upload da bam wht is da advantages??
Thank you so much. You explained all this so easily 🤗🤩
Good... 👍 Nicely explain ed
Superb👏
thanks brilliant- very helpful!
Very good explanations!! Looking forward to watching more of your videos!
Really clear, thanks!
You are amazing..
Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?
Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.
great
excellent video!
Great go on
GREAT VIDEO!!!!!!!!!
Nice video!
do you know any software tool
I can use to compare the results of full genome
sequencing from two different companies?
I have bought tests from Dante and Nebula, and once
I get the results I would like to be able to compare them
and do some statistical analysis of the differences.
Awesome video
But Im not too sure about your explanation of genome coverage
Your explanation for it sounded more like read depth
Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.
Amazing! Succinct! Thank you!!!!
Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?
bam is aligned to the reference genome, fastq is not.
I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.
30 times coverage or 100 times coverage is better? Which is better on accuracy? Is 100x an over do or it is necessary to reduce the error margin?
Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide.
So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.
It’s so helpful thanks, but the music is not necessary
So the company I tested with gave me these files but none of them is transferable to the famous ancestry data bases. Is there a way to convert them?
DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.
Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?
FASTQ data need trimming.
HELP
Any question is particular I can help with?