Thank you for such a nice video. I have been improving my programming following the videos. I also tried using a different method to read the fastq file to extract the sequences and quality score. def readfastq (filename): with open (filename,'r') as f: file=f.readlines() seq=[file[i].strip(' ') for i in range(1,len(file),4)] qual=[file[i].strip(' ') for i in range (3, len(file),4)] return seq, qual
10:10 i still don't understand why 2 is so dominant, like not even 1s or 3s?? just a huge amount of 2s.... like his explanation is not doing it for me.. does anyone know... (like maybe huge probabilty difference gap between 1-2 and 2-3, that they all get classified as "2", but if anyone knows any better, would be appreciated)
Hey brandon , a quality score of 3 means there are 50% chance that the bases are incorrect . The difference between the Q = 3 and Q = 2 is that from 3 to 2 you have greater chances to have an incorrect set overall ( > 50%), and because these values are discrete , you can't end up with values in-between I suppose this, hope it's clear lol And also, a quality score of 1 would mean 100% of incorrect bases Lmk if you notice Im wrong , this interests me
because they are plotting the quality frequencies per base and not per read. Chances of misincorporating non-terminating nucleotides increases over time, so bases with poor quality scores are present more often (maybe at the end of each good read)
Thank you for such a nice video. I have been improving my programming following the videos. I also tried using a different method to read the fastq file to extract the sequences and quality score.
def readfastq (filename):
with open (filename,'r') as f:
file=f.readlines()
seq=[file[i].strip('
') for i in range(1,len(file),4)]
qual=[file[i].strip('
') for i in range (3, len(file),4)]
return seq, qual
I'm liking and following this course, but could the genome links be made available? thx
Richard Walker They are written in the jupyter notebooks. I guess they are public. github.com/BenLangmead/ads1-notebooks
how powerful should be my machine to perform this tasks?
10:10 i still don't understand why 2 is so dominant, like not even 1s or 3s?? just a huge amount of 2s.... like his explanation is not doing it for me.. does anyone know... (like maybe huge probabilty difference gap between 1-2 and 2-3, that they all get classified as "2", but if anyone knows any better, would be appreciated)
Hey brandon , a quality score of 3 means there are 50% chance that the bases are incorrect .
The difference between the Q = 3 and Q = 2 is that from 3 to 2 you have greater chances to have an incorrect set overall ( > 50%), and because these values are discrete , you can't end up with values in-between
I suppose this, hope it's clear lol
And also, a quality score of 1 would mean 100% of incorrect bases
Lmk if you notice Im wrong , this interests me
because they are plotting the quality frequencies per base and not per read. Chances of misincorporating non-terminating nucleotides increases over time, so bases with poor quality scores are present more often (maybe at the end of each good read)
can someone provide me the url?
exactly where is the url?
Thank you.
which software u are using there is no wget such stupid thing in python I haven't seen such programming env.
Just load the dataset from the url , or use conda to load wget