In regards to the NCBI BLAST output I often find the distance tree of results option (above the hit table) very insightful to see were your query fits in the tree of your blast results
Can you think of a reason a Naive Bayesian Classifier would misclassify fungal ITS sequences? Like, if the amplicon length is highly variable (150-350 bp)? I had the dada2 naive bayesian classifier label some fungal ITS sequences as sponges, corals, and plants, but independent BLAST searches as well as the IDTAXA (DECIPHER) algorithm confirmed those labels were erroneous. Couldn’t identify exactly what caused the error but it disappeared when I stopped using the naive bayesian classifier
Thanks! They’re both used for classification but kraken is with meta genomics data and this is mainly used with 16S rRNA gene sequences. I’m not sure how kraken works.
Very happy to see you back... Just a note, I think for eukaryotes 18S V4 in many case can go to the species level identification... so adapting to different number of levels would be critical in the package but I am sure it is in the plan. It would be also interesting if you could discuss the different K-mer based methods like Kraken, DECIPHER, dada2. What are the differences ?
Interesting idea! I like the idea of a head to head. If I recall correctly I didn’t really like the decipher benchmarking since they didn’t do leave one out.
Great summary Pat. Also love the emphasis on the 'game plan' -- always important to know where you're going before you put fingers to keys 🙂. The figures from Wang et al. are classics (and are also hand drawn?). If so, makes you appreciate how lucky with are now with tools like ggplot!
Thank you for always great lectures! I'm learning so much😄
Wonderful!
ditto!
What a timely video, I was just looking at this paper trying to understand the underlying methods.
Fantastic!
Brilliant, thanks. Looking forward to your implementation.
Thanks Pieter!
I wasn’t familiar with your channel until recently, Pat. Stellar stuff all around! 🎉
Thanks Mike!
I loved it, thank you
Wonderful - thanks for watching!
Great description. Another great video!!! Do you think this can be done to the GTDB database too?
Most likely (what’s GTDB?)
@@Riffomonas GTDB stands for Genome Taxonomy Database. Also, GreenGenes 2 is another good database. Thanks
In regards to the NCBI BLAST output I often find the distance tree of results option (above the hit table) very insightful to see were your query fits in the tree of your blast results
Ah, good point! I've never really used the tree much
Can you think of a reason a Naive Bayesian Classifier would misclassify fungal ITS sequences? Like, if the amplicon length is highly variable (150-350 bp)? I had the dada2 naive bayesian classifier label some fungal ITS sequences as sponges, corals, and plants, but independent BLAST searches as well as the IDTAXA (DECIPHER) algorithm confirmed those labels were erroneous. Couldn’t identify exactly what caused the error but it disappeared when I stopped using the naive bayesian classifier
Hmmm. I wonder what you’re using as the database for ITS
Hello very informative video, I want to ask if is it concept same as the kraken2 tool used?
Thanks! They’re both used for classification but kraken is with meta genomics data and this is mainly used with 16S rRNA gene sequences. I’m not sure how kraken works.
Very happy to see you back... Just a note, I think for eukaryotes 18S V4 in many case can go to the species level identification... so adapting to different number of levels would be critical in the package but I am sure it is in the plan. It would be also interesting if you could discuss the different K-mer based methods like Kraken, DECIPHER, dada2. What are the differences ?
Interesting idea! I like the idea of a head to head. If I recall correctly I didn’t really like the decipher benchmarking since they didn’t do leave one out.
Great summary Pat. Also love the emphasis on the 'game plan' -- always important to know where you're going before you put fingers to keys 🙂. The figures from Wang et al. are classics (and are also hand drawn?). If so, makes you appreciate how lucky with are now with tools like ggplot!
HA - 2007 wasn't the stone ages! I know I had software to generate plots although I'm not sure what generated that 3d bar plot 😱