I never comment on youtube videos but thank you so much for this. It was so simple and straight to the point. I am new to coding and needed to get all of the outputs from 20 featurecount reads into one output file, and this was the only thing I could find that not only made sense, but also worked. Thank you thank you thank you!!!
Just learning through exposure trying to analyze your own data is a great way to start. Try shifting things you do in excel into Python/R. Get some exposure to Linux. On a linux machine make a conda environment and play around in Jupyter notebook and python. If you don't have access to Linux, make a free tier AWS EC2 and play around with it. If you have any specific questions, please let me know! I am always curios to know what people want to learn so I can make worthwhile videos.
@@sanbomics Thanks))). Keep making videos. Especially short videos with step-by-step explanations of what you do on the command line. Themes can be different samtools, bamtools, etc. anything else, would be interesting.
Loving your videos so far - thanks for this! I'm trying to get cluster heatmaps going and know you have videos on this, but as a newbie, thought I may as well start from the start. How did you go from individual outputs into a single excel spreadsheet with counts for all the samples - or was this summary count table with all samples autogenerated? Alternatively, did you have to run a concatenate script to to paste them all together?
Hi. Good question. I pointed to all the BAM files in one command and it put them together automatically. Using *.bam means all files that end with .bam
Hey man, very informative video. I am really struggling with the DESeq2 analysis from here on. Could you please upload a step-by-step tutorial such as this on how to perform this analysis in R. I am new to this field and have been having a difficult time with this stuff, but your vids are really easy to follow and explanatory from everything else I have come across thus far.
@@sanbomics Hey there, okay awesome no problem. I am busy trying to do the differential expression on my data now so would be really helpful. It is the only thing I seem to get stuck at. Not familiar with R so DESeq2 in R is quite challenging.
I'm guessing you may have figured it out by now.. but I finally got around to it. I should be starting a single-cell series soon if you are interested.
thank you very much. Great work. Maybe a dumb question, but how would you process bam files that contain SCdata from several cells? Essentially what I need is a table similar to yours with genes in the rows and cells in the columns (instead of whole bam files).
You can convert it back to fastq then run it through various single cell counters. e.g., if it is 10x data you can use cellranger bamtofastq then cellranger count
Thank you for your vidoe. I have an quick question. I would like to use GFF file for featureCounts; I have already applied gff3 files when indexing...using STAR.. How could I apply gff files under feactureCounts function; probably I need to setup some options....
Thanks for great follow along videos. I just have one question, how do I get the csv file that you also load into R in you next video. I'm unsure how to go from .out to .csv. Thanks in advance!
I think by default it is tab delimited. I typically open it up in excel, remove some of the columns i don't need then save it as a csv. Alternatively, you can open it in R and specify the delimiter is \t
Hi Thanks a lot! I am planning to generate RNAediting data using REDI tool. I used STAR for allignment. Do I need to generate count table ? or I can directly use the BAM file generated after alignment?I saw you RNAseq tutorial part2 and generated BAM file.Please let me know.
Thanks! If you should just be able to open it directly. Try saving it with a .csv extension so excel will try to open it as default. You don't need to open it in excel necessarily. You can drop/rename columns in R too
@@sanbomics Thanks once again. I have pair-end reads and it would treat the reads as individual sample, that means I would have doubled columns, then? How would you work when we have two bam files per sample?
I am having trouble with the .out format. My computer won't let me open it in excel, numbers, or any other spreadsheet format. Is there a way to convert it to csv, in R or otherwise? Thanks! I really love your tutorials and have learned a lot!
@@sanbomics Thanks! I didn't realize you could specify the output file format in featureCounts. Specifying the output file as .txt instead of .out did the trick. Thank you!
When I run featureCount I get this error: ERROR: invalid parameter: 'SRR.fastqAligned.sortedByCoord.out.bam' This is my code: featureCounts -a Mus_musculus.GRCm39.109.gtf -o count.out -T 12 SRR.fastqAligned.sortedByCoord.out.bam Any idea what the issue is?
Not sure if this helps but I had the same problem when I used *.bam to include all 28 files named in this format SRR9943471_trimmed.fqAligned.sortedByCoord.out.bam. However, when I included more for the wildcard file name, i.e. from *.bam to *_trimmed.fqAligned.sortedByCoord.out.bam, somehow it started working.
Firstly thank you so much for your videos are very useful. I have used featureCounts to generate the count table, but I obtain a percentage of Unassigned_NoFeatures to high (around 50%). I checked that the annotation file used to the alignment and to generate the count table is the same, also I checked the type of stranded of the assay and I continue having the same problem. I tried to change the GTF.featureType to exon by gene and the % of Unassigned_NoFeatures decrease until 15% more or less. These results suggest me that I have a high content of introns or intergenic regiones in my results but when I checked with the IGV I don't observe that. I don't know if you can help me with this or tell me is this results are normal for human data. Thank you so much!!
Hi, I used htseq instead and I got few issues:1/ the output is one file per bam( I was expecting a single table where each bam file is a column aka sample. Then my second issue is genes ID, they are labeled BRV001 and so on, which is not really informative as I was hoping to have gene names instead ( for example recA or rRna etc), any tips ? Thanks a lot
@@sanbomics i simply ended up combining the resulting featurecount outputs. As of the gene names it simply wasn’t included in the annotation so I had to re annotate myself
I did, but you don't have to. You can do it all in R or python. Excel is just a nice way to look at the whole dataframe. You can see more than the dataframe output in R studio or jupyter notebook. Also very easy to do specific edits like delete columns etc.
i ran the feature counts code but it gave this error - ERROR: invalid parameter: '/root/data/bams/SRR7666346_1.fastqAligned.sortedByCoord.out.bam' what should i do?
I never comment on youtube videos but thank you so much for this. It was so simple and straight to the point. I am new to coding and needed to get all of the outputs from 20 featurecount reads into one output file, and this was the only thing I could find that not only made sense, but also worked. Thank you thank you thank you!!!
Wooo glad it helped you!
Dude, you are great! You are the first one who really helped out.
Thanks xD. Glad I could help! Please feel free to ask if you have any other questions or want to learn anything else!
@@sanbomics Thank you. I just started studying bioinformatics on my own. What do you recommend where to start?
Just learning through exposure trying to analyze your own data is a great way to start. Try shifting things you do in excel into Python/R. Get some exposure to Linux. On a linux machine make a conda environment and play around in Jupyter notebook and python. If you don't have access to Linux, make a free tier AWS EC2 and play around with it. If you have any specific questions, please let me know! I am always curios to know what people want to learn so I can make worthwhile videos.
@@sanbomics Thanks))). Keep making videos. Especially short videos with step-by-step explanations of what you do on the command line. Themes can be different samtools, bamtools, etc. anything else, would be interesting.
Loving your videos so far - thanks for this! I'm trying to get cluster heatmaps going and know you have videos on this, but as a newbie, thought I may as well start from the start. How did you go from individual outputs into a single excel spreadsheet with counts for all the samples - or was this summary count table with all samples autogenerated? Alternatively, did you have to run a concatenate script to to paste them all together?
Hi. Good question. I pointed to all the BAM files in one command and it put them together automatically. Using *.bam means all files that end with .bam
Hey man, very informative video. I am really struggling with the DESeq2 analysis from here on. Could you please upload a step-by-step tutorial such as this on how to perform this analysis in R. I am new to this field and have been having a difficult time with this stuff, but your vids are really easy to follow and explanatory from everything else I have come across thus far.
Hi Justin. I was planning on doing that soon. I just haven't gotten around to it yet.
@@sanbomics Hey there, okay awesome no problem. I am busy trying to do the differential expression on my data now so would be really helpful. It is the only thing I seem to get stuck at. Not familiar with R so DESeq2 in R is quite challenging.
I'm guessing you may have figured it out by now.. but I finally got around to it. I should be starting a single-cell series soon if you are interested.
thank you very much. Great work. Maybe a dumb question, but how would you process bam files that contain SCdata from several cells? Essentially what I need is a table similar to yours with genes in the rows and cells in the columns (instead of whole bam files).
You can convert it back to fastq then run it through various single cell counters. e.g., if it is 10x data you can use cellranger bamtofastq then cellranger count
Thank you for your vidoe. I have an quick question. I would like to use GFF file for featureCounts; I have already applied gff3 files when indexing...using STAR.. How could I apply gff files under feactureCounts function; probably I need to setup some options....
Thanks for great follow along videos. I just have one question, how do I get the csv file that you also load into R in you next video. I'm unsure how to go from .out to .csv.
Thanks in advance!
I think by default it is tab delimited. I typically open it up in excel, remove some of the columns i don't need then save it as a csv. Alternatively, you can open it in R and specify the delimiter is \t
Hi Thanks a lot! I am planning to generate RNAediting data using REDI tool. I used STAR for allignment. Do I need to generate count table ? or I can directly use the BAM file generated after alignment?I saw you RNAseq tutorial part2 and generated BAM file.Please let me know.
Hi, I have never used REDI tool so I do not know. sorry!
Hi, its really useful! How did you open/export data in excel?
Thanks! If you should just be able to open it directly. Try saving it with a .csv extension so excel will try to open it as default. You don't need to open it in excel necessarily. You can drop/rename columns in R too
@@sanbomics Thanks once again. I have pair-end reads and it would treat the reads as individual sample, that means I would have doubled columns, then? How would you work when we have two bam files per sample?
Hi. When you ran STAR you should have input both read files in one alignment command. There should be only one bam file per sample. Hope this helps!
Have you merged the whole sample in one table? Because I know feature count output is one file for each sample
feature counts will merge the outputs into one file when you run them all in one command with the *
Thanks
No problem!
I am having trouble with the .out format. My computer won't let me open it in excel, numbers, or any other spreadsheet format. Is there a way to convert it to csv, in R or otherwise? Thanks! I really love your tutorials and have learned a lot!
if you save it as .tsv instead your computer might be less confused. Beware that excel might change some of your gene symbols to a date.
@@sanbomics Thanks! I didn't realize you could specify the output file format in featureCounts. Specifying the output file as .txt instead of .out did the trick. Thank you!
why don't use '-t gene'? or it should be '-t exon'?
When I run featureCount I get this error:
ERROR: invalid parameter: 'SRR.fastqAligned.sortedByCoord.out.bam'
This is my code:
featureCounts -a Mus_musculus.GRCm39.109.gtf -o count.out -T 12 SRR.fastqAligned.sortedByCoord.out.bam
Any idea what the issue is?
Very weird. It looks ok to me... Were you able to figure it out? sorry for late reply
Not sure if this helps but I had the same problem when I used *.bam to include all 28 files named in this format SRR9943471_trimmed.fqAligned.sortedByCoord.out.bam. However, when I included more for the wildcard file name, i.e. from *.bam to *_trimmed.fqAligned.sortedByCoord.out.bam, somehow it started working.
@ sanbomics
Have you merged the output files ?
feature counts will merge the outputs into one file when you run them all in one command with the *
Firstly thank you so much for your videos are very useful. I have used featureCounts to generate the count table, but I obtain a percentage of Unassigned_NoFeatures to high (around 50%). I checked that the annotation file used to the alignment and to generate the count table is the same, also I checked the type of stranded of the assay and I continue having the same problem. I tried to change the GTF.featureType to exon by gene and the % of Unassigned_NoFeatures decrease until 15% more or less. These results suggest me that I have a high content of introns or intergenic regiones in my results but when I checked with the IGV I don't observe that. I don't know if you can help me with this or tell me is this results are normal for human data. Thank you so much!!
Hmm. Sounds weird. It's hard for me to diagnose from here. See what happens if you use a pseudoaligner instead like salmon
You are definitely using the right annotation? xD
@@sanbomics Thank you so much for your feedback. I am triying now with Salmon
@@sanbomics yes, I check that several times 😅
Hi, I used htseq instead and I got few issues:1/ the output is one file per bam( I was expecting a single table where each bam file is a column aka sample. Then my second issue is genes ID, they are labeled BRV001 and so on, which is not really informative as I was hoping to have gene names instead ( for example recA or rRna etc), any tips ? Thanks a lot
Hi, it is hard for me to troubleshoot on this end of things, especially since I have used HTseq like once in my life. Sorry!
@@sanbomics i simply ended up combining the resulting featurecount outputs. As of the gene names it simply wasn’t included in the annotation so I had to re annotate myself
I have checked BAM file but still results is Succesfully assigned alignment :0 (0%),
Hmm. What happens when you do: samtools flagstat your_file.bam
Did you open your count.out file in excel?
I did, but you don't have to. You can do it all in R or python. Excel is just a nice way to look at the whole dataframe. You can see more than the dataframe output in R studio or jupyter notebook. Also very easy to do specific edits like delete columns etc.
i ran the feature counts code but it gave this error - ERROR: invalid parameter: '/root/data/bams/SRR7666346_1.fastqAligned.sortedByCoord.out.bam' what should i do?
Sounds like a typo in the command. Just double check the command and then if that file exists