RNAseq tutorial - part 3 - generating count table

Поделиться
HTML-код
  • Опубликовано: 18 сен 2024

Комментарии • 51

  • @Kelly-gg8eq
    @Kelly-gg8eq 3 месяца назад

    I never comment on youtube videos but thank you so much for this. It was so simple and straight to the point. I am new to coding and needed to get all of the outputs from 20 featurecount reads into one output file, and this was the only thing I could find that not only made sense, but also worked. Thank you thank you thank you!!!

    • @sanbomics
      @sanbomics  3 месяца назад

      Wooo glad it helped you!

  • @vardansaroyan7634
    @vardansaroyan7634 2 года назад

    Dude, you are great! You are the first one who really helped out.

    • @sanbomics
      @sanbomics  2 года назад +1

      Thanks xD. Glad I could help! Please feel free to ask if you have any other questions or want to learn anything else!

    • @vardansaroyan7634
      @vardansaroyan7634 2 года назад

      @@sanbomics Thank you. I just started studying bioinformatics on my own. What do you recommend where to start?

    • @sanbomics
      @sanbomics  2 года назад +1

      Just learning through exposure trying to analyze your own data is a great way to start. Try shifting things you do in excel into Python/R. Get some exposure to Linux. On a linux machine make a conda environment and play around in Jupyter notebook and python. If you don't have access to Linux, make a free tier AWS EC2 and play around with it. If you have any specific questions, please let me know! I am always curios to know what people want to learn so I can make worthwhile videos.

    • @vardansaroyan7634
      @vardansaroyan7634 2 года назад

      @@sanbomics Thanks))). Keep making videos. Especially short videos with step-by-step explanations of what you do on the command line. Themes can be different samtools, bamtools, etc. anything else, would be interesting.

  • @adampassman
    @adampassman 2 года назад +2

    Loving your videos so far - thanks for this! I'm trying to get cluster heatmaps going and know you have videos on this, but as a newbie, thought I may as well start from the start. How did you go from individual outputs into a single excel spreadsheet with counts for all the samples - or was this summary count table with all samples autogenerated? Alternatively, did you have to run a concatenate script to to paste them all together?

    • @sanbomics
      @sanbomics  2 года назад +1

      Hi. Good question. I pointed to all the BAM files in one command and it put them together automatically. Using *.bam means all files that end with .bam

  • @justinasmus1190
    @justinasmus1190 3 года назад +1

    Hey man, very informative video. I am really struggling with the DESeq2 analysis from here on. Could you please upload a step-by-step tutorial such as this on how to perform this analysis in R. I am new to this field and have been having a difficult time with this stuff, but your vids are really easy to follow and explanatory from everything else I have come across thus far.

    • @sanbomics
      @sanbomics  3 года назад

      Hi Justin. I was planning on doing that soon. I just haven't gotten around to it yet.

    • @justinasmus1190
      @justinasmus1190 3 года назад +1

      @@sanbomics Hey there, okay awesome no problem. I am busy trying to do the differential expression on my data now so would be really helpful. It is the only thing I seem to get stuck at. Not familiar with R so DESeq2 in R is quite challenging.

    • @sanbomics
      @sanbomics  2 года назад +1

      I'm guessing you may have figured it out by now.. but I finally got around to it. I should be starting a single-cell series soon if you are interested.

  • @benjaminwehnert1893
    @benjaminwehnert1893 3 месяца назад

    thank you very much. Great work. Maybe a dumb question, but how would you process bam files that contain SCdata from several cells? Essentially what I need is a table similar to yours with genes in the rows and cells in the columns (instead of whole bam files).

    • @sanbomics
      @sanbomics  3 месяца назад

      You can convert it back to fastq then run it through various single cell counters. e.g., if it is 10x data you can use cellranger bamtofastq then cellranger count

  • @freezingtolerance7493
    @freezingtolerance7493 Год назад

    Thank you for your vidoe. I have an quick question. I would like to use GFF file for featureCounts; I have already applied gff3 files when indexing...using STAR.. How could I apply gff files under feactureCounts function; probably I need to setup some options....

  • @amilairiskic5967
    @amilairiskic5967 Год назад

    Thanks for great follow along videos. I just have one question, how do I get the csv file that you also load into R in you next video. I'm unsure how to go from .out to .csv.
    Thanks in advance!

    • @sanbomics
      @sanbomics  Год назад

      I think by default it is tab delimited. I typically open it up in excel, remove some of the columns i don't need then save it as a csv. Alternatively, you can open it in R and specify the delimiter is \t

  • @researchonfungi3232
    @researchonfungi3232 Год назад

    Hi Thanks a lot! I am planning to generate RNAediting data using REDI tool. I used STAR for allignment. Do I need to generate count table ? or I can directly use the BAM file generated after alignment?I saw you RNAseq tutorial part2 and generated BAM file.Please let me know.

    • @sanbomics
      @sanbomics  Год назад

      Hi, I have never used REDI tool so I do not know. sorry!

  • @prabhatkhanalphd6915
    @prabhatkhanalphd6915 2 года назад +1

    Hi, its really useful! How did you open/export data in excel?

    • @sanbomics
      @sanbomics  2 года назад

      Thanks! If you should just be able to open it directly. Try saving it with a .csv extension so excel will try to open it as default. You don't need to open it in excel necessarily. You can drop/rename columns in R too

    • @prabhatkhanalphd6915
      @prabhatkhanalphd6915 2 года назад

      @@sanbomics Thanks once again. I have pair-end reads and it would treat the reads as individual sample, that means I would have doubled columns, then? How would you work when we have two bam files per sample?

    • @sanbomics
      @sanbomics  2 года назад +1

      Hi. When you ran STAR you should have input both read files in one alignment command. There should be only one bam file per sample. Hope this helps!

  • @nahedahmed3095
    @nahedahmed3095 5 месяцев назад

    Have you merged the whole sample in one table? Because I know feature count output is one file for each sample

    • @sanbomics
      @sanbomics  3 месяца назад

      feature counts will merge the outputs into one file when you run them all in one command with the *

  • @sanjaisrao484
    @sanjaisrao484 Год назад

    Thanks

  • @ChristyClutter
    @ChristyClutter Год назад

    I am having trouble with the .out format. My computer won't let me open it in excel, numbers, or any other spreadsheet format. Is there a way to convert it to csv, in R or otherwise? Thanks! I really love your tutorials and have learned a lot!

    • @sanbomics
      @sanbomics  Год назад

      if you save it as .tsv instead your computer might be less confused. Beware that excel might change some of your gene symbols to a date.

    • @ChristyClutter
      @ChristyClutter Год назад +1

      @@sanbomics Thanks! I didn't realize you could specify the output file format in featureCounts. Specifying the output file as .txt instead of .out did the trick. Thank you!

  • @juanma415
    @juanma415 7 месяцев назад

    why don't use '-t gene'? or it should be '-t exon'?

  • @emmabowie2683
    @emmabowie2683 Год назад +1

    When I run featureCount I get this error:
    ERROR: invalid parameter: 'SRR.fastqAligned.sortedByCoord.out.bam'
    This is my code:
    featureCounts -a Mus_musculus.GRCm39.109.gtf -o count.out -T 12 SRR.fastqAligned.sortedByCoord.out.bam
    Any idea what the issue is?

    • @sanbomics
      @sanbomics  Год назад

      Very weird. It looks ok to me... Were you able to figure it out? sorry for late reply

    • @oliviacheng9934
      @oliviacheng9934 10 месяцев назад

      Not sure if this helps but I had the same problem when I used *.bam to include all 28 files named in this format SRR9943471_trimmed.fqAligned.sortedByCoord.out.bam. However, when I included more for the wildcard file name, i.e. from *.bam to *_trimmed.fqAligned.sortedByCoord.out.bam, somehow it started working.

  • @nahedahmed3095
    @nahedahmed3095 5 месяцев назад

    @ sanbomics
    Have you merged the output files ?

    • @sanbomics
      @sanbomics  3 месяца назад

      feature counts will merge the outputs into one file when you run them all in one command with the *

  • @celiagonzalezgil57
    @celiagonzalezgil57 3 месяца назад

    Firstly thank you so much for your videos are very useful. I have used featureCounts to generate the count table, but I obtain a percentage of Unassigned_NoFeatures to high (around 50%). I checked that the annotation file used to the alignment and to generate the count table is the same, also I checked the type of stranded of the assay and I continue having the same problem. I tried to change the GTF.featureType to exon by gene and the % of Unassigned_NoFeatures decrease until 15% more or less. These results suggest me that I have a high content of introns or intergenic regiones in my results but when I checked with the IGV I don't observe that. I don't know if you can help me with this or tell me is this results are normal for human data. Thank you so much!!

    • @sanbomics
      @sanbomics  3 месяца назад

      Hmm. Sounds weird. It's hard for me to diagnose from here. See what happens if you use a pseudoaligner instead like salmon

    • @sanbomics
      @sanbomics  3 месяца назад

      You are definitely using the right annotation? xD

    • @celiagonzalezgil57
      @celiagonzalezgil57 3 месяца назад

      @@sanbomics Thank you so much for your feedback. I am triying now with Salmon

    • @celiagonzalezgil57
      @celiagonzalezgil57 3 месяца назад

      @@sanbomics yes, I check that several times 😅

  • @sMr_Borgov
    @sMr_Borgov 11 месяцев назад

    Hi, I used htseq instead and I got few issues:1/ the output is one file per bam( I was expecting a single table where each bam file is a column aka sample. Then my second issue is genes ID, they are labeled BRV001 and so on, which is not really informative as I was hoping to have gene names instead ( for example recA or rRna etc), any tips ? Thanks a lot

    • @sanbomics
      @sanbomics  11 месяцев назад +1

      Hi, it is hard for me to troubleshoot on this end of things, especially since I have used HTseq like once in my life. Sorry!

    • @sMr_Borgov
      @sMr_Borgov 11 месяцев назад

      @@sanbomics i simply ended up combining the resulting featurecount outputs. As of the gene names it simply wasn’t included in the annotation so I had to re annotate myself

  • @khanmohdsarim
    @khanmohdsarim 2 года назад

    I have checked BAM file but still results is Succesfully assigned alignment :0 (0%),

    • @sanbomics
      @sanbomics  2 года назад

      Hmm. What happens when you do: samtools flagstat your_file.bam

  • @AdrianGarcia-ph3vr
    @AdrianGarcia-ph3vr Год назад

    Did you open your count.out file in excel?

    • @sanbomics
      @sanbomics  Год назад

      I did, but you don't have to. You can do it all in R or python. Excel is just a nice way to look at the whole dataframe. You can see more than the dataframe output in R studio or jupyter notebook. Also very easy to do specific edits like delete columns etc.

  • @marilyngomes6684
    @marilyngomes6684 9 месяцев назад

    i ran the feature counts code but it gave this error - ERROR: invalid parameter: '/root/data/bams/SRR7666346_1.fastqAligned.sortedByCoord.out.bam' what should i do?

    • @sanbomics
      @sanbomics  9 месяцев назад

      Sounds like a typo in the command. Just double check the command and then if that file exists