Bioinformatics in Python: DNA Toolkit. Part 7: A search for a real protein from NCBI database

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 23

  • @AnonyDz
    @AnonyDz 4 года назад +11

    Wow I went from 0 to cross matching humain insulin sequence in like 2 days, and this is due to your wonderful videos... hope you will keep up the good work!! thank you sir.

  • @ArmyofDarkniss
    @ArmyofDarkniss 4 года назад +3

    Hey, great videos! I watches all your DNA Toolkit Videos and they are really helpful. Keep up the good work

    • @rebelScience
      @rebelScience  4 года назад +1

      We will come back to it this Saturday.

  • @peciuma5485
    @peciuma5485 4 года назад +1

    Excellent video. Thanks for sharing!

  • @stengah966
    @stengah966 4 года назад +1

    really great video...thanks for your work

  • @maisongrefe3714
    @maisongrefe3714 4 года назад +1

    These videos are awesome. Keep making them pls!

  • @fabiopereira1155
    @fabiopereira1155 4 года назад

    Excellent videos - I've also joined the chat - keep it up!
    just a quick suggestion, I've struggled thus far in the series to understand certain aspects because of the ambiguity between protein/amino acids. Meaning, the function outlined in the video doesn't tell you which protein it generates per say, only what the codon translated makes, an amino-acid.
    I understand that the line between what an amino-acid is and proteins is quite thin, so I'm just sharing my thoughts about how it'd hindered understanding what was going on.
    once again - thank you for these videos.

    • @DimensionPicturesAOT
      @DimensionPicturesAOT 4 года назад

      the amino acid sequence is what folds to a protein as it achieves its lowest functional energy state. Taking this amino acid sequence, formatted as a FASTA file, you can query the protein databases for matching genes in organisms

  • @qaziacademy3048
    @qaziacademy3048 Год назад

    Very nice video. If i want to extract the anticancer peptides features using NLP to predict the anticancer. Please will you guide me what to do first and so on?

  • @gregkan3964
    @gregkan3964 3 года назад +1

    hello ! i have a problem in my last function, actually despite copy pasting the co-routine " proteins_from_rf(rf) " ALWays returns an empty list , thus res returns as empty too

  • @MarcoDEGENNARO-xn6uu
    @MarcoDEGENNARO-xn6uu 8 месяцев назад

    Hello, thank you for sharing these intriguing videos. I have a question regarding copying the whole fasta code into a unique single line. Could you please explain how that process works?

    • @MarcoDEGENNARO-xn6uu
      @MarcoDEGENNARO-xn6uu 8 месяцев назад

      Or How can I use an external file to upload my sequences?

  • @Danix132
    @Danix132 4 года назад

    Awesome Channel!

  • @Etreum0
    @Etreum0 2 года назад

    Awesome!!! I really understand a lot better all these concepts I really appreciate this video series, thanks thanks thanks a lot. Now I am wondering. How can I do this with a multi-fasta file?

  • @EuphoriaSkater3
    @EuphoriaSkater3 4 года назад

    For some reason my code is printing out tons of the same proteins. Can I prevent this?

  • @Sun48100
    @Sun48100 4 года назад

    It's not working.
    There is no generates of 6 all protein in open reading frames.
    However, the vscode didn't detect any problems of code in the video but detect in code of you wrote on gitlab. It said "seq is undefined" (but it already defined)

    • @rebelScience
      @rebelScience  4 года назад

      Hey! Your description is confusing. I don't really understand what is the problem. I would suggest joining our chat and sharing your code and screenshots of your errors/problems.

    • @Sun48100
      @Sun48100 4 года назад

      @@rebelScience ok where can I sent the screenshot?

    • @rebelScience
      @rebelScience  4 года назад

      Please check any video description. There are links to Telegram and Matrix chats we have. You can join one or the other. They are linked.

  • @josephwalsh4616
    @josephwalsh4616 Год назад +1

    for the line rfs = gen_reading_frames(seq[startRead: endRead]),
    I get startRead, endRead not defined
    I tried startReadPos, endReadPos but that didn't work either.
    Are these built in functions from a module I am missing or am I just being an idiot?

    • @josephwalsh4616
      @josephwalsh4616 Год назад

      Also thank you for all your wonderful content

  • @ranataki8675
    @ranataki8675 3 года назад +1

    Hey, this somehow works on the BT006808 however, when I tried it on another sequence (JF909299.1) it didn't output the correct protein

    • @rebelScience
      @rebelScience  3 года назад +2

      Hey! I have looked it now and you are right. Even though our DNA Toolkit is only the base set of functions (it can't handle any sequences that are not ATCG for example. MT324680.1 has a single K in it and requires extra logic to find/produce a protein) it should still find any "standard" proteins correctly. I am working on a next version/update for our DNA Toolkit to support sequences like MT324680.1 and others.
      I have looked at:
      1) www.ncbi.nlm.nih.gov/nuccore/JF909299.1
      2) www.ncbi.nlm.nih.gov/nuccore/BT006808.1
      1) www.ncbi.nlm.nih.gov/nuccore/JF909299.1?report=fasta
      2) www.ncbi.nlm.nih.gov/nuccore/BT006808.1?report=fasta
      It looks like both AA sequences produce similar/same Protein sequences. Image: t.ly/H5cb
      But the expected JF909299.1 Protein sequence is not there. If we look at it's AA sequence, it does look like it should produce what we are seeing with our DNA Toolkit code. I am now wondering if NCBI Protein sequence is somehow different or specific. I am not surer at this stage, but I will investigate this and add to our next DNA Toolkit update video. Any findings by you or anyone else is super welcome. Also, Rana, if you are not in our chat yet, join in as it is easier to discuss things like that there.