Python For Bioinformatics and Your First Python for Bioinformatics Program

Поделиться
HTML-код
  • Опубликовано: 6 авг 2024
  • For more in-depth Python for Bioinformatics training visit: www.howtobioinformatics.com/py...
    Hi and welcome to Python for bioinformatics, my name is Blake Allen, and I am going to show you how to make your first Python for Bioinformatics program, in under 20 minutes.
    Were going to go over calculating GC content and making your first Python Program, So if you're a little more advanced and you already know how to use Python, but you'd like to learn more, go ahead and click the link below where I'll show you advanced techniques in learning python for bioinformatics.
    The first thing we're going to need is some data. If you don't have any data, you can't do any bioinformatics, but the great thing is, is there is a ton of free data online ready to go.
    So go ahead and open up your web browser and lets get started, I use chrome.
    Go ahead and type in the Letters NCBI. In the search bar go ahead and type in BRCA 1
    Click on this little tab right here that says nucleotide. Up at the top we've got a few things, go ahead and click on the homo sapiens BRCA1, FASTA tab.
    Click on Send in the top right hand corner, click on send to file, and download as FASTA. Then copy that sequence.fasta to a new folder we'll be working in. Replace the name to BRCA1_BAP1.TXT, then you can open it and look at it.

Комментарии • 87

  • @georgegrevera7000
    @georgegrevera7000 6 лет назад

    I very much enjoyed this video. I like the fact that, by the end, I'm working with real data and doing something useful. Thanks!

  • @tomhitch763
    @tomhitch763 11 лет назад +6

    This tutorial is brilliant, please create more!

  • @chaokang3594
    @chaokang3594 9 лет назад +3

    Really helpful!
    I love Python!

  • @LauraBrock
    @LauraBrock 11 лет назад +3

    This was really informative and interesting!

  • @MyChannel-jf7mr
    @MyChannel-jf7mr 10 лет назад

    Very informative. Thank you for providing this example.

  • @ShadArfMohammed
    @ShadArfMohammed 7 лет назад +2

    Thanks a lot, it was really helpful. You haven't put any other videos on this subject since 2013, though.

  • @ricardomoran3
    @ricardomoran3 11 лет назад

    FANTASTIC! Thank you!

  • @NA0S90
    @NA0S90 9 лет назад

    very straight forward tutorial, thanks

  • @SeemaP83
    @SeemaP83 11 лет назад +1

    It was helpful..thank you.keep adding

  • @alexanderdavis3117
    @alexanderdavis3117 11 лет назад

    Very cool! I need to learn Python ASAP!

  • @laceycarlyle7754
    @laceycarlyle7754 11 лет назад +2

    Very informative!

  • @mardiclements1571
    @mardiclements1571 11 лет назад

    Very Helpful!

  • @dhivyas9908
    @dhivyas9908 5 лет назад

    Thank you it works very well

  • @cherryblossoms95
    @cherryblossoms95 11 лет назад +1

    THIS IS AMAZING.

  • @MyMasaka
    @MyMasaka 2 года назад

    The best video i have seen on bioinformatics

  • @jpshiva1
    @jpshiva1 11 лет назад

    Noel Tanner,
    Thanks for the Reference sequence, i was having hard time finding the correct nucleotide.

  • @nityaaryasomayajula2204
    @nityaaryasomayajula2204 5 лет назад

    Hello, Thanks for this video! I was wondering if we could use the difflib program to do comparative genomics for two different files and create a report of differences?

  • @meanderband
    @meanderband 11 лет назад

    Very Nice!

  • @rusbiology3460
    @rusbiology3460 4 года назад

    Спасибо тебе большое за этот разбор!

  • @grimreapper2358
    @grimreapper2358 5 лет назад

    this is outstanding iam hoping you can show more examples in jupyter notebook

  • @ujenetics
    @ujenetics 9 лет назад +1

    Thanks a lot for a nice turotial! But have you tried TextWrangler instead of Textedit?

  • @aalimmujawar582
    @aalimmujawar582 2 года назад

    thanks it is very good information

  • @MrGomajo
    @MrGomajo 8 лет назад +12

    Why not write it in the Python IDLE?

  • @kjeyaprakash2638
    @kjeyaprakash2638 8 лет назад

    which python book could be better for references ? This is nice!

  • @davidr.martinezph.d.4746
    @davidr.martinezph.d.4746 9 лет назад

    Hi,
    So I wrote the same program on PyCharm
    I tried opening this in Bash Shell and I get told "not a directory". I switched directories to ensure I was in the right folder. Does anyone have suggestions?

  • @irenez.b.1730
    @irenez.b.1730 6 лет назад

    any more advanced python scripts to use for the analysis of sequencing data

  • @omotosoolatunde9139
    @omotosoolatunde9139 3 года назад

    Thank You!

  • @zapy422
    @zapy422 8 лет назад

    Nice cool intro to bioinfo

  • @mni79
    @mni79 4 года назад

    good work

  • @MrLompa76
    @MrLompa76 10 лет назад

    So I have to create a folder first then create another folder to put the file inside of it?

  • @kavansoni4671
    @kavansoni4671 6 лет назад

    Pls provide the exact link for dataset download in description

  • @shankfan
    @shankfan 10 лет назад +1

    this is for python 2.7.x right? it doesnt work with my 3.3.x

  • @dragonsteria3042
    @dragonsteria3042 9 лет назад

    Awesome, my first python program to know the gc content... I have a question, What is the gc content for? What does it tell me exactly? Did not understand that very well.
    BTW I used this squence Rattus norvegicus BRCA1 mRNA, complete cds
    gc content: 0.460014

  • @VercingetoR3x
    @VercingetoR3x 6 лет назад

    What version of python did you use?

  • @SpamHead8
    @SpamHead8 11 лет назад

    Very clear and informative - thanks! Do you mind if I post/share?

  • @gitarrestunden2445
    @gitarrestunden2445 10 лет назад

    Hi! Thanks for the video!! However, can you please explain why you set the g, a, t and c at 0 in the beginning?
    Thanks!

    • @stevanbr1
      @stevanbr1 10 лет назад +1

      Because you have to initialize variables to zero before you add a number to it ( g+=1 => g = g + 1), if you don't initialize variables to zero, your variable has seme thrash value, and you won't have a valid result. First time it enters 'if' with 'g', g is going to be zero, so g = 0 + 1 = 1, if you don't initialize, it will be g = #$#@$+ 1 = ?. Hope that helps :)

  • @dr.md.ismailhossain2681
    @dr.md.ismailhossain2681 5 лет назад

    very nice

  • @jmadzo
    @jmadzo 11 лет назад +1

    more pythonic would by to get rif of nested loop and just use build in string function count():
    for line in gene: g += line.count('g'); a += line.count('a'); c += line.count('c'); t += line.count('t');

  • @titanoboa100
    @titanoboa100 10 лет назад +1

    My problem so far is saving the folder as a plain txt file. My macbook will not give me the option when I select the drop down list.

  • @favoriteundsubscribe
    @favoriteundsubscribe 11 лет назад

    awesome

  • @M.K-SAVE
    @M.K-SAVE 4 года назад +1

    Just small question. Is this what bioinformatics mostly do? Sequence genes then use a programming language for analysis?

    • @MrChristian331
      @MrChristian331 4 года назад

      In a nutshell...YES. But in addition to analysis, they can use programming for drug discovery therapeutics. They can use programming for predictive analytics to see if something will switch a gene on or turn it off before administering it experimenting with it to save time and money.

  • @queenofunderland
    @queenofunderland 8 лет назад

    anyone know the answer ? what ,if u take the fasta format without head ,can u get rid of that gene.readline() ?
    And when the counter are named with A,C,T,G string, can u get rid of that line.lower() ?
    TQ 4 any suggestions .

    • @nenadsvrzikapa6893
      @nenadsvrzikapa6893 8 лет назад +1

      +willie ekaputra yeah that just skips the line, so if the line is not there you don't need to skip it, but if you remove it then it's no longer a fasta file. Either way, this is not how an advanced Bioinformatician would solve this task.I think Blake is showing that you can make the string lower case. It usually is upper case so you don't need to be converting you don't need that line.

    • @queenofunderland
      @queenofunderland 7 лет назад

      I have other question, can u then make this code a fct . with Def ... () :, so that u can open ANY Fasta saved files in yer PC and count its GC Content ?

  • @bogdanbogdanovich140
    @bogdanbogdanovich140 4 года назад +3

    invalid syntax on the second quote of print "number of g's " + str(g)

    • @MrChacha1994
      @MrChacha1994 4 года назад +4

      idk if its because he's using make but If you are using windows like I am, make sure that when you use the "print" function, make sure to use parenthesis
      Ex: (EXACTLY LIKE THIS)
      print("number of g's " + str(g)")

    • @Paul-su7sb
      @Paul-su7sb 3 года назад

      Same here, thank you so much for the advice I am going to try it

    • @kareenamulchandani3356
      @kareenamulchandani3356 Год назад

      I think the syntax changed in Python3

  • @cgroza
    @cgroza 8 лет назад +7

    Why not use count() or regular expressions?

  • @NoelTanner
    @NoelTanner 11 лет назад +2

    I had a little trouble finding the correct Nucleotide, To save time here is the ref. # for the example in the video:
    NCBI Reference Sequence: NG_031859.1

  • @76BlueLions
    @76BlueLions 11 лет назад

    Your web page is down, can you let me download this. Your channel blocks it from being able to download.

  • @Neohowphinktams
    @Neohowphinktams 11 лет назад

    Good video, just wish it was more streamlined

  • @science_mbg
    @science_mbg 8 лет назад

    Thanks but I had problem while running. I used windows bash and I got "
    print "number of g's " + str(g)
    ^
    SyntaxError: invalid syntax
    error. Even though I did the same thing that you did. Please help me

    • @nagaswaroopkenguntenagaraj8677
      @nagaswaroopkenguntenagaraj8677 8 лет назад +2

      +Suleyman Bozkurt
      That maybe because you are using python 3+ where the syntax for print statement is print("number of g's "+ str(g)) [Notice the parentheses], whereas in python 2+ the syntax for print is as mentioned in the video[ print "number of g's " + str(g) ]
      Hope it helped! :)

    • @d34thcom3sripping
      @d34thcom3sripping 6 лет назад

      thnx boss. resolved my issues.

  • @unays
    @unays 4 года назад

    oh man, wow thanx

  • @bhrishxxn1639
    @bhrishxxn1639 8 лет назад

    thanks so much i'll definitely be coming back

  • @bhanuchandrakarisetty9718
    @bhanuchandrakarisetty9718 11 лет назад

    sir i am using windows 7 operating system, python and instead of coda i am using sublime text 2. i have followed everything until the TERMINAL option. it is not there in windows. can u tell me the equivalent one. so that i can finish the last step. waiting for your reply sir. thank you

    • @wavesofgrey-vb9gw
      @wavesofgrey-vb9gw 5 лет назад

      windows command line, or now powershell. you will have to add python to the path to run python from the command line

  • @previeweverything6124
    @previeweverything6124 4 года назад

    My syntax is always error in
    If char == "g" :
    Usually in (if) and in (g)
    Help me why

    • @dxamphetamin
      @dxamphetamin 4 года назад

      'g', you need to check for a char not a string

  • @MadMechwarrior
    @MadMechwarrior 11 лет назад

    I live python. Great tutorial!

  • @irenez.b.1730
    @irenez.b.1730 6 лет назад

    👏👏👏

  • @biemsklebob
    @biemsklebob 5 лет назад

    9:00 variable*

  • @mannyfan165
    @mannyfan165 8 лет назад +2

    dude why does this not work at all using windows

    • @LegeFles
      @LegeFles 7 лет назад

      did you install python?

    • @mannyfan165
      @mannyfan165 7 лет назад

      yes

    • @LegeFles
      @LegeFles 7 лет назад +3

      Matt saying it doesn't work "at all" isn't really a helpfull comment.

  • @MWorks08
    @MWorks08 6 лет назад +3

    1.75x Speed would be really appreciated for this video :D

  • @Actanonverba01
    @Actanonverba01 7 лет назад

    for beginners only

  • @mauroresaca
    @mauroresaca 4 года назад

    Why never start with the code this man?

  • @IsaacPiera
    @IsaacPiera 7 лет назад +2

    super inneficient code. use the count() funcion which is WAY faster!

    • @georgegrevera7000
      @georgegrevera7000 6 лет назад

      I timed both ways on a file of 117k bases. His way used 0.02 sec. Using count() used 0.005 sec. Both are fast enough for me.

    • @johnfedorov8089
      @johnfedorov8089 5 лет назад

      @@georgegrevera7000 The problem is scale. Had the gene sequences been longer, this would be exponentially inefficient. I'm coming from a computer science background though, where efficiency is hammered into our heads due to scalability

  • @jaredakers7683
    @jaredakers7683 7 лет назад

    Someone should re-do these videos in Windows.

  • @pankajsaraswat3110
    @pankajsaraswat3110 8 лет назад

    bevkuff

  • @ggyanwali
    @ggyanwali 8 лет назад +2

    poor video making quality