Bioinformatics Project from Scratch - Drug Discovery Part 1 (Data Collection and Pre-Processing)

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • Do you want to collect your very own novel and original dataset in biology that you can use in your Data Science Project? In this video, I will show you how to download and pre-process biological activity data from the ChEMBL database that you can use to perform Computational Drug Discovery. The dataset is comprised of compounds (molecules) that have been biologically tested for their activity towards target organism/protein of interest. This video represents Part 1 in a multi-part video series on Bioinformatics Project.
    🌟 Buy me a coffee: www.buymeacoff...
    ⭕Code: ✅ github.com/dat...
    ⭕ Playlist:
    Check out our other videos in the following playlists.
    ✅ Data Science 101: bit.ly/datapro...
    ✅ Data Science RUclipsr Podcast: bit.ly/datasci...
    ✅ Data Science Virtual Internship: bit.ly/datapro...
    ✅ Bioinformatics: bit.ly/dataprof...
    ✅ Data Science Toolbox: bit.ly/datapro...
    ✅ Streamlit (Web App in Python): bit.ly/datapro...
    ✅ Shiny (Web App in R): bit.ly/datapro...
    ✅ Google Colab Tips and Tricks: bit.ly/datapro...
    ✅ Pandas Tips and Tricks: bit.ly/datapro...
    ✅ Python Data Science Project: bit.ly/datapro...
    ✅ R Data Science Project: bit.ly/datapro...
    ⭕ Subscribe:
    If you're new here, it would mean the world to me if you would consider subscribing to this channel.
    ✅ Subscribe: www.youtube.co...
    ⭕ Recommended Tools:
    Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!
    ✅ Check out Kite: www.kite.com/g...
    ⭕ Recommended Books:
    ✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
    ✅ Data Science from Scratch : amzn.to/3fO0JiZ
    ✅ Python Data Science Handbook : amzn.to/37Tvf8n
    ✅ R for Data Science : amzn.to/2YCPcgW
    ✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
    ✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
    ⭕ Stock photos, graphics and videos used on this channel:
    ✅ 1.envato.marke...
    ⭕ Follow us:
    ✅ Medium: bit.ly/chanin-m...
    ✅ FaceBook: / dataprofessor
    ✅ Website: dataprofessor.org/ (Under construction)
    ✅ Twitter: / thedataprof
    ✅ Instagram: / data.professor
    ✅ LinkedIn: / chanin-nantasenamat
    ✅ GitHub 1: github.com/dat...
    ✅ GitHub 2: github.com/cha...
    ⭕ Disclaimer:
    Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
    #dataprofessor #bioinformatics #drugdiscovery #drugdesign #chembl #cheminformatics #bioinformaticsproject #bioinformaticproject #drug #drugs #molecule #molecules #machinelearning #lecture #dataprofessor #bigdata #QSAR #QSPR #machinelearning #datascienceproject #randomforest #decisiontree #svm #neuralnet #neuralnetwork #supportvectormachine #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel

Комментарии • 271

  • @DataProfessor
    @DataProfessor  3 года назад +7

    👉Watch this video next (How to learn data science in 2021) ruclips.net/video/oR670Txwh88/видео.html
    Support this Channel 👇👇👇
    🌟 Buy me a coffee www.buymeacoffee.com/dataprofessor
    🌟 Download Kite for FREE www.kite.com/get-kite/?
    👉 Subscribe to this RUclips channel ruclips.net/user/dataprofessor
    👉 Join the Newsletter of Data Professor newsletter.dataprofessor.org
    👉 Blogs on Medium medium.dataprofessor.org/

    • @aadimnepal6497
      @aadimnepal6497 2 года назад

      Hi Professor. Where can I find the documentation for CHEMBL library?

    • @kapilkumar-jn4il
      @kapilkumar-jn4il 2 года назад

      Dear sir, Indian card does not support automatics payment please set for a prepaid plan. I am getting into trouble paying.

    • @suthishmababu2107
      @suthishmababu2107 4 месяца назад

      Do you have documentation for this project

  • @DataProfessor
    @DataProfessor  4 года назад +19

    Thanks to the discussion with Shweta in this comment section. Back in the days, 7 years ago, we manually compiled the bioactivity data of more than 2000 compounds from hundreds of research articles. The whole process took 6 months, then we spent a few more months manually curating the data, and double checking again and again for consistency. Fast forward to today, we can do the same thing in less than 10 minutes as shown in this video. I am thankful for the generosity of data providers for making these APIs as well as the various libraries such as pandas (imagine handling hundreds of Excel files and manually curating those) and scikit-learn (imagine optimizing learning parameters manually on 50 computers and via a GUI interface of data mining software such as Weka). Coding is indeed a real superpower. If you are thinking of whether to learn coding or not, my recommendation is yes! It will be one of the best decision for your career and hobby 😃

    • @shwetaredkar734
      @shwetaredkar734 4 года назад +4

      Very true. I thank you on behalf of all your subscribers for all the efforts and getting such super amazing tutorials to all of us.

    • @DataProfessor
      @DataProfessor  4 года назад +2

      @@shwetaredkar734 I'm flattered.Thanks for the kind words. Glad the tutorials were helpful.

    • @kapilkumar-jn4il
      @kapilkumar-jn4il 2 года назад

      @@DataProfessor need help sir

  • @soukisama05
    @soukisama05 4 года назад +35

    Dear Data Professor,
    I can not even express how grateful I am for your content and dedication to your subscribers!
    I come from a biological background and I am new to the bioinformatic world.
    You give me motivation and great advice to continue studying in this incredible field.
    Great work,
    Greetings from Brazil!

    • @DataProfessor
      @DataProfessor  4 года назад +3

      Wow, thank you! Glad the contents are helpful to your journey into bioinformatics 😃

  • @KenJee_ds
    @KenJee_ds 4 года назад +30

    Can't wait for part 2! I know your subscribers have been asking for this series!

    • @DataProfessor
      @DataProfessor  4 года назад +4

      Thanks Ken! Inspired by subscribers and by Ken Jee’s Data Science from Scratch Series 😃

    • @KenJee_ds
      @KenJee_ds 4 года назад +4

      @@DataProfessor haha I am flattered! Keep up the great work!

    • @queenofunderland
      @queenofunderland 3 года назад

      Me too.
      Yaay, thx for the video, Prof. Learned sth from my bioinformatics study, but definitely you give me more experience!

  • @khaifea8829
    @khaifea8829 4 года назад +32

    As a Bioinformatics MSc student I found this so interesting

    • @DataProfessor
      @DataProfessor  4 года назад +5

      Thanks Muhammad for watching and the kind comment! I am currently editing the video of Part 0 (Bioinformatics 101) and Part 2 will be filmed soon.😃

    • @user-pm6hz9pb1e
      @user-pm6hz9pb1e 3 года назад

      hi everyone
      please , can anyone tell me how to get past master thesis in Bioinformatics about Drug Discovery ?

    • @tiamat1628
      @tiamat1628 7 месяцев назад

      Brilliant sfuff thank you very much for sharing your knowledge and wisdom

  • @ElijahErureh
    @ElijahErureh Год назад +2

    I want to show my sincere appreciation for how you made data science so simple for me and interesting

    • @DataProfessor
      @DataProfessor  Год назад

      Thanks for the kind words, this means a lot!

  • @Yoursleepassistant
    @Yoursleepassistant 3 года назад +5

    What an awesome explanation
    I finally maneged to find channel to walk me through step by step.
    I sincerely thank you

  • @shwetaredkar734
    @shwetaredkar734 4 года назад +1

    This video is a treasure that I have found. Probably the first video ever on Chembl data collection. I wish this video was out when my paper was under review last year. Luckily, I could solve the reviewer query on Chembl.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Shweta, Thanks for the kind words! I’m flattered, glad it was helpful! 😃

    • @shwetaredkar734
      @shwetaredkar734 4 года назад +1

      @@DataProfessor seriously, I faced a lot many issues to solve the comment. May be I was not familiar with Chembl. But surely, this video will help many of them. Thanks for making this.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      @@shwetaredkar734 Thanks for sharing. And thanks again for the kind words 😃

  • @danieltoo2008
    @danieltoo2008 Год назад +1

    Wow!!! Now I can learn DS and Bioinformatics from a Thai Professional. Thanks Prof for helping get out off my SandBox prison mindset.😅😅😅

  • @manabendraborah8654
    @manabendraborah8654 Год назад +3

    sir....thank you so much for simply sharing your knowledge.

  • @saraalm9567
    @saraalm9567 2 года назад +7

    This is incredible. Thank you so much for sharing your knowledge and experience with us!

  • @gauravbhattacharjee5737
    @gauravbhattacharjee5737 4 года назад +6

    Professor, this is an indispensable resource! You are the best!

    • @DataProfessor
      @DataProfessor  4 года назад +2

      Happy to help! Thanks for the kind words!

  • @FarisIzzaturRahman
    @FarisIzzaturRahman 3 года назад +2

    I watch this on May 2021, and sooo excited with this amazing project video, what a great content, thankss Prof!

  • @CostanzoPadovano
    @CostanzoPadovano 3 года назад +6

    Perfect lesson, thank you Professor!
    Greetings from Italy!

    • @DataProfessor
      @DataProfessor  3 года назад

      Thanks for watching and glad it is helpful 😊

  • @avoniadevile3035
    @avoniadevile3035 3 года назад +1

    I am new to your profile. I was intrested to see about drug discovery and especially l am rescently got interested in bioinformatics. Sadly l think l chose the wrong course as I am biomedical science student in my second years and now l can see that l should have gone bioinformatics way! It was very straight forward lesson, proffessor! Thank you so much

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, thanks for the comment. It's okay, I was also a biomed student back in the days. You'll get a solid domain knowledge of the biomedical sciences. You can also do bioinformatics in the side.

  • @mr.harambae
    @mr.harambae 3 года назад +4

    Yess! Finally! The niche teach professor!

  • @raedhanoon2791
    @raedhanoon2791 4 года назад +2

    This was a useful, informational video! It was straightforward and very interesting to learn about. You've intrigued me to pursue a bioinformatics program and I plan on starting my own project using the same tools demonstrated, Thanks Data Professor!

  • @alvinmodales6809
    @alvinmodales6809 3 года назад +3

    You are so amazing data professor, thank for always sharing your expertise and knowledge

  • @delilahjones6496
    @delilahjones6496 3 месяца назад +1

    I would really appreciate it if you explained your lines of code and how they work. Otherwise, I am not learning why I'm typing what I'm typing, or why it's necessary, etc.
    I just watched another series by a different guy who thoroughly explained every line of code he wrote in R and I walked away with a much better understanding of why each line of code is being typed or why certain arguments were used.

  • @onkarkumbhar1610
    @onkarkumbhar1610 3 года назад +2

    This is one only resource condensed, thanks!

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Thank you for watching and support!

  • @anjuzoldyck9266
    @anjuzoldyck9266 7 месяцев назад +1

    thank you so much professor you are a gem 💎

  • @ashwins5180
    @ashwins5180 4 года назад +2

    Thanks alot sir...please upload part 2, i m eagerly waiting for ur next video for bioinfo projects in ml😍😍😍

    • @DataProfessor
      @DataProfessor  4 года назад +2

      Thanks Ashwin, it was fun making this video, I will also release a Part 0 video (introductory video on the biology background) as well 😃

  • @anjelidubey1095
    @anjelidubey1095 9 месяцев назад

    What a great video! This helps so much in getting started building my portfolio

  • @JellaMaruti
    @JellaMaruti 5 месяцев назад +1

    Hello Dr Chanin,
    What if the units for IC50 are different? I am trying to create a project but the imported data from the ChEMBL contains different units (uM and nM). How to solve this problem? I went through one of your comment, but could not find the code in the notebook that you mentioned. Thank you for the help!

    • @DataProfessor
      @DataProfessor  5 месяцев назад

      Hi, you’ll need to convert the units to the same scale by either converting uM (micro-molar or 10^-6) to nM (nano-molar or 10^-9) or vice versa.

  • @aliqaitoon9965
    @aliqaitoon9965 4 года назад +2

    Great lecture. Thanks professor. Please we need second part soon 🙏😍

    • @DataProfessor
      @DataProfessor  4 года назад

      Ali, glad you found the video helpful. I'm also thrilled to make the next part, please stay tuned 😃

    • @aliqaitoon9965
      @aliqaitoon9965 4 года назад +1

      @@DataProfessor i will be

  • @flowstateofmind2380
    @flowstateofmind2380 Год назад

    ตามมาจากช่อง Data rockie นะครับ อยากสนับสนุนให้อาจารย์เขียนหนังสือด้าน data science for bioinformatics มากเลยครับ จะรอซื้อครับ 👍

  • @user-bn8vv8mk8f
    @user-bn8vv8mk8f 3 года назад +1

    hello! can you help me? im having this kind of error when importing the libraries:
    ImportError: cannot import name 'hashlib' from 'requests_cache.backends.base' (/usr/local/lib/python3.7/dist-packages/requests_cache/backends/base.py)
    do you know why this happening?

    • @user-bn8vv8mk8f
      @user-bn8vv8mk8f 3 года назад +1

      this error comes when i execute:
      from chembl_webresource_client.new_client import new_client
      thanks for the guidance

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi @VxS replied to your other post

  • @varshak4824
    @varshak4824 4 года назад +3

    Excellent and Awesome.I will share this tutorials with my friends. Sir, I have a request. Could you please make tutorial on how to collect data from BioLip, KEGG, PDBSUM etc and how to use various packages and tools such as Rcpi, Protr, PyPDI,Biopython, etc to generate descriptors? Just waiting for the second part. Many thanks!!!

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks Varsha for the suggestions. Currently in the making is how to use BioPython to do a protein blast. More on the molecular descriptors is also in the pipeline in forthcoming videos of the Bioinformatics Project series. 😃

    • @shwetaredkar734
      @shwetaredkar734 4 года назад +1

      @@DataProfessorthat's a great news. 👍

  • @kampco9982
    @kampco9982 Месяц назад

    I am trying to do follow along, doing this with dopamine d2 receptor protein, but after pre-processing the data, i noticed that there are some duplicate values that have the same molecule_chembl_id, but different standard_value, is this normal?

  • @aashishkatyal
    @aashishkatyal 3 года назад +1

    Awesome Tutorial Must Watch for Beginners.

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Glad you think so! Thank you for watching!

  • @sakichan2640
    @sakichan2640 2 года назад +1

    thank you professor ! Do more bioinformatics video pls

  • @kamalikabhattacharjee336
    @kamalikabhattacharjee336 2 года назад +5

    This is so good for self learners like me, thanks for creating this kind of bioinformatics content.
    Apart from acetylcholinesterase what are other molecules for which I can create similar project ? Tried for human BRCA1 but not a great one to create the exact similar project. Please suggest a few options.

    • @khushboodutta30
      @khushboodutta30 2 года назад

      Even I'm interested to learn to build a project for HER 2 /BRCA1 / Bcr-Abl genes

    • @DataProfessor
      @DataProfessor  Год назад

      Members in the Cytochrome P450 and Kinase protein family are also worth exploring

  • @user-rq7em3vx1r
    @user-rq7em3vx1r 2 месяца назад

    Hi, if any of my fellow bioinformaticians is watching this video. Let me know how does it help? I don't mean anything negative. The video seems great. But here you have already built a project and are running it in front of us. How do I build one of my own? Could you please teach us that? Or is this how people generally take inspiration about making projects? I am sorry I ma very new to this.

  • @salikmalik7631
    @salikmalik7631 4 года назад +1

    Great video data professor, waiting for part 2.. :)

  • @TejasriSivapriya
    @TejasriSivapriya 21 день назад

    Hi sir I am actually a biotechnology student and want to step into bioinformatics and have no prior knowledge about coding so do you thing this playlist is a good start and what else do you think I should do after this to understand more

  • @samarafroz9852
    @samarafroz9852 3 года назад +1

    You're doing such a great job sir please upload more tutorial about AI and drug discovery please upload drug design with GAN and AAE

  • @benjamintwumasi2480
    @benjamintwumasi2480 Год назад +1

    At the importing Library section of the Colab, I still get a messages like "No module named 'chembl_webresource_client". Please look into that.

    • @DataProfessor
      @DataProfessor  Год назад

      Have you installed this library before running the code cell.

    • @benjamintwumasi2480
      @benjamintwumasi2480 Год назад

      @@DataProfessor Yeah, that day I realised the mistake is from my end.

  • @UwU-uq9pq
    @UwU-uq9pq 3 года назад

    Hi Professor, I have face a problem when following your code on Google Colab. While at the first step of copying files to Google Drive, I encounter a “ModuleNotFoundError”. And I found that this is cause by a module 'termios' which is only available on Unix system. I found that you are also using Windows OS but how can you solve the problem faced? Regarding this problem, the googled result only show me to change to unix os.

  • @poojavani223
    @poojavani223 2 года назад

    This video really awesome, thank you so much.

  • @yinyang6058
    @yinyang6058 3 года назад

    Thank you! This is soooooo helpful to me!!!

  • @naturevideosandaudiostv3262
    @naturevideosandaudiostv3262 4 года назад +2

    Sir every video of u is a new lesson for us but plz if u dont min plz zoom the vid. So that we can see the code clearly plz

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks for suggestion, recent videos are zoomed in, sorry for the small font size of earlier videos.

  • @belkhiranesrine185
    @belkhiranesrine185 2 года назад

    Please sir, how can I solve this error, I have Windows 8.1 and I use python 3:
    ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

  • @halizahasniaputri8869
    @halizahasniaputri8869 3 года назад

    I'm doing my final project for my graduation later, I thought of various ways and I didn't find any way out until I finally found your video, I'm very grateful that you made a video like this, it really helped me. May I ask a few things? how do you combine two chemical molecules so that you get a new molecular structure and then test the bioinformatics? I am very confused about that, please help me
    Can I talk about this with you please?
    Thank you very much..

  • @alizanaz5963
    @alizanaz5963 Год назад

    Hi! I have been following your tutorials from past few days. I have a question. Is it necessary to test models on the same dataset they were trained with, or can a different test dataset be used for evaluation?

  • @marcofestu
    @marcofestu 4 года назад +1

    Eager to see part 2 professor 😁

  • @ramkumarrs1170
    @ramkumarrs1170 4 года назад

    Wow! super step by step process explanation!

  • @sadiafarzanadiya2800
    @sadiafarzanadiya2800 3 года назад +2

    As a high school student it has gone beyond my head now I need some sleep

  • @Hiraeth_hiraeth
    @Hiraeth_hiraeth 5 месяцев назад

    are there any prerequisites to be able to do this project? like are we supposed to know some ML beforehand?

  • @nguyenthanhtrung3768
    @nguyenthanhtrung3768 4 года назад +1

    Specially thank you for your useful and practical video. It would be much more applicable if you could instruct how to execute this work with R.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Great suggestion! I'll definitely consider this for future videos. In the meantime, please refer to the GitHub and link to the research paper of one recent publication from our research group:
      github.com/chaninlab/anti-sickling
      pubs.rsc.org/en/content/articlelanding/2018/ra/c7ra12079f

    • @nguyenthanhtrung3768
      @nguyenthanhtrung3768 3 года назад

      @@DataProfessor Thank you very much for your valuable suggestions. It would be very helpful if you could advise how to download necessary data from several other databases to apply for other drug discover.

  • @josefranklinct
    @josefranklinct 4 года назад +1

    that's fantastic, thanks so much professor

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for watching! Glad you liked it!

  • @priyankachauhan9513
    @priyankachauhan9513 3 года назад +1

    Hello sir i am not able to run the githubtools to calculating the network proximity of bladder and colorectal cancer so please 🙏 make a video on this

  • @pradumnchavan1745
    @pradumnchavan1745 4 года назад +1

    thanks you sir for this amazing tutorial i am msc bioinformatics student can you plz make complete video on another bioinfo project using the approach text mining and data curation.

  • @sebastiancastro4126
    @sebastiancastro4126 4 года назад

    Hello, I would like to make an inquiry. When we obtain the dataframe with the data for the biological activity of the selected target, some numeric columns are "object", which prevents statistics from being performed. How can I change the dtype of these columns? Thank you very much!

  • @balamurganp7224
    @balamurganp7224 Год назад +1

    As a college student, can i earn anything using bioinformarics by anymeans? 🤔

  • @pravalikas.p8520
    @pravalikas.p8520 2 года назад

    Hai sir thank you so much for this explanation

  • @nikitia_r
    @nikitia_r 2 года назад +1

    Hi Prof Chanin,
    When I run this code in JupyterNotebook, I get stuck where we save the bioactivity data.
    I run the following:
    df.to_csv('C:/Users//bioactivity_data.csv')
    df = pd.read_csv("bioactivity_data.csv") # to read the csv
    df.head(5) # to view the data
    while the I am able to access the csv on my desktop , the code doesnt run in the notebook. (still shows In [*]: at df.to_csv...
    I would just like to find out if there are any suggestions as to how I should be doing this?

    • @DataProfessor
      @DataProfessor  2 года назад

      Hi, You could try removing the path so that you have df.to_csv('bioactivity_data.csv') and see if this works

  • @rhard007
    @rhard007 4 года назад +4

    Hi Professor can you recommend some books for self-study on the topic of bioinformatics? thank you.

    • @DataProfessor
      @DataProfessor  4 года назад +4

      Hi, there are quite a few books,the O'Reilly book "Bioinformatics Programming Using Python: Practical Programming for Biological Data" seems like a good starter to get you started with python coding bioinformatics examples.

  • @elshroomness
    @elshroomness 3 года назад

    I am having an issue trying to install the chembl resource client.
    This the error that i am running into:
    ""ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall. ""
    Can anyone help?
    EDIT: I uninstalled Anaconda and reinstalled it and its working fine.

  • @arkadeepbanerjee2347
    @arkadeepbanerjee2347 4 года назад +1

    Looking forward to part 2.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Arkadeep, thanks for the comment! Part 2 is in the making.

  • @devarakondahimaja8423
    @devarakondahimaja8423 3 года назад

    If i want to write a if statement where the id number in row(1) of a CSV file is == id number in my text file print the sequence that means print the information of that particular id
    How to write?

  • @cooldayka
    @cooldayka 4 года назад +2

    Dear Professor,
    Is it possible to somehow contact you, I have a small question about one thing not really about the topic but about the course that is available I am now a bit confused should I pay for it or better to not do it, since it most probably will not be much beneficial

    • @DataProfessor
      @DataProfessor  4 года назад

      Hi you contact me at dataprofessorofficial@gmail.com

  • @mohtashimnizamani
    @mohtashimnizamani 2 года назад

    It'll be studying computer science at university and I'm also interested in bioinformatics, but I don't have any biology background whatsoever. How should I approach bioinformatics and can I self study biology concepts and start doing bioinformatics projects?

  • @Anummmmmm798
    @Anummmmmm798 2 года назад +2

    Hi sir I am PhD scholar in Bioinformatics can u plz suggest some research topics

    • @DataProfessor
      @DataProfessor  2 года назад

      Hi, I made a video on how to select a research topic ruclips.net/video/zRWFR1SSBFc/видео.html

  • @sebastianjorgecastro2452
    @sebastianjorgecastro2452 4 года назад +1

    Hello, when I use df = pd.DataFrame.from_dict(res), I get this error: TypeError: Object of type DataFrame is not JSON serializable. What can I do? Thank you!

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Can you recheck if you have run all cells in sequential order

    • @sebastianjorgecastro2452
      @sebastianjorgecastro2452 4 года назад

      @@DataProfessor My mistake! I didn't select the target correctly.

  • @gayathrim3022
    @gayathrim3022 3 года назад +1

    Could you please post a video on extracting various important information say chemical, ADR etc related to the drug using NLP techniques.

    • @DataProfessor
      @DataProfessor  3 года назад

      Thanks for the suggestion, will definitely consider this for future videos.

  • @TravisKPHall
    @TravisKPHall Год назад

    Dear Professor, this was an amazing tutorial and lecturer in one, thank you so much, but I have two questions, I am currently trying to create my own independent model and I have a question, the classification of active, inactive or intermediate based on the IC50 value that you used, is this a universally accepted for all molecules when making a QSAR model? What constitutes us using these parameters active 10000?

  • @johntichenor1601
    @johntichenor1601 3 года назад

    Hey there I'm getting an attribute error when setting target = new_client.target do you know the reason for this?

  • @user-bt2kf5ls2h
    @user-bt2kf5ls2h Год назад

    Hello!
    I am working on a drug target interaction prediction using ML project. I wanted to know if I can use the data for my project in the same way given by you?

  • @garrettmccue2644
    @garrettmccue2644 3 года назад

    creating the df from the bioactivity data dictionary is taking a long time (30+min) with no luck. Any suggestions on speeding this up? i am using jupyter fyi

  • @naganandhireddygunreddy5397
    @naganandhireddygunreddy5397 Год назад

    how to create a data folder in jupyter notebook and copy bioactivity_data csv file...please help me how do this part in jupyter notebook..help me out i got strucked at this part....in this video , how to create and copy explained colab notebooks on google drive

  • @hyrunnisa997
    @hyrunnisa997 2 года назад +1

    I cannot search for you on colab...nothing comes up.

    • @DataProfessor
      @DataProfessor  2 года назад

      Notebooks are available on GitHub github.com/dataprofessor so make sure to click on the GitHub tab if opening from Colab.

  • @narendra_nn
    @narendra_nn 3 года назад

    Hello,
    In recent version of Chembl(v29) database there is a target SARS-Cov2 with Chembl Id of CHEMBL4523582; In this dataset there is a feature of Standard type containing inhibition (~98% of data) and IC50(~2% of data) as classes, I am pleased to know the difference between them or can i see those both as a same standard type....

  • @nidhibharani1886
    @nidhibharani1886 3 года назад

    Great video with concise explanation 👍 Is there a similar way to download sequencing data? Please make a video on that topic.

  • @luiseduardogoncalves2228
    @luiseduardogoncalves2228 3 года назад

    Thank you professor for this tutorial! I am very new to bioinformatics and I would like to assure one thing. For example, when I type on ChEMBL a protein like Phospholipase A2 and then select targets, all of the results displayed are Phospholipase A2 targets right? Secondly, if i choose single protein, this means that the displayed protein is targeted by phospholipase A2. Is this correct?

  • @positive51
    @positive51 3 года назад +1

    Thanks for what you do!

  • @saipavanchary0521
    @saipavanchary0521 3 года назад +1

    Sir Subscribed just now!!

  • @koussaisalem4894
    @koussaisalem4894 Год назад

    why did u use IC50 to work on?why dont u use all the features?

  • @eyupbilgi3191
    @eyupbilgi3191 3 года назад +2

    thank you for this amazing introduction. is it possible to use same approach to collect data from article repositories like elsevier, sciencedirect, scopus etc. with certain keywords and endpoints.

    • @DataProfessor
      @DataProfessor  3 года назад +2

      Similar concept applies but instead of the chembl API you can look into using the API of these article repositories.

  • @fahaddewtoniumfd5428
    @fahaddewtoniumfd5428 4 года назад +1

    Hi. Is there any prerequisite for fully understanding this video. I have an undergraduate biology background, with very little experience in programming. Will that be enough?

    • @DataProfessor
      @DataProfessor  4 года назад

      Yes absolutely, I go at the basic level that you can follow step by step in a guided way. I would recommend to also take an introductory python course via Kaggle mini-course to augment your understanding, please see this video also ruclips.net/video/ZtTt822bDNE/видео.html

  • @angsumandas1
    @angsumandas1 3 года назад

    Sir I tried hard to use these techniques to Autoimmunity and allergenic drug target. But I did not succeed. Please show me oath!

  • @alexaiden.forever
    @alexaiden.forever 3 года назад +1

    It's an amazing series..

    • @DataProfessor
      @DataProfessor  3 года назад

      Thanks for the kind words!

    • @alexaiden.forever
      @alexaiden.forever 3 года назад

      @@DataProfessor Can we use any other database instead of chembl???

  • @abolajishiwoku32
    @abolajishiwoku32 4 года назад

    As an aspiring data scientist, this project has been really helpful in building my portfolio with original data. Do you think building a portfolio of projects based on past academic papers is useful in today's biotech industry ?

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Yes, definitely. The existing research community is based on doing cutting edge research that is based on past academic papers. The goal is to find knowledge gaps presented in the current research literature and to make incremental improvements that would lead to publishing the work itself as a research paper. Hope this helps.

  • @krishnaprasad3726
    @krishnaprasad3726 Год назад

    Sir , where is the previous part of this video.. please provide me the link 🙏

  • @Activistxpeace
    @Activistxpeace Год назад

    Dear data professor and channel fans. I am a biologist and new to bioinformatics. I am trying to use your videos and code to run experiments to learn. When I open Google colab and open the data professor notebook in code/python I can't see anything.
    Can you please explain to me if I need to install python in my computer to use Google colab? What other basic programs I need to be able to see the notebook and code?
    Is the code still available?
    If anyone can help me solve this questions to I can replicate this experiment I appreciate it very much.
    Thank you.

  • @idanmorad4769
    @idanmorad4769 4 года назад +1

    Try and use more Of pandas’ built in method such as map or apply instead of looping over the data frame

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks Idan for the valuable suggestion!

  • @samsononi8039
    @samsononi8039 2 года назад

    Is this video recommended for someone with knowledge of only biology?

  • @utkar1
    @utkar1 4 года назад +1

    Hey dataprof!
    Really excited for this amazing project.
    I followed the same steps as your showed while creating the dataset. However, my findings were a little different.
    when looking for missing values, I found that
    - the dataset has 6 completely empty variables
    - 3 features with greater than 50% to 99% missing values
    - 3 features with less 50% missing values
    So for the empty features, I can straight away drop the features itself. How about the features with missing values ranging from 50% to 99%.
    Should I drop these features as well? Since simply applying dropna() is leaving me with an empty dataset.
    let me know your thoughts.
    Thanks a lot :)

    • @DataProfessor
      @DataProfessor  4 года назад

      Hi Utkarsh, great question! The data that is obtained in Part 1 is not yet ready for machine learning model building. It is still a raw data that needs to undergo molecular descriptor calculation that is covered in the subsequent Parts. As for missing values, once you have computed the molecular descriptors, yes you can remove those having significantly sparse data (normally we try >80% as the threshold but you are welcome to tune this threshold. Using dropna() is giving you empty data due to the sparseness of the data which leads us to rather use the low variance threshold (inverse of the % missing values as you had mentioned which is actually not missing values but rather they have 0 as the value). Hope this helps. 😃

  • @shivanshisrivastava4363
    @shivanshisrivastava4363 4 месяца назад

    I get an error 400;invalid request authorisation error

  • @azkajunaidlife
    @azkajunaidlife 3 года назад +1

    Lots of appreciation for this series and ur work. U are an inspiration for me.
    Please guide I'm stucked onto the 4th code where we need to select target after making targets search.
    I opened in colab and it's showing red exclamation.

    • @azkajunaidlife
      @azkajunaidlife 3 года назад +1

      I'm beginner and new to bioinformatics

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, what error message is it giving?

  • @mnvny
    @mnvny 2 года назад +1

    Sir Can you please briefly explain why we are choosing IC50?

    • @DataProfessor
      @DataProfessor  2 года назад +1

      Hi, IC50 is the bioactivity value that we use as the Y variable in model building.

  • @eduardourbano4596
    @eduardourbano4596 4 года назад +1

    very interesting. Great job!

  • @andyderek3021
    @andyderek3021 2 года назад

    Interesting tutorial! Please, I have a question: can someone explain to me why opt for "SINGLE PROTEIN" why not "Organism" ?

  • @sagarikasahoo1044
    @sagarikasahoo1044 3 года назад +1

    Can a undergraduate student do this ? I found this really interesting. Thank you for these wonderful videos:)

    • @DataProfessor
      @DataProfessor  3 года назад +2

      Yes, absolutely, also make sure to watch the following to high-level overview to get acquainted with the topic.
      2-Part Bioinformatics 101 videos as well (links below)
      - Part 1 ruclips.net/video/p5iZxIT16KQ/видео.html
      - Part 2 ruclips.net/video/ua08NV58Gew/видео.html
      Machine Learning for Drug Discovery (Explained in 2 minutes)
      ruclips.net/video/xDMzOUUnNzw/видео.html

    • @sagarikasahoo1044
      @sagarikasahoo1044 3 года назад +1

      @@DataProfessor Thanks you so much for clarifying my query. :) It helped a lot.

  • @arvinths9025
    @arvinths9025 3 года назад

    Sir, is these only for target types in chembl database or also for chembl compound types,tissue types, assay types, which is shown in search bar of chembl database?

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi. We are searching for a "Target" of interest (a specific protein such as Aromatase) and the associated compounds that have been reported to bind to the target protein will be downloaded as our dataset.

  • @sherin47
    @sherin47 3 года назад +1

    Hi, I have completed Masters in Bioinformatics 12 yrs. before, With no prior practical experience. What would you suggest further for enhancing career and obtaining a job.

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, without prior experience may be a disadvantage if going to industry, I would suggest to find a local university and offer to do an internship or work for free to get break the gap period so that you have acquired some experience, perhaps help to complete a project. Afterwards you can start to look at other opportunities.

    • @sherin47
      @sherin47 3 года назад

      @@DataProfessor i needed some guidance in this area.Can you provide me your email id.Appreciate your support.

  • @muhammadjamalahmed8664
    @muhammadjamalahmed8664 4 года назад +1

    I love your work...

  • @chyokomizo
    @chyokomizo 3 года назад +1

    great video! thanks a lot!

  • @souldiezcamp2380
    @souldiezcamp2380 3 года назад +1

    please the link of the paper in which u did this approch

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi, here it is peerj.com/articles/2322/

  • @hermainrais2280
    @hermainrais2280 4 года назад

    Hello everyone can anyone please tell where i can find only computational based working article found in Bioinformatics?

  • @abdulrhmana.elshiekh383
    @abdulrhmana.elshiekh383 3 года назад +1

    Your are awesome man!