Call for Participation in the Open Bioinformatics Research Project
HTML-код
- Опубликовано: 11 окт 2024
- In this video, I will share a novel bioinformatics dataset. I have compiled a collection of bioactivity datasets from the ChEMBL (version 29) database. Particularly, there are 136 CSV files belonging to 136 variants of the Beta-Lactamase target protein. I also provide a high-level overview of the dataset as well as my thoughts on some of the analysis that you can perform and contribute to this Open Bioinformatics Research Project.
You can think of this as sort of like a Hacktoberfest! Let’s work on this and learn together!
Contribute to this Open Bioinformatics Research Project
👉 GitHub github.com/dat...
👉 Kaggle www.kaggle.com...
Prerequisite knowledge
👉 An Introduction to Computational Drug Discovery • An Introduction to Com...
👉 How to use PaDelPy to calculate molecular descriptors/fingerprints from SMILES notation • How to build machine l...
👉 Playlist of Bioinformatics videos • Bioinformatics Project...
Support my work:
👪 Join as Channel Member:
/ @dataprofessor
✉️ Newsletter newsletter.data...
📖 Join Medium to Read my Blogs / membership
☕ Buy me a coffee www.buymeacoff...
Recommended Resources
📚 Books kit.co/datapro...
😎 Taro (Tech Career Mentorship) www.jointaro.c...
📜 Google Data Analytics Professional Certificate imp.i384100.ne...
🤔 Interview Query www.interviewq...
🖥️ Stock photos, graphics and videos used on this channel 1.envato.marke...
Subscribe:
🌟 Coding Professor / @codingprofessor
🌟 Data Professor www.youtube.co...
Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
#bioinformatics #machinelearning #research #dataprofessor
To share your progress or comments on various social media platforms about this Open Bioinformatics Research Project initiative, please use the tag #dataprofessor
👉Twitter twitter.com/thedataprof
👉LinkedIn www.linkedin.com/company/dataprofessor/
🌟 Join as a Channel Member to support us:
ruclips.net/channel/UCV8e2g4IWQqK71bbzGDEI4Qjoin
🌟 Download Kite for FREE www.kite.com/get-kite/?
I just finished my notebook on Kaggle, hope it will be interesting to look at
@@OlehMezhenskyi Awesome, Thanks for your submission! :)
Is there any deadline for submissions ?
Hello Data Professor.
Very nice video!! I'm a biologist, finishing my phd thesis in structural biochemistry. I'm studying an specific enzyme (Isocitrate lyase), an important enzyme that plays a central role in the mobilization of lipidic reserves in seeds during the germination process (metabolic pathway == glyoxylate cycle), as a molecular target for new herbicides mechanisms of actions. We are studying it in silico, in vivo and in vitro. I found your video very interesting, and I believe that interdisciplinary studies applying machine learning will be very promising in many fields of science in the near future. I will keep following the development of your project.
Cordially, Paulo Menezes (Brazil).
Sounds like an interesting project, yes machine learning could definitely be used to provide insights from such data and good luck on your research endeavor. :)
What a great chanel! It really drives me to keep studying programming and statistics!
I am extremely delighted to participate and contribute my best. I am an experienced life science researcher and a beginner in the field of data science.
Glad to have you on board, welcome to the initiative.
I’m very excited about this! Cant wait to contribute. I have my bachelors in biochemistry, and I’m currently doing a masters in bioinformatics.
Awesome, welcome to the initiative.
Professor i m ready to contribute in project
Awesome! I want to play with this dataset. I think the idea of analyzing it together is great 👍
Great, welcome to the challenge!
I’m gonna take a bite out of this once I get out of work. Exciting stuff!
Awesome!
I would like to contribute to this project, I love the way you share your knowledge🙌
Awesome, thank you! Look forward to it! 😊
I would like to participate as well!! This looks like a great idea for a project!!!
I am really interested. Your video series on the CHEMBL dataset was the gateway for me to actually do hands-on bioinformatics work which then lead me to learn further programming. So, definitely would love to participate and contribute to the project.
Wonderful! Great to have you on board as well!
I was one of those interested keep it up bro u are among the best people in the scientific field
Thanks for the kind words!
I'm quite interested ! Bioinformatics is totally new to me though, I'll look into it seems interesting
Thanks Ibraheem, you’re most welcome to the challenge! Aside from feature interpretation which requires domain expertise in biology, everything else is a series of data processing/wrangling and model building.
i'd love to contribute to this. im currently a bachelor's student in computer science and always had the goal of wanting to specialize in bioinformatics!!
i'll go through the prerequisites hopefully!
There is a bachelor and masters in bioinformatics... Why just specialize?
Can’t wait to participate!
Hi, super interesting. I am doing research on beta lactam, lactamases and transpeptidase, would love to contribute. Is there any discord or channel to organise and don't make things twice? :)
Wow. Very much interested.
Welcome on board Gustavo 😊
Sounds fun. I recently submitted my phd thesis in bioinformatics. I will have a look
Awesome, it's an open experiment, hopefully we can learn together as a community and publish a paper in the process.
Let’s do this. I am currently pursuing a biomedical engineering degree and I am very interested in machine learning and AI . I will try my best to contribute. Is it fine if I use kaggle and upload public notebooks there ?
Yes, please upload public notebooks to the betalactamase Kaggle page (links in the repo)
This is awesome, thanks for sharing this project. I would like to contribute as well!
This is quite interesting. I recently completed my master's dissertations of building a QSAR Machine learning classification model. And I would really like to contribute in this project.
Great, looking forward to your contribution, the easiest way is to contribute a notebook to Kaggle (links in video description).
sounds amazing i would love to join!
Question: When considering removing molecules with pchembl based on standard deviation, should that be based on each target protein? As in if molecule had std > 2 for target protein A but not for target protein B, then remove the rows with target protein A b/c the assay is still valid for target protein B.
Yes, you are correct, it is based on each unique target. Afterwards you can build models separately for each target.
I would love to participate?🔥
You’re most welcome my friend!
Would you consider adding the "Hacktoberfest" topic to the repository? Maybe it could get more people interested in contributing!
Great suggestion! I’ve added the Hacktoberfest mention to the video description and keywords. Thanks Daniel!
Thanks a lot for sharing the dataset. I would like to participate in the project. You are truly an inspiration for us.
Thanks Rameez, welcome to the challenge!
This is a fantastic initiative.
Thanks Shravan :)
This looks awesome, I'm in :)
Hey awesome Pratham, glad to have you!
Wow, I know you from twitter. So this feels like a crossover episode.
This looks very interesting. I would like to participate in this project.
Welcome to the project!
I work as a bioequivalence clinical research associate, and I am new in data science and data analytics, I started to build a data frame from my own data, which contains all the relevant pharmacokinetic parameters (Cmax, AUCs, T-Half) and many chemical properties for each molecule that I scrapped from drugbank with the formula for each product. and I would love to participate in this project. although it is pretty advanced for me. but I'll do my best trying to participate.
thanks for your videos, it gives me hope in the ML field. you are the best
Awesome, what you're doing is equally advanced, look forward to your contributions.
i gladly want to try and participate in the reasearch
I wanna contribute too. It sounds like a very fun project.
I really love this channels and the way it covers it's topics. What would it take for one to become involved academically or professionally into the field of Bioinformatics?
Teaching, conducting research and publishing papers are some ways to contribute to the field of bioinformatics. This open project aims just that, to contribute to scholarly research in bioinformatics.
@@DataProfessor Thank you for your answer, Data Professor! I will fallow your advice and content. I hope I am not too annoying for asking one more question.
I already have a degree in Computer Science and somewhat good foundation with Data Science, but my lack of knowledge, in chemistry and biology, seems like a very limiting factor. Will it be beneficial to pursue further education in a similar field if I am to fallow a career path in bioinformatics?
Thank you for your time!
Yess.. im interested in thiss project 🙌
Sounds interesting!
I would like to join
I'm interested to join 🙋🏻♀️
I would like to participate in this project , am very much interested in the soft computing science !
Great, welcome on board!
I would like to contribute to this project!! I’m a 4th year biology undergrad so I can def help with the paper writing aspect. I’ve also taken courses on bioinformatics
Awesome, glad to have you on board. More info on the biology aspect (probably on the model interpretation part) will be announced in the future once the model has been built.
I am very early in my MSc for precision medicine and am only beginning to gain experience in R/python/Linux for data analysis - will this process be well documented so that I could follow along and learn from once my basic knowledge is stronger?
Thanks for the comment, It's a community effort, so we can probably all learn together in the open.
@@DataProfessor perfect - thank you
Wow! I am also interested!!
Welcome to the initiative!
Hello professor, I tried to get fingerprint binary data for 'canonical_smiles', but descriptor generated only 1407 observations. I think actual data has 71973 observations. Where did I make mistake? Thanks in advance
I would like contribute my best in this project and also learn more from the project. I hope with the help of you I can gain knowledge and also a good job 👍
Awesome, welcome to the challenge!
@@DataProfessor thank you sir. Can you explain me what is the work alloted me to do?
I'll try finish.
I will be interested
Maybe a good idea to make a Discord server where people can easily collaborate? Also helps to know what others are working on so we don't reinvent the wheel multiple times within the same community (aka basic EDA + molecular simulation results, etc)?
Great suggestion. Will have to figure out how to make the Discord server.
Seems very interesting.. I would like to contribute.
Very interesting. I'm in!!
Welcome to the challenge!
I am a graduate student in biotechnology. I am really interested to be a part in it.
I am interested to kick start this...
I'm 4th yr btech biotech student . I know basics of bioinformatics and I would like to contribute to this project.
It sounds fantastic!!
Hello professor,
Padeldescriptor function generates fingerprint binary data for the mol_dir 'smi' file. Is that true? Thank you
❤❤
Great idea Prof!
Thanks Albert!
I would like to participate in this research work!!
Awesome, welcome to the initiative!
Super awesome
Thanks Rick!
Awesome initiative!
Thanks William 😆
Hello, im new into ur channel. were all of your Bioinformatics projects made for everyone or just for the bioinformatics people? can we from outer domain knowledge follow this?
Hi, the bioinformatics contents on this channel are created for the general audience. I'll be linking prerequisite contents in the video description.
Hi Prof, i'm a ungraduate student trying to learn through this proyect. I've been working on this, but after filtering, my model fails in learning. Strange behavior of the train and test datasets are present, so the model is not learning. I tried with different architectures but it was not working. Any suggestions? I am pretty sure that the preprocessing is correct
Hi Alejandro, have you (1) used the csv file to build the model or (2) have you computed the descriptors from the SMILES and use the generated features to predict the pIC50 values (convert IC50 to pIC50 via negative log) in a regression model or bin the IC50 values to active/inactive class in a classification model. Please follow option 2. Hope this helps 😊
Great, I'm very interested and would like to be part of this project.
Sounds good! Welcome!
I would love to be able to contribute. I'm right now at my master's in bioinformatics, and I would love to contribute to this project
You're very welcome, look forward to your submission! 😊
I know where to spend my week end now :D, let's do some coding thank you for the idea I will participate
Glad to have you Adnane! 😊
Hello I would like to join as well !! I am a biological science student with some experience in machine learning. Let me know if I can contribute .
Yes, welcome to the initiative. To contribute you can perform analysis on the dataset and upload your notebook to Kaggle (links in the description).
Lovely! I would like to join if that’s not too late!
Yes, you can join, it’s just started
Best regards Data Professor, is this call still open?
Hi, yes the Open project is on-going. I'll create more follow-up videos about this soon.
Number of PubChem molecular descriptors (338 molecules) obtained with PaDEL differs
from the number of molecules in the molecule.smi file (64424 molecules). How can we solve this problem?
Hi, I suspect there may be some error in the SMILES notation which PaDEL may not properly read. I recommend to check the log file from the calculation, error details are normally written there.
I want to participate. Is it too late????
Dear Sir,..I would love to participate..
I would like to contribute to you project
I am also interested, please let me know how to join
Awesome, joining is simple, you can submit a Jupyter notebook with your analysis to the Kaggle dataset (links in the video description).
Hello! I'm a chemist, Am I still on time to join the project?
Yes, the project is currently ongoing, participants are contributing on Kaggle and GitHub pages of the project.
Is the project over or can i still contribute to it?
Hi, it's still ongoing.
Sir,what is the deadline for submission of jupyter notebook?
There’s no hard deadline, but look forward to them at your earliest convenience. It’s a community effort, let’s learn together.
@@DataProfessorPerfect Thank you Sir😀
even im intrsensted
Sir I am interested but I have a Very basic bioinformatics and python knowledge...can I take part in the project?
Yes sure, you can use the dataset as a practice data by applying some of the model building that I’ve also shown in several other videos in the channel in the Bioinformatics playlist.
Sir make a video on job place and companies hiring to bioinformatics students and comments if any known vaccancy for bioinformatics remote basis
Thanks for suggestion, will look into this.
I am interested
Let's see how my bioinformatics skills can be applied here :D
Awesome, looking forward to it!
Does R notebooks accepted in this project?
Yes sure, R is also great
Interested
Want to participate too. I'm a biology student on my 4th year and now looking for my thesis projects. 😇
I am from computer science background and I am into data science can I participate?
Yes, you are welcome to join. To participate you can perform data analysis in a Jupyter notebook and do a PR on the betalactamase repo (links in the video description). Or you can share a Jupyter notebook to the betalactamase page on Kaggle (links in the video description).
Hi I'm Bioinformatician I'm intrested in this
Im interested too
Do we have a slack channel?
I am also interested in computational drug discovery and as such want to help out
Great!
Does someone want to open a discord for everyone who's interested?
Thanks for suggestion, due to popular request I'll be creating and set up a discord server soon.
I would like to participate sir ,🙏
Hello ! Sir I am interested in joining this project
That's great, welcome to the initiative!
I would like to contribute.
Awesome, looking forward to your contribution, e.g. Jupyter notebook on Kaggle or a Pull Request on GitHub to the betalactamase repo. A tweet about your contribution can be made by tagging me @thedataprof on Twitter would also be great.
Sir, I'm interested to participate.
Great, welcome on board!
I am also interested
Welcome to the challenge!
Hi I wan to join this pls
Wanna join
Heya, Let me know if you would like to get it published OA. We have some leftover funds and would love to contribute financially. Also I can carry out MD simulations and calculations using Schrodinger's Suite.
Awesome, thanks!
I am intrested
Great, welcome to the initiative!
May I start research with you with zero knowledge about bioinformatics ? I have a degree of ECE. Kindly give me a the mail address of you.
I am interested
I would like to contribute.
Yes definitely, welcome to the challenge!