Language Models for Protein Data - Evolutionary Scale Modeling: Amelie Schreiber | Munich NLP 024
HTML-код
- Опубликовано: 8 ноя 2024
- Protein Language Models (pLMs) such as ESM-2 represent a significant advance in computational biology. ESM-2, an encoder-only model utilizing Rotary Position Embeddings (RoPE), is tailored for protein sequences comprising standard and non-standard amino acids, deletion markers, and complex formation indicators. The innovative ESMFold extends ESM-2’s capabilities, enabling rapid protein folding predictions without reliance on Multiple Sequence Alignment, demonstrating a noteworthy acceleration compared to the established AlphaFold2. This presentation canvasses a spectrum of ESM-2 applications, ranging from assessing mutation effects and evolutionary trajectories to predicting protein-protein interactions and facilitating in silico directed evolution via EvoProtGrad. We will also discuss fine-tuning of ESM-2 for various sequence and token classification tasks such as predicting gene ontology terms, predicting binding sites of protein sequences, and predicting post translational modification sites. We will also briefly discuss geometric compression as measured by the intrinsic dimension of embeddings obtained from ESM-2, how this tracks information theoretic compression, how this is related to Low Rank Adaptations (LoRA), potentially detecting AI generated proteins, and applications to curriculum learning strategies.
About the speaker:
Amelie Schreiber holds a Master’s in Mathematics and has a keen interest in the practical applications of protein language models (pLMs). Her work involves refining these models using techniques like Low Rank Adaptation (LoRA), and their quantized versions (QLoRA), to better predict protein functions, binding sites, and post translational modifications. She employs a mathematical perspective to understand the structure within pLMs, using tools such as persistent homology to investigate the intrinsic dimension of model embeddings in order to (1) inform curriculum learning strategies for both large language models and protein language models, (2) to choose optimal ranks for LoRA and QLoRA, and (3) to better understand the relationship between information theoretic compression, geometric compression, and generalization capabilities of models. Amelie is also exploring ways to adapt large language models to annotate proteins through instruction fine-tuning with QLoRA, which could help bridge gaps in computational biology. As an independent researcher, her efforts are focused on the intersection of mathematics, biology, and natural language processing to contribute to our understanding of proteins.
Twitter: / amelie_iska
LinkedIn: / amelie-schreiber-the-s...
Hugging Face (incl. blog posts): huggingface.co...
GitHub: github.com/Ame...
About Munich NLP:
Munich🥨NLP is a community founded in May 2022 by LMU and TUM students focusing on NLP topics. Within the first year, the community has already grown to over 1000 members consisting not only of current students, but also including PhD students, professors, and industry practitioners. We host weekly workshops and/or paper-reading events, both to learn from guests and to gather inspiration for our own (research) projects, as well as to establish and keep going an active student NLP community in the Munich area. The goal is to promote NLP-related exchange between students, researchers, and practitioners inside and outside the university and to showcase paths and possibilities during and after university.
Homepage: munich-nlp.git...
LinkedIn: / munich-nlp
Twitter / X: / munichnlp
#deeplearning #nlp #ai #opensource #protein #esm2 #biomedical #proteinmodels #opensourcecommunity #chatgpt #gpt4 #data #datasets #africanlp #explainableai #xai #aisafety #iccv2023 #benchmark #realitycheck #instruction #instructiontuning #tuning #peft #llama #llama2 #lora #machinelearning #scale #size #artificialintelligence #computerscience #computervision #transformers #research #papers #representations #linguistics #learning #teaching #opensource #bert #lmu #munich #chatgpt #gpt3 #languagemodel #naturallanguageprocessing