- Видео 15
- Просмотров 10 466
Jordan Nanos
Канада
Добавлен 26 июн 2017
Machine Learning Architecture
TWRTW Ep #4 - Coding Assistants, SMCI shorts, NVIDIA DOJ Subpoena, OpenAI Lawsuits, Intel Spinoff
With backgrounds in the design and implementation of compute infrastructure from edge to cloud, sensor to tensor, Jordan Nanos and Hunter Almgren give their take on what’s new in enterprise technology - specifically, what they read this week.
Show notes:
x.com/peterberezinbca/status/1829343765653225639
x.com/mattshumer_/status/1831767014341538166
www.bloomberg.com/news/articles/2024-08-30/intel-is-said-to-explore-options-to-cope-with-historic-slump
futurism.com/the-byte/openai-copyrighted-material-parliament
x.com/RnaudBertrand/status/1831536755729952909
www.bloomberg.com/news/articles/2024-08-30/intel-is-said-to-explore-options-to-cope-with-historic-slump
Show notes:
x.com/peterberezinbca/status/1829343765653225639
x.com/mattshumer_/status/1831767014341538166
www.bloomberg.com/news/articles/2024-08-30/intel-is-said-to-explore-options-to-cope-with-historic-slump
futurism.com/the-byte/openai-copyrighted-material-parliament
x.com/RnaudBertrand/status/1831536755729952909
www.bloomberg.com/news/articles/2024-08-30/intel-is-said-to-explore-options-to-cope-with-historic-slump
Просмотров: 147
Видео
Demo and Code Review for Text-To-SQL with Open-WebUI
Просмотров 613День назад
github.com/JordanNanos/example-pipelines
TWRTW Ep #3 - GenAI's Impact on Work/Privacy, Immersion Cooling, OpenStack is Back, Perplexity Ads
Просмотров 6421 день назад
With backgrounds in the design and implementation of compute infrastructure from edge to cloud, sensor to tensor, Jordan Nanos and Hunter Almgren give their take on what’s new in enterprise technology - specifically, what they read this week. Show Notes: Twitter taught Tay to be racist: www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist Digital twins for people doing interviews: w...
TWRTW Ep #2 - Nuclear Power, GPU Buildouts, Semi-Stateful Workloads, LLM security, GPT-5 Speculation
Просмотров 69Месяц назад
With backgrounds in the design and implementation of compute infrastructure from edge to cloud, sensor to tensor, Jordan Nanos and Hunter Almgren give their take on what’s new in enterprise technology - specifically, what they read this week. Show Notes: Intel compared to Boeing: www.theregister.com/2024/08/09/opinion_column_intel/?td=rt-3a Nuclear Power Roadblocks: www.cnbc.com/2024/08/10/why-...
Using Llama-3.1-405B as a Coding Assistant with Continue.Dev, Ollama, and NVIDIA GH200 Superchip
Просмотров 819Месяц назад
Running Llama-3.1-405B in just 2u of rackspace, under 3kW, with an industry standard server, the HPE ProLiant DL384 Gen12: www.hpe.com/ca/en/compute/pro... Featuring 2x NVIDIA Grace Hopper GH200 superchips inside Continue.Dev VSCode Plugin for coding assistance: continue.dev/ Ollama for model serving: ollama.com/ Open-WebUI for the chat frontend: openwebui.com/
Running Llama-3.1-405B with Ollama and Open-WebUI: Introduction to the DL384 Gen12 Server
Просмотров 220Месяц назад
Running Llama-3.1-405B in just 2u of rackspace, under 3kW, with an industry standard server, the HPE ProLiant DL384 Gen12: www.hpe.com/ca/en/compute/proliant-dl384-gen12.html Featuring 2x NVIDIA Grace Hopper GH200 superchips inside Ollama for model serving, Open-WebUI for the chat frontend
TWRTW Ep. #1 - Intel issues, NVIDIA delays, JPY carry trades, Google antitrust
Просмотров 142Месяц назад
With backgrounds in the design and implementation of compute infrastructure from edge to cloud, sensor to tensor, Jordan Nanos and Hunter Almgren give their take on what’s new in enterprise technology - specifically, what they read this week. RSS: rss.com/podcasts/twrtw Spotify: open.spotify.com/show/1FzBQv10cadDsQGOCudEXD Apple: podcasts.apple.com/ca/podcast/things-we-read-this-week-jordan-nan...
Building Customized Text-To-SQL Pipelines in Open WebUI
Просмотров 2,1 тыс.Месяц назад
Accessing an LLM served by vLLM or ollama through open-webui Using a text-to-SQL prompt within an open-webui pipeline Connecting to a postgres database with an open-webui pipeline Modifying SQL queries using the information in-context All disconnected, running on a single server with 1x GPU www.youtube.com/@jordannanos x.com/JordanNanos www.linkedin.com/in/jordannanos/
Simple Overview of Text to SQL Using Open-WebUI Pipelines
Просмотров 3,1 тыс.Месяц назад
runs on a single GPU server
Overview of an Example LLM Inference Setup
Просмотров 2,8 тыс.Месяц назад
HPE Cray XD670 with 8x H100 80GB SXM GPUs ollama, vLLM, open-webui docker containers llama3.1, mistral, gemma models
Great Video! Can you tell me please how to create/generate API Key for llama_index?
@@Mohsin.Siddique llama-index is a python package that is installed via pip, you don’t need an API key. No API keys required for this pipeline
Iam running both the openwebui and pipelines on different docker containers, but there seems to be an error whenever i try to connect both, your example which repeats the text back to the user seems to work fine, but whenever i use libraries like langchain or lammaindex it doesnt seem to work, throws an http connection error.. could you provide any help on this ?
@@harsh90dem0 are the dependant packages installed in your pipelines container? you’ll need to docker exec or kubectl exec into the container called “pipelines” Then run: pip install llama-cloud==0.0.13 llama-index==0.10.65 llama-index-agent-openai==0.2.9 \ llama-index-cli==0.1.13 llama-index-core==0.10.66 llama-index-embeddings-openai==0.1.11 \ llama-index-indices-managed-llama-cloud==0.2.7 llama-index-legacy==0.9.48.post2 \ llama-index-llms-ollama==0.2.2 llama-index-llms-openai==0.1.29 \ llama-index-llms-openai-like==0.1.3 llama-index-multi-modal-llms-openai==0.1.9 \ llama-index-program-openai==0.1.7 llama-index-question-gen-openai==0.1.3 \ llama-index-readers-file==0.1.33 llama-index-readers-llama-parse==0.1.6 \ llama-parse==0.4.9 nltk==3.8.1
minds to share your code please?
@@RickySupriyadi hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html
@@jordannanos wow cool, thank you.
What an awesome introduction to pipelines - thank you so much!
Hi, how do I execute the python lib installation on the pipeline server?
@@gilkovary2753 you’ll need to docker exec or kubectl exec into the container called “pipelines” Then run: pip install llama-cloud==0.0.13 llama-index==0.10.65 llama-index-agent-openai==0.2.9 \ llama-index-cli==0.1.13 llama-index-core==0.10.66 llama-index-embeddings-openai==0.1.11 \ llama-index-indices-managed-llama-cloud==0.2.7 llama-index-legacy==0.9.48.post2 \ llama-index-llms-ollama==0.2.2 llama-index-llms-openai==0.1.29 \ llama-index-llms-openai-like==0.1.3 llama-index-multi-modal-llms-openai==0.1.9 \ llama-index-program-openai==0.1.7 llama-index-question-gen-openai==0.1.3 \ llama-index-readers-file==0.1.33 llama-index-readers-llama-parse==0.1.6 \ llama-parse==0.4.9 nltk==3.8.1
Thanks for sharing this awesome project! I tried running the 01_text_to_sql_pipeline_vLLM_llama.py file from your GitHub repo, but I'm having trouble uploading it on Open WebUI even though I've installed all the requirements. Do you have any idea what might be causing this issue? Thanks again!
Did you configure well pipeline ?
@@dj_hexa_official what do u mean with that ?
@@netixc9733 what error are you seeing? docker logs -f or kubectl logs -f your pipelines container and it may report an error
Lovely demo of the synergy between language models and databases.
First of all great job Jordan. It would be really helpful if you could share the code on git.
hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html
great video. hoping to see more as soon. congrats.
hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html
Jordan , Super good job. I'm trying to integrate openwebui into my CRM system. That I would like to query the database for any of our product price or everything through the chat for my employees. This rag pipeline can make it in this way for example ? Thanks you for your answer
hi, I think if you've got a db you should be able to query it. especially if you already know how using python. I posted another video. code is here: github.com/JordanNanos/example-pipelines video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html
@@jordannanos Thanks a lot Jordan . Super cool
Hi. Could you link us to the source code of the pipeline?
code is here: github.com/JordanNanos/example-pipelines video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html
Jordan thanks, I have a single gpu runpod setup would you recommend just adding a docker postgresql to existing pod? and is the python code using langchain stored in the pod pipeline settings? this sort of reminds me of AWS serverless Lambda but simpler
@@RedCloudServices if you’d like to save money I would run Postgres in docker on the same VM you’ve already got. That will also simplify networking. Over time you might want to start/stop those services independently in the event of an upgrade to docker or your VM. Or you might want to scale independently. In that case you might want a separate VM for your DB and a separate one for your UI. Or you might consider running kubernetes. Yes the python code is all contained within the pipelines container and uses llama-index not langchain (though you could use langchain too). Just a choice I made.
@@RedCloudServices in other words, you’ll need to pip install the packages that the pipeline depends on, inside the pipelines container. Watch the other video I linked for more detail on how to do this.
@@jordannanos yep! just watched it. I just learned openwebui does not allow Vision only models or multi modal LLMs like Gemini. Was hoping to setup a pipeline using a vision model 🤷♂️ also it’s not clear how to edit or setup whatever vector db it’s using
Hi Jordan, thanks. I am missing the steps where you created the custom "Database Rag Pipeline with Display". From the Pipelines page you completed the database details and set the Text-to-sql Model to Llama3, but where do you configure the connection between the pipeline valves and the "Database Rag Pipeline with Display" to be an option to be selected?
@@martinsmuts2557 it’s a single .py file that is uploaded to the pipelines container. I’ll cover that in more detail in a future video
@@jordannanos Do create this video soon!
@@KunaalNaik @martinsmuts2557 just posted a video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html repo is here: github.com/JordanNanos/example-pipelines
hi
30k GC? 8 of them?
Thx for sharing and it's really interesting to learn more about the pipeline projects related to open webui.
Sweet rig. Is that your daily driver? 😀😀
The cost of such a setup is circa $500,000........ amma get me 2 :)
Thank you Jordan! Great work, interesting to see how these new servers can really deliver performance. ARM / x86.. just works. Yours, Greg
Thank you Jordan.
Thank you for this. Can you share more info on the RAG pipeline along with code examples.
working on getting it to run on both vLLM + ollama endpoints with llama3.1 + mistral. prompt uses llamaindex for text-to-sql.
similar to this guide: docs.llamaindex.ai/en/stable/examples/index_structs/struct_indices/SQLIndexDemo/
Great job can't wait to see more
@@jvannoyx4 hi, code is here: github.com/JordanNanos/example-pipelines video reviewing the code: ruclips.net/video/iLVyEgxGbg4/видео.html
Amazing video! I have a 4xA4000 GPU 128GB and I can only get the 405B 2_K model, and it’s really slow. Amazing how the GH100 chips offer great token/sec performance!
- what are you using it for? - .... stuff
AI, apparently. (LLM = Large Language Model)
I feel jalous of that 8xH100 server. Currently using a 4x3090 at home. I actually use a pretty similar setup with vLLM for the full precision models and exllama or llama.cpp for quantized models + OpenwebUI as a frontend.
Why would you need more than that? Be glad for what you already have or you won't find happiness :)
Bitch i have a p40 and im over the moon. Being poor in ml is hard.
Word
Why not podman tho
Good discussion! Keep it up.
Great work, Jordan! Gonna start scraping the parts together...
I been automating deployments with skypilot. It uses the cheapest spot instances and heals itself
nice video. saw the link from twitter. my question is, is there a way to speed up the results after you ask it a question?
Yes, working to improve the LLM response and SQL query time