LLM for data analytics: text-to-sql 3 architecture patterns

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024

Комментарии • 9

  • @xerwanderer
    @xerwanderer Месяц назад +2

    i've down all 3 of architectures you've mentioned, but still not getting the ideal results. The main issues i've encountered:
    1. lack of text2sql pairs, i've collected all of the sql queries succeed in our database, but it's incredibly hard to inference back to the original query in human language.
    2. it's almost impossible to help llm understand the relation between business info(usually used in human language) to actual data structure.
    3. the information dense is quite low when export database scheme, table structure, we used lots of nested json stored in single column, also enums with no detailed discription.
    but it was done months ago, today i might have some new ideas on issue 1&3, but 2 remains to be seemingly impossible.

    • @DenysonData
      @DenysonData  Месяц назад +1

      RE: "i've collected all of the sql queries succeed in our database, but it's incredibly hard to inference back to the original query in human language" Good approach! However, i guess, with txt2sql more than ever you need to start with the end user questions-and from my experience there is usually a VERY limited set.
      RE: "it's almost impossible to help llm understand the relation between business info(usually used in human language) to actual data structure. " 100% that's also my main argument against the hype around "genbi" and AI will replace data analysts.
      RE: "we used lots of nested json stored in single column, also enums with no detailed discription." as with efficient data analytics pre-processing according the END business needs is your best friend here. Point the smartest person at the complex schema with dozens of caveats and they would trow their hands up rather sooner than later

  • @Jocob-Beller
    @Jocob-Beller 6 дней назад

    Really good illustration Denys! Just one question, will this architecture still function well when you have too many tables with bad naming? I only see some products like AskYourDatabase work well with this situation. How should the solution fit in this architecture?

    • @DenysonData
      @DenysonData  5 дней назад

      I guess the easiest/cleanest/cheapest is getting the names right. Or creating a layer of views on top. In my last video I provide an extra explanation for each table, which could also help. But if you are looking for hands-off solution that should work "out-of-the-box" on top of lots of tables, i guess having tables named nicely goes a long way. Let me know if I misunderstood the question.

  • @WesFang
    @WesFang 20 дней назад

    Thanks Denys for putting this together - can you elaborate on what goes into the "prompt template"?

    • @DenysonData
      @DenysonData  19 дней назад

      Sure. Here is link to the file with a prompt template I am covering in my last video: github.com/denysthegitmenace/aws-bedrock/blob/main/query_structured_data_lambda/prompt_templates.py
      SQL_TEMPLATE_STR is a good example

  • @elenaromanova2841
    @elenaromanova2841 25 дней назад

    Hello Denis. Thanks for the video. I am wondering if it’s possible to add implementation details in tech stack and tools for RAG type of architecture. What framework was used to load DB schema - if Langchain, what loader and how it was vectorized, which Vector DB is good for this type of cases and Foundational modes kids from your experience for both: vectors as well as generation. Maybe some examples of code for loader, retrieval and connectors if possible. I have the case in mind to implement and puzzling on how to load structured data into vector DB as well as retrieve it for generations. Thank you in advance. ❤

    • @DenysonData
      @DenysonData  25 дней назад

      Yep. Planning to publish this exact walk-through this weekend.
      No Langchain, though. It was done with LLamaIndex. Also, I am not using any extereanl Vector storage for this tutorial here-it's all in-memory. But I know that my collegues (and we are working primarily on AWS) started using Aurora PostgreSQL with pgvector instead of OpenSearch serverless for cost-efficiency reasons. Hope that helps and stay tuned :)

    • @DenysonData
      @DenysonData  22 дня назад

      Just uploaded the video. Curious to learn what you think