I can't wait for a video where scraped HTML is converted to Langchain's Parent Document Retriever (large AND small chunks) with Unstructured, with enriched metadata got from HTML structure / tags.
Love this idea, I have something in mind in for parsing HTML content on the fly I would like to try! I will share a video when I set up it up. It might not be exactly what you are looking for but hopefully along those lines 🙂
There is an issue at text extracting from a PDF document with function : "partition_pdf" function with "by_title" as a strategy. Expectation is to extract text chunks based on titles. But it has extracted the text with lots of noises.
Absolutely, I am going to create at least one or two more technical videos using unstructured in some upcoming video! The team is solving an important piece of the puzzle 🧩 in data pipelines for LLM applications! 🙂
Yes, I used the DALL-E 3 model with the following prompt; “A landscape photo depicting an unstructured, huge pile of books on the left side. In the middle, there's a complex Rube Goldberg machine, intricately designed with gears, pulleys, and various mechanical parts. The machine functions to organize books, and on the right side, it outputs into an organized, pristine Victorian library, filled with neatly arranged shelves of books, an ornate fireplace, and elegant furniture. The entire scene combines chaos and order in a whimsical, fantastical manner.” Cheers! 🥂
@@DevelopersDigest You should do a video on how you created the image... you might garner a different audience as I'm sure artistic bent people would be fascinated. I think it's a really luring piece of art, I just want to see the details. Congrats.
Thank you for this suggestion! I haven’t considered diving into AI image generations tools much yet but with your feedback I will certainly add that to my list! Thank you! 🙏
@@DevelopersDigest I've been submitting the exact prompt you provided on various "free" services and the results are nothing compared to what you had generated. I'm posting my conclusion more for the benefit of people interested in art work to document that the DALL-E 3 creation beats several others. I do not mean to hijack your parser video, but, as I said, the image is what drew me into your video. Thank you!
I can't wait for a video where scraped HTML is converted to Langchain's Parent Document Retriever (large AND small chunks) with Unstructured, with enriched metadata got from HTML structure / tags.
Love this idea, I have something in mind in for parsing HTML content on the fly I would like to try! I will share a video when I set up it up. It might not be exactly what you are looking for but hopefully along those lines 🙂
Yes please share if you have anything that can do this!@@DevelopersDigest
How does it extract tabular data from pdfs? Does it know the relationships between rows and columns?
There is an issue at text extracting from a PDF document with function : "partition_pdf" function with "by_title" as a strategy. Expectation is to extract text chunks based on titles. But it has extracted the text with lots of noises.
how do yyou get the access to the api-key? is it paid?
Sounds incredibly useful: multiformat, intelligent parsing....
Absolutely, I am going to create at least one or two more technical videos using unstructured in some upcoming video! The team is solving an important piece of the puzzle 🧩 in data pipelines for LLM applications! 🙂
Thanks for this. is there one for excel or better yet for taking tables out of documents and loading it into database tables for that kind of data?
The steampunk artwork that drew me into this video is great. Did you have an AI process design it?
Yes, I used the DALL-E 3 model with the following prompt; “A landscape photo depicting an unstructured, huge pile of books on the left side. In the middle, there's a complex Rube Goldberg machine, intricately designed with gears, pulleys, and various mechanical parts. The machine functions to organize books, and on the right side, it outputs into an organized, pristine Victorian library, filled with neatly arranged shelves of books, an ornate fireplace, and elegant furniture. The entire scene combines chaos and order in a whimsical, fantastical manner.” Cheers! 🥂
@@DevelopersDigest You should do a video on how you created the image... you might garner a different audience as I'm sure artistic bent people would be fascinated. I think it's a really luring piece of art, I just want to see the details. Congrats.
Thank you for this suggestion! I haven’t considered diving into AI image generations tools much yet but with your feedback I will certainly add that to my list! Thank you! 🙏
@@DevelopersDigest I've been submitting the exact prompt you provided on various "free" services and the results are nothing compared to what you had generated. I'm posting my conclusion more for the benefit of people interested in art work to document that the DALL-E 3 creation beats several others. I do not mean to hijack your parser video, but, as I said, the image is what drew me into your video. Thank you!
I just signed up for the $20/ChapGPT account so I could access Dall-E 3 and make the inriguing type images you do. Thank you, thank you.
How do you access the GUI? I got an API key, but it's unclear where to go next..
This is the repo!
github.com/Unstructured-IO/unstructured-api-gui
thanks! @@DevelopersDigest
The GPU can't be used. It always show invalid key although my key is valid.
did u figure it out