- Видео 38
- Просмотров 83 055
Quix
Великобритания
Добавлен 22 июл 2020
Quix Streams is an open source Python library for processing data in Apache Kafka. Designed around the principle of DataFrames (tabular representation of streaming data), it helps teams build real-time data pipelines for ML and analytics, by simplifying the transition from batch-based processing using libraries like Pandas to stream-based processing
Best in class Python developer experience with pure Python, no JVM, no wrappers around other languages, and no cross-language debugging. It allows developers to utilize the full Python ecosystem. It’s stateful, fault-tolerant and promotes best practices to quickly deploy and scale out for production.
To learn more you can
1. Follow/star the repo: github.com/quixio/quix-streams
2. Join the Slack community: quix.io/slack-invite
3. Visit: quix.io/
Best in class Python developer experience with pure Python, no JVM, no wrappers around other languages, and no cross-language debugging. It allows developers to utilize the full Python ecosystem. It’s stateful, fault-tolerant and promotes best practices to quickly deploy and scale out for production.
To learn more you can
1. Follow/star the repo: github.com/quixio/quix-streams
2. Join the Slack community: quix.io/slack-invite
3. Visit: quix.io/
Quix Community Clubhouse | Ricardo | Juniz Energy
We really enjoyed our Community Clubhouse this month!
This clip was from our special guest, Ricardo from Juniz Energy, showcased the data pipeline he build to ingest and process energy plant data. It was great to see Quix making a difference in the wild.
If you attended you'd have also seen:
- Mike our CEO talking about the events he's been attending and insights from customers and what's happening in the data industry.
- Daniil, who's heading up the SDK team developing Quix streams, talking about the latest releases and what's coming next for the library.
- Patrick our Head of Software talking about the latest release of the Cloud Platform and some sneak peeks at new features coming in the ...
This clip was from our special guest, Ricardo from Juniz Energy, showcased the data pipeline he build to ingest and process energy plant data. It was great to see Quix making a difference in the wild.
If you attended you'd have also seen:
- Mike our CEO talking about the events he's been attending and insights from customers and what's happening in the data industry.
- Daniil, who's heading up the SDK team developing Quix streams, talking about the latest releases and what's coming next for the library.
- Patrick our Head of Software talking about the latest release of the Cloud Platform and some sneak peeks at new features coming in the ...
Просмотров: 52
Видео
Quix Streams 3.0.0: Branching & Multiple Topics
Просмотров 220Месяц назад
Tim takes you through all the details about the Quix Streams 3.0.0 release. 🌱New features🌱 - StreamingDataFrame Branching - Consuming multiple topics per Application ("multiple StreamingDataFrames") - Automatic StreamingDataFrame tracking (no arguments needed for Application.run()) 💎Enhancements💎 - Extensive Documentation improvements and additions 🐞Bug Fixes🐞 - Fix issue with handling of Quix ...
Quix Streams 2.11: Source Connectors
Просмотров 80Месяц назад
Tim takes you through the latest advancements in Quix Streams with the release of version 2.11, a game-changer for data integration enthusiasts and professionals alike. This update introduces powerful Source Connectors, designed to seamlessly ingest diverse data sources into the Kafka ecosystem, enhancing your data processing capabilities. Whether you're dealing with CSV files or complex data s...
Quix Streams 2.10: Adding Schema Registry Support
Просмотров 148Месяц назад
In this video, Tim, one of our resident contributors and maintainers of Quix Streams, introduces the new features in Quix Streams version 2.10, focusing on the addition of schema registry support. A schema registry is a centralized repository that stores and manages schemas for data serialization formats like Avro, JSON Schema, and Protocol Buffers. It plays a crucial role in data streaming and...
How microservices architecture works
Просмотров 2322 месяца назад
How microservices architecture works
Streaming DataFrames: Build apps on real time data streams
Просмотров 7342 месяца назад
Streaming DataFrames: Build apps on real time data streams
What Is Explode? Top 3 Kafka Python Use Cases
Просмотров 3113 месяца назад
What Is Explode? Top 3 Kafka Python Use Cases
Real-time DataFrames | New features in Quix Streams 2.9.0
Просмотров 1723 месяца назад
Real-time DataFrames | New features in Quix Streams 2.9.0
Real-Time Python Aggregations with WarpStream
Просмотров 2023 месяца назад
Real-Time Python Aggregations with WarpStream
Real-time DataFrames | New features in Quix Streams 2.8.0
Просмотров 2303 месяца назад
Real-time DataFrames | New features in Quix Streams 2.8.0
High Performance Kafka Producers in Python
Просмотров 3,8 тыс.4 месяца назад
High Performance Kafka Producers in Python
From Kafka to a Spreadsheet - A Step by Step Python Tutorial
Просмотров 3,9 тыс.5 месяцев назад
From Kafka to a Spreadsheet - A Step by Step Python Tutorial
Kafka Stream Processing with Python - A Walkthrough
Просмотров 9 тыс.6 месяцев назад
Kafka Stream Processing with Python - A Walkthrough
Kafka Consumers in Python - A Walkthrough
Просмотров 7 тыс.6 месяцев назад
Kafka Consumers in Python - A Walkthrough
A Simple Kafka and Python Walkthrough
Просмотров 24 тыс.6 месяцев назад
A Simple Kafka and Python Walkthrough
Really cool but specifying requirements like this hurts a bit - have you tried with uv?
Hi, I haven't but thanks for the tip.
Very nice, thank you!
What even is this about?
Sinking data from Apache Kafka to an MQTT broker. Do you do anything with either of these?
Thanks very much for the effort.
Thanks for sharing this informative video. I have tried to follow ur code , I am facing below error while running python script :1729674832.710|FAIL|rdkafka#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 5ms in state CONNECT) . Please advice me how to fix this issue. Thanks
Hello Tim, thanks for the video. The example shows quite straightfowrad .csv file connection. What about the the key clearification when creating the topic ? Another thing: have you considered about connectiong to a sql database server ? Immagine in general we run a query (either simple or complicated) to build-up the content of the topic. Many thanks
Hi! We have a Postgres CDC connector that you can use to source data based on changes to the DB and then insert those changes into the Kafka topic. github.com/quixio/quix-samples/tree/main/python/sources/postgres_cdc You can also roll your own connectors. If you can connect to your DB with Python, you can use Quix Streams to pump the data into the topic(s) or if you prefer another language you can use any language you want, but you'll have to know how to build the docker container for it and use a supported Kafka client. If you want to go into more details about your needs drop into our Slack -> quix.io/slack-invite
Very good tutorial, many thanks. Subscriber +1
Thanks for the sub!
Hi Tim. Great stuff. I'm a java practitionner and long time lover of Kafka Streams. I'm super excited to see that this framework is slowly closing the gap for Python devs (such a huge community). Could we expect something like .process operator in the future release ? (Multi topic, multi store flexible stateful implementation !)
Hey Adam. Looks like the team responded in Slack. If you have any more questions or ideas please feel free to pick their brains there :-)
Thank you guys. Specifically for braking changes highlighting
Bro broke the world record of typing 💀💀💀💀
Data engineering and software engineering worlds will collide with Protobuf support. Thank you for the tour, Tim ✌🏽
Hello, thanks for the video. What if we would like to consume the data in the topic from certain existing offset number ? I was looking at .seek() class but not get a effective solution yet. Many thanks
Hi thanks for the question. To do it currently, you'd have to manually seek and commit that offset for each partition. But QuixStreams doesn't support that at the moment.
@@QuixStreams many thanks !
Amazing style of teaching
Interested in an episode on "From Kafka to Kibana"
Hi Tomas, Thanks for sharing this informative content. It's really helped me to understand the kafka pipeline & concepts. I have managed to successfully run this pipeline , unfortunately I got stuck in final steps while writing data into influxDB data explorer ,endup with output message from influxDB microservice as "waiting for incoming messages." I have configured influxDB hostname & secrets as explained in the repo, Could you please advice me how can I fix this issue to get my data in influxDB ?
Hi @vijayanandsundaresan3776. Happy to help. Could you please come over to the community channel here join.slack.com/t/stream-processing/shared_invite/zt-2lxwg3a0l-6NIZuvkFVKrm6UTH_97iSA. We'll be on hand to help you solve the problem in the #quix-help channel.
Hello, thanks for the video. Really helpful to get start to understand Kafka. I was working in server and successfully created the topic learnt from previous video, there I managed to write non-duplicate messages. Following this video, when subscribing the topic, I got no messages (None), but I do see a message in that topic. i.e. my topic is 'weather_data_demo-Yifei', when subscribing, there is a line printed: [DEBUG] [quixstreams] : Assigning topic partition "weather_data_demo-Yifei[0]", but message is always None, any hints on this ? Many thanks
I see, because the producer hasn't sent messages, with 'auto_offset_rest=latest' will only read new messages sent after it starts. 'earliest' will work, but after consuming the messages once, running script again, no messages obtained..
@@yifeitong same issue here. found any solution?
@@sridharstreakssri try auto_offset_reset earliest ?
Thanks for the video. I was following the step and got this error: KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}. Can anyone help with this ? Many thanks
There's a problem with the connection to the Kafka broker. This can be due to incorrect broker address, network issues, or broker unavailability. Check out the docs here: quix.io/docs/manage/troubleshooting.html#data-is-not-being-received-into-a-topic Or ask the community for help here join.slack.com/t/stream-processing/shared_invite/zt-2lxwg3a0l-6NIZuvkFVKrm6UTH_97iSA
the best videos are usually the most underrated. This video is exactly what I needed. Learning micro servicing for real-time ML applications, and this is so helpful. Thanks a ton for this.
Thank fo watching. Please share our content on X and Twitter to spread the word!
I was dreading learning Kafka learning but Quixstreams has really made it easy, also, this video is so useful and easy to understand the basic requirements of building a Kafka-based stream. Thanks a ton
insta subscribed
how come this video has only 1K likes. gotta appreciate the man's effort
Thanks @Apex543. Please share on socials to help boost the numbers!
What ide/text_editor are you using?
It's Neovim with some custom configuration. We recommend getting started with the Neovim Kickstart project and tailor to your preferences
I am thinking of our use case where we generate a search result of the size 25M records when querying an app over REST api. It returns {"events": [{},{},..{}]} but the events key can be different each time and we can't know that in advance. Also we cannot do something like response.json() otherwise it loads the entire records in memory before processing which leads to excessive RAM utilization and delay in processing. Currently we parse using ijson and stream each individual {} to kafka using the for loop in my comment above. I am on the hunt for a better solution as we need to run multiple such streams in parallel to finish data ingestion in time.
Thank you for sharing. That is a great use case for explode! If you partition and produce using different keys (provide a key in sdf.to_topic()), you could run multiple consumers downstream to process in parallel. This depends on if you care about ordering (perhaps you do for search results unless the result’s rank is encoded). You may also care about exactly-once processing guarantees, because if there’s failure, this avoids events being reprocessed. In general when processing large result sets I recommend leveraging state
@@DataSurfer yes you are absolutely correct about processing once! Do you guys have a community slack we could have a chat on?
@@vikramtatke5930 Yes, you can come speak to the whole team on Slack. The link is here (and also in the video description): quix.io/slack-invite
@@DataSurfer that's great thanks a lot!! :) if it's alright for me to suggest, you could make a video on this too :)
@@vikramtatke5930 You're more than welcome to make suggestions :) However, it would be more fun if you could share some more details and we build out the solution together. Let's chat in Slack!
Hi thanks for this awesome video. Just curious - How is the explode while writing to Kafka topic different than saying for item in orders["items"]: producer.produce(item) ?
Thank you! The main difference is that the Producer object is a low-level object with no guarantees. Using StreamingDataFrame provides automatic management of resilient and durable application state with checkpoints and changelogs. This means you can provide exactly-once processing guarantees for the items in the array if required. For further reading, check the docs for "Stateful Applications"
@@DataSurfer For sure! Thanks a lot!
Awesome videos!! how could I store my data to a local csv tho!??
We’ve just released a new Sinks API. The release has a CSV Sink connector included. Pls check out the docs
@@michaelrosam9271 That sounds great! Thanks!
If you haven't already seen it check out the video about the 2.9 release. ruclips.net/video/VoDQtO8mirc/видео.html
Waiting for some more series on kafka, great explanation
Have you got anything specific you want us to explain or demonstrate?
Thank you, your videos are both technical and funny
Thank you!
The performance analysis and optimisation breakdown in this video is probably the perfect balance between the most common and powerful tunings and brevity. Great work! Maybe you could comment on client (or even topic) optimisation in the upcoming videos? 🤔
Thanks! Great idea, I'll add it to the list.
I did not get that expand = True. I mean how there will be an array of messages? Wont the subscriber get a single message at a time? or do they get a batch of messages at once?
Hi, Expand is used when you want to return more than one 'row' from your processing function. The function will receive 1 row or data, then you might decide that should represent 2 or more. Maybe you received a binary value that you have decoded and it contains more data. Or you have received a json payload that actually represents more than 1 row. So you want to unpack this into the individual rows. Another example, if you have a sentence and you want to process each word individually, you can use expand=True to split the sentence into words and treat each word as a separate event/row. Then in the next lines of code or in another service you can process the, new, individual rows as needed. If that's still not clear, happy to jump on a call or do another video focusing on this. Let me know.
@@QuixStreams Yeah yeah I got it. So the messages might be nested and we need to kinda unpack it and we can do that via the expand=True right?
That's correct yes.
Always tried to learn Kafka but was worried about the libraries. But quixstreams is something else, its super easy to get started with
How come it is running continuously without a loop?
QuixStreams! sdf.run listens for the termination signals and keeps the listeners open for business.
when you set the compression at 21:50, do you need to make sure that all the consumers have the same compression type specified as the producers or is it embedded in the message and configured automatically
It should just work automatically. The consumer can detect the compression type when it reads each batch and decompress it on the fly. The only thing you need to ensure is support. So if your producers compress with, say, `lz4`, you need to make sure all your consumer libraries support reading `lz4`. With a very common algorithm like `gzip`, that's pretty much guaranteed. For the more exotic ones, I'd double check before you ship.
@@QuixStreams Thanks for the response. That is a good design choice from Kafka, especially if all your components are using the same quixstreams library as the compression support should be the same.
Absolutely. 👍
I feel like if you need super high performance IO routing synchronous python might be the wrong choice. Apache camel has an integration for GitHub event source so it might work better.
Yup, that's absolutely true. However, if you were parallelising, you might decide that it's better to have 3 Python producers sharing a workload instead of introducing an extra component like Camel. And with these tips, you might be looking at 3 producers instead of 10. 🙂 (That's the shape of it at least. I'm not actually sure how you'd load-balance this specific data-source, but I thought it was an interesting one to explore the concepts with. 😉)
Great video, thanks! Can you tell me briefly how I can perform transformations in the consumer using Pandas, in real-time?
Glad you enjoyed it! There's a video here that goes into performing realtime stream transformations with a panda-like API: ruclips.net/video/5sqegy_EPa0/видео.html It's note exactly the same as Pandas, but it the underlying data's stored in Kafka it'll probably solve your problem in the same way. 🙂
I go through how to migrate Pandas code to Quix Streams code in a video. Hopefully it provides some inspiration: ruclips.net/video/FC-DrNbe5fA/видео.html
Excellent video. Could you please provide some insights on how you configured your vim settings? If possible please make a video about Vim configuration for Python developers.
I like your videos, Please upload more videos.
Yes sir, will do! We actually have many more in the pipeline. Subscribe and stay tuned!
My Neovim still doesn't give me type hints like that. Please make a video if you can 😅
Can you do one tutorial with the transformation bit especially groupby transformation. I am having trouble implementing the grouby implemented in the documentation.
Will do. Watch this space.
One quick question on the quix data streams - how is it different from spark? 2, what's the deployment strategy? Kubernaties pods ? Or vms?
Quix Streams is a library, not a server-side engine like Spark. It uses container orchestration via Kubernetes to achieve the same horizontal scale as Spark. The benefit being it is easy to develop and debug a Quix Streams application. You can deploy it how you like, but the best way is in a docker container in K8’s
What's the meaning of `...transform...`?
Any transformation you can do on top of the data
Yes that's correct. It could be determining a new value based on existing values in the data set or some average based on previous values that you have in state. Really any processing you can achieve using Python.
hi, great explanation. My requirements is to capture CDC events from multiple tables using Kafka and Debezium(This part is done). And I want to create multiple streams out of those topics which CDC data and perform complex joins on them. Will QuixStreams will be viable option to me, because this is the only library I found functioning able to ingest and send it to output topics. Help me!.
We noticed you've joined the Quix Community on Slack where Tomas has helped you with a solution for stream joins. For future reference to anyone who would like support, please join us on Slack: quix.io/slack-invite
I learnt a lot grandpa 😊
So clean and easy explanation. Really like your style.
Thank you 😊
Brilliant! Bravo! 👏👏👏
Thanks 👍
I was looking for streaming data for forecasting model. ❤
❤ learning python
i have a question, will it gonna listen for ever if i am in a case were i want to have this code written to be always listenning, and never stops, if i run that code and leave it will it works as i want or i need to do something in addition? @Quix
Hi, this code will run forever, unless there is an issue like the machine it's running on being turned off, it's also not production code so adding some error handling etc would certainly be a good idea. If you want to run this in production with resiliency and SLA's check out Quix Cloud -> quix.io/signup
vim, regex in searching and replacement are wonderful hope to study more use cases from you love ❤your video!