I would like to point out that there are datebase (extensions) for GIS data. Postgis for postgres. So in fact you could query a database. Other databases have also extensions or native features.
Yes-for our vector-based data this is a good solution. However, for raster data we don’t have any direct equivalent. We sort of glossed over this in the interest of time, so really good thoughts here!
Very good and explicative video, thank you very much. I am currently building an internal data platform, and I was going to use Prefect on a VM, but after seeing your video I believe the best way to go would be: Prefect + Dask Scheduler + Dask Worker on Azure Kubernetes Service. Does that make sense to you? Then I could benefit from autoscaling of the workers. Thanks again!
Yep, that sounds like a great solution! There's also fully managed solutions like Snowflake and Databricks as well, if that suits your use case. Thanks for watching!
Hadoop MapReduce could absolutely be used in place of Spark/Dask as our distributed data processing cluster. However, this would be a lot of manual work to build the types of aggregations we would need from scratch. Good point!
Thanks for the video! Would be great to also see the how you would write it on a real application
I would like to point out that there are datebase (extensions) for GIS data. Postgis for postgres. So in fact you could query a database. Other databases have also extensions or native features.
Yes-for our vector-based data this is a good solution. However, for raster data we don’t have any direct equivalent. We sort of glossed over this in the interest of time, so really good thoughts here!
Very good and explicative video, thank you very much.
I am currently building an internal data platform, and I was going to use Prefect on a VM, but after seeing your video I believe the best way to go would be: Prefect + Dask Scheduler + Dask Worker on Azure Kubernetes Service. Does that make sense to you? Then I could benefit from autoscaling of the workers.
Thanks again!
Yep, that sounds like a great solution! There's also fully managed solutions like Snowflake and Databricks as well, if that suits your use case. Thanks for watching!
This made me wonder whether systems like Hadoop and MapReduce are still used/built.
Hadoop MapReduce could absolutely be used in place of Spark/Dask as our distributed data processing cluster. However, this would be a lot of manual work to build the types of aggregations we would need from scratch. Good point!
Did something similar but on a very large scale in PayPal,
Cool cool!
Hi, what exactly is this subject? Is it data science?
This is system design-we’re considering what services and infrastructure to use to solve a high-level problem. Thanks for watching!