Hi Please help me. I have created an external table in the synapse lake database, Now I like to load the records from the external table into the dedicated SQL pool table. Please advice on the procedure.
Do you know what is the difference between lake databases and delta lake project? Both seem to have roughly the same functionality - I can use Spark to do ETL tasks - and then use spark pools as well as serverless sql pools to query data.
nice video, i wanted to explore this new functionality, in my opinion it has (like you said) no better feeling than the data workflow. Question, i don't know much about delta lake, is that something you can also use as a business intelligence engineer/data engineer to create a star model which you can use to connect to with Power BI? is that usefull?
Yep, absolutely. As a standard practice we now land our star schemas in the lake as Delta tables and serve it out to Power BI, Tableau etc from there, either through Synapse Serverless or Databricks SQL!
You can absolutely do that, although then we're using the serverless engine + integration movement to do it, rather than the spark engine. Both valid, but have their own pros/cons from a cost/performance approach.
It looks like Synapse engineers don't use their own product. Synapse includes spark, but they want you to load data from a single file instead of a directory. Catastrophe :-(
I feel like you why not just write pyspark. You can remove tedium by programming repetitive tasks with functions and do things like build a view from tables grabbing the comments which I definitely wouldn’t want to be doing manually over 200 columns and having to maintain it. Using parquet tables is also just asking for problems when you’re rewriting and someone is querying at the same time.
Usually because the team/company don't have pyspark skills, maybe they don't have /any/ internal programming skills, so diving straight in and writing some pyspark is a pretty steep learning curve. Completely agree that parquet isn't fit for an enterprise lake these days, and I would 100% use pyspark for this instead, but it's good to have tools for other data roles, and if they get it to a point where it's automated & slick, there's a lot of good that can do!
Microsoft recognizes not everyone has the same path to IT......esp those that become accidental data engineers or power users thinking about making a jump.....and perhaps facilitating to POC to automation cycle. Want to be better? Auto-gen a mapping data flow from someone else's spark code so that the visual ETL champion can become more of a coder by osmosis. At the end of the day, it's all doing the same task and same work. However, it can become a wasteland of different objects if not properly managed.
Finally Someone explained Lake Database clearly as compared to sql database.
Nice explained
Great video. Thank you for this content.
Hi Please help me. I have created an external table in the synapse lake database, Now I like to load the records from the external table into the dedicated SQL pool table. Please advice on the procedure.
Do you know what is the difference between lake databases and delta lake project? Both seem to have roughly the same functionality - I can use Spark to do ETL tasks - and then use spark pools as well as serverless sql pools to query data.
nice video, i wanted to explore this new functionality, in my opinion it has (like you said) no better feeling than the data workflow. Question, i don't know much about delta lake, is that something you can also use as a business intelligence engineer/data engineer to create a star model which you can use to connect to with Power BI? is that usefull?
Yes, Delta Lake can be used for BI/ Data Engineering jobs - You can run SQL queries using Databricks SQL Endpoint and build Dashboards
Yep, absolutely. As a standard practice we now land our star schemas in the lake as Delta tables and serve it out to Power BI, Tableau etc from there, either through Synapse Serverless or Databricks SQL!
If this expands to allow variables and selecting folders instead is files then this could be pretty neat.
Why not just have a view over source files and then use a copy to create the new table files?
You can absolutely do that, although then we're using the serverless engine + integration movement to do it, rather than the spark engine. Both valid, but have their own pros/cons from a cost/performance approach.
@@AdvancingAnalytics I would pay to see a video explaining this
It looks like Synapse engineers don't use their own product. Synapse includes spark, but they want you to load data from a single file instead of a directory. Catastrophe :-(
When GA on unity and DLT
That's a question for Databricks!
@@AdvancingAnalytics I know I just can’t wait, I hope for June
I feel like you why not just write pyspark. You can remove tedium by programming repetitive tasks with functions and do things like build a view from tables grabbing the comments which I definitely wouldn’t want to be doing manually over 200 columns and having to maintain it. Using parquet tables is also just asking for problems when you’re rewriting and someone is querying at the same time.
Usually because the team/company don't have pyspark skills, maybe they don't have /any/ internal programming skills, so diving straight in and writing some pyspark is a pretty steep learning curve. Completely agree that parquet isn't fit for an enterprise lake these days, and I would 100% use pyspark for this instead, but it's good to have tools for other data roles, and if they get it to a point where it's automated & slick, there's a lot of good that can do!
Microsoft recognizes not everyone has the same path to IT......esp those that become accidental data engineers or power users thinking about making a jump.....and perhaps facilitating to POC to automation cycle.
Want to be better? Auto-gen a mapping data flow from someone else's spark code so that the visual ETL champion can become more of a coder by osmosis.
At the end of the day, it's all doing the same task and same work. However, it can become a wasteland of different objects if not properly managed.
You are extra dramatic, sometimes it is so irritating and tough to concentrate. Please consider this point.
Other, more boring, channels are available :D