Source type, project discussion Handling duplicates Delta lake feature Spark vs dbx Power bi connect to synapse Spark architecture Dag Client mode vs cluster mode Df vs dataset Normalisation 2nd highest salary in dep
When someone saying they are optimizing the code in databricks..all are faking😂😂. Spark itself optimize your code using catalytst optimizer/Spark sql engine and after spark 3.0 when Adaptive Query Execution(AQE) introduced it also optimized join during run time and we can alter the broadcast threshold which is part of admin team during databricks cluster creation The only things didnt impact by above two is those things stored inside user defined memory like udfs and low level programming on rdd ops which now a days no one doing in databricks.last one is caching manually also
Yes, you are 100% correct. But still there are some optimization which we can perform like - Using Predicate and Projection Pushdown to cut down the data read. - Caching or persisting data if it's reused frequently. - Choosing built-in functions over UDFs, since they’re more optimized. - And picking efficient file formats like Parquet for better compression and speed. Please correct if needed.
Source type, project discussion
Handling duplicates
Delta lake feature
Spark vs dbx
Power bi connect to synapse
Spark architecture
Dag
Client mode vs cluster mode
Df vs dataset
Normalisation
2nd highest salary in dep
When someone saying they are optimizing the code in databricks..all are faking😂😂.
Spark itself optimize your code using catalytst optimizer/Spark sql engine and after spark 3.0 when Adaptive Query Execution(AQE) introduced it also optimized join during run time and we can alter the broadcast threshold which is part of admin team during databricks cluster creation
The only things didnt impact by above two is those things stored inside user defined memory like udfs and low level programming on rdd ops which now a days no one doing in databricks.last one is caching manually also
@SrihariSrinivasDhanakshirurexactly there are a lot of other optimisations
@SrihariSrinivasDhanakshirurbut these are the code level optimisations like you said bucketing, partitioning
Yes, you are 100% correct. But still there are some optimization which we can perform like
- Using Predicate and Projection Pushdown to cut down the data read.
- Caching or persisting data if it's reused frequently.
- Choosing built-in functions over UDFs, since they’re more optimized.
- And picking efficient file formats like Parquet for better compression and speed.
Please correct if needed.
Correct brother and moment you select photon acceleration in cluster creation ..data.bricks take care if itself
seems rataa laga kar aaya hai bhai :).... anyway he answered all questions very well