Sir how are the driving dataset records sorted then if the max core is ment just for non driving dataset records? And in which phase does the join happen? That is not mentioned in the video.
with unsorted join , sorting is not guaranteed in output but grouping is inevitable , grouping and sorting are 2 different things, at the outset, even I am not sure of exact internal working of the process, but , as per their definition and guide- If in case component tries to spill to disk when max_core fills, both driving and non driving lands to disk and additional I/O makes the process slower. do remember that JOIN component has AB_WORK_DIR as well to hold driving data as it for other non grouped transform component. So max_core is seemingly an isolated allocation of memory for non-driving inputs in case of unsorted i.e. small inputs so that lower probability to be exhausted, and the driving being largest uses other component memory(ab-work_dir) like any other component does use, now when join process the unsorted data, it scans complete input for key-group1 and goes to max_core to seek the join , do the join in ab-work-dir and outputs, and then 2nd key group. so effectively JOIN does special in unsorted input is to keep doing SCAN one by one key group or in other words it does keeps creating intermediate temporary key-group file for all key groups in the first scan itself and later go to max-core and see the data for those key group to make join ,one by one before and outputs . This is my understanding. ready to discuss more if you thing we are not good.
It seems like you are just reading the abinitio help file. If you could explain with diagrams or images , it would have been little better to understand.
please provide notes for all like u have given on reformat
Sir how are the driving dataset records sorted then if the max core is ment just for non driving dataset records? And in which phase does the join happen? That is not mentioned in the video.
with unsorted join , sorting is not guaranteed in output but grouping is inevitable , grouping and sorting are 2 different things, at the outset, even I am not sure of exact internal working of the process, but , as per their definition and guide-
If in case component tries to spill to disk when max_core fills, both driving and non driving lands to disk and additional I/O makes the process slower.
do remember that JOIN component has AB_WORK_DIR as well to hold driving data as it for other non grouped transform component.
So max_core is seemingly an isolated allocation of memory for non-driving inputs in case of unsorted i.e. small inputs so that lower probability to be exhausted, and the driving being largest uses other component memory(ab-work_dir) like any other component does use, now when join process the unsorted data, it scans complete input for key-group1 and goes to max_core to seek the join , do the join in ab-work-dir and outputs, and then 2nd key group. so effectively JOIN does special in unsorted input is to keep doing SCAN one by one key group or in other words it does keeps creating intermediate temporary key-group file for all key groups in the first scan itself and later go to max-core and see the data for those key group to make join ,one by one before and outputs . This is my understanding.
ready to discuss more if you thing we are not good.
@@datapundit great explanation
Sir could you pls upload Images and explain
Just a suggestion Better to explain with some notes or diagram... BTW your videos are very helpful
yes, this is one of my initial video , these days are all with example, thanks for visiting!!! cheers
It seems like you are just reading the abinitio help file. If you could explain with diagrams or images , it would have been little better to understand.
hmm, let me create a another one , i remember i did used white board but not sure until i saw it again now,
@@datapundit Thanks for considering!