Hi Soumil, What to do in case I want to keep the new scheme (i.e want to update my old table to have new scheme )and not want to store it just the data with new scehma in separate table?
Hi @renukasrivastava1167, In the Glue Crawler properties we can tick mark the option of merging the schema so that the updated schema will persist for the data. I think this option comes in the 4th step in crawler configuration in advanced properties section. Here it says: When crawler detects the schema change, what should it do? Ignore, Merge & one more option I forgot. Once you tick that, the crawler should take care of any schema change. This is the option: "When the crawler detects schema changes in the data store, how should AWS Glue handle table updates in the data catalog?" In this, you have to select the first option and also tick mark the option just beneath this: "Update all new and existing partitions with metadata from the table" This will take care of the schema evolution and your table will always have the latest schema. In case of new column addition, the new column will have NULLs in old data files. I hope it helps.
Please provide video on how you are calculating MD5 hash of the table def and storing it in dynamodb. if it is ok , do you trigger the job.?
Hi Soumil, What to do in case I want to keep the new scheme (i.e want to update my old table to have new scheme )and not want to store it just the data with new scehma in separate table?
Hi @renukasrivastava1167, In the Glue Crawler properties we can tick mark the option of merging the schema so that the updated schema will persist for the data.
I think this option comes in the 4th step in crawler configuration in advanced properties section. Here it says: When crawler detects the schema change, what should it do? Ignore, Merge & one more option I forgot. Once you tick that, the crawler should take care of any schema change.
This is the option: "When the crawler detects schema changes in the data store, how should AWS Glue handle table updates in the data catalog?"
In this, you have to select the first option and also tick mark the option just beneath this: "Update all new and existing partitions with metadata from the table"
This will take care of the schema evolution and your table will always have the latest schema. In case of new column addition, the new column will have NULLs in old data files.
I hope it helps.
In read world how we run AWS Glue job from tester perspective. How we are going to validate from Source to Target
Thank you for making video on our blog @soumil - Navnit Shukla
You guys did great job
Soumil you are jumping some steps I want to see😄😄 from