GE rules are already created; you need to apply them to your data. If you want to auto apply these rules then create a custom suite and it will apply a few rules automatically. However, these will be generic and for more control you would want to apply rules based on your needs. Custom Suite also offers a better visual experience via HTML.
Thanks for the video. What if I have composite primary key will the “expect_column_values_to_be_unique” work? Its failing for me. so could you help me on this?
We are testing a csv file data in this session with Great Expecations. In the upcoming video we will test database table using Great Expecations in an Airflow Dag.
Did they change the structure? Why did not you talked about validator, checkpoints? I have stuck here: import great_expectations as gx context = gx.get_context() validator = context.sources.pandas_default.read_csv( "data.csv" ) validator.validate(expectation_suite="config.json") checkpoint = context.add_or_update_checkpoint( name="my_quickstart_checkpoint", validator=validator, ) checkpoint_result = checkpoint.run() context.view_validation_result(checkpoint_result) ERROR: great_expectations.exceptions.exceptions.DataContextError: expectation_suite default not found
They might have, please check their docs. On the error front, you are not providing the correct directory for the config.json. It is not able to locate the expectation_suite that you are providing.
Yes, you can build and manage custom expectations for your use case. Here is the docs on custom expectations: docs.greatexpectations.io/docs/oss/guides/expectations/custom_expectations_lp/
Good explanation. But you need to update the video. "get_expectations_config()" is no longer available in great expectations framework. I used your code for running my checks, and it failed in the "get_expectations_config()" step.
What version are you using? Anyways, you can try the "get_expectation_suite" that's the updated function. expectation_suite = data_assistant_result.get_expectation_suite( expectation_suite_name=expectation_suite_name )
This is another master peace video. I am struggling on below scenarios , if possible could you please explain in your upcoming videos 1. How to read latest file . Suppose my source folder contain many files, i want read only latest file 2. I want create a Python script to read, process and load the data into db table when file arrived in source folder
Why every tutorial is using local client; its a very unusual case because in real world you have to load all the data into memory to execute this quality analysis
this is the best basic tutorial of this tool i've been able to find, you have everything one would need to start, in a digestable way. thanks
this is gradually becoming my favourite channel.
I love your videos man
Try publishing them more on subreddits like r/datascience and r/dataengineering
Thanks for the tip. Will publish on subreddit in the future.
From where we can refere the videos related to pytest framwork set up to validate data kindly help with the video link?
Please check the description. Link to the PyTest video is available there.
Is there a way to create the rules/tests automatically? Is there a better way to visualise a summary of tests?
GE rules are already created; you need to apply them to your data. If you want to auto apply these rules then create a custom suite and it will apply a few rules automatically. However, these will be generic and for more control you would want to apply rules based on your needs. Custom Suite also offers a better visual experience via HTML.
Thanks for the video. What if I have composite primary key will the “expect_column_values_to_be_unique” work?
Its failing for me. so could you help me on this?
Thanks for stopping by. You can try the “expect_compound_columns_to_be_unique” in above scenario.
Hi, Can we test csv file data with database table with some expectation?
We are testing a csv file data in this session with Great Expecations. In the upcoming video we will test database table using Great Expecations in an Airflow Dag.
Did they change the structure? Why did not you talked about validator, checkpoints? I have stuck here: import great_expectations as gx
context = gx.get_context()
validator = context.sources.pandas_default.read_csv(
"data.csv"
)
validator.validate(expectation_suite="config.json")
checkpoint = context.add_or_update_checkpoint(
name="my_quickstart_checkpoint",
validator=validator,
)
checkpoint_result = checkpoint.run()
context.view_validation_result(checkpoint_result)
ERROR: great_expectations.exceptions.exceptions.DataContextError: expectation_suite default not found
They might have, please check their docs. On the error front, you are not providing the correct directory for the config.json. It is not able to locate the expectation_suite that you are providing.
What an awesome video. Thank you.
can you have custom expactations?
Yes, you can build and manage custom expectations for your use case. Here is the docs on custom expectations: docs.greatexpectations.io/docs/oss/guides/expectations/custom_expectations_lp/
Can you please make a video on how to create custom expectation using query or anything. Then how to apply that for DQ
Hey Sawgat, this is a custom expectation and I will cover the custom expectations suite with this library in the future.
Excellent video.
Awesome Video. Subscribed for more!
Good explanation.
But you need to update the video. "get_expectations_config()" is no longer available in great expectations framework. I used your code for running my checks, and it failed in the "get_expectations_config()" step.
What version are you using? Anyways, you can try the "get_expectation_suite" that's the updated function.
expectation_suite = data_assistant_result.get_expectation_suite(
expectation_suite_name=expectation_suite_name
)
Please create end to end python projects for Data Analyst
Will try and create an end to end project.
This is another master peace video.
I am struggling on below scenarios , if possible could you please explain in your upcoming videos
1. How to read latest file . Suppose my source folder contain many files, i want read only latest file
2. I want create a Python script to read, process and load the data into db table when file arrived in source folder
Thanks and will create a video on your suggested topic. Stay tuned.
Why every tutorial is using local client; its a very unusual case because in real world you have to load all the data into memory to execute this quality analysis
You can take the code and concepts and replicate it for your use case on a server or cloud environment.