AWS Tutorials - Introduction to AWS Glue DataBrew

Поделиться
HTML-код
  • Опубликовано: 23 ноя 2020
  • Learn about AWS Services - aws-dojo.com/
    AWS Glue DataBrew is a new visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning. You can choose from over 250 pre-built transformations to automate data preparation tasks, all without the need to write any code.
  • НаукаНаука

Комментарии • 16

  • @4niceguy
    @4niceguy 2 года назад

    Ye... I really appreciate all your wonderful classes.

  • @enidaguja6655
    @enidaguja6655 2 года назад +1

    Thank you for all tutorials. They are great and helpful! I am wondering could one project have many recipes? Best regards

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад +1

      It is one recipe per project. However - one recipe can be used in many projects.

  • @sancho709
    @sancho709 3 года назад +1

    Very, very good

  • @sridharvuligonda319
    @sridharvuligonda319 3 года назад +2

    Can we use this DataBrew as an ETL tool instead of Glue studio?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      You can. You can convert the recipe into Job. But keep in mind, you don't have any control over code generated by DataBrew and you can change it.

  • @charliehunter6387
    @charliehunter6387 2 года назад +1

    Awesome tutorial! when i run my Databrew job i get multiple csv's called 'XXXX_part00001', 'XXXX_part00002', etc. Is there a way i can make it output one csv with all the parts?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      Out of box, the files are partitioned based on size and you cannot control it. Idea is to not to make too big or too small file as both degrade performance if you try to access data using Athena. You can also configure column based partition. Why you want to avoid partitioning?

  • @bobbrandt3043
    @bobbrandt3043 2 года назад

    Is it possible to merge the columns WITHOUT deleting the original columns?

  • @krishnaprasadas8566
    @krishnaprasadas8566 3 года назад +1

    What is the underlying processing engine for DataBrew ? I mean where is the jobs, data quality jobs, running.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад +1

      It uses managed server for this purpose but I am not sure about processing engine configuration. It could be Apache Spark. Sometime - I get errors with DAG mentioned which gives me idea that it might be using Apache Airflow. but I am not 100% sure.

    • @krishnaprasadas8566
      @krishnaprasadas8566 3 года назад

      @@AWSTutorialsOnline Okey, But don't know whether they really support BigData cost effectively.

  • @katiushkaflores
    @katiushkaflores 3 года назад

    They even used the same vocabulary as Trifacta!