Running Hive Queries on Hadoop through Ambari Web UI | Hadoop Hive Tutorial | Lecture 11

Поделиться
HTML-код
  • Опубликовано: 16 апр 2022
  • This lecture is all about running Hive queries on Hadoop through Ambari Web UI in which we have covered up basics of Hive and how we can run HiveQL queries using Hive View 2.0 on Ambari Web UI with ease.
    We have first created 2 tables Movies & Ratings, then using HiveQL we have analyzed and got the most popular movies in a dataset (Spoiler alert: * wars).
    Get the required files:
    wget raw.githubusercontent.com/ash...
    wget raw.githubusercontent.com/ash...
    In the previous lecture we have seen about Hive- A relational data store for Hadoop where we have seen what is Hive, how it works, where Hive sits on Hadoop stack also discussed all about Hive Architecture which includes:
    Metastore
    Driver
    Compiler
    Optimizer
    Executor
    CLI, UI, and Thrift Server
    ----------------------------------------------------------------------------------------------------------------
    Installing mrjob on HDP 2.6.5 (be sure to "su root" first, as shown in the video.)
    yum-config-manager --save --setopt=HDP-SOLR-2.6-100.skip_if_unavailable=true
    yum install repo.ius.io/ius-release-el7.rpm dl.fedoraproject.org/pub/epel...
    yum install python-pip
    pip install pathlib
    pip install mrjob==0.7.4
    pip install PyYAML==5.4.1
    yum install nano
    ----------------------------------------------------------------------------------------------------------------
    Want to know more about Big Data? then checkout the full course dedicated to Big Data fundamentals: • Big Data Full Course
    ---------------------------------------------------------------------------------------------------------
    HDP Sandbox Installation links:
    Oracle VM Virtualbox: download.virtualbox.org/virtu...
    HDP Sandbox link: archive.cloudera.com/hwx-sand...
    HDP Sandbox installation guide: hortonworks.com/tutorial/sand...
    -------------------------------------------------------------------------------------------------------------
    Also check out similar informative videos in the field of cloud computing:
    What is Big Data: • What is Big Data? | Bi...
    How Cloud Computing changed the world: • How Cloud Computing ch...
    What is Cloud? • What is Cloud Computing?
    Top 10 facts about Cloud Computing that will blow your mind! • Top 10 facts about Clo...
    Audience
    This tutorial is made for professionals who are willing to learn the basics of Big Data Analytics using Hadoop Ecosystem and become a Hadoop Developer. Software Professionals, Analytics Professionals, and ETL developers are the key beneficiaries of this course.
    Prerequisites
    Before you start proceeding with this course, I am assuming that you have some basic knowledge to Core Java, database concepts, and any of the Linux operating system flavors.
    ---------------------------------------------------------------------------------------------------------------------------
    Check out our full course topic wise playlist on some of the most popular technologies:
    SQL Full Course Playlist-
    • SQL Full Course
    PYTHON Full Course Playlist-
    • Python Full Course
    Data Warehouse Playlist-
    • Data Warehouse Full Co...
    Unix Shell Scripting Full Course Playlist-
    • Unix Shell Scripting F...
    --------------------------------------------------------------------------------------------------------------------------
    Don't forget to like and follow us on our social media accounts which are linked below.
    Facebook-
    / ampcode
    Instagram-
    / ampcode_tutorials
    Twitter-
    / ampcodetutorial
    Tumblr-
    ampcode.tumblr.com
    -------------------------------------------------------------------------------------------------------------------------
    Channel Description-
    AmpCode provides you e-learning platform with a mission of making education accessible to every student. AmpCode will provide you tutorials, full courses of some of the best technologies in the world today.By subscribing to this channel, you will never miss out on high quality videos on trending topics in the areas of Big Data & Hadoop, DevOps, Machine Learning, Artificial Intelligence, Angular, Data Science, Apache Spark, Python, Selenium, Tableau, AWS , Digital Marketing and many more.
    #bigdata #datascience #technology #dataanalytics #datascientist #hadoop #hdfs #mrjob #hdp #hdfs #hive

Комментарии • 17

  • @shreyash184
    @shreyash184 Год назад

    when you uploading files from hive UI, where it stores the files in our virtual box hdfs or what ?

  • @nabinupreti3416
    @nabinupreti3416 Год назад +2

    can i upload a file of 1 or 2 GB with single cluster as shown in this method?. Please provide the link of video for configuring multiple clusters.

    • @ampcode
      @ampcode  Год назад

      Hey buddy,
      Below official docs from Cloudera may help you setup your own multi node cluster. Hope this’ll answer your doubts😃
      docs.cloudera.com/HDPDocuments/HDP2/HDP-2.2.9-Win/bk_HDP_Install_Upgrade_Win/content/ch_deploying.html

  • @nivretech2072
    @nivretech2072 2 года назад +1

    can I create my own database in the Hive Views?

    • @ampcode
      @ampcode  2 года назад +1

      Hello,
      You can create your own hive database which stores it’s data in HDFS hive directory and then you can create views on top of one/more tables present in that database.
      Please let me know if any further information is required.
      Thanks!

    • @nivretech2072
      @nivretech2072 2 года назад

      @@ampcode Thank you for replying, I am using the latest version of the HDP, I am creating database and tables in Data Analytics Studio and the database and tables I created, they are not showing up. So I found your video using this HDP version, and I might give it a try. i am downloading now the HDP 2.6.5. If I may ask, if you have time, can you help me with my case study? I am from Philippines, thanks!

  • @Saiprathap140
    @Saiprathap140 Год назад +1

    My command prompt showing wget is not recognised as an internal or external command

    • @Saiprathap140
      @Saiprathap140 Год назад +1

      How can I download file through command prompt

    • @ampcode
      @ampcode  Год назад

      Hello there!
      Have you install HDP Sandbox which is kind of pre-requisite for this lecture. If yes, then you’ll able to submit wget through putty terminal. Pls let me know if u face any issues 😊

    • @Saiprathap140
      @Saiprathap140 Год назад +1

      @@ampcode yeah ...I installed HDP sand box
      You said to download file from command prompt right?

    • @ampcode
      @ampcode  Год назад

      @@Saiprathap140 Hello Sai,
      You need to install putty software to able to connect to your linux vm machine where we have extracted HDP. Then you can submit all the commands mentioned in the lecture. Pls let me know if you still face any issue😊

    • @Saiprathap140
      @Saiprathap140 Год назад +1

      @@ampcode But you did that in command prompt bro

  • @nabinupreti3416
    @nabinupreti3416 Год назад +1

    I have a file with nested json. while uploading from ambari UI it says it cannot accept nested json. what may be the solution for nested array. my json looks like this.
    {
    "reporting_entity_name" : "XXX",
    "reporting_entity_type" : "XXX",
    "reporting_structure" : [ {
    "reporting_plans" : [ {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    } ],
    "in_network_files" : [ {
    "description" : "XXX",
    "location" : "www.rates_1.json"
    } ]
    },
    {
    "reporting_plans" : [ {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "XXX"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "XXX"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "XXX"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    }, {
    "plan_name" : "XXX",
    "plan_id_type" : "XXX",
    "plan_id" : "XXX",
    "plan_market_type" : "group"
    } ],
    "in_network_files" : [ {
    "description" : "in-network file",
    "location" : "network-rates_2.json"
    } ]
    }
    }

    • @ampcode
      @ampcode  Год назад

      Hello there! Can you please convert the nested JSON file into a Spark DataFrame and then write it to a Hive table. this will work as a charm!. But I also found similar issue of dealing with nested JSON files with Hive as given below. Let me know if you have any issues. :)
      stackoverflow.com/questions/45233084/create-hive-table-for-nested-json-data

  • @bulavo
    @bulavo 3 месяца назад

    you couldve chosen a smaller file, it takes 6 hours to upload ratings.data to HIVE, lol

    • @bulavo
      @bulavo 2 месяца назад

      it turns out i had some problems with Hadoop, had to reinstall everything