Running Hive Queries on Hadoop through Ambari Web UI | Hadoop Hive Tutorial | Lecture 11
HTML-код
- Опубликовано: 16 апр 2022
- This lecture is all about running Hive queries on Hadoop through Ambari Web UI in which we have covered up basics of Hive and how we can run HiveQL queries using Hive View 2.0 on Ambari Web UI with ease.
We have first created 2 tables Movies & Ratings, then using HiveQL we have analyzed and got the most popular movies in a dataset (Spoiler alert: * wars).
Get the required files:
wget raw.githubusercontent.com/ash...
wget raw.githubusercontent.com/ash...
In the previous lecture we have seen about Hive- A relational data store for Hadoop where we have seen what is Hive, how it works, where Hive sits on Hadoop stack also discussed all about Hive Architecture which includes:
Metastore
Driver
Compiler
Optimizer
Executor
CLI, UI, and Thrift Server
----------------------------------------------------------------------------------------------------------------
Installing mrjob on HDP 2.6.5 (be sure to "su root" first, as shown in the video.)
yum-config-manager --save --setopt=HDP-SOLR-2.6-100.skip_if_unavailable=true
yum install repo.ius.io/ius-release-el7.rpm dl.fedoraproject.org/pub/epel...
yum install python-pip
pip install pathlib
pip install mrjob==0.7.4
pip install PyYAML==5.4.1
yum install nano
----------------------------------------------------------------------------------------------------------------
Want to know more about Big Data? then checkout the full course dedicated to Big Data fundamentals: • Big Data Full Course
---------------------------------------------------------------------------------------------------------
HDP Sandbox Installation links:
Oracle VM Virtualbox: download.virtualbox.org/virtu...
HDP Sandbox link: archive.cloudera.com/hwx-sand...
HDP Sandbox installation guide: hortonworks.com/tutorial/sand...
-------------------------------------------------------------------------------------------------------------
Also check out similar informative videos in the field of cloud computing:
What is Big Data: • What is Big Data? | Bi...
How Cloud Computing changed the world: • How Cloud Computing ch...
What is Cloud? • What is Cloud Computing?
Top 10 facts about Cloud Computing that will blow your mind! • Top 10 facts about Clo...
Audience
This tutorial is made for professionals who are willing to learn the basics of Big Data Analytics using Hadoop Ecosystem and become a Hadoop Developer. Software Professionals, Analytics Professionals, and ETL developers are the key beneficiaries of this course.
Prerequisites
Before you start proceeding with this course, I am assuming that you have some basic knowledge to Core Java, database concepts, and any of the Linux operating system flavors.
---------------------------------------------------------------------------------------------------------------------------
Check out our full course topic wise playlist on some of the most popular technologies:
SQL Full Course Playlist-
• SQL Full Course
PYTHON Full Course Playlist-
• Python Full Course
Data Warehouse Playlist-
• Data Warehouse Full Co...
Unix Shell Scripting Full Course Playlist-
• Unix Shell Scripting F...
--------------------------------------------------------------------------------------------------------------------------
Don't forget to like and follow us on our social media accounts which are linked below.
Facebook-
/ ampcode
Instagram-
/ ampcode_tutorials
Twitter-
/ ampcodetutorial
Tumblr-
ampcode.tumblr.com
-------------------------------------------------------------------------------------------------------------------------
Channel Description-
AmpCode provides you e-learning platform with a mission of making education accessible to every student. AmpCode will provide you tutorials, full courses of some of the best technologies in the world today.By subscribing to this channel, you will never miss out on high quality videos on trending topics in the areas of Big Data & Hadoop, DevOps, Machine Learning, Artificial Intelligence, Angular, Data Science, Apache Spark, Python, Selenium, Tableau, AWS , Digital Marketing and many more.
#bigdata #datascience #technology #dataanalytics #datascientist #hadoop #hdfs #mrjob #hdp #hdfs #hive
when you uploading files from hive UI, where it stores the files in our virtual box hdfs or what ?
can i upload a file of 1 or 2 GB with single cluster as shown in this method?. Please provide the link of video for configuring multiple clusters.
Hey buddy,
Below official docs from Cloudera may help you setup your own multi node cluster. Hope this’ll answer your doubts😃
docs.cloudera.com/HDPDocuments/HDP2/HDP-2.2.9-Win/bk_HDP_Install_Upgrade_Win/content/ch_deploying.html
can I create my own database in the Hive Views?
Hello,
You can create your own hive database which stores it’s data in HDFS hive directory and then you can create views on top of one/more tables present in that database.
Please let me know if any further information is required.
Thanks!
@@ampcode Thank you for replying, I am using the latest version of the HDP, I am creating database and tables in Data Analytics Studio and the database and tables I created, they are not showing up. So I found your video using this HDP version, and I might give it a try. i am downloading now the HDP 2.6.5. If I may ask, if you have time, can you help me with my case study? I am from Philippines, thanks!
My command prompt showing wget is not recognised as an internal or external command
How can I download file through command prompt
Hello there!
Have you install HDP Sandbox which is kind of pre-requisite for this lecture. If yes, then you’ll able to submit wget through putty terminal. Pls let me know if u face any issues 😊
@@ampcode yeah ...I installed HDP sand box
You said to download file from command prompt right?
@@Saiprathap140 Hello Sai,
You need to install putty software to able to connect to your linux vm machine where we have extracted HDP. Then you can submit all the commands mentioned in the lecture. Pls let me know if you still face any issue😊
@@ampcode But you did that in command prompt bro
I have a file with nested json. while uploading from ambari UI it says it cannot accept nested json. what may be the solution for nested array. my json looks like this.
{
"reporting_entity_name" : "XXX",
"reporting_entity_type" : "XXX",
"reporting_structure" : [ {
"reporting_plans" : [ {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
} ],
"in_network_files" : [ {
"description" : "XXX",
"location" : "www.rates_1.json"
} ]
},
{
"reporting_plans" : [ {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "XXX"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "XXX"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "XXX"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
}, {
"plan_name" : "XXX",
"plan_id_type" : "XXX",
"plan_id" : "XXX",
"plan_market_type" : "group"
} ],
"in_network_files" : [ {
"description" : "in-network file",
"location" : "network-rates_2.json"
} ]
}
}
Hello there! Can you please convert the nested JSON file into a Spark DataFrame and then write it to a Hive table. this will work as a charm!. But I also found similar issue of dealing with nested JSON files with Hive as given below. Let me know if you have any issues. :)
stackoverflow.com/questions/45233084/create-hive-table-for-nested-json-data
you couldve chosen a smaller file, it takes 6 hours to upload ratings.data to HIVE, lol
it turns out i had some problems with Hadoop, had to reinstall everything