Course Description
Apache Hive is a data warehouse software project built on top of Apache Hadoop to provide data summarization, query and analysis. Hive gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. This course will give you a quick introduction to Hive and explore most of its features in under an hour.
What am I going to get from this course?
- DDL – create table, point to locations, file formats, create index, create table with partitions etc.
- How to point to data files loaded into HDFS and use Hive to point to those files
- Difference between internal and external tables
- Demo/hands-on
- Tables/partitioning/buckets
- Turning on ACID and DML – insert/update/delete/merge
- Complex data types – Struct, Map, Array
- Fine Tuning – Advanced queries and explain plan
- How to use HiveQL using Select and complex joins