This blog , What Xiaojun shares is about Hive Basic operation !
create database [ if not exists ] myhive ;
explain ：hive The table storage location mode of is created by hive-site.xml One of the attributes specifies
Create a database and specify hdfs Storage location :
create database myhive2 location '/myhive2';
drop database myhive2;
Using this command to delete can only delete an empty database , If there are tables under the database , So it's a mistake !
Force database deletion
drop database myhive cascade;
The tables under the database are deleted together ; Do not execute , Dangerous action
view the database
# View basic database information desc database myhive2; # See the database for more details desc database extended myhive2;
use myhive（ Database name ）;
modify the database
The metadata information of the database is unchangeable , Including the name of the database and the location of the database , But we can use alter database Command to modify some properties of the database .
# Modify the creation date of the database alter database myhive2 set dbproperties('createtime'='20880611');
Create a basic data table ( Internal table ):
create table tableName( Field name Field type , Field name Field type ) ROW FORMAT DELIMITED IELDS TERMINATED BY char(char Separator ) Specifies the field to field separator in the data ‘\t’ or ‘,’ or ‘|’ Or others
Create external data tables ：
create EXTERNAL table tableName( Field name Field type , Field name Field type ) To create an external table, you need to specify the storage path of the data . adopt LOCATION Make a designation .
Load data from the local file system into the table load data local inpath ‘ File path ’ into table Table name ;
Load data and override existing data load data local inpath ‘ File path ’ overwrite into table Table name ;
from HDFS The file system loads data into the table load data inpath ‘/hivedatas/techer.csv’ into table techer;
The difference between internal and external tables ： When deleting an internal table ： Internal table deletion removes both metadata and data of the table . When deleting an external table ： Metadata of external table is deleted , The data itself is not deleted .
Create a partition table ：
Common partition rules of enterprises ： Partition by day （ One division a day ）
Create partition table Syntax create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by ‘\t’;
Create a strap with multiple partitions create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by ‘\t’;
Load data Into the partition table load data local inpath ‘/export/servers/hivedatas/score.csv’ into table score partition (month=‘201806’);
Load data Go to a multi partitioned table load data local inpath ‘/export/servers/hivedatas/score.csv’ into table score2 partition(year=‘2018’,month=‘06’,day=‘01’);
Check the partition show partitions score;
Add a partition alter table score add partition(month=‘201805’);
Add multiple partitions at the same time alter table score add partition(month=‘201804’) partition(month = ‘201803’); Be careful ： After you add a partition, you can create it in hdfs You can see a folder under the table in the file system
Delete partition alter table score drop partition(month = ‘201806’); A special emphasis on : Partition field must not appear in the existing field of database table !
effect : Divide the data into regions , You don't need to scan irrelevant data when querying , Speed up query .
Create a bucket table ：
A special structure is added to the existing table structure .
Divide the data into buckets according to the specified fields , To put it bluntly, it is to divide the data into fields , Data can be divided into multiple files according to fields
Turn on hive Bucket table function of set hive.enforce.bucketing=true;
Set bucket (reduce) The number of set mapreduce.job.reduces=3;
Create bucket table create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by ‘\t’;
matters needing attention : Bucket table data loading , Only through insert overwrite.hdfs dfs -put File or through load data Unable to load . So you can only create a normal table first , And pass insert overwrite Load the data of ordinary table into bucket table through query
Create a normal table create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by ‘\t’;
Loading data in common table load data local inpath ‘/export/servers/hivedatas/course.csv’ into table course_common;
adopt insert overwrite Load data into the bucket table insert overwrite table course select * from course_common cluster by(c_id);
A special emphasis on : The bucket field must be a field in the table .
Bucket logic : Hash the bucket field , Use hash value and the number of buckets to get the remainder , What's the balance , This data is in which bucket . This time about Hive That's the basic operation of , The following small bacteria will complete more content for you here , Coming soon !ε≡٩(๑>₃<)۶
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .