Hive classic short answer questions

Homo sapiens 2021-01-22 18:07:19
hive classic short answer questions


What is? Hive?

Hive Is based on Hadoop A data warehouse tool , A structured data file can be mapped to a database table , And provide classes SQL Query function (HQL).

2.HIve The meaning of ( The reason for the initial development )

Reduce the development cost and learning cost of developers .

3.Hive The internal components of the module , What are the functions

Metadata : Data that describes data Internal execution process : Parser ( analysis SQL sentence )、 compiler ( hold SQL The statement is compiled into MapReduce Program )、 Optimizer ( Optimize MapRedue Program )、 actuator ( take MapReduce The results of the program run are submitted to HDFS)

4.Hive Supported data formats

Text,SequenceFile,ParquetFile,ORC,RCFILE

5. Get into Hiveshell The way of the window ?

The first kind of interaction :Hive Interaction shell( Directly through bin/hive The way ) The second way of interaction :Hive JDBC service 1. start-up hiveserver2 service The front desk :bin/hive --service hiveserver2 2.beeline Connect hiveserver2 beeline beeline> !connect jdbc:hive2://node01:10000

6.Hive database 、 Table in HDFS The path stored on the ?

/user/hive/warehouse

7、like And rlike The difference between

like:like It's a fuzzy matching query rlike:rlike regular expression .

8. The difference between internal and external tables ?

Internal table deletion removes both metadata and data of the table . Metadata of external table is deleted , The data itself is not deleted .

9. The advantage of partitioned tables is , The requirement for partition fields is ?

advantage : Specify partition query , Improve query , The efficiency of analysis requirement : Partition fields must not appear in fields in the data table .

10、 The advantage of a bucket table is , The requirement of the bucket field is ?

advantage : 1、 about join The needs of , Can play the role of optimization acceleration .( Premise is ,join Field is set to bucket field ) 2、 For data sampling ( obtain / Extract data samples ) requirement : The bucket field must be a field in the table

11、 How to import data into a table

1. Insert data directly into the table 2. Insert data through a query 3. Multiple insertion mode 4. Create table and load data in query statement 5. Passed while creating the table location Specify the load data path

12. How to export data to a table

1、 Export the results of the query to local 2、 Format and export the query results to local 3、 Export the results of the query to HDFS On ( No, local) 4、Hadoop Command export to local 5 、hive shell Command export 6、export Export to HDFS On ( Full table export ) 7. sqoop export

13、order by And sort by The difference between

order by: Global ordering , One MapReduce sort by: Sort within each partition , Not sort for global result sets .

14.where And having The difference between ?

“Where” It's a constraint statement , Constrain the query conditions in the database before the results of the query database are returned , That is, it works before the result returns , And where Can't be used later “ Aggregate functions ”; “Having” It's a filter statement , The so-called filtering is to filter after the result of querying the database is returned , It works after the result is returned , also having You can use it later “ Aggregate functions ”.

15、distribute by When to use , Usually used in combination with ?

When you need to select a field Conduct In zoning Use Usually with sort by Use a combination of ( Partition before sort ) Hive requirement DISTRIBUTE BY The statement is written in SORT BY The statement before .

16.Cluster by When to use ?

When you need to partition according to a certain field and sort in ascending order according to this field cluster by

17. distribute by+sort by( Same field ) And Cluster by The difference between ?

distribute by+sort by Method can specify positive and negative order Cluster It can only be positive order , Cannot specify sort by

18.hive -e/-f/-hiveconf What do you mean by the difference ?

-e Executes the specified HQL -f perform HQL Script -hiveconf Set up hive Parameter configuration at run time

19、hive How to declare parameters , What is the priority ?

The configuration file < Command line arguments < Parameter declarations

20. To write hiveUDF Code , What is the name of the method ?

evaluate

21. In the enterprise hive What are the common data storage formats ? What are the common data compression formats ?

The storage format is ORC,ParquetFile Format , The data compression format is snappy

22.hive Type of custom function

Custom functions fall into three categories : UDF(User Defined Function): One in, one out UDAF(User Defined Aggregation Function): Aggregation function , More in one out ( for example count/max/min) UDTF(User Defined Table Generating Function): One in, many out , Such as lateral view explode()

23.Fetch Set in grab more What's the effect , Set up none What's the effect ?

Set to more, Simple query statements don't translate into MR Program Set to none, All query statements should be transformed into MR Program

24、 What are the benefits of the local model

On the premise of small amount of data Improved query efficiency

25. When one key When the data is too large to skew , How to deal with it

Turn on Map After end aggregation and function opening local aggregation hive Will create two MR Program The first is local aggregation of data The second is the final summary of the data

26、Count(distinct) How to write the replacement statement of

SELECT count(DISTINCT id) FROM bigtable; Replace statement SELECT count(id) FROM (SELECT id FROM bigtable GROUP BY id) a; First filter in Management

27、 How to use partition clipping 、 Column cut

Column cut : Just take the columns you need Partition clipping : Just take the partitions you need What would you like? What to take

28. How to understand dynamic partition adjustment

With the partition rule of the first table , To correspond to the partition rule of the second table , All partitions of the first table , Copy it all to the second table , When the second table loads data , There's no need to specify a partition , Just use the partition of the first table

29. When the data is skewed , How to write lots of data to 10 File

( Split a large task into several small tasks , Re execution ) Set up reduce Number (10) 1:distribute by ( Field ) 2 distribute by rand()

30. influence map The quantity factor

When the files were very small : influence map The number factor is the number of documents When the files are big : influence map The number factor is the number of blocks

31.reduce What is the calculation of quantity

The formula : N=min( Parameters 2, Total amount of input data / Parameters 1) Parameters 1: Every Reduce The maximum amount of data processed Parameters 2: Every mission is the biggest Reduce Number

32. What are the benefits of parallel execution

Parallel execution enables multiple tasks without dependencies to be executed at the same time , Played a role in improving the efficiency of the query

33. What commands can't be executed in strict mode

1、 Scanning all partitions is not allowed 2、 Used order by Statement query , Be required to use limit sentence 3、 Queries that restrict cartesian products

34.JVM What's the benefit of reuse

Allow multiple task Use one jvm The cost of task startup is reduced , Improve the efficiency of the task ( however , Before the end of the mission ,jvm Don't release , To occupy for a long time . When resources are insufficient , Waste of resources )

35. What is? MR Local mode

The task is being submitted SQL Statement " Local execution ", Tasks are not assigned to the cluster

36. What is local computing

Data stored in HDFS after , Write analysis code to realize calculation program , When the program is distributed , Priority distribution to the node where the data used by this program is located .

37. First join Optimization of post filtration

1. Write the filter conditions in join…on Of on in SELECT a.id FROM ori a LEFT JOIN bigtable b ON (b.id <= 10 AND a.id = b.id); 2. Write the filter conditions in join…on Of join, Subquery filtering SELECT a.id FROM bigtable a RIGHT JOIN (SELECT id FROM ori WHERE id <= 10 ) b ON a.id = b.id;

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Homo sapiens]所创,转载请带上原文链接,感谢
https://javamana.com/2021/01/20210122180032291l.html

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云