Detailed explanation of HBase basic principle

itread01 2021-01-14 14:11:40
detailed explanation hbase basic principle

## HBase Introduction HBase It's a decentralized 、 Open source database for Columns . Based on the HDFS above .Hbase The source of my name is Hadoop database, namely Hadoop Database .HBase The calculation and storage capacity of depends on Hadoop Cluster . It's about NoSql and RDBMS Between , Only through primary key (row key) And primary key range To retrieve information , Only single transaction is supported ( It can be done by Hive Support to implement multiple tables join And so on ).HBase The characteristics of Chinese table :1. Big : A table can have billions of rows , Millions of columns 2. For the column : For the column ( family ) Storage and licensing control of , Column ( family ) Independent search .3. sparse :** For empty (null) The column of , It doesn't take up storage space , therefore , Tables can be designed to be very sparse **.## HBase Underlying principle ### System architecture ![HBase System architecture ]( According to this picture , Explain HBase The components in the #### Client1. Include access to hbase The interface of ,**Client Maintaining some cache To speed up the process of hbase Access to **, such as regione Location information .#### ZookeeperHBase You can use built-in Zookeeper, You can also use external , In the actual production environment , To maintain unity , Generally use external connection Zookeeper.Zookeeper stay HBase The role of :1. Make sure that at any time , There is only one cluster master2. Store everything Region Address entry for 3. Real time monitoring Region Server The state of , Will Region server Real time notification of online and offline information to Master#### HMaster1. For Region server Distribute region2. ** Responsible for region server Load balancing **3. Found to be invalid region server And redistribute the region4. HDFS Garbage file recycling on 5. Deal with schema Update request #### HRegion Server1. HRegion server** Maintenance HMaster Assigned to it region**, Deal with these region Of IO Ask for 2. HRegion server Responsible for segmentation becomes too large in the process of execution region You can see from the picture that ,**Client Visit HBase There is no need to HMaster Participate in **( Addressing access Zookeeper and HRegion server, Read and write the data and visit HRegione server)**HMaster Just defenders table and HRegion The metadata information of , The load is very low .**### HBase Table data model of ![HBase Table structure of ]( Line key Row Key And nosql The database is the same ,row key Is the primary key used to retrieve records . Visit hbase table The lines in the , There are only three ways :1. By single row key Visit 2. Through row key Of range3. Full scan Row Key The line key can be any string (** The maximum length is 64KB**, In practical application, the length is usually 10-100bytes), stay hbase Inside ,row key Stored as an array of bytes .**Hbase The information in the table will be processed according to rowkey Sort ( Dictionary order )** In storage , Information according to Row key Dictionary sequence (byte order) Sort storage . Design key When , To fully sort and store this feature , Save the rows that are often read together .( Location dependence ). Be careful : The dictionary order is right int The result of the sorting is 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21 ... .** To maintain the natural order of plastic surgery , The line key must be 0 Fill left .**** One read and write of a line is an atomic operation ( No matter how many columns are read or written at a time )**. This design decision can make it easy for users to understand the behavior of the program when concurrent updates are performed on the same line .#### Column family Column Family**HBase Every column in the table , All belong to a certain family **. A column family is a table schema Part of ( The column is not ),** Must be defined before using tables **. All the names begin with the family . for example courses:history , courses:math All belong to courses This clan .** Access control 、 Disk and memory usage statistics are done at the column family level . The more families , To be involved in a row of data IO、 The more files we search for , therefore , If it's not necessary , Don't set too many column families .**#### Column Column Specific columns below column families , Belong to a certain ColumnFamily, Similar to in mysql The specific columns that are created .#### Time stamp TimestampHBase Medium pass row and columns What is determined is a storage unit called cell. Every cell They all store multiple versions of the same data . Versions are indexed by timestamps . The type of timestamp is 64 An integer .** Timestamps can be created by hbase( Automatically when data is written ) Assignment **, At this point, the timestamp is the current system time accurate to milliseconds . Timestamps can also be assigned explicitly by the client . If the application wants to avoid data version conflicts , You have to generate your own unique timestamp .** Every cell in , The data are sorted in reverse chronological order **, That is, the latest information is at the top of the list . In order to avoid the management caused by too many versions of data ( Including storage and index ) Burden ,hbase There are two ways to recycle data version :1. At the end of the storage n Version 2. Save the latest version ( Set the life cycle of the data TTL). Users can set it for each column family .#### Unit Cell from {row key, co

  1. Spring boot static resource configuration principle (step by step source analysis, detailed and easy to understand)
  2. 400万Docker镜像中,51%的镜像存在高危漏洞
  3. Head first design pattern -- 10. Iterator pattern
  4. A few pictures, take down the HTTPS
  5. Simple use of pyecharts module
  6. [azure redis cache] discussion on the functionality of azure redis
  7. Installation of SVN under Linux
  8. Sorting out knowledge points of MySQL Cluster
  9. rocketmq-cpp-client Visual Studio 2019 编译
  10. rocketmq-cpp-client Visual Studio 2019 编译
  11. RBAC authorization mode of k8s
  12. Remember to use it once Asp.Net The development process of core webapi 5.0 + dapper + MySQL + redis + docker
  13. Java Concurrent Programming points
  14. Explain Java I / O flow in detail
  15. Linux system builds springboot project environment and deploys it
  16. Easy to understand JS object-oriented, by the way understand prototype and__ proto__
  17. Summary of java basic knowledge
  18. . net cloud native architect training camp (module 2 basic consolidation rabbitmq mastransit detailed explanation) - learning notes
  19. The architecture of MySQL
  20. MySQL security management, database maintenance and performance improvement
  21. Redis basic command
  22. Summary of MySQL articles
  23. 2、 Create k8s cluster in 5 seconds
  24. data自定义属性在jQuery中的用法
  25. Linux常见解压缩
  26. Detailed explanation of HBase basic principle
  27. 1、 Why and how to learn k8s
  28. Java advanced (29) -- HashMap set
  29. java中大文件上传
  30. Weblogic 2017-3248 analysis of Java Security
  31. Kubernetes official java client 8: fluent style
  32. Explain the function of thread pool and how to use it in Java
  33. Programming software tutorial video Encyclopedia: C + + / Java / Python / assembly / easy language (with tutorial)
  34. Description of dependency problem after javacv is updated to 1.5. X and how to reduce the size of dependency package
  35. Java reflection & dynamic agent
  36. Building Apache 2.4 + php7 + mysql8 in centos7 environment
  37. Summary of Java multithreading (1)
  38. Oracle AWR report generation
  39. Four magic functions of mybatis, don't step on the pit!
  40. A 16-year-old high school student successfully transplanted Linux to iPhone and posted a detailed guide
  41. Centos7 one click installation of JDK1.8 shell script
  42. Mounting of file system in Linux (centos7)
  43. How does serverless deal with the resource supply demand of k8s in the offline scenario
  44. Detailed explanation of HBase basic principle
  45. Spring security oauth2.0 authentication and authorization 4: distributed system authentication and authorization
  46. Redis performance Part 5 redis buffer
  47. JavaScript this keyword
  48. Summary of Java multithreading (3)
  49. Sentry(v20.12.1) K8S 云原生架构探索, SENTRY FOR JAVASCRIPT 手动捕获事件基本用法
  50. Sentry(v20.12.1) K8S 云原生架构探索, SENTRY FOR JAVASCRIPT 手动捕获事件基本用法
  51. (10) Spring from the beginning to the end
  52. Summary of Java multithreading (2)
  53. Spring source notes! From the introduction to the source code, let you really understand the source code
  54. A stormy sunny day
  55. Zookeeper (curator), the implementation of distributed lock
  56. Show the sky! Tencent T4's core Java Dictionary (framework + principle + Notes + map)
  57. Spring boot project, how to gracefully replace the blank value in the interface parameter with null value?
  58. Spring boot project, how to gracefully replace the blank value in the interface parameter with null value?
  59. docker+mysql集群+读写分离+mycat管理+垂直分库+负载均衡
  60. docker+mysql集群+读写分离+mycat管理+垂直分库+负载均衡