Hello everyone , I'm Xiaoyu

Recently, I used it in my work Hbase This database , By the way, I did something about Hbase To share with you . Actually Hbase There are really many, many content systems , Here are some technical points that Xiaoyu thinks will be used in his work , I hope I can help you .

It can be said that the Internet is built on all kinds of databases , Now there are several mainstream databases : With MySQL As a representative of the relational database and its distributed solution , With Redis To represent the cache database , With ES To represent the retrieval database , And then distributed persistence KV database . But in the field of open source , Especially at home ,HBase Almost distributed persistence KV The preferred solution of database .HBase There are a lot of business scenarios , such as User portrait 、 real time ( offline ) recommend 、 Real time risk control 、 social contact Feed flow 、 Product history orders 、 Social chats 、 Monitoring system and user behavior log wait .

Preface

Every one of us, no matter what technology we use , It will generate a lot of data , And the storage and query of these data is difficult to meet our needs for a small database , So it's happening HBase Distributed big data .HBase It's built on Hadoop Column oriented on file systems Database management system .HBase It's something like Google’s Big Table Data model of , It is Hadoop Part of the ecosystem , It stores data in HDFS On , The client can use the HBase Realize to HDFS On the data Random access . It mainly has the following characteristics :

Complex transactions are not supported , Only support Row level transactions , That is, the reading and writing of single line data are atomic ;

Because of the adoption of HDFS As underlying storage , So and HDFS equally , Support for structured 、 Semi structured and unstructured storage ;

Supported by Add machines Expand horizontally ;

Support Data fragmentation ;

Support RegionServers Between Automatic failover ;

Easy to use Java client API;

Support BlockCache and The bloon filter ;

Filter support Predicate push-down .

HBase principle

Concept

HBase Is distributed 、 Column oriented open source database ( Actually, to be exact For column families ).HDFS by Hbase Provide reliable Underlying data storage services ,MapReduce by Hbase High performance Ability to calculate ,Zookeeper by Hbase Provide Stable service and Failover Mechanism , So we say Hbase It is a solution to high-speed storage and reading of massive data through a large number of cheap machines Distributed database solutions .

The column type storage

Let's take a look at the previous The row storage of relational database Of . Here's the picture :

You can see only the first line ID:1 Xiaoyu has filled in all the data in this line , Xiaona and Xiaozhi didn't complete the data . In our row structure , It's all fixed , Every row is the same , Even if you don't fill in , It has to be empty, too , Can't be without .

Take a look at the use of Column storage of non relational database Effect diagram :

You can see that a column of data of Xiaoyu before corresponds to a row of data of Xiaoyu now , The original seven column data of Xiaoyu has become the current seven rows . The previous seven lines are in one line , Shared a primary key ID:1 . In column storage , It's seven lines , Each row has a primary key corresponding to it , That's why Xiaoyu's primary key ID:1 Seven times . The biggest advantage of this arrangement is , We don't need to add data that we don't need , It's going to be big Save our space resources . Because the selection rule in the query is through Column to define Of , The whole database is Automatic indexing Of .

NoSQL Compared with relational database

Compare it with the picture below

RDBMS And Hbase contrast

Hbase It's based on Column family To store data . There can be a lot of columns down there , The families are Create table You have to specify . In order to deepen Hbase Understanding of lineage , Here are the tables and Hbase Table of database :

The main difference

HBase framework

Hbase By Client、Zookeeper、Master、HRegionServer、HDFS And so on .

Client

Client Use HBase Of RPC Mechanism and HMaster、HRegionServer communicate .Client And HMaster Conduct Management communication , And HRegion Server Conduct Data operation communication .

Zookeeper

Hbase adopt Zookeeper To do it master Of High availability 、RegionServer Of monitor 、 The entrance of metadata and the maintenance of cluster configuration Etc . The specific work is as follows :

1. adopt Zoopkeeper To ensure that there is only one in the cluster 1 individual master Running , If master abnormal , Will pass Competition mechanism Generate new master Provide services

2. adopt Zoopkeeper To monitor RegionServer The state of , When RegionSevrer When there is an anomaly , adopt Callback Notice in the form of Master RegionServer Upper and lower limit information

3. adopt Zoopkeeper Storage of metadata Unified entry address .

Client in use hbase When , Need to add zookeeper Of ip Address and node path , Establish a relationship with zookeeper The connection of , The way to establish the connection is shown in the following code :

Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.quorum", "XXXX.XXX.XXX");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.set("zookeeper.znode.parent", "XXXXX");
Connection connection = ConnectionFactory.createConnection(configuration);

Hmaster

master The main responsibilities of a node are as follows :

1. by RegionServer Distribute Region

2. Maintain the entire cluster's Load balancing

3. Maintain the cluster's Metadata information , Found inoperative Region, And will fail Region Assign to normal RegionServer Fall for it RegionSever When it fails , Coordinate the corresponding Hlog The split

HRegionServer

HRegionServer Internal management of a series of HRegion object , Every HRegion Corresponding Table One of them ColumnFamily The storage , That is, a Store Manage a Region On the one Column family (CF). Every Store Contains a MemStore and 0 To more than one StoreFile.Store yes HBase Our storage core , from MemStore and StoreFile form .

HLog

When data is written , First write the prewrite log (Write Ahead Log), Every HRegionServer All of the services Region All of the write logs are stored in The same log file in . Data is not written directly HDFS, But wait for the cache to A certain amount of Then batch write , After writing, in the log Make a mark .

MemStore

MemStore It's an orderly Memory cache , The data written by the user is first put into MemStore, When MemStore When it's full Flush Become a StoreFile( The storage time corresponds to File), When StoreFile The number has increased to A certain threshold , Trigger Compact Merge , Will be multiple StoreFile Merge Become a StoreFile.StoreFiles After the merger, more and more big StoreFile, When Region All in StoreFiles(Hfile) Of Total size Over threshold (hbase.hregion.max.filesize) That is to trigger the split Split, Put the current Region Split Divide into 2 individual Region, Father Region Offline , new Spilt Out of 2 A child Region By HMaster Assign to the right one HRegionServer On , Make original 1 individual Region The pressure to shunt To 2 individual Region On .

Region Addressing mode

adopt zookeeper.META, There are mainly the following steps :

1. Client request ZK obtain .META. Where RegionServer Of Address .

2. Client request .META. Where RegionServer obtain Where to access the data RegionServer Address ,client Will .META. Information about cache Come down , For next quick access .

3. Client Where the request data is located RegionServer, Get the data you need .

HDFS

HDFS by Hbase Provide the ultimate Underlying data storage services , Also for Hbase Provide High availability (Hlog Stored in HDFS) Support for .

HBase Components

Column Family Column family

Column Family Also called Column family ,Hbase Dividing data storage by column family , Any number of columns can be included under the column family , Achieve flexible Data access .Hbase The column family must be specified when the table is created . Just as you must specify specific columns when creating a relational database .Hbase The more families, the better , The official recommendation is that the lineage should be less than or equal to 3. The scenarios we use are generally 1 Family of columns .

Rowkey

Rowkey The concept and mysql The primary keys in are exactly the same ,Hbase Use Rowkey Come on The only difference Data in a row .Hbase Only support 3 There are several ways to query : be based on Rowkey Of Single line query , be based on Rowkey Of Range scan , Full table scan .

Region Partition

Region:Region And the concept of relational database Partition or partition almost .Hbase The data of a large table will be based on Rowkey Of Different ranges Assign to different Region in , Every Region Responsible for a certain range of data access and storage . So even if it's a huge watch , Because of being cut into different region, It's a visit The delay is also very low .

TimeStamp Many versions

TimeStamp It's the realization of Hbase Many versions The key to . stay Hbase Using different timestame To identify the same rowkey Different versions of data corresponding to rows . When writing data , If the user does not specify the corresponding timestamp,Hbase Meeting Automatic addition One timestamp,timestamp And server time bring into correspondence with . stay Hbase in , identical rowkey According to timestamp Reverse order . The latest version is queried by default , User access Appoint timestamp To read data from older versions .

Hbase Write logic

Hbase Write process

There are three main steps :

1. Client Get data written to Region Where RegionServer

2. Request write Hlog, Hlog Stored in HDFS, When RegionServer Something unusual happened , Need to use Hlog Come on Restore data .

3. Request write MemStore, Only when writing Hlog And write MemStore It's all done Write request is complete .MemStore Later, it will gradually brush to HDFS in .

MemStore Brush set

In order to improve the Hbase Write performance of , When write request write MemStore after , Do not brush the disc immediately . But will At a certain time Carry out the operation of disk brushing . What specific scenarios will trigger the operation of disk brushing ? Summarize into the following scenarios :

1. The global parameter is control Memory as a whole Usage situation , When all memstore Occupy the whole heap Of The largest proportion When , Will trigger Brush set The operation of . This parameter is hbase.regionserver.global.memstore.upperLimit, Default to entire heap In memory 40%. However, this does not mean that the global memory triggered operation will MemStore All are delivered , But through another parameter hbase.regionserver.global.memstore.lowerLimit To control , The default is the entire heap In memory 35%. When flush To all memstore Occupy the whole heap The ratio of memory is 35% When , Just Stop scrubbing . This is mainly to reduce the impact of disk swiping on the business , Realization Smooth system load Purpose .

2. When MemStore The size of hbase.hregion.memstore.flush.size The size will trigger the brush disk , Default 128M size

3. It was said that Hlog In order to ensure Hbase Data consistency , So if Hlog Too many words , Too long time for recovery , therefore Hbase Would be right Hlog Of Limit the maximum number . When reach Hlog When the maximum number of , Will force the disc to brush . This parameter is hase.regionserver.max.logs, The default is 32 individual .

4. Can pass hbase shell perhaps java api Manual trigger flush The operation of .

5. During normal shutdown RegionServer Will trigger the operation of the brush disk , After all the data is swiped You don't need to use it anymore Hlog Restore data .

6. When RegionServer In case of failure , Above Region Meeting transfer To other normal RegionServer On , After recovery Region After the data of , Will trigger the brush disc , It will be provided for business access only after the disk is swiped .

HBase Middle layer

Phoenix yes HBase The open source SQL Middle layer , It allows you to use the standard JDBC The way to operate HBase The data on the . stay Phoenix Before , If you want to visit HBase, It can only be called Java API, But compared to using a line SQL Can realize data query ,HBase Of API Or too complicated .Phoenix The idea is we put sql SQL back in NOSQL, That is, you can use standard Of SQL You can do it right HBase Data operation on . And it also means that you can integrate Spring Data JPA or Mybatis Such as the commonly used Persistence layer frame To operate HBase.

secondly Phoenix Of Performance performance It's also excellent ,Phoenix Query engine Will SQL The query is converted to one or more HBase Scan, adopt Parallel execution To generate standard JDBC Result set . It uses... Directly HBase API And coprocessors and custom filters , Can provide for small data query Millisecond performance , Provides second level performance for queries of tens of millions of rows of data . meanwhile Phoenix Also has a Secondary indexes etc. HBase Features not available , Because of the above advantages , therefore Phoenix Become HBase Best Of SQL Middle layer .

HBase Install and use

download HBase Compressed package , First, decompression.

tar -zxvf hbase-0.98.6-hadoop2-bin.tar.gz

open hbase-env.sh File configuration JAVA_HOME:

export JAVA_HOME=/opt/modules/jdk1.7.0_79

To configure hbase-site.xml:

<configuration>
 <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hadoop-senior.shinelon.com:8020/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop-senior.shinelon.com</value>
  </property>
</configuration>

Change the host name above to your own , You can start it HBase 了 .Web Page access is as follows :

HBase command

Here are some of Xiaoyu's tidying up about Hbase It's often used command

HBase API Use

API as follows

package com.initialize;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.regionserver.BloomType;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
/**
 *  
 *  1、 Build connections
 *  2、 Take a table from the connection DDL Operating tools admin
 *  3、admin.createTable( Tables describe objects );
 *  4、admin.disableTable( Table name );
 * 5、admin.deleteTable( Table name );
 * 6、admin.modifyTable( Table name , Tables describe objects ); 
 *
 */
public class HbaseClientDemo {
    Connection conn = null;
    @Before
    public void getConn() throws IOException {
        // Build a connection object
        Configuration conf = HBaseConfiguration.create();// It will load automatically hbase-site.xml
        conf.set("hbase.zookeeper.quorum","n1:2181,n2:2181,n3:2181");
        conn = ConnectionFactory.createConnection(conf);
    }
    /**
     * DDL
     *  Create table
     */
    @Test
    public void testCreateTable() throws  Exception{
        // Construct a... From a connection DDL Operator
        Admin admin = conn.getAdmin();
        // Create a label definition description object
        HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf("user_info"));
        // Create column family definition description object
        HColumnDescriptor hColumnDescriptor_1 = new HColumnDescriptor("base_info");
        hColumnDescriptor_1.setMaxVersions(3);
        HColumnDescriptor hColumnDescriptor_2 = new HColumnDescriptor("extra_info");
        // Put the column family definition information object into the table definition object
        hTableDescriptor.addFamily(hColumnDescriptor_1);
        hTableDescriptor.addFamily(hColumnDescriptor_2);
        // use ddl Operator objects :admin To create tables
        admin.createTable(hTableDescriptor);
        // Close the connection
        admin.close();
        conn.close();
    }
    /**
     *  Delete table
     */
    @Test
    public void testDropTable() throws  Exception{
        Admin admin = conn.getAdmin();
        // Disable table
        admin.disableTable(TableName.valueOf("user_info"));
        // Delete table
        admin.deleteTable(TableName.valueOf("user_info"));
        admin.close();
        conn.close();
    }
    /**
     *  Modify the table definition -- Add a column family
     */
    @Test
    public void testAlterTable() throws  Exception{
        Admin admin = conn.getAdmin();
        // Take out the old table definition information
        HTableDescriptor tableDescriptor = admin.getTableDescriptor(TableName.valueOf("user_info"));
        // A new definition of column family is constructed
        HColumnDescriptor hColumnDescriptor = new HColumnDescriptor("other_info");
        hColumnDescriptor.setBloomFilterType(BloomType.ROWCOL);// Set the bloom filter type for the column family
        // Add column family definitions to table definition objects
        tableDescriptor.addFamily(hColumnDescriptor);
        // Give the modified table definition to admin To submit
        admin.modifyTable(TableName.valueOf("user_info"), tableDescriptor);
        admin.close();
        conn.close();
    }
}

Examples are as follows

package com.initialize;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellScanner;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
public class HbaseClientDML {
    Connection conn = null;
    @Before
    public void getConn() throws IOException {
        // Build a connection object
        Configuration conf = HBaseConfiguration.create();// It will load automatically hbase-site.xml
        conf.set("hbase.zookeeper.quorum","n1:2181,n2:2181,n3:2181");
        conn = ConnectionFactory.createConnection(conf);
    }
    /**
     *  increase , Change :put To cover
     */
    @Test
    public void testPut() throws  Exception{
        // Gets the... Of the specified table table object , Conduct DML operation
        Table table = conn.getTable(TableName.valueOf("user_info"));
        // Construct the data to be inserted as a Put type ( One put Object can only correspond to one rowkey) The object of
        Put put = new Put(Bytes.toBytes("001"));
        put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"),Bytes.toBytes(" Small feather "));
        put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("18"));
        put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes(" Chengdu "));
        Put put2 = new Put(Bytes.toBytes("002"));
        put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes(" Xiao Na "));
        put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("17"));
        put2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes(" Chengdu "));
        ArrayList<Put> puts = new ArrayList<>();
        puts.add(put);
        puts.add(put2);
        // Insert in
        table.put(puts);
        table.close();
        conn.close();
    }
    /***
     *  Loop inserts a lot of data
     */
    @Test
    public void testManyPuts() throws Exception{
        Table table = conn.getTable(TableName.valueOf("user_info"));
        ArrayList<Put> puts = new ArrayList<>();
        for(int i=0;i<10000;i++){
            Put put = new Put(Bytes.toBytes(""+i));
            put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("usernaem"), Bytes.toBytes(" Small feather " +i));
            put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes((18+i) + ""));
            put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes(" Chengdu "));
            puts.add(put);
        }
        table.put(puts);
    }
    /**
     *  Delete
     */
    @Test
    public void testDelete() throws Exception{
        Table table = conn.getTable(TableName.valueOf("user_info"));
        // Construct an object to encapsulate the data information to be deleted
        Delete delete1 = new Delete(Bytes.toBytes("001"));
        Delete delete2 = new Delete(Bytes.toBytes("002"));
        delete2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"));
        ArrayList<Delete> dels = new ArrayList<>();
        dels.add(delete1);
        dels.add(delete2);
        table.delete(dels);
        table.close();
        conn.close();
    }
    /**
     *  check
     */
    @Test
    public void  testGet() throws Exception{
        Table table = conn.getTable(TableName.valueOf("user_info"));
        Get get = new Get("002".getBytes());
        Result result = table.get(get);
        // Take a user specified... From the result key and value Value
        byte[] value = result.getValue("base_info".getBytes(), "age".getBytes());
        System.out.println(new String(value));
        System.out.println("======================");
        // Traverse all of the... In the entire row of results kv Cell
        CellScanner cellScanner = result.cellScanner();
        while(cellScanner.advance()){
            Cell cell = cellScanner.current();
            byte[] rowArray = cell.getRowArray();// Ben kv The byte array of the row key
            byte[] familyArray = cell.getFamilyArray();// Byte array of column family names
            byte[] qualifierArray = cell.getQualifierArray();// Byte array of column names
            byte[] valueArray = cell.getValueArray();//value Byte array
            
            System.out.println(" The line of key :" + new String(rowArray, cell.getRowOffset(), cell.getRowLength()));
            System.out.println(" The names of the clans :" + new String(familyArray, cell.getFamilyOffset(), cell.getFamilyLength()));
            System.out.println(" Name :" + new String(qualifierArray, cell.getQualifierOffset(), cell.getQualifierLength()));
            System.out.println("value:" + new String(valueArray, cell.getValueOffset(), cell.getValueLength()));
        }
        table.close();
        conn.close();
    }
    /**
     *  Press the row key range to query the data
     */
    @Test
    public void testScan() throws Exception{
        Table table = conn.getTable(TableName.valueOf("user_info"));
        // Contains the starting line key , Does not contain the end line key , But if you really want to find the row key at the end , You can splice an invisible character on the last line key (\000)
        Scan scan = new Scan("10".getBytes(), "10000\001".getBytes());
        ResultScanner scanner =table.getScanner(scan);
        Iterator<Result> iterator = scanner.iterator();
        while(iterator.hasNext()){
            Result result =iterator.next();
            // Traverse all of the... In the entire row of results kv Cell
            CellScanner cellScanner = result.cellScanner();
            while(cellScanner.advance()){
                Cell cell = cellScanner.current();
                byte[] rowArray = cell.getRowArray();// Ben kv The byte array of the row key
                byte[] familyArray = cell.getFamilyArray();// Byte array of column family names
                byte[] qualifierArray = cell.getQualifierArray();// Array of bytes listed
                byte[] valueArray = cell.getValueArray();//value Byte array
                System.out.println(" The line of key :" + new String(rowArray, cell.getRowOffset(), cell.getRowLength()));
                System.out.println(" The names of the clans :" + new String(familyArray, cell.getFamilyOffset(), cell.getFamilyLength()));
                System.out.println(" Name :" + new String(qualifierArray, cell.getQualifierOffset(), cell.getQualifierLength()));
                System.out.println("value:" + new String(valueArray, cell.getValueOffset(), cell.getValueLength()));
            }
            System.out.println("----------------------");
        }
    }
    @Test
    public void test(){
        String a = "000";
        String b = "000\0";
        System.out.println(a);
        System.out.println(b);
        byte[] bytes = a.getBytes();
        byte[] bytes2 = b.getBytes();
        
        System.out.println("");
    }
}

HBase Application scenarios

Object storage system

HBase MOB(Medium Object Storage), Medium object storage yes hbase-2.0.0 New features introduced in the , For resolution hbase Store medium files (0.1m~10m) The problem of poor performance . This feature is suitable for picture 、 file 、PDF、 Small video Store in Hbase in .

OLAP The storage

Kylin The bottom layer of this is HBase The storage , What I'm interested in is its High concurrency and mass storage capacity .kylin structure cube This process produces a lot of Pre aggregate intermediate data , Data inflation rate is high , There are high requirements for the storage capacity of the database .

Phoenix Is built on HBase On the one SQL engine , adopt phoenix Can be called directly JDBC Interface operation Hbase, Although there are upsert operation , But it's more about OLAP scene , The disadvantage is that Very inflexible .

Sequential data

openTsDB application , Record and display the value of indicators at each time point , Commonly used in The scene of monitoring , yes HBase An application of the upper layer .

User portrait system

Dynamic column , The properties of sparse Columns . The number of dimensions used to describe user characteristics is indefinite And it may Dynamic growth Of ( Like hobbies , Gender , Address, etc ), Not every feature dimension has data .

news / Order system

Strong consistency , Good read performance ,hbase Can guarantee Strong consistency .

feed Stream system storage

feed The stream system has Read more and write less 、 The data model is simple 、 High concurrency 、 Peak and trough visits 、 Persistent reliable storage 、 Message sequencing These characteristics , for instance HBase Of rowKey Sorting in dictionary order is just right for this scenario .

Hbase Optimize

Pre partition

By default , Creating HBase Table will automatically create a Region Partition , When importing data , be-all HBase The clients are all directed to this one Region Writing data , Until this Region Big enough to do segmentation . One way to speed up batch writing is through Create some blanks in advance Of Regions, So when the data is written HBase when , According to Region Zoning , Do data processing in the cluster Load balancing .

Rowkey Optimize

HBase in Rowkey Is in accordance with the Dictionary order storage , therefore , Design Rowkey when , Make the most of Sorting features , Store data that is read together frequently in one block , Put the data that may be accessed recently together .

Besides ,Rowkey If it's incremental generation , It is not recommended to write directly in positive sequence Rowkey, instead reverse The way reverse Rowkey, bring Rowkey Roughly balanced distribution , One of the advantages of this design is that it can RegionServer Load balancing of , Otherwise, it's easy to generate all the new data in one RegionServer The phenomenon of accumulation on the surface , That's OK combination table Design with the pre segmentation of .

Reduce the number of column families

Don't define... In one table Too much Of ColumnFamily. at present Hbase It's not very good to deal with more than 2~3 individual ColumnFamily Table of . Because of some ColumnFamily stay flush When , It's adjacent to ColumnFamily It can also be triggered by correlation effects flush, Final It causes the system to produce more Of I/O.

Cache policy

When creating a table , Can pass HColumnDescriptor.setInMemory(true) Put the watch in RegionServer Of cache in , Make sure it's read cache hit .

Set storage lifetime

When creating a table , Can pass HColumnDescriptor.setTimeToLive(int timeToLive) Set the... Of the data in the table Storage lifetime , Expired data will be automatically deleted .

Hard disk configuration

Each station RegionServer management 10~1000 individual Regions, Every Region stay 1~2G, Each Server At least 10G, Most important 1000*2G=2TB, consider 3 Backup , Must 6TB. Scheme one is to use 3 block 2TB Hard disk , The second is to use 12 block 500G Hard disk , When the bandwidth is enough , The latter can provide more Throughput rate , More granular Redundant backup , Faster single disk recovery .

Allocate appropriate memory to RegionServer service

Without affecting other services , The bigger the better . For example, in HBase Of conf In the catalog hbase-env.sh Last addition export HBASE_REGIONSERVER_OPTS="-Xmx16000m$HBASE_REGIONSERVER_OPTS”, among 16000m To assign to RegionServer Of Memory size .

The number of backups to write data

The number of backups is proportional to the read performance Proportional , Inversely proportional to write performance , And the number of backups affects High availability . There are two configurations , One is to hdfs-site.xml copy to hbase Of conf Under the table of contents , And then in it Add or modify configuration items dfs.replication The value of is the number of backups to set , This kind of modification is very important for all HBase All user tables are in effect , Another way , It's rewriting HBase Code , Give Way HBase Support For column families Set the number of backups , When the table is created , Set the number of column family backups , The default is 3, This number of backups is only for The set column family takes effect .

WAL( Prewrite log )

The switch can be set , Express HBase Do you need to write log before writing data , The default is to turn on , Turn it off Improve performance , But if the system fails ( Responsible for inserting RegionServer Hang up ), Data may be lost . To configure WAL Calling JavaAPI When writing , Set up Put Example of WAL, call Put.setWriteToWAL(boolean).

Batch write

HBase Of Put Support single insert , Batch insertion is also supported , Generally speaking Batch writing is faster , Save back and forth Network overhead . Call... On the client side JavaAPI when , First of all, the batch of Put Put one in Put list , And then call HTable Of Put(Put list ) Function to Batch write .

Last

Understanding HBase when , You can find HBase The design of is actually the same as Elasticsearch Very similar , Such as HBase Of Flush&Compact Mechanism design and Elasticsearch cut from the same cloth , So it's easier to understand .

essentially ,HBase The location is Distributed storage system ,Elasticsearch yes Distributed search engine , The two are not the same , But the two are complementary .HBase Our search capabilities are limited , Only based on RowKey The index of , Other advanced features such as secondary index need to be developed by ourselves . therefore , There are some cases that combine HBase and Elasticsearch Realization Storage + Search for The ability of . adopt HBase make up Elasticsearch Lack of storage capacity , adopt Elasticsearch make up HBase Lack of search capability .

Actually , It's not just HBase and Elasticsearch. Any kind of distributed framework or system , They all have certain Generality , The difference lies in their respective Different concerns . Xiaoyu's feeling is , When learning distributed middleware , The core concerns should be clarified first , Compare with other middleware , Extract commonalities and characteristics , Further understanding .


Recommended reading

Graphic, : Ali's favorite 【 Little rabbit 】RabbitMQ How to cultivate students
I told my girlfriend about encryption algorithm over the weekend , little does one think …
Graphic, :Kafka What are the secrets that make me love it ?
Graphic, : How to explain to your girlfriend what micro service is ?
dried food !MySQL Optimization principle analysis and optimization scheme summary