Learning notes of Hadoop peripheral components

Data agency 2021-01-22 15:57:15
learning notes hadoop peripheral components

brief introduction

Hadoop It is a well-known excellent open source distributed file storage and processing framework , It makes it easy for users to develop applications that deal with massive amounts of data , Its main advantages are :

high reliability :Hadoop The ability to store and process data bit by bit is trustworthy .

High scalability :Hadoop To allocate data among available computer clusters and complete computing tasks , These clusters can be easily extended to a number of nodes .

Efficiency :Hadoop The ability to move data dynamically between nodes , And ensure the dynamic balance of each node , So it's very fast .

High fault tolerance :Hadoop Automatically save multiple copies of data , And automatically reallocate failed tasks .

Low cost : And integrated machine 、 Business data warehouse is better than ,Hadoop Open source software costs less .

As the version evolves ,Hadoop Achieve better resource isolation and add other features , Here's a comparison Hadoop 1.0 and 2.0 The differences in the features of different versions ,Hadoop 1.0 from HDFS and MapReduce Two systems make up , There are several disadvantages :

Static resource allocation : That is, each node is configured with available slot total , these slot Once the number is started, it cannot be dynamically modified ;

Resources can't be shared : take slot It is divided into Map slot and Reduce slot Two kinds of , And it's not allowed to share ;

Too large a resource partition : Based on no category slot The granularity of resource partition is still too rough , The resource utilization of nodes is often too high or too low ;

There is no effective resource isolation mechanism : The adoption is based on jvm The resource isolation mechanism of , Too rough , A lot of resources , Such as CPU There's no way to isolate , This will cause serious interference between tasks on the same node .

Hadoop 2.0 from HDFS、MapReduce and YARN Three systems , among YARN It's a resource management system , Responsible for cluster resource management and scheduling ,2.0 in YAR Allow each node to (NodeManager) Configure available CPU And total memory resources , The central scheduler allocates these resources to the application .


HDFS(Hadoop Distributed File System) ,Hadoop Distributed file system on , fit PB Storage of a large amount of data , Extensibility is strong , High fault tolerance ( Default 3 copy ).

As shown in the figure HDFS yes Master/Slave structure , Yes NameNode、Secondary NameNode、DataNode These characters , To understand its architecture and working principle, we need to clarify the concepts :

NameNode:Master node , Manage block mapping ; Handle client read and write requests ; Configure replica policy ; management HDFS The namespace of ;

Secondary NameNode: Share responsibility namenode workload ; yes NameNode Cold backup of ; Merge fsimage and fsedits And then send it to namenode.

DataNode:Slave node , Responsible for the storage client Data block sent block; Read and write data block .

Hot backup :b yes a Hot backup of , If a Broken . that b Run now instead of a The job of .

Cold backup :b yes a Cold backup of , If a Broken . that b It can't be replaced immediately a Work . however b On storage a Some information , Reduce a The loss after failure .

Fsimage: Metadata image file ( The directory tree of the file system .)

edits: Operation log of metadata ( Modification records made for the file system )

frame :HDFS A cluster consists of a large number of DataNode form , Nodes between different racks communicate through switches ,HDFS Through the rack awareness strategy , send NameNode Be able to identify each DataNode The rack to which it belongs ID, Use the replica storage policy , To improve the reliability of data 、 Availability and utilization of network bandwidth .

Data blocks (block):HDFS The most basic storage unit , Default 128M, Users can set their own .

Metadata : finger HDFS File system , File and directory attribute information .HDFS The image file is used in the implementation (Fsimage) + Log files (EditLog) Backup mechanism of . The image file contains : Modification time 、 Access time 、 Block size 、 Storage location information of the data blocks that make up the file . The contents of the image file of the directory include : Modification time 、 Access control rights and other information . The log file records :HDFS Update operation for .NameNode When it starts , The contents of the image file and the log file are merged in memory . Update the metadata in memory to the latest state .

HDFS Reading documents

1) The client explicitly calls open() Function to open a file .

2) Backstage through RPC call NN service , Get the file block information of the file to be opened and the data node where the file is located .

3) The client explicitly calls read() function , Read data from the first block , And select the copy closest to the client .

4) After selecting the copy closest to the client , Client directly from DN Reading data .

5) After reading the current data block, continue to connect to the next copy of this file DN.

6) After reading the data , The client explicitly calls close() function . Relative to reading local file system data ,HDFS The process of reading data is complicated , But for clients , The only functions that need to be explicitly called are open()、read() and close(), Basically the same as reading data from a local file system .

HDFS Writing documents

1) Client calls create() To create a file .

2) Backstage through RPC call NN service , Create a new file in the namespace of the file system .

3) The client starts writing data . Write the data to the local temporary file first , When it comes to 1 When it's a block size , The client will be from NN obtain 1 individual DN list , At the same time, the background will cut the file block into multiple packets (packet).

4) Every packet Pipelined to NN Back to DN.

5) the last one DN For each packet, Back in the opposite direction of the write pipeline ACK, confirm packet Successfully wrote all DN.

6) Client calls close() Function close file , All remaining data will be written to DN, And close with DN The connection of .

7) notice NN Write completed .

HDFS The writing process is just as complicated , But the client only needs to operate create()、write() and close() function , Similar to writing local file system data .


MapReduce It's simple , Easy to implement and extensible . It makes it easy to write programs that run on multiple hosts at the same time , have access to Ruby、Python、PHP and C++ Such as the Java Class language map and reduce Program .MapReduce It's good for dealing with large data sets , Because it will be processed by multiple hosts at the same time , It's usually faster .

stay Hadoop in , Used to perform MapReduce There are two machine roles for tasks : One is JobTracker; The other is TaskTracker.JobTracker It's for scheduling work ,TaskTracker It's for the execution of work . One Hadoop There's only one in the cluster JobTracker. stay hadoop in , Every MapReduce The task is initialized as a Job, Every Job It can be divided into two stages :map Phase and reduce Stage .

Hadoop The core of data processing is MapReduce Programming model . One Map/Reduce Homework (job) The input data set is usually divided into several data blocks , For individual data blocks , from map Mission (task) Processing them in full parallel . The framework will be right map Sort the output of , Then enter the results into reduce Mission . Usually, the input and output of jobs are stored in the file system . therefore , Programming is mainly mapper Phase and reducer Stage .

chart . MapReduce Control flow and data flow

chart . MapReduce Data flow

Word count

Calculate the frequency of each word in the file . The output is sorted in alphabetical order of words .

Data De duplication



MapReduce The result of weight removal

2017-12-9 a

2017-12-9 b

2017-12-9 a

2017-12-10 b

2017-12-10 a

2017-12-9 b

2017-12-11 c

2017-12-11 b

2017-12-10 a

2017-12-12 d

2017-12-12 d

2017-12-10 b

2017-12-13 a

2017-12-13 a

2017-12-11 b

2017-12-14 b

2017-12-14 c

2017-12-11 c

2017-12-15 c

2017-12-15 d

2017-12-12 d

2017-12-11 c

2017-12-11 c

2017-12-13 a

2017-12-14 b

2017-12-14 c

2017-12-15 c

reduce The input should be data key, And yes value-list There is no demand for . When reduce Received a <key ,value-list> When you do, you will directly key Copy to output key in , And will value Set to null . therefore map The task of the stage is to adopt Hadoop After the default job input method , take value Set to key, And output directly ( Here's the... In the output value Null value ).

Single table Association

input data

Output data























































demand : according to child and parent Find the relationship between grandparents and grandchildren



stay Map Stage , Father son relationship with opposite son father relationship , At the same time in each value Prefix before - And + Mark this key-value Medium value Is it positive or negative order , After entering context.MapReduce The same key Different value value , Put together , push to Reduce Stage . stay value Array , According to the prefix , We can easily learn that , Which is grandparent, Which is child.

More than a table

input data

Output data
















Beijing 2011 2019







Beijing 2010 1962







tianjin 2011 1355







tianjin 2010 1299


Inner Mongolia





shanxi 2011 3593







shanxi 2010 3574


Ji Lin




demand : That is to say id by key do join operation






stay Map Stage , Mark all the data as <key,value> In the form of , among key yes id,value Different forms are chosen according to different sources : originate A The record of ,value The value of is "a#"+name; originate B The record of ,value The value of is "b#"+score.

stay reduce Stage , Let's put each one first key Under the value The list is split into two parts, each from the table A And table B The two part , Put them in two vectors . Then go through two vectors and do Cartesian product , Form one final result after another .

Hadoop flow

Hadoop Flow provides API Allow users to write in any scripting language map Function or reduce function .Hadoop The key to flow is , It USES UNIX Standard flow as a program and Hadoop Interface between . therefore , Any program that can read data from a standard input stream , And can write data to the standard output stream , Then you can pass Hadoop Streams are written in other languages MapReduce programmatic map Function or reduce function .

bin/Hadoop jar contrib/streaming/Hadoop-0.20.2-streaming.jar –input input –output output –mapper /bin/cat –reducer usr/bin/wc

As you can see from this example ,Hadoop The packets introduced by the stream are Hadoop-0.20.2-streaming.jar, And it has the following command :

-input Indicates the input file path

-output Indicates the output file path

-mapper To develop map function

-reducer Appoint reduce function

Hadoop How flow works

When an executable makes mapper when , every last map The task starts the executable as a separate process , And then in map When the task is running , The input is split into lines and provided to the executable , And as its standard input (stdin) Content . When the executable file runs the result ,map From standard output (stdout) Data collection , And turn it into <key,value> Yes , As map Output .

reduce And map identical , If the executable file acts as reducer when ,reduce The task starts the executable , And will <key,value> Convert to lines as standard input to the executable (stdin). then reduce Will collect the standard output of this executable (stdout) Content , And turn each line into <key,value> Yes , As reduce Output .

map And reduce Convert the output to <key,value> The default method is , Add the first tab Symbol ( tabs ) The previous content serves as key, The following content is as value. without tab Symbol , So all the content of this line as key, and value The value is null, You can change .

Hadoop The command of flow

Hadoop The specific contents of the stream command are shown in the following table :


Optional / Mandatory

Parameter interpretation

-input <path>


Enter the data path . Specify job input ,path It can be a file or a directory , have access to * wildcard ,-input Option can be used to specify multiple files or directories multiple times as input .

-output <path>


Output data path . Specify the job output directory ,path Must not exist , And the user executing the job must have permission to create the directory ,-output Can only be used once .



mapper Executable program or Java class , Must be specified and unique .



reducer Executable program or Java class , Must be specified and unique .

-file <file>


Distribute local files

-cacheFile <file>


distribution HDFS file

-cacheArchive <file>


distribution HDFS Compressed files

-numReduceTasks <num>


Appoint reduce The number of tasks . If you set -numReduceTasks 0 perhaps -reducer NONE There is no reducer Program ,mapper The output of is directly used as the output of the whole job .

-jobconf | -D NAME=VALUE


Job configuration parameters . Specify job parameters ,NAME Is the parameter name ,VALUE Is the parameter value , Parameters that can be specified refer to hadoop-default.xml. It is especially recommended to use -jobconf mapred.job.name='My Job Name' Set the job name , Use -jobconf mapred.job.priority=VERY_HIGH | HIGH | NORMAL | LOW | VERY_LOW Set job priority , Use -jobconf mapred.job.map.capacity=M Set up to run at the same time M individual map Mission , Use -jobconf mapred.job.reduce.capacity=N Set up to run at the same time N individual reduce Mission .



Combiner Java class



Partitioner, How to handle the output file Java class



InputFormat, How to handle the input file Java class



OutputFormat Java class



InputReader To configure

-cmdenv <n>=<v>


Pass to mapper and reducer Environment variables of

-mapdebug <path>


mapper Running on failure debug Program

-reducedebug <path>


reducer Running on failure debug Program



Detailed output mode

Hadoop The circulation command is used to configure Hadoop The flow of Job Of . It should be noted that , If you use this part of the configuration , It must be placed before the stream command configuration , Otherwise the command will fail .


Optional / Mandatory


-archives <comma separated list of archives>


Separate the unfilled files in the calculation with commas . Only for JOB.

-conf <configuration file>


Develop the configuration file for the application Specify an application configuration file.

-D <property>=<value>


Use the given property value .

-files <comma separated list of files>


Comma separated files , copy to Map reduce machine , Only for JOB

-jt <local> or <resourcemanager:port>


To specify a ResourceManager. Only for JOB.

-libjars <comma seperated list of jars>


Will be separated by commas jar The path contains to classpath In the middle , Only for JOB.

Hadoop Streaming Advantages and disadvantages

advantage :

  1. You can write it in your favorite language MapReduce Program ( You don't have to use Java)
  2. It doesn't need to be like writing Java Of MR The program is like that import A bunch of Libraries , Do a lot of configuration in the code , A lot of things are abstracted to stdio On , A significant reduction in the amount of code .
  3. Because there is no library dependency , Debug is convenient , And can be separated from Hadoop First, simulate and debug with pipeline locally .

shortcoming :

  1. It can only be controlled by command line arguments MapReduce frame , Unlike Java Can be used in the code API, The control is weak .
  2. Because there's a layer of processing in the middle , Efficiency will be slower .

Hadoop Streaming It's more suitable for some simple tasks , For example, use Python Write a script with only one or two hundred lines . If the project is complex , Or it needs to be optimized in detail , Use Streaming It's easy to see places that are tied up .


In the absence of YARN Before ,Hadoop 1.0 Version time , MapReduce Do a lot of things ,Job Tracker( Job tracker ) Do both resource management and task scheduling / monitor ,Task Tracker The division of resources is too rough ,MapReduce Achieve task assignment 、 Resource allocation 、 The framework of batch computing is as follows :

First, the user program (JobClient) Submitted a job,job Your message will be sent to Job Tracker in ,Job Tracker yes Map-reduce The center of the frame , He does the assignment first , You need to know where the data is distributed , It means to be with HDFS Metadata Server Communications ; Secondly, according to the data distribution , Assign tasks to real machines , There are basically two steps , Determine which machines are alive first 、 How much resources are left ; in addition , According to their distribution with the data , Make a task assignment strategy ; Then start to assign tasks , To put MapReduce The logic is distributed to each machine ; then , We need to monitor all the machines Mapper、Reducer The task progress of the instance , If it fails, recycle and reallocate resources , Specify the data index , Re execution

Sum up : He needs to communicate regularly with the machines in the cluster (heart beat), Need to manage which programs should run on which machines , Need to manage all job Failure 、 Restart and other operations , Also need to communicate with the data metadata Center , Understand the data distribution and so on .

TaskTracker yes Map-reduce Each machine in the cluster has a part , What he does is mainly to monitor the resources of his own machine .TaskTracker At the same time, monitor the current machine's tasks Health .TaskTracker We need to get this information through heart beat Send to JobTracker,JobTracker This information will be collected for the new submission job Assign which machines to run on .

risk :JobTracker yes Map-reduce Centralized processing point of , There are many things to manage , There is a single point of failure , State consistency is not guaranteed , It's hard to do Secondary Standby, It's not convenient to do HA,JobTracker Too many tasks lead to too much resource consumption , When MR job A lot of times , This can cause significant memory overhead , Also increased JobTracker failed risk , This is also the industry generally summed up the old Hadoop Of Map-Reduce Can only support 4000 The upper limit of the node host .

stay TaskTracker End , With map/reduce task Number as a measure of resources is too simple , Not considered CPU/ Memory usage , If two large memory consumption task It's been dispatched to a place , It's easy to show up OOM. stay TaskTracker End , Force resources into map task slot and reduce task slot, If there are only map task Or just reduce task When , It wastes resources , That is the problem of cluster resource utilization mentioned above .

At the source level , You'll find the code very hard to read , Often because of a class Too much has been done , The amount of code is up to 3000 Multiple lines cause class My task is not clear , increase bug The difficulty of repair and version maintenance .

From an operational point of view MapReduce The framework is of any importance / Unimportant changes ( for example bug Repair , Performance improvement and characterization ) when , Will force system level upgrades . To make matters worse , Force each client of the distributed cluster system to update at the same time . These updates will allow users to verify that their previous applications are new Hadoop And waste a lot of time , In the long run, this has to be removed from the list of excellent open source systems .

Hadoop 2.0 Medium MapReduce It deals with data analysis ,YARN As a resource manager ,YARN Mainly by Resource Manager、Node Manager、Application Master and Container Etc , It's a master/slave structure , Pictured :

Resource manager yes master,Node manager yes slave node .Resource manager Be responsible for each node manger Manage and schedule resources in a unified way . When a user submits an application , You need to provide a... To track and manage this program Application Master, It is responsible for Resource Manager Application resources , And ask the Node Manger Start a task that can take up some resources . Because of different Application Master Distributed to different nodes , So they don't interact with each other .

Resource Manager yes Master Last independently running process , Responsible for the unified resource management of the cluster 、 Dispatch 、 Distribution, etc ;Node Manager yes Slave Last independently running process , Responsible for reporting the status of nodes ;App Master and Container Is running on the Slave Components on ,Container yes yarn A unit in which resources are allocated , Including memory 、CPU And so on ,yarn With Container Assign resources to units .

Client towards Resource Manager Every application submitted must have a Application Master, It passes by Resource Manager After allocating resources , Run on a certain Slave Node Container in , Do things specifically Task, Also run with a certain Slave Node Container in .RM,NM,AM Even ordinary Container Communication between , It's all used RPC Mechanism .

Resource manager

RM Is a global Explorer , There is only one role in the cluster , Responsible for resource management and allocation of the whole system , Including processing client requests 、 start-up / monitor APP master、 monitor node manager etc. . It is mainly composed of two components : Scheduler (Scheduler) And Application Manager (Applications Manager,ASM).


Scheduler according to capacity 、 Restrictions such as queues ( For example, each queue allocates a certain amount of resources , Perform a certain number of tasks at most ), Allocate resources in the system to running applications . It should be noted that , The scheduler is a “ Pure scheduler ”, It no longer does any work related to specific applications , For example, not responsible for monitoring or tracking the execution status of the application , It is also not responsible for restarting failed tasks caused by application execution failure or hardware failure , It's all up to the application Application Master complete .

The scheduler allocates resources only according to the resource requirements of each application , And the resource allocation unit uses an abstract concept “ Resource containers ”(Resource Container, abbreviation Container) Express ,Container Is a dynamic resource allocation unit , It will memory 、CPU、 disk 、 Network and other resources are encapsulated together , So as to limit the amount of resources used by each task . Besides , The scheduler is a pluggable component , Users can design new schedulers according to their own needs ,YARN Provides a variety of directly available schedulers , such as Fair Scheduler and Capacity Scheduler etc. .

Application Manager (Application Master)

Application manager is responsible for managing all the applications in the whole system , Including application submission 、 Negotiate resources with the scheduler to start Application Master、 monitor Application Master Run state and restart it in case of failure, etc .

Application Master(AM)

  • management YARN Each instance of the application running in .
  • Complete data segmentation , And request resources for the application and further assign them to internal tasks .
  • Responsible for coordination from resource manager Resources for , And pass node manager Monitor easy execution and resource usage .

Node Manager(NM)

Node manager The whole cluster has multiple , Responsible for resources and usage on each node . Responsible for resource management and tasks on a single node , The treatment comes from resource manager The order of , Processing from domain app master The order of .Node manager Managing Abstract containers , These abstract containers represent specific programs that use resources for each node .Node manager Regularly to RM Report the resource usage and each Container Operating state (CPU And memory )


Container yes YARN Resource abstraction in , It encapsulates the multi-dimensional resources on a node , Such as memory 、CPU、 disk 、 Network, etc , When AM towards RM When applying for resources ,RM by AM The returned resources are used Container It means .YARN Each task is assigned a Container, And this task can only use this Container Resources described in . It should be noted that ,Container differ MR v1 Medium slot, It is a dynamic resource partition unit , It is dynamically generated according to the requirements of the application . So far, ,YARN Support only CPU And memory , And use lightweight resource isolation mechanism Cgroups Isolate resources ;



In the age of big data , Application services consist of many independent programs , These independent programs run in all kinds of 、 On the ever-changing hardware , How to make multiple independent programs in an application work together is a difficult thing . Developing such an application is easy for developers to fall into the logic of how to make multiple programs work together , Finally, there is no time to think about and implement the application logic ; Or developers don't pay enough attention to collaborative logic , Developed a simple and fragile master coordinator , Leading to unreliable single failure point .

ZooKeeper It's a distributed system / The coordinator of software , Its design guarantees the robustness of distributed programs , So that application developers can pay more attention to the logic of the application itself , Instead of working together ,ZK It's the manager of the cluster , Monitoring the status of each node in the cluster , According to the feedback of the node, the next reasonable operation is carried out , Finally, it will be easy to use interface and performance efficient 、 A stable system is provided to the user .

Distributed applications can be based on ZooKeeper Implementation such as data publishing / subscribe 、 Load balancing 、 Naming service 、 Distributed coordination / notice 、 Cluster management 、Master The election 、 Configuration maintenance , Name service 、 Distributed synchronization 、 Distributed lock and distributed queue .

Hadoop Use in Zookeeper Event handling ensures that the entire cluster has only one NameNode, Store configuration information, etc ;HBase Use Zookeeper Event handling ensures that the entire cluster has only one HMaster, To detect HRegionServer Online and down , Store access control lists, etc ;Kafka Use ZK Monitoring node status 、 Store offset , And choose the main broker.



To put it simply ,zookeeper= file system + A notification mechanism .

file system :Zookeeper Maintain a file system like data structure :

Each subdirectory entry is like NameService It's called znode, Like the file system , Be able to freely increase 、 Delete znode, In a znode Add... Below 、 Delete child znode, The only difference is znode It can store data ,znode There are four types of :

  • PERSISTENT- Persistent directory node

The client and zookeeper After disconnection , The node still exists

  • PERSISTENT_SEQUENTIAL- Persist the sequential numbering of the directory nodes

The client and zookeeper After disconnection , The node still exists , It's just Zookeeper Number the node name in order

  • EPHEMERAL- Temporary directory node :

The client and zookeeper After disconnection , The node is deleted

  • EPHEMERAL_SEQUENTIAL- Temporary sequential numbering of directory nodes

The client and zookeeper After disconnection , The node is deleted , It's just Zookeeper Number the node name in order ;

A notification mechanism : The client registers to listen to the directory nodes it cares about , When the directory node changes ( Data change 、 Be deleted 、 Add and delete subdirectory nodes ) when ,zookeeper The client will be notified .

To configure

ZK Control management through configuration file (zoo.cfg The configuration file ), Some parameters are optional , Some parameters are required . These necessary parameters constitute ZooKeeper Minimum configuration requirements for configuration documentation , Native conf Not in directory zoo.cfg;

Zoo.cfg Configuration of :
# the port at which the clients will connect

//zookeeper External communication port , No modification is required by default


# The number of milliseconds of each tick
//Zookeeper Server heartbeat time , Unit millisecond 


# The number of ticks that the initial
# synchronization phase can take
// Vote for the new leader Initialization time of 
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
//Leader And Follower The unit of maximum response time between , Response exceeded syncLimit*tickTime,Leader Think Follwer Die , Remove... From the server list Follwer
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
// Data persistence Directory , There are also nodes ID Information , You need to create and specify 
// Log save path , This directory must be created manually to specify , Otherwise, error is reported when starting .
//Session Timeout limit , If the timeout set by the client is not in this range , It will be forced to set the maximum or minimum time . default Session The timeout is in 2 *tickTime ~ 20 * tickTime This range 
# The number of snapshots to retain in dataDir
// This parameter is used with the following parameters , This parameter specifies the number of files to keep . The default is to keep 3 individual .(No Java system property)New in 3.4.0
# Purge task interval in hours
# Set to "0" to disable auto purge feature
// As mentioned above ,3.4.0 And later ,ZK The function of automatically cleaning transaction log and snapshot file is provided , This parameter specifies the cleaning frequency , In hours , You need to configure a 1 Or a larger integer , The default is 0, Indicates that the automatic cleaning function is not turned on , But it can run bin/zkCleanup.sh To manually clean up zk journal .
// To configure zookeepe The port for communication and election between nodes in the cluster , among 2888 The port number is zookeeper Listening port for communication between services , and 3888 yes zookeeper Election communication port .server.N N Representing this node ID Number , You need to manually specify the number of each node , The number can't be repeated ;
 Configure the cluster node number myid
 New file myid( stay zoo.cfg Configured dataDir Under the table of contents , Here is /home/xxx/zookeeperxxx/data), bring myid The value of and server The same number as , such as namenode Upper myid:1;datanode1 Upper myid:2, And so on ;
 To configure log4j.properties:
 stay ~/zookeeper/conf/ Under the path log4j.properties file , Need modification host And others log Path configuration information ;

zoo.cfg Optimize


1: Default jvm No configuration Xmx、Xms Etc , Can be in conf Create under directory java.env file 
export JVMFLAGS="-Xms512m -Xmx512m $JVMFLAGS"
2:log4j To configure , because zk It's through nohup Starting up , There will be one. zookeeper.out Log files , This file records the output to console Log .log4j Just configure the output to console that will do ,zookeeper.out Over time, it's going to get bigger , Put it on a large disk .
3:zoo.cfg In file ,dataDir It's for storing snapshot data ,dataLogDir It's for pre write logs . These two directories should not be configured as one path , To configure to a different disk . If the disk is used raid, The system is just a disk , It can also be configured on a disk . The part of the pre write log has a great impact on the performance of the write request , Guarantee dataLogDir The performance of the disk is good .
4:zoo.cfg In file skipACL=yes, Ignore ACL verification , It can reduce the operation of authority verification , A little bit more performance .
5:zoo.cfg In file forceSync=no, This is very helpful to improve the performance of write requests , It means that every time you write a request, the data must be sent from pagecache It's fixed to disk , It's a successful return . When the number of write requests reaches a certain level , Subsequent write requests will wait for previous write requests forceSync operation , Cause a certain delay . If we pursue low latency write requests , To configure forceSync=no, Data written to pagecache Then go back to . But when the machine goes off ,pagecache There is a risk of data loss in .
6:zk Of dataDir and dataLogDir Under the path , If not configured zk Automatic cleaning , Will continue to add data files . It can be configured as zk The system automatically cleans up data files , But if we want the highest performance of the system , It is recommended to manually clean up the files :zkCleanup.sh -n 3 So keep three documents .
7: see zk Node status . Restart zk Before and after the node , Be sure to check the status 
echo ruok | nc host port
echo stat | nc host port
8: To configure fsync.warningthresholdms=20, In milliseconds , stay forceSync=yes When , If the data is solidified to disk fsync exceed 20ms When , Will be in zookeeper.out Output one in warn journal . This is currently zk Of 3.4.5 and 3.5 Version has bug, stay zoo.cfg Configuration does not work in . My approach is to conf/java.env Add java System attribute :
export JVMFLAGS="-Dfsync.warningthresholdms=20 $JVMFLAGS"

zkcli Use

because zookeeper File system like features , therefore ,zkCli The operation of is similar to the common operation in the file system : Additions and deletions 、 Resource management 、 Authority control and so on .

Establish a session connection :./zkCli.sh -timeout 0 -r -server ip:port

-timeout: Specifies the timeout for the current session .zookeeper Depending on the heartbeat with the client to determine whether a session is valid ,timeout If the server is in timeout The heartbeat packet of the client is not received within the specified time , I think this client is invalid . Unit millisecond .

-r:read-only.zookeeper Read only mode refers to zookeeper If you lose connection with half or more servers in the cluster , The server is no longer processing client requests , But when that happens , Machines can provide read services to the outside world , In this case, you can use read-only mode .

-server: Indicates the address and port of the server you want to connect to .

Enter the interface and start using zkClient, Press h View usage help : You know ,zkClient Common operations and file systems are much the same , Including view 、 newly added 、 modify 、 Delete 、 The quota 、 Authority control, etc .

zkClient The query includes the data of the node and the status of the node . There are mainly uses stat List the status of the node ; Use get Get the data of the node ; Use ls List the child nodes of the node ; Use ls2 List the list of child nodes and the status of the node at the same time ;

Get the status of the node , Usage method :stat path

stay zookeeper in , Every write to the data node ( Like creating a node ) It's considered a business , For each transaction, the system assigns a unique id To identify this transaction ,cZxid It means business id, Indicates in which transaction the node was created ;

ctime: Indicates when the node was created ;

mZxid: Transaction at last update id;

mtime: Last update time ;

pZxid: Represents the last modified transaction in the list of child nodes of the node id( Add children to the current node , Deleting one or more child nodes from the child nodes of the current node will change the list of child nodes of the node , The data content of the modified node is not in this column );

cversion = -1,dataVersion = 0,aclVersion = 0 It's been introduced in the first blog , Represents the version of the child node list , The version of the data content ,acl edition ;

ephemeralOwner: For temporary nodes , Represents the transaction that created the temporary node id, If the current node is permanent , This value is fixed , by 0;

datalength Indicates the length of the data stored in the current node ;

numChildren Represents the number of child nodes owned by the current node ;

Get the list of child nodes and stat The node , Usage method :ls path or ls2 path

Get the data of the node , The result is the value of the current node and stat The values of the path are put together . Usage method :get path

Use delete path Delete... Without child nodes node, Delete a node that has children :rmr path;

ACL Authority control

ACL Its full name is Access Control List Access control list , Used to control access to resources .ZK Similar file system ,Client You can create it on top 、 to update 、 Delete and other rights control ,5 The operation permissions are :CREATE、READ、WRITE、DELETE、ADMIN , That is to say, increase 、 Delete 、 Change 、 check 、 Administrative authority , this 5 The kind of permission is abbreviated as crwda( The initials of each word );

In a traditional file system , By default, a file or subdirectory inherits from the parent directory ACL. And in the Zookeeper in ,znode Of ACL There is no inheritance , Every znode All of them are independently controlled , Only the client can satisfy znode When setting permission requirements for , In order to complete the corresponding operation .

Zookeeper Of ACL, There are three dimensions :scheme、id、permission, Usually expressed as scheme:id:permission,schema On behalf of authorization policy ,id Table users ,permission Table permissions .


scheme Corresponding to which scheme is used for authority management ,zookeeper Of scheme They are classified as follows :

world: Default mode , It's almost universally accessible ;

auth: Represents an authenticated user (cli Through addauth digest user:pwd To add an authorized user in the current context )

digest: Use account : Password authentication , This is also the most commonly used in business systems ;

ip: Use Ip Address the authentication

1) Add an authenticated user

addauth digest user name : Passwords plaintext

example :addauth digest user1:password1

2) Set the permissions

setAcl /path auth: user name : Passwords plaintext : jurisdiction

example :setAcl /test auth:user1:password1:cdrwa

3) see Acl Set up :getAcl /path


id It's validation mode , Different scheme,id The value of is also different .scheme by ip when ,id The value of is the client's ip Address .scheme by world when ,id The value of is anyone.scheme by digest when ,id The value of is :username:BASE64(SHA1(password)).


zookeeper Currently, the following permissions are supported :

  • CREATE(c): Create permissions , Can be in the current node Create child node, That is, to the child nodes Create operation
  • DELETE(d): Delete permission , You can delete the current node, That is, to the child nodes Delete operation
  • READ(r): Read permission , You can get the current node The data of , Sure list At present node be-all child nodes, That is, for this node GetChildren and GetData operation
  • WRITE(w): Write permissions , To the present node Writing data , That is, for this node SetData operation
  • ADMIN(a): Administrative authority , You can set the current node Of permission, That is, for this node setAcl operation


HBase It's built on HDFS Distributed column storage system on , It is mainly used for massive structured data storage , Logically speaking ,HBase Put the data according to the table , Column , Line to store .

HBase And HDFS contrast

Both have good fault tolerance and scalability , Can be extended to hundreds of nodes .HDFS Suitable for batch processing scenarios , Random search of data is not supported , Not suitable for incremental data processing , Data update not supported ;

HBase characteristic

Big : A table can have billions of rows , Millions of column .

Modeless : Each row has a sortable primary key and any number of columns , Columns can be added dynamically as needed , Different rows in the same table can have different columns .

For the column : For the column ( family ) Storage and rights control for , Column ( family ) Independent search .

sparse : For the empty (null) The column of , It doesn't take up storage space , Tables can be designed very sparsely .

Multiple versions of data : Data in each cell can have multiple storage versions , By default, the version number is assigned automatically , Is the timestamp of the cell insertion time .

Single data type :HBase The data in is a string , There is no type .

HBase framework

Client: Include access HBase The interface of , And maintain cache To speed up HBase The interview of ;

Zookeeper: Ensure that there is only one in the cluster at any time Master, Store all Region Address entry for , Real-time monitoring Region server Online and offline information of , And inform Master, Storage HBase Of schema and table Metadata ;

Master: by Region server Distribute region; be responsible for Region server Load balancing of , Found inoperative Region server And reallocate the region, Manage user pairs table The operation of adding, deleting, modifying and checking ;

Region server:Region server maintain region, Deal with these region Of IO request ;Region server Responsible for the segmentation in the process of running become too large region;

HBase Data model

HBase Is based on Google BigTable Typical of model development key/value System .

HBase Logical view

RowKey: yes Byte array, It's every record in the table “ Primary key ”, Easy and fast search ,Rowkey The design of is very important ;

Column Family: Column family , Have a name (string), Contains one or more related columns ;

Column: Belong to one of columnfamily,familyName:columnName, Each record can be added dynamically ;

Version Number: The type is Long, The default is the system timestamp , It can be customized by users ;

Value(Cell):Byte array.

surface webtable It contains two lines (com.cnn.www and com.example.www) And called contents、anchor and people The three families of columns of . For the first line (com.cnn.www),anchor There are two columns (anchor:cssnsi.com,anchor:my.look.ca), also contents Include a column (contents:html). Ben

The line of key com.cnn.www Row has 5 A version , com.example.www There is a version of the line .contents:html The column qualifier contains the entire... Of a given web site HTML.anchor Each qualifier of the column family contains the external site linked to the site represented by the row and its anchor point in its link (anchor) Text used in .people Column families represent people associated with the site .

As agreed , A column name consists of its column family prefix and qualifier . for example , Column content : html By column family contents and html The qualifier consists of . The colon character (:) Separate column families from column family qualifiers .

webtable The table is shown below :

The empty cells in this table appear in HBase Not occupying space or actually existing in . This is what makes HBase “ sparse ” Why . A tabular view is not a view HBase The only possible way to get data , Even the most accurate .

HBase Physical view

Every column family Stored in HDFS In a separate file on the , Null values are not saved .Key and Version number At every column family There's one of them ;HBase A multi-level index is maintained for each value , namely :<key, columnfamily, columnname, timestamp>; The table is divided into multiple rows in the direction of the row Region;Region yes Hbase The smallest unit of distributed storage and load balancing in , Different Region Distributed to different RegionServer On .Region Divided by size , As the data grows ,Region Growing , When it reaches a threshold ,Region It will be divided into two new Region;

Region It's the smallest unit of distributed storage , But it's not the smallest unit of storage . Every Region It contains many Store object . Every Store Contains a MemStore Or some StoreFile,StoreFile Contains one or more HFile.MemStore Store in memory ,StoreFile Stored in HDFS On .

Although in HBase In the logical view , A table is treated as a set of sparse rows , But they are physically stored by column family . New column qualifiers can be changed at any time (column_family:column_qualifier) Add to an existing column family .

ColumnFamily anchor surface :

ColumnFamily contents surface :

HBase Empty cells in logical views do not store . So the timestamp is t8 Of contents:html Requests for column values will return no values . Again , At the time stamp t9 In a anchor:my.look.ca The request for a value does not return any value . however , If a timestamp is not provided , The latest value of a specific column is returned . Given multiple versions , The most recent and the first to find , Because timestamps are stored in descending order . therefore , If no time stamp is specified , That's right com.cnn.www The request for values for all columns in will be : Time stamp t6 Medium contents:html, Time stamp t9 in anchor:cnnsi.com Value , Time stamp t8 in anchor:my.look.ca Value .

HBase surface 、 Row and column family

HBase The middle table is in schema Defined in advance , You can use the following command to create a table , The table name and column family name must be specified here . stay HBase shell The syntax for creating a table in is as follows :

HBase The line in is a logical line , The physical model is a family of columns (colomn family) Separately accessed , Row keys are unexplained bytes , The rows are in alphabetical order , The lowest order appears first in the table . An empty byte array is used to represent the beginning and end of a table namespace .

Apache HBase Columns in are grouped into column families . All column members of a column family have the same prefix . for example ,courses:history and courses:math All are courses Members of the lineage . The colon character (:) Separate column families from column family qualifiers . The column family prefix must consist of printable characters . Limit the tail , The column family qualifier can consist of any byte . Must be in schema Declare the column family in advance when defining , And columns don't need to be in schema Time definition , But it can dynamically become a column when the table starts and runs .

In Physics , All column family members are stored on the file system together . Because of tuning (tunings) And storage (storage) The specification is done at the column family level , Therefore, it is recommended that all column family members have the same general access pattern and size characteristics .

from {row key, column( =<family> + <label>), version} The only identified unit .cell There is no type of data in , All in bytecode form .

ROOT Table and META surface

HBase All of the Region Metadata is stored in .META. In the table , With Region Increase of ,.META. The data in the table will also increase , And split into several new Region. In order to locate .META. Each of the tables Region The location of , hold .META. All in the table Region The metadata of is stored in -ROOT- In the table , Finally by Zookeeper Record -ROOT- Table location information . Before all clients access user data , Need to visit first Zookeeper get -ROOT- The location of , And then visit -ROOT- Watch get .META. The location of the table , According to the .META. The information in the table determines where the user data is stored , As shown in the figure below .

-ROOT- The watch will never be divided , It has only one Region, This can ensure that you can locate any one of the three jumps at most Region. In order to speed up the visit ,.META. All of the watch Region All in memory . The client will cache the queried location information , And the cache will not fail . If the client can't access the data according to the cache information , Ask about .META. Tabular Region The server , Where to try to get data , If it still fails , Ask -ROOT- Table related .META. Where is the watch . Last , If all the previous information fails , Through ZooKeeper Repositioning Region Information about . So if all the cache on the client is invalid , You need to do 6 Back and forth on the Internet , To locate the right Region.

HBase Data model operations

HBase There are four main data operations in , Namely :Get、Put、Scan and Delete.

Get( Read ):Get Specifies the return property of the row . Read through Table.get perform .

Get The syntax of the operation is as follows :

In the following get In the command example , We scanned emp First row of table :

Read the specified column : Here's how to use get Operation to read the specified column Syntax :

The example given below is for reading HBase Specific columns in the table :

Put( Write ):Put You can add new rows to the table ( If the item is new ) Or you can update an existing row ( If the item already exists ).Put The operation passes Table.put(non-writeBuffer) or Table.batch(non-writeBuffer) perform .

Put The command for the operation is as follows , In this grammar , You need to indicate the new value :

The new given value will replace the existing value , And update the line .

Put The operation sample : hypothesis HBase There is a table in EMP Have the following data :

The following command names the employee “raju” The city value of is updated to “Delhi”

The updated table is as follows :

Scan( scanning ):Scan Allows iterations of specified attributes over multiple rows .Scan The syntax of the operation is as follows :

Delete( Delete ):Delete Operation is used to delete a row from a table .Delete adopt Table.delete perform .HBase Data will not be modified , So by creating something called tombstones To handle Delete operation . these tombstones, And useless value , All cleaned up in heavy compaction . Use Delete The syntax of the command is as follows :

Here's an example of deleting a specific cell :

Delete all cells of the table : Use “deleteall” command , You can delete all cells in a row . The following is deleteall The syntax of the command :

Here is the use of “deleteall” Command deletion emp surface row1 An example of all the units of .

Use Scan Command validation table . The snapshot of the deleted table is as follows :

HBase Read and write flow

HBase Use MemStore and StoreFile Store updates to tables . The data is first written to HLog and MemStore.MemStore The data in is sorted , When MemStore Accumulate to a certain threshold , I'm going to create a new one MemStore, And will be old MemStore Add to Flush queue , By a separate thread Flush To disk , Become a StoreFile. meanwhile , The system will be in Zookeeper Record a CheckPoint, Indicates that the data changes before this time have been persistent . When the system has an accident , May lead to MemStore Data loss in , At this time to use HLog To restore CheckPoint Later data .

StoreFile Is read-only , Once created, it cannot be modified . therefore Hbase The update of is actually a continuous operation . When one Store Medium StoreFile When a certain threshold is reached , There will be a merge operation , Will be for the same key The changes are combined , Form a big one StoreFile. When StoreFile After a certain threshold value is reached , I'll be right again StoreFile Carry out segmentation operation , Divide equally into two StoreFile.

Write operations

1、Client adopt Zookeeper The scheduling , towards RegionServer Send a write data request , stay Region Write data in .

2、 Data is written to Region Of MemStore, until MemStore Reach a preset threshold .

3、MemStore The data in is Flush Become a StoreFile.

4、 With StoreFile The increasing number of documents , When its quantity increases to a certain threshold , Trigger Compact Merge operation , Will be multiple StoreFile Merge into one StoreFile, At the same time, version merging and data deletion .

5、StoreFiles Through constant Compact Merge operation , Gradually form a bigger and bigger StoreFile.

6、 Single StoreFile When the size exceeds a certain threshold , Trigger Split operation , Put the present Region Split become 2 A new one Region. Father Region Will be offline , new Split Out of 2 Height Region Will be HMaster Assign to the corresponding RegionServer On , Make original 1 individual Region The pressure is diverted to 2 individual Region On .

It can be seen that HBase Only add data , All updates and deletions are done later Compact In the course of , Make the user's write operation as long as the memory can be returned immediately , Realized HBase I/O The high function of .

Read operations

1、Client visit Zookeeper, lookup -ROOT- surface , obtain .META. Table information .

2、 from .META. Look up , Get the Region Information , To find the corresponding RegionServer.

3、 adopt RegionServer Get the data you need to find .

4、Regionserver The memory of is divided into MemStore and BlockCache Two parts ,MemStore Mainly used to write data ,BlockCache Mainly used to read data . Read the request first MemStore Middle search data , If you can't find it, you will find BlockCache Intermediate investigation , If we don't find out, we'll find StoreFile To read a , And put the results of the reading in BlockCache.

5、 Addressing process :client-->Zookeeper-->-ROOT- surface -->.META. surface -->RegionServer-->Region-->client



Hive It's based on Hadoop Data warehouse infrastructure on . It provides a series of tools for data extraction, transformation and loading (ETL), It's a way to store 、 Queries and analysis are stored in Hadoop The mechanism of large-scale data in .Hive A simple class is defined SQL query language , be called HQL, It allows familiarity SQL User query data for . meanwhile , The language also allows familiarity MapReduce Developer's development custom mapper and reducer To handle built-in mapper and reducer Complex analysis that can't be done .

Hive Its structure can be divided into the following parts :

① The user interface : Include CLI, Client, WUI

② Metadata Store : Usually stored in a relational database such as mysql, derby in

③ Interpreter 、 compiler 、 Optimizer 、 actuator

④ Hadoop: use HDFS For storage , utilize MapReduce Calculate

① There are three main user interfaces :CLI,Client and WUI. The most common one is CLI,Cli When it starts , It will start one at the same time Hive copy .Client yes Hive The client of , The user is connected to Hive Server. Start up Client In mode , Need to point out that Hive Server The node , And start at this node Hive Server.WUI It's through a browser Hive.

② Hive Store metadata in a database , Such as mysql、derby.Hive The metadata in includes the name of the table , The columns and partitions of the table and their properties , Table properties ( External table or not ), Table data directory, etc .

③ Interpreter 、 compiler 、 Optimizer complete HQL Lexical analysis of query statements 、 Syntax analysis 、 compile 、 Optimization and generation of query plan . The generated query plan is stored in HDFS in , And then MapReduce Calls to perform .

④ Hive Data stored in HDFS in , Most queries are made by MapReduce complete ( contain * Query for , such as select * from tbl Will not generate MapRedcue Mission ).

Hive and RDB similarities and differences

  • query language : because SQL It is widely used in data warehouse , therefore , Specifically for Hive The features of the design class SQL Query language of HQL. be familiar with SQL Developers can easily use Hive Development .
  • Data storage location :Hive It's based on Hadoop Above , all Hive The data is stored in HDFS Medium . Database can store data in block device or local file system .
  • data format :Hive There is no specific data format defined in , Data format can be specified by user , User defined data formats need to specify three properties : Column separator ( Usually a space 、”\t”、”\x001″)、 Line separator (”\n”) And how to read file data (Hive There are three file formats by default in TextFile,SequenceFile as well as RCFile). Because in the process of loading data , No need to go from user data format to Hive Conversion of defined data formats , therefore ,Hive No changes will be made to the data itself during loading , And just copy or move the data content to the corresponding HDFS Directory . And in the database , Different databases have different storage engines , Defined its own data format . All data will be stored in a certain organization , therefore , The process of loading data in a database is time consuming .
  • Data update : because Hive It is designed for data warehouse application , The content of data warehouse is read more and write less . therefore ,Hive Overwrites and additions to data are not supported , All data is determined at load time . And the data in the database usually needs to be modified frequently , So you can use INSERT INTO ... VALUES Add data , Use UPDATE ... SET Modifying data .
  • Indexes : I've said before ,Hive There is no processing of the data during loading , It doesn't even scan the data , Therefore, there is no data Key Index .Hive To access a specific value in the data that meets the criteria , Need to brutally scan the whole data , So access latency is high . because MapReduce The introduction of , Hive Data can be accessed in parallel , So even if there's no index , Access to large amounts of data ,Hive It still shows the advantages . In the database , Usually one or more columns are indexed , So access to a small amount of data with specific conditions , Databases can be very efficient , Lower delay . Because of the high latency of data access , To determine the Hive Not suitable for online data query .
  • perform :Hive Most queries in are executed through Hadoop Provided MapReduce To achieve ( similar select * from tbl The query for does not need MapReduce). The database usually has its own execution engine .
  • Execution delay : Mentioned before ,Hive When querying data , Because there is no index , The entire table needs to be scanned , So the delay is high . Another cause Hive The high delay factor is MapReduce frame . because MapReduce It has a high latency of its own , So I'm using MapReduce perform Hive When inquiring , There will also be higher delays . Relative , Database execution latency is low . Of course , This low is conditional , That is, the data scale is small , When the scale of data exceeds the processing capacity of database ,Hive The advantages of parallel computing are obvious .
  • Extensibility : because Hive It's based on Hadoop Above , therefore Hive The scalability of is and Hadoop The scalability of is consistent ( The biggest in the world Hadoop Cluster in Yahoo!,2009 The scale of the year is 4000 Station node left and right ). And the database ACID The strict limitation of semantics , Expansion lines are very limited . The most advanced parallel database at present Oracle In theory, the expansion ability is only 100 Around the table .
  • Data scale : because Hive Build on clusters and make use of MapReduce Parallel computation , So it can support a lot of data ; Corresponding , Database can support smaller data size .


Redis It's an open source , Use ANSI C language-written 、 Support network 、 Log type that can be memory based or persistent 、Key-Value database . Use as database 、 Caching and message broker . It is often referred to as a data structure server ,Redis Support for storage Value Types include String( character string )、list( Linked list )、set( aggregate )、zset(sorted set – Ordered set ) and hash( Hash type ), These data types support push/pop、add/remove And take intersection, union and difference sets and more abundant operations , And these operations are atomic .

Redis With built-in replication 、Lua Script 、LRU Take back 、 Transactions and different levels of persistence on disk , And pass Redis Sentinel and Redis Automatic partitioning of clusters provides high availability .

Redis Persistence

Redis There are two persistence schemes :RDB and AOF;

RDB: Create point in time based snapshots of data sets at certain intervals .RDB Persistence is to make the present Redis A snapshot of a data set in memory is written to disk , That is to say Snapshot snapshot ( All key value pairs in the database ). By default ,Redis Save a snapshot of the dataset to disk , be known as dump.rdb Binary file . Recovery is to read the snapshot file directly into memory .

AOF: Better persistence than snapshot , Because in use aof Persistent mode Redis Will pass every written order received write Function appended to file ( The default is appendonly.aof). When Redis During restart, the contents of the whole database will be rebuilt in memory by executing the write command saved in the file again . Of course, because os Will be cached in the kernel write Changes made , So it may not be written to disk immediately . such aof The persistence of the method is also likely to lose some modifications . But we can tell... Through the configuration file Redis We want to pass fsync The function forces os Time to write to disk .

Value data type

Redis Of Key It's a string type , because Key No binary safe String , So it's like “my key” and “mykey\n” This includes spaces and line breaks Key It's not allowed . Design Key A few rules for :

  • length :Key Don't be too short. , It affects readability . Don't be too long , It's not just memory , And the cost of finding such key values in the data is very high ;
  • Life cycle : If it's small db To use the , What may need to be considered is data persistence and consistency , But it's just that Redis If you use it as a cache , Then, we must be responsible for the relevant Key Do life cycle management ;
  • Prefix : It is suggested to prefix business name with , Colon segmentation is used to construct some regular Key name . Such as business name : Table name :ID;
  • character : Special characters are not allowed , Such as space 、 Line break 、 Single and double quotation marks and other escape characters .


The most conventional set/get operation ,Value It can be String It can also be numbers , Generally do some complex counting function cache .


here Value It's storing structured objects , It is more convenient to operate one of the fields . When bloggers do single sign on , This data structure is used to store user information , With cookieId As Key, Set up 30 Minutes is the cache expiration time , It can simulate the similar session The effect of .


Use List Data structure of , Can do simple message queuing function . There's another one , You can use lrange command , Based on Redis The paging function of , Excellent performance , Good user experience . I also use a scene , Very suitable --- Take market information . It's also a producer and consumer scenario .LIST It can complete the queue very well , The first in, first out principle .


because set It's a collection of distinct values . So we can do the function of global de duplication . Why not JVM Self contained Set Deduplication ? Because our system is generally a cluster deployment , Use JVM Self contained Set, More trouble , Do you want to do a whole thing for one , Another public service , It's too troublesome . in addition , Is to use intersection 、 Combine 、 Subtraction and so on , We can calculate common preferences , All my hobbies , Their own unique preferences and other functions .

sorted set

sorted set One more weight parameter score, The elements in the set can press score Arrange . Can do leaderboard application , take TOP N operation .

Expiration strategy

Redis The expiration strategy of is when Redis Cached Key Out of date ,Redis How to deal with it . There are usually three expiration strategies :

Timed expiration : Each set expiration time of Key All need to create a timer , It will be cleared immediately after the expiration time . This strategy can immediately clear expired data , It's very memory friendly ; But it will take up a lot of CPU Resources to process overdue data , This affects the response time and throughput of the cache .

Inertia expires : Only when visiting one Key when , Only then can we judge that Key Has it expired , If it expires, it will be removed . This strategy can maximize savings CPU resources , But it's not very memory friendly . In extreme cases, there may be a large number of overdue Key Not visited again , So it won't be removed , Take up a lot of memory .

Expire regularly : Every once in a while , Will scan a certain number of databases expires A certain number of in the dictionary Key, And remove the expired Key. The strategy is a compromise between the first two . By adjusting the time interval of timing scanning and the limited time of each scanning , Can make in different situations CPU And memory resources to achieve the best balance effect .

expires The dictionary will save all the expired Key The expiration date of , among ,Key Is a pointer to a key in the key space ,Value Is the millisecond precision of the key UNIX The expiration time indicated by the time stamp . Bond space refers to the Redis All keys stored in the cluster .

Redis Two expiration strategies, lazy expiration and periodic expiration, are used simultaneously in .

Problems with expiration policy : because Redis Regular deletion is a random sampling check , It's impossible to scan and remove all expired Key And delete , Then some Key Because it was not requested , Lazy deletion is not triggered either . such Redis It's going to take up more and more memory . At this point, we need the memory knockout mechanism .

Memory retirement strategy

Redis Our memory elimination strategy is Redis When there is not enough memory for the cache , How to deal with data that needs to be newly written and needs to apply for additional space .

The maximum memory is set by setting maxmemory To complete , The format is maxmemory bytes , When the memory currently in use exceeds the set maximum memory , It's time to release the memory , When memory release is needed , You need to use some kind of policy to delete the saved objects .Redis There are six strategies .

Redis When the memory exceeds the limit , According to the configuration strategy , Eliminate the corresponding Key-Value, So that memory can continue to leave enough space to save new data .Redis After determining a key value pair , Will delete the data and publish the data change message to the local (AOF Persistence ) And slave ( Master slave connection ).

volatile-lru: Use LRU Algorithm for data elimination , Only those with a set expiry date will be eliminated Key ;(LRU Algorithm : Least Recently Used Abbreviation , Elimination of the last use of the earliest , And the least used Key)

allkeys-lru: Use LRU Algorithm for data elimination , be-all Key Can be eliminated ;

volatile-random: Random elimination data , Only those with a set expiry date will be eliminated Key;

allkeys-random: Random elimination data , be-all Key Can be eliminated ;

volatile-ttl: Eliminate the shortest remaining period of validity Key;

no-enviction: Don't delete any data ( but Redis It will also be released according to the reference counter ), When there is not enough memory , It will directly return an error .

Redis colony

Redis stay 3.0 Cluster is not supported before version ,3.0 I want to build before the release Redis The cluster needs middleware to find the corresponding node of storage value and value .Redis There's a built-in... In the cluster 16384 Hash slot , When need is in Redis Place one in the cluster Key-Value when ,Redis First pair Key Use crc16 The algorithm works out a result , Then get the result right 16384 Mod , So each of them Key They all have a number in 0-16383 The Hashi trough between ,Redis The hash slot will be mapped to different nodes approximately equally according to the number of nodes .

Redis There are multiple in the cluster Redis Servers will inevitably hang up .Redis Cluster servers communicate with each other through ping-pong Determine whether the node can be connected . If more than half of the nodes go ping There is no response when a node , The cluster thinks that this node is down .

Master-slave structure

Redis It supports three master-slave structures , Namely :

One master to one slave : It is often used to write a large number of requests , And when you need persistence , Only on from node AOF Persistence , This not only ensures the performance of the master node, but also ensures the security of the data ; But when you restart the master node, you need to break the replication relationship of the slave node first , Otherwise, when the master node restarts, because there is no persistent data , So the data of the master node is empty , At this time, the data of the master node will be lost if the slave node resynchronizes the data of the master node .

One master to many followers : It is mostly used in the case of high read request , Through the separation of read and write, the read request is handed over to the slave node to share the pressure of the master node ; At the same time, some dangerous or time-consuming operations in the development can also be performed on the slave node ; disadvantages : When there are too many slave nodes , It will result in a piece of data from the master node to be sent to many slave nodes , Therefore, the load and bandwidth consumption of the primary node will be large .

Tree master slave structure : This structure solves the problem that the bandwidth consumption of the master node is too large when there are too many slave nodes mentioned above , The master writes data to fewer slaves , Then the slave node synchronizes to its own slave node .

Master slave copy

To achieve larger storage capacity and bear high concurrent access of distributed database , We will store the data of the original centralized database to multiple other network nodes .Redis To solve this single node problem , Multiple replicas of data replication will also be deployed to other nodes for replication , Realization Redis High availability , Realize redundant backup of data , To ensure high availability of data and services .

Master slave copy , It means to put one Redis Server data , Copy to other Redis The server . The former is called the main node (master), The latter is called the slave node (slave), Data replication is one-way , From master to slave only .

By default , Each station Redis Servers are all primary nodes ; And a master node can have multiple slave nodes ( Or no slave ), But a slave node can only have one master node , The role of master-slave replication :

data redundancy : Master-slave replication realizes hot backup of data , It's a way of data redundancy beyond persistence .

Fault recovery : When there is a problem with the master node , Can be served by a slave node , Fast fault recovery ; It's actually a redundancy of services .

Load balancing : On the basis of master-slave replication , Cooperate with the separation of reading and writing , Write service can be provided by the master node , Read service provided by slave node ( The write Redis Connect master node when data is applied , read Redis Apply connection from node when data ), Share server load ; Especially in the situation of less writing and more reading , Sharing read load through multiple slave nodes , Can be greatly improved Redis Concurrency of servers .

Read / write separation : Can be used to achieve read-write separation , Main library write 、 Read from library , Read write separation can not only improve the load capacity of the server , At the same time, according to the change of demand , Change the number of slave Libraries .

High availability cornerstone : In addition to the above functions , Master slave replication is also the foundation for sentinels and clusters to implement , So master-slave replication is Redis High availability Foundation .

Enable master-slave replication from the slave node , Yes 3 Ways of planting :

1、 The configuration file : Add to profile from server :


2、 Start command :redis-server Add after starting command :


3、 Client commands :Redis After the server starts , Execute commands directly from the client :

slaveof<masterip><masterport>, Then Redis Instance becomes slave .

adopt info replication Command to see some of the copied information .

The master-slave replication process can be roughly divided into 3 Stages : Connection establishment phase ( Preparation stage )、 Data synchronization phase 、 Command propagation stage . Execute from node slaveof After the command , The replication process starts , The following figure shows that the replication process can be roughly divided into 6 A process .

The process can also be seen in the log record after the master-slave configuration .

1、 Save master (master) Information : perform slaveof after Redis The following logs will be printed :

2、 Establish network connection between slave node and master node

From the node (slave) Internal maintenance of replication related logic through scheduled tasks running per second , When the scheduled task finds that there is a new master node , Will attempt to establish a network connection with this node .

Establish network connection between slave node and master node . The slave node will create a socket Socket , The slave node establishes a port as 51234 Socket , Dedicated to receiving replication commands sent by the primary node . Print the following log after the slave node is connected successfully :

If the slave cannot establish a connection , Scheduled tasks will be retried indefinitely until the connection is successful or executed slaveofnoone Cancel copy . About connection failure , Can be executed from node info replication see master_link_down_since_seconds indicators , It records the system time when the connection to the primary node fails . When the slave node fails to connect to the master node, the following logs will also be printed every second ;

3、 send out ping command : Send from node after successful connection establishment ping Request for first communication , ping The main purpose of the request is as follows :

  1. Check whether the network socket between master and slave is available .
  2. Check whether the master node currently accepts processing commands .

If sent ping After the command , The slave node does not receive the master node's pong Reply or timeout , For example, the network timeout or the master node is blocking and unable to respond to commands , Disconnect replication from node , Next scheduled task will initiate reconnection .

Sent from node ping Command returned successfully ,Redis Print log , And continue the subsequent replication process :

4、 Authority verification : If the master node is set requirepass Parameters , Password authentication is required , Slave node must be configured masterauth The parameters ensure that the same password as the master node can pass the verification . Replication will terminate if validation fails , Reinitiate replication process from node .

5、 Synchronize datasets : After the master-slave replication connection is in normal communication , For the scene where the replication is first established , The master node will send all the data held to the slave node , This part of the operation is the most time-consuming step .

6、 Command continuous replication : When the master node synchronizes the current data to the slave node , The replication process is completed . Next, the master node continuously sends the write command to the slave node , Ensure master-slave data consistency .

This article is from WeChat official account. - Data agency (DataClub)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-01-18

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

本文为[Data agency]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云