Some time ago , Little bacteria have shared HDFS series 1-12 The blog of , It's finally over . So Xiaojun plans to issue another issue about HDFS The classic interview questions , Most of the content comes from the blogs I shared earlier , Interested partners can browse by themselves , Link to Xiaojun and put it at the end of the article ~
Distributed needs to start from
StorageTwo aspects to discuss : Distributed computing : It's a calculation method , It's breaking down the application into many small parts , Assign to multiple computers for processing . This can save the overall calculation time , Greatly improve the calculation efficiency . Distributed storage , It's a data storage technology , Use the disk space on each machine in the enterprise through the network , These scattered storage resources constitute a virtual storage device , Data is stored in every corner of the enterprise , Multiple servers .
a) HDFS i. managers ：NameNode ii. Worker ：DataNode iii. Assistant Manager ：SecondaryNameNode b) MapReduce c) Yarn i. managers ：ResourceManager ii. Worker ：NodeManager
i. The first copy comes from the client ii. The second copy is stored in different nodes on the same rack as the first copy according to certain rules iii. According to certain rules, the third copy is stored on the random nodes on different racks which are logically closest to the first and second copies
a) Maintain the namespace of the managed file system （ Metadata ） b) Make sure that the user's file block points to the specific DataNode The mapping of nodes c) Maintenance Management DataNode Periodically reported heartbeat information
a) Read and write data b) Periodically to NameNode Report heartbeat information （ Information that includes data 、 The checksum ） if DataNode exceed 10 Minutes didn't go to NameNode Upload heartbeat information , It is deemed that DataNode The node is down c) Pipelined replication of data
Generally speaking, it means nameNode Configure the rack information of each node by reading our configuration
NameNode When assigning nodes （ Pipelining of data and HDFS When making a replica ）
1、 client Initiate file upload request , adopt RPC And NameNode Establish communication ,NameNode Check if the destination file already exists , Does the parent directory There is , Returns whether you can upload ; 2、 client Request first block To which DataNode Server ; 3、 NameNode Allocate files according to the number of backups specified in the configuration file and the rack aware principle , Return to available DataNode Your address is like ： A,B,C; 4、 client request 3 platform DataNode One of them A Upload data （ It's essentially a RPC call , establish pipeline）,A Upon receipt of the request, the call continues B, then B call C, Will the whole pipeline Establishment and completion , And then go back step by step client; 5、 client Start to go A Upload the first one block（ The data is first read from disk and put into a local memory cache ）, With packet In units of （ Default 64K）,A Receive a packet Will be passed on to B,B Pass to C;A Each one packet A reply queue is placed waiting for the reply . 6、 The data is divided into pieces packet The packet is in pipeline On the Internet , stay pipeline In the opposite direction , Send... One by one ack（ The order is correct answer ）, In the end by the pipeline First of all DataNode node A take pipelineack Send to client;. 7、 Turn off the write stream . 8、 When one block Once the transmission is complete ,client Ask again NameNode Upload the second block To the server .
1、 The client calls FileSystem Object's open() To read the file you want to open . 2、 Client towards NameNode launch RPC request , To determine the request file block Where it is ; 3、 NameNode Some or all of the files will be returned as appropriate block list , For each block,NameNode Will be returned containing the block Replica DataNode Address ; These returned DN Address , According to the cluster topology DataNode Distance from client , And then sort it , There are two rules for sorting ： Distance in network topology Client The nearest row is in the front ; Timeout reporting in heartbeat mechanism DN Status as STALE, This kind of platoon be in the rear . 4、Client Select the one at the top of the order DataNode To read block, If the client itself is DataNode, Then the data will be obtained directly from the local ( short Path read feature ). 5、 The bottom line is essentially to build Socket Stream（FSDataInputStream）, Repeated calls to the parent class DataInputStream Of read Method , Until the data on this block is read . 6、 Parallel reading , If it fails, reread . 7、 After reading the list block after , If the file reading is not finished , The client will continue to NameNode Get the next batch of block list . And return to the following block list . 8、 Finally close the read stream , And will read all block Merge into a complete final document .
After the data is written, the check sum is calculated ,DataNode Check and calculate periodically , Compare the calculation results with the results of the first time . If the same, it means no data loss , If not, it means that the data is lost , Data recovery for lost data . Check the data before reading , Compared with the first results . If it is the same, it means that the data is not lost , Can read . If not, it means data Something is missing . Read to other copies .
1、 Mass data storage ： HDFS Scalable , Its stored files can support PB Level data . 2、 High fault tolerance ： Node lost , The system is still available , The data is kept in multiple copies , Automatic recovery after copy loss . Can be built on cheap （ Compared with small and large computers ） On the machine , Realize linear expansion ( As the number of nodes increases , The storage capacity of the cluster , Computing power increases ). 3、 Large file storage ：DFS Data is stored in blocks , Split a large file into several small files , Distributed storage .
1. Can't do low latency data access ： HDFS Continue to optimize for reading large amount of data at one time , At the expense of delay . 2. Not suitable for a large number of small file storage ： A: because NameNode Store the metadata of the file system in memory , Therefore, the total number of files that the file system can store is limited by NameNode The memory capacity of . B: Each file 、 The storage information of directory and data block accounts for about 150 byte . For the above two reasons , So lead to HDFS Not suitable for a large number of small file storage . 3. File modification ; Not suitable for multiple writes , Read once （ A small amount of reading ）. 4. Parallel writing of the same text by multiple users is not supported .
When the cluster starts , First, go into safe mode Or use hdfs dfsadmin -safemode enter Command to manually enter safe mode
a) The client is not allowed to modify any files , Including uploading files 、 Delete file 、 rename 、 Create a folder and so on . b) Only the client is allowed to read data
a) Check the integrity of the data block b) Merge fsimge,editslog Restore the state before the last shutdown
a) hdfs dfsadmin -safemode enter Command to enter manually b) hdfs dfsadmin -safemode leave Command to exit manually
a) fsimage The file is actually Hadoop A permanent checkpoint for file system metadata . b) edits The file stores the client's operation on the cluster These two files can be used to restore the state of the cluster before shutdown !
a) stay NameNode When it starts , It will be fsimage The contents of the file are loaded into memory , We'll do it later edits The operations in the file , Make the metadata in memory and the actual synchronization . b).SecondayNameNode Periodic pull fsimage and edits Merge to create a new fsimage
a) NameNode Create a Edits.new b)SNN from NameNode Node copy Fsimage and Edits File to SNN,SNN Import two files into memory and merge them to create a new Fsimage.ckpt file . c)SNN New Fsimage.ckpt Send to NameNode node , Rename it to Fsimage Replace the original Fsimage, The original Edits Generate Edits.new file , take Edits Replace with new Edits.new
a): Conduct Fsimage and Edits Merge operation of , Reduce edits Log size , Speed up cluster startup b): take Fsimage And Edits Make a backup , To prevent loss
1. Time dimension , The default is once an hour dfs.namenode.checkpoint.period ：3600 2. The number dimension , Default HDFS The operation reaches 100 Once a million times dfs.namenode.checkpoint.txns ： 1000000 3、 60 seconds to determine whether to reach 100W
Enter into SNN Data storage folder -----> Will be the latest version of Fsimage as well as Edits Copy to nameNode node , Put it in NN In the corresponding configuration directory of the node -----> Restart the cluster
a) Close the protective wall 、 close SELinux、 To configure ssh Password free login 、 To configure IP Corresponds to the host name 、 Change host name 、 install jdk、
b) Create a white list dfs.hosts, Add all nodes to the file , edit hdfs-site.xml File configuration dfs.hosts The mapping information c) Use hdfs dfsadmin -refreshNodes Refresh NameNode d) Use yarn dfsadmin -refreshNodes to update resourceManager e) modify slaves Add the host name of the new service node to the file f) Start the new node g) Browser view new node information h) perform start-balancer.sh Load balancing
a) Use HDFS Provided -getmerge command 【HDFS–> Local 】 b) Traverse each small file, append to a file, and then upload 【 Local –>HDFS】
a） Instantiation configuration b) Instantiate file system objects hdfs c) call hdfs Of mkdirs() The method can
The Chinese meaning of the English corresponding to each catalog
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .