This bigdata, Hadoop composition and ecology

InfoQ 2021-06-10 18:28:43
bigdata hadoop composition ecology


{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":" introduction ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" With the development of science and technology , We're leaving more and more data online , Big to online shopping 、 Commodity trading , As small as browsing the web 、 WeChat chat 、 The mobile phone automatically records the daily itinerary and so on , so to speak , In today's life , As long as you're still , You're going to generate data all the time , But can these data be called big data ? No , These are not big data yet . So what is big data ?","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/992fbf928758c3d011d94eca2bd41ac3.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":" Big data overview ","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" Definition ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" The definition of Baidu Encyclopedia ","attrs":{}},{"type":"text","text":": Big data means ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Can't be in a certain time frame ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" Capture with regular software tools 、 Manage and handle ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Big data sets ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":", It needs new processing mode to have stronger decision-making power 、 Insight into the power of discovery and process optimization 、 High growth rate and diversified information assets .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" We can summarize the characteristics of big data :","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Large amount of data , It is necessary to take some tools to collect , And then do the analysis and calculation .","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Let's take an example :","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Simon I'm a Jackie Chan fan , He collected... On the website 100G Jackie Chan's classic movie **( collection )","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":", To collect these movies , He decided to store all these movies ","attrs":{}},{"type":"text","text":"( Storage )","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":", one day Simon Suddenly want to see Jackie Chan starring “ Chinese Zodiac ”, This movie is in his store 100G In the film , therefore Simon According to the year of release ","attrs":{}},{"type":"text","text":"( Analysis and calculation )** Found the movie and watched it .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" therefore , We can refine the definition of big data as follows :","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Big data mainly solves the problem of massive data ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" collection ","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Storage ","attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Analysis and calculation ","attrs":{}},{"type":"text","text":".","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Big data , How to measure the scale of the data ?","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/19/19d8c29c27fb31f293b8d7175a7f5491.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" Data units ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Measure the size of the data , We need to know the unit of data first . According to the data unit from small to large , In turn :bit、Byte、KB、MB、GB、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"TB","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"PB","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"EB","attrs":{}},{"type":"text","text":"、ZB、YB、BB、NB、DB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" The conversion between units is as follows :","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1 Byte =8 bit","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1 KB = 1,024 Bytes = 8192 bit","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1 MB = 1,024 KB = 1,048,576 Bytes","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1 GB = 1,024 MB = 1,048,576 KB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1 TB = 1,024 GB = 1,048,576 MB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1 PB = 1,024 TB = 1,048,576 GB","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" For us , The most exposed may be KB、MB、GB Equal unit .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" But because of the difference of computer performance , If you put it on computers in ancient times , Let them deal with ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"GB","attrs":{}},{"type":"text","text":" Level data is the limit ; For now memory is generally 128G Server for , Multiple parallel processing ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"EB","attrs":{}},{"type":"text","text":" Level data is no problem .","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" The meaning and value of data ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Every era has a definition of data , The key goal is to find out what's behind the data ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Meaning and value ","attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" So much data , The formats are different , What's the point of processing data ?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Think about the , No matter how big the data is, it is also made up of many small data ,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" big data ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" Think of it as ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" A collection of data ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":", We can infer from this data that ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Approximate objective law ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":", This law can be used to predict the next occurrence of data ontology ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" probability ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":". For example, a user often watches Jackie Chan's movies on a movie website , So the next time the user visits the movie website , Put Jackie Chan's movies at the top of the recommended list , Because we found that he liked Jackie Chan's movies through his browsing data , And I believe that the user's interest will not change in a short time .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" This is a simple application of big data in life --","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Mining user preferences , Build a recommendation model ","attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Data age , Our data has ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Massive , diversity , The real time , uncertainty ","attrs":{}},{"type":"text","text":" Other characteristics . that , We need to store , Dealing with these characteristics of massive data , In what way or platform is more suitable , After years of technological development and natural selection ,Hadoop Distributed models stand out .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" therefore , Learning big data can't be done without ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Hadoop","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":", But for students who have been exposed to big data for a short time or have not yet been exposed to big data , If we ask them, we should learn Hadoop That's the content of ,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Distributed storage ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Calculation ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" I'll say it , But these two concepts are still too general , So how should we control Hadoop How about the study of , Mo panic , And listen to ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Simon Lang ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" Come slowly .","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/39e95f95862e47771bb143f454827ccd.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":"Hadoop summary ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop It's a by Apache Developed by the foundation ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Distributed system infrastructure ","attrs":{}},{"type":"text","text":", It mainly solves the problem of massive data ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Storage ","attrs":{}},{"type":"text","text":" And massive data ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Analysis and calculation ","attrs":{}},{"type":"text","text":" problem , In a broad sense ,Hadoop Usually refer to ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop ecosystem ","attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a4464576c5dce9b8ad321dc8f239367f.png","alt":"image-20210322164226320","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" We see first ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop Structure of ","attrs":{}},{"type":"text","text":", Then introduce ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop ecosystem ","attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Hadoop form ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop The composition of the structure in ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.x","attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2/3.x","attrs":{}},{"type":"text","text":" Somewhat different , As shown in the figure below ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/84a66a77b0614209644aff993ebcd3e2.png","alt":"image-20210520215604023","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop Mainly from : Just ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Calculation ","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Resource scheduling ","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" data storage ","attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Auxiliary tool ","attrs":{}},{"type":"text","text":" form .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" stay Hadoop1.x Time ,Hadoop Medium MapReduce","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" At the same time, it deals with business logic operation and resource scheduling ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}},{"type":"strong"}],"attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":", The coupling is great .","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" stay Hadoop2/3.x Time , Added ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Yarn","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}},{"type":"strong"}],"attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":",","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Yarn Only responsible for the scheduling of resources ,MapReduce It's just about computation ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}},{"type":"strong"}],"attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":".","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Note","attrs":{}},{"type":"text","text":": Resource scheduling refers to CPU、 Memory 、 The choice of server computing, etc ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Next , Let's introduce :","attrs":{}}]},{"type":"blockquote","content":[{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" For storage HDFS","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" For resource scheduling YARN","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Used for calculation MapReduce","attrs":{}}]}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"HDFS Architecture Overview ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop Distributed File System, abbreviation ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"HDFS","attrs":{}},{"type":"text","text":", Is a distributed file system ,HDFS The structure diagram of is as follows :","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9c/9c7e669d96e518146cf574f820e0cc61.png","alt":"image-20210520221117080","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS The architecture contains a ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NameNode、DataNode And spare SecondaryNode","attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NameNode(nn)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NameNode(nn) Namely Master, It's a supervisor , managers , It mainly has the following functions :","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① management HDFS The namespace of ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② Configure replica policy ( Such as data configuration several copies )","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ Manage data blocks (block) Mapping information for ","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Data exists Datanode In which data blocks , Distributed storage ","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④ Handle client requests ","attrs":{}}]}],"attrs":{}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"DataNode(dn)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DataNode Namely Slave,NameNode give a command ,DataNode Perform the actual operation .","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① Store the actual data block ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② Perform block reading / Write operations ","attrs":{}}]}],"attrs":{}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Secondary NameNode(2nn)","attrs":{}}]}]}],"attrs":{}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① auxiliary NameNode, Share their workload , Such as regular merger Fsimage and Edits, And push it to NameNode","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② In an emergency , Can assist in recovery NameNode.","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NOTE","attrs":{}},{"type":"text","text":":Secondary NameNode Is not NameNode Hot standby , When NameNode When I hang up , It does not It can be replaced immediately NameNode And provide services .","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"YARN Architecture Overview ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Yet Another Resource Negotiator abbreviation ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"YARN","attrs":{}},{"type":"text","text":", It is Hadoop Explorer for , Responsible for providing server computing resources for computing programs , It's equivalent to a distributed ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Operating system platform ","attrs":{}},{"type":"text","text":", and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MapReduce","attrs":{}},{"type":"text","text":" And so on ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Applications on top of the operating system ","attrs":{}},{"type":"text","text":". As shown in the figure below :","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/17/173d45121f7e5872f101ce59cd03bc51.png","alt":"image-20210520230922949","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"YARN Mainly by ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ResourceManager","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NodeManager","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ApplicationMaster","attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Container","attrs":{}},{"type":"text","text":" Etc .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ResourceManager(RM): The whole cluster resource ( Memory 、cpu etc. ) Boss ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NodeManager(NM): The number of server resources in a single node ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ApplicationMaster(AM): The boss of a single task ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Container: Containers , It's like a stand-alone server , It encapsulates the resources needed for the task to run , Such as memory 、CPU、 disk 、 Network, etc .","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NOTE","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResourceManager It's a Master, There is one below each child node NodeManager, from RM to NM Allocate resources . In every node there will be ApplicationMaster( Later referred to as" AM) Things that are . He will be responsible for working with RM Communication for resources , Also with NM Communication to start or stop the task .","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"MapReduce Architecture Overview ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MapReduce It's a ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Distributed computing programs ","attrs":{}},{"type":"text","text":" Programming framework of , It's user development “ be based on Hadoop Data analysis application ” The core framework of . His core function is to use ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Business logic code written by user ","attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Comes with default components ","attrs":{}},{"type":"text","text":" Integrated into a complete distributed computing program , Running concurrently in a Hadoop On the cluster .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MapReduce","attrs":{}},{"type":"text","text":" The calculation process is divided into two stages :Map and Reduce","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8f99aaff7b82f0d0fa5770932178803b.png","alt":"image-20210520231933220","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Map Phase parallel processing of input data ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Reduce Phase pair Map Summarize the results ","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" The relationship between the three ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"HDFS、YARN、MapReduce The relationship between the three ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d4f7846ce53ad9371c7af5d10778770.png","alt":"image-20210520233912648","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" from HDFS Read data from ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Yarn Resource scheduling processes this data ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MapReduce received Yarn Turn on the corresponding MapTask The tasks and ReduceTask Mission ","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" The processed data is stored in HDFS in ","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Do you think Hadoop It's over ,NO! NO! NO! Hadoop A wave of ecosphere understanding ~","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ok , Keep learning !","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c860aaabcc9d5a1beb6aa40ff962ca23.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Hadoop ecosystem ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Let's take a look at one ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Hadoop The ecological system ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" Brain map of .","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e08706d2ecee2e12acca9398737b0dda.png","alt":"image-20201201224043405","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Mom" , Why so much content , It's killing me . Don't be confused , Although it looks like a lot , But it can be summed up in one sentence :","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop","attrs":{}},{"type":"text","text":" It's a ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Open source framework for Distributed Computing ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":", Provides a ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Distributed systems ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":" subprojects (HDFS) And support MapReduce","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":" Distributed computing software architecture ","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":". Since there's a bit more content in the brain map , Let's just introduce a few in Hadoop Some of the most important components of the ecosystem , If the partner is interested in other components , You can look it up .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hive","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hive Is based on Hadoop A data warehouse tool , A structured data file can be mapped to a database table , By class SQL Statement quick to implement simple MapReduce Statistics , You don't have to develop anything special MapReduce application , It is very suitable for statistical analysis of data warehouse .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hbase","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hbase Is a high reliability 、 High performance 、 For the column 、 Scalable distributed storage system , utilize Hbase Technology can be cheap PC Server Set up a large structured storage cluster .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Sqoop","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sqoop It's a Hadoop And data transfer tools in relational databases , You can put a relational database (MySQL ,Oracle ,Postgres etc. ) The data in Hadoop Of HDFS in , Can also be HDFS The data in a relational database ","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Zookeeper","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Zookeeper It's a distribution designed for distributed applications 、 Open source coordination services , It is mainly used to solve some data management problems often encountered in distributed applications , Simplify the coordination and management of distributed applications , Provide high-performance distributed services .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Ambari","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ambari It's based on Web Tools for , Support Hadoop The supply of clusters 、 Management and monitoring .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Oozie","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Oozie Is a workflow engine server , Used to manage and coordinate operations in Hadoop On the platform (HDFS、Pig and MapReduce) The task of .","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hue","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hue It's based on WEB The monitoring and management system , Realize to HDFS,MapReduce/YARN, HBase, Hive, Pig Of web Operation and management .","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":".........","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop So much about ecosystems , If you are interested in other contents, please add them by yourself .","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":" Learning resources ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" In order to facilitate you to learn big data , I organized a big data learning resource , Contains ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Learning route ","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" video ","attrs":{}},{"type":"text","text":" and ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" project ","attrs":{}},{"type":"text","text":".","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" Send me a private message or add my wechat :langyakun9768","attrs":{}}]}]}
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://javamana.com/2021/05/20210522121535309n.html

  1. Mybatis framework (2)
  2. Basic usage of Linux nginx
  3. JQuery to add table rows, delete rows, call the date control
  4. About the workflow of mybatis and the comparison with JDBC and Hibernate
  5. JMeter HTTP get login
  6. Download and Install Apache Zookeeper on Ubuntu
  7. Java IO stream character stream FileReader, filewriter, copy of custom small array, BufferedReader, bufferedwriter, readline() and newline() methods, linenumberreader, use the specified code table to read and write characters
  8. Debugging errors encountered in Hadoop running
  9. The command of releasing memory in Linux
  10. Mybatis 3 source code analysis (3)
  11. Introduction of four switch types of rabbitmq
  12. JavaScript string (1) common functions
  13. Object oriented [day07]: introduction to class features (4)
  14. Linux DNS toolkit -- bind utils
  15. Oracle gets the first and last days of this week, this month, this quarter, and this year
  16. Install MySQL in Ubuntu
  17. Java implementation of sensitive word filtering (DFA algorithm)
  18. Image upload with spring MVC multipartfile
  19. Making Linux system installation disk under Ubuntu
  20. Introduction of Linux screen command
  21. On the problem of using ant script and maven pom.xml file in the project
  22. Environment installation of Kafka and strom clusters
  23. Zookeeper C API guide 1 (turn)
  24. Linux view system information command summary
  25. Making clock with native JavaScript
  26. Linux command line bash batch rename files
  27. Java video transcoding blog
  28. Log of MySQL 5.7 intimate parameters_ timestamps
  29. Solution to access denied when MySQL changes root password
  30. Simple dynamic string SDS of redis data structure
  31. Introduction to JavaScript classics-day02
  32. Notes on Linux learning: about environment variables
  33. Big data self taught 3-windows client dbvisualizer / squirrel configuration connection hive
  34. IOS learning - KVO design pattern
  35. Linux driver learning
  36. Main new features of Java 8
  37. Source code analysis of hashtable in Java
  38. Hadoop learning notes (2): simple operation
  39. Linux thread implementation model
  40. MySQL creates users and assigns user permissions
  41. Method of transforming complex JSON string into Java nested object
  42. JavaScript operators (Boolean, multiplicative, and additive)
  43. 2018-08-16 Chinese code spring boot add basic log
  44. Spring annotation
  45. Why learn and master Linux?
  46. Introduction of MySQL storage engine
  47. Adding, deleting, modifying and querying hibernate demo
  48. MySQL cannot start InnoDB: committed to open a previously opened tablespace
  49. Learning summary of week 7 of 201753122018-2019-2 Java programming
  50. I spring and Autumn "Baidu Cup" CTF competition in October
  51. Installation of mysqldb module in Python
  52. MySQL: date type
  53. Why is MyISAM faster than InnoDB query in MySQL
  54. HTTP Status 500 - org.apache.jasper.JasperException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException
  55. Wechat enterprise number jssdk wx.config reported invalid signature error, resulting in the API interface unable to be used
  56. Java gets the date of the current date according to the date of birth
  57. Ashx general handler and HttpHandler
  58. Struts.xml syntax
  59. < h1> 02_ Linux learning_ Command & lt/ h1>
  60. Feeling after reading the illustrated HTTP - (HTTP status code of returned result)