Liver explosion! Take you to understand Hadoop serialization

ZSYL 2021-09-15 08:53:53
liver explosion understand hadoop serialization


1. Serialization Overview

1) What is serialization

serialize Namely Object in memory , Convert to a sequence of bytes ( Or other data transfer protocols ) For easy storage to disk ( Persistence ) And network transmission .

Deserialization Is to receive a sequence of bytes ( Or other data transfer protocols ) Or is it Persistent data on disk , Convert to objects in memory .

2) Why serialize

Generally speaking ,“ Live ” Objects only exist in memory , Turn off the power and it's gone . and “ Live ” Object can only be used by local processes , Can't be sent to another computer on the network .

However serialize Can be stored “ Live ” object , Can be “ Live ” Object to the remote computer .

3) Why not Java Serialization

Java Serialization It's a Heavyweight serialization framework (Serializable), After an object is serialized , There will be a lot of additional information attached ( All kinds of verification information ,Header, Inheritance system, etc ), It's not easy to transmit efficiently in the network .

therefore ,Hadoop I have developed a serialization mechanism (Writable).

4)Hadoop Serialization features :

(1) compact : Efficient use of storage space .

(2) Fast : The extra cost of reading and writing data is small .

(3) interoperability : Support multi language interaction

2. Customize bean object

Customize bean Object implements the serialization interface (Writable)

The basic serialization types often used in enterprise development cannot meet all requirements , For example Hadoop Pass a... Inside the framework bean object , Then the object needs implements Serializable .

Concrete realization bean The object serialization steps are as follows 7 Step :

(1) Must be realized Writable Interface

(2) When deserializing , Need reflection to call null parameter constructor , So there has to be a null parameter construct

public FlowBean() {

super();
}

(3) Override the serialization method

@Override
public void write(DataOutput out) throws IOException {

out.writeLong(upFlow);
out.writeLong(downFlow);
out.writeLong(sumFlow);
}

(4) Override the deserialization method

@Override
public void readFields(DataInput in) throws IOException {

upFlow = in.readLong();
downFlow = in.readLong();
sumFlow = in.readLong();
}

(5) Note that the order of deserialization is exactly the same as that of serialization

(6) To display the results in a file , need rewrite toString(), You can use "\t" Separate , Convenient for subsequent use .

(7) If you need to customize bean Put it in key Transmission of , Then we need to realize Comparable Interface , because MapReduce In the box Shuffle Process requirements for key Must be able to sort .

See the following sorting cases for details :

@Override
public int compareTo(FlowBean o) {

// Reverse order , From big to small 
return this.sumFlow > o.getSumFlow() ? -1 : 1;
}

3. Serial case operation

1) demand

Count the total uplink traffic consumed by each mobile phone number 、 Total downlink traffic 、 Total discharge .

(1) input data

 Insert picture description here

1 13736230513 192.196.100.1 www.atguigu.com 2481 24681 200
2 13846544121 192.196.100.2 264 0 200
3 13956435636 192.196.100.3 132 1512 200
4 13966251146 192.168.100.1 240 0 404
5 18271575951 192.168.100.2 www.atguigu.com 1527 2106 200
6 84188413 192.168.100.3 www.atguigu.com 4116 1432 200
7 13590439668 192.168.100.4 1116 954 200
8 15910133277 192.168.100.5 www.hao123.com 3156 2936 200
9 13729199489 192.168.100.6 240 0 200
10 13630577991 192.168.100.7 www.shouhu.com 6960 690 200
11 15043685818 192.168.100.8 www.baidu.com 3659 3538 200
12 15959002129 192.168.100.9 www.atguigu.com 1938 180 500
13 13560439638 192.168.100.10 918 4938 200
14 13470253144 192.168.100.11 180 180 200
15 13682846555 192.168.100.12 www.qq.com 1938 2910 200
16 13992314666 192.168.100.13 www.gaga.com 3008 3720 200
17 13509468723 192.168.100.14 www.qinghua.com 7335 110349 404
18 18390173782 192.168.100.15 www.sogou.com 9531 2412 200
19 13975057813 192.168.100.16 www.baidu.com 11058 48243 200
20 13768778790 192.168.100.17 120 120 200
21 13568436656 192.168.100.18 www.alibaba.com 2481 24681 200
22 13568436656 192.168.100.19 1116 954 200

(2) Input data format :

7 13560436666 120.196.100.99 1116 954 200
id Phone number The Internet ip Uplink traffic Downstream traffic Network status code

(3) Expected output data format

13560436666 1116 954 2070
Phone number Uplink traffic Downstream traffic Total discharge

2) Demand analysis

 Insert picture description here
3) To write MapReduce Program

(1) Write the flow statistics Bean object

package com.zs.mapreduce.writable;
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
/** * 1. Define class implementation writable Interface * 2. Override serialization and deserialization methods * 3. Override null parameter construction * 4.toString() Method * serialize :sumFlow -- downFlow -- upFlow--->upFlow -- downFlow -- sumFlow */
public class FlowBean implements Writable {

private long upFlow; // Uplink traffic 
private long downFlow; // Downstream traffic 
private long sumFlow; // Total discharge 
// Space parameter structure 
public FlowBean() {

}
public long getUpFlow() {

return upFlow;
}
public void setUpFlow(long upFlow) {

this.upFlow = upFlow;
}
public long getDownFlow() {

return downFlow;
}
public void setDownFlow(long downFlow) {

this.downFlow = downFlow;
}
public long getSumFlow() {

return sumFlow;
}
public void setSumFlow(long sumFlow) {

this.sumFlow = sumFlow;
}
// heavy load 
public void setSumFlow() {

this.sumFlow = this.upFlow + this.downFlow;
}
// serialize 
@Override
public void write(DataOutput out) throws IOException {

out.writeLong(upFlow);
out.writeLong(downFlow);
out.writeLong(sumFlow);
}
// Deserialization 
@Override
public void readFields(DataInput in) throws IOException {

this.upFlow = in.readLong();
this.downFlow = in.readLong();
this.sumFlow = in.readLong();
}
@Override
public String toString() {

return upFlow + "\t" + downFlow + "\t" + sumFlow;
}
}

(2) To write Mapper class

package com.zs.mapreduce.writable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class FlowMapper extends Mapper<LongWritable, Text, Text, FlowBean> {

private Text outK = new Text();
private FlowBean outV = new FlowBean();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

// 1. Get a row 
// 1 13736230513 192.196.100.1 www.atguigu.com 2481 24681 200
String line = value.toString();
// 2. cutting 
String[] split = line.split("\t");
// 3. Grab want The data of 
String phone = split[1];
String up = split[split.length - 3];
String down = split[split.length - 2];
// 4. encapsulation 
outK.set(phone);
outV.setUpFlow(Long.parseLong(up));
outV.setDownFlow(Long.parseLong(down));
outV.setSumFlow();
// 5. Write 
context.write(outK, outV);
}
}

(3) To write Reducer class

package com.zs.mapreduce.writable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class FlowReducer extends Reducer<Text, FlowBean, Text, FlowBean> {

private FlowBean outV = new FlowBean();
@Override
protected void reduce(Text key, Iterable<FlowBean> values, Context context) throws IOException, InterruptedException {

// 1. Traverse the set and accumulate values 
long totalUp = 0;
long totaldown = 0;
for (FlowBean value : values) {

totalUp += value.getUpFlow();
totaldown += value.getDownFlow();
}
// 2. encapsulation outK,outV
outV.setUpFlow(totalUp);
outV.setDownFlow(totaldown);
outV.setSumFlow();
// 3. Write 
context.write(key, outV);
}
}

(4) To write Driver Drive class

package com.zs.mapreduce.writable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class FlowDriver {

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

// 1. obtain job object 
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
// 2. Set up jar package 
job.setJarByClass(FlowDriver.class);
// 3. relation mapper and reducer
job.setMapperClass(FlowMapper.class);
job.setReducerClass(FlowReducer.class);
// 4. Set up mapper Output key and value type 
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
// 5. Set the final data output key and value type 
job.setOutputValueClass(Text.class);
job.setOutputValueClass(FlowBean.class);
// 6. Set the input path and output path of data 
FileInputFormat.setInputPaths(job, new Path("D:\\software\\hadoop\\input\\inputflow"));
FileOutputFormat.setOutputPath(job, new Path("D:\\software\\hadoop\\output\\output2"));
// 7. Submit job
boolean result = job.waitForCompletion(true);
System.exit(result ? 0:1);
}
}

 Insert picture description here

 Insert picture description here
come on. !

thank !

Strive !

版权声明
本文为[ZSYL]所创,转载请带上原文链接,感谢
https://javamana.com/2021/09/20210909142156652e.html

  1. La dernière réponse à l'entrevue de développement Android, l'hiver froid de l'industrie
  2. A young Lexus, the new NX refuses to be mediocre and mature
  3. Interprétation approfondie de l'équipe sin7y: application de plookup dans la conception de zkevm
  4. Java basic knowledge point Combing, redis Common Data Structures and Using scenario Analysis,
  5. Five minutes to understand MySQL index push down
  6. Data structure and algorithm (XI) -- algorithm recursion
  7. Programmation asynchrone Java scirp, développement frontal de base
  8. Java basic knowledge point video, three sides ant Gold Clothing successfully obtained offer,
  9. Oracle Linux bascule le noyau uek vers le noyau rhck pour résoudre les problèmes de compatibilité acfs
  10. After the grand finale of spring in jade mansion, after reading many comments, I began to sympathize with white deer
  11. 字节跳动Java高级工程师,统一命名服务、集群管理、分布式应用?
  12. 字节跳动Java高级工程师,深入分布式缓存从原理到实践技术分享,
  13. 字节跳动第三轮技术面,阿里P8架构师Java大厂面试题总结,
  14. 字节跳动社招Java面试,超通俗解析CountDownLatch用法和源码,
  15. 字节跳动最新开源,最经典的HashMap图文详解,
  16. 字節跳動第三輪技術面,阿裏P8架構師Java大廠面試題總結,
  17. Byte Jumping the Third Third Technical surface, Ali P8 Architect Java Factory Interview Question summary,
  18. L'ingénieur Java senior de Byte Hopping approfondit la mise en cache distribuée, du principe au partage de la technologie pratique.
  19. Byte Jump Java Senior Engineer, Unified Naming service, Cluster Management, Distributed application?
  20. Plusieurs méthodes de transfert de fichiers entre Windows et Linux
  21. 快速从 Windows 切换到 Linux 环境
  22. 五分钟向MySql数据库插入一千万条数据
  23. Java日期时间API系列42-----一种高效的中文日期格式化和解析方法
  24. 用Java实现红黑树
  25. 使用Redis Stream来做消息队列和在Asp.Net Core中的实现
  26. 海量列式非关系数据库HBase 架构,shell与API
  27. Architecture, Shell et API de base de données non relationnelle à grande échelle
  28. Mise en œuvre de l'arbre Rouge et noir en Java
  29. Java Date Time API Series 42 - - a efficient Chinese Date Format and Analysis Method
  30. 5 minutes pour insérer 10 millions de données dans la base de données MySQL
  31. Passage rapide de Windows à l'environnement Linux
  32. Notes on Java backend development of PostgreSQL (I)
  33. 海量列式非關系數據庫HBase 架構,shell與API
  34. Byte Jump the latest open source, the most Classic hashtap Graph details,
  35. L'interview Java de Byte Hopping Society, l'analyse super populaire de l'utilisation et du code source de countdownlatch,
  36. "Anti Mafia storm" Wang Zhifei's love history is really wonderful: he divorced Zhang Xinyi and married a 14-year-old wife
  37. In spring in the jade mansion, Jia Fengyuan was not moved by his brother's death. Why was su Yingxue changed? The reason is realistic
  38. Adam Oracle Oracle fully constructs Adam token incentive for ecological development
  39. 实战SpringCloud通用请求字段拦截处理,超过500人面试阿里,
  40. 宅家36天咸鱼翻身入职腾讯,Zookeeper一致性级别分析,
  41. The first starcoin & move hacksong source code analysis - P (a)
  42. Zhaijia 36 days Salt Fish turn into Tencent, Zookeeper Consistency level analysis,
  43. Traitement de l'interception des champs de demande communs de Spring Cloud, plus de 500 personnes interviewent Ali,
  44. About JavaScript modules
  45. Object oriented programming (2)
  46. Java日期时间API系列42-----一种高效的中文日期格式化和解析方法
  47. Java日期時間API系列42-----一種高效的中文日期格式化和解析方法
  48. 宅家36天鹹魚翻身入職騰訊,Zookeeper一致性級別分析,
  49. Java Date Time API Series 42 - - a efficient Chinese Date Format and Analysis Method
  50. 已成功拿下字节、腾讯、脉脉offer,7年老Java一次操蛋的面试经历,
  51. 小米Java社招面试,每次面试必问的二叉树的设计与编码,
  52. 小米Java校招面试,阿里、百度、美团、携程、蚂蚁面经分享,
  53. 小米Java校招面試,阿裏、百度、美團、攜程、螞蟻面經分享,
  54. Xiaomi Java School Recruitment interview, Ali, baidu, meituan, ctrip, ant Facebook Sharing,
  55. La conception et le codage de l'arbre binaire requis pour chaque entrevue d'embauche de la société Java millet;
  56. A remporté avec succès Byte, Tencent, Pulse offer, 7 ans Java une expérience d'entrevue de baise,
  57. 干货来袭,Java岗面试12家大厂成功跳槽,
  58. 常用Java框架面试题目,现在做Java开发有前途吗?
  59. 常用Java框架面試題目,現在做Java開發有前途嗎?
  60. Les questions d'entrevue couramment utilisées pour le cadre Java sont - elles prometteuses pour le développement Java?