Flink+Kafka:网易云音乐实时数仓建设实践

InfoQ 2021-01-21 07:36:13
网易 kafka Flink 音乐 flink+kafka


{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,网易云音乐基于 Kafka+Flink 的实时任务达到了 500+,Kafka 集群总节点数达到 200+,单 Kafka 峰值 QPS 400W+。本文由网易云音乐实时计算平台研发工程师岳猛分享,主要从以下四个部分将为大家介绍 Flink + Kafka 在网易云音乐的应用实战:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"背景"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Flink + Kafka 平台化设计"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 在实时数仓中的应用"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"问题 & 改进"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景介绍"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 流平台通用框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前流平台通用的架构一般来说包括消息队列、计算引擎和存储三部分,通用架构如下图所示。客户端或者 web 的 log 日志会被采集到消息队列;计算引擎实时计算消息队列的数据;实时计算结果以 Append 或者 Update 的形式存放到实时存储系统中去。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,我们常用的消息队列是 Kafka,计算引擎一开始我们采用的是 Spark Streaming,随着 Flink 在流计算引擎的优势越来越明显,我们最终确定了 Flink 作为我们统一的实时计算引擎。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/41\/41d6c12393e8d49cbb304329898abbcb.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 为什么选 Kafka?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 是一个比较早的消息队列,但是它是一个非常稳定的消息队列,有着众多的用户群体,网易也是其中之一。我们考虑 Kafka 作为我们消息中间件的主要原因如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高吞吐,低延迟:每秒几十万 QPS 且毫秒级延迟;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高并发:支持数千客户端同时读写;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容错性,可高性:支持数据备份,允许节点丢失;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可扩展性:支持热扩展,不会影响当前线上业务。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 为什么选择 Flink?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache Flink 是近年来越来越流行的一款开源大数据流式计算引擎,它同时支持了批处理和流处理,考虑 Flink 作为我们流式计算引擎的主要因素是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高吞吐,低延迟,高性能;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高度灵活的流式窗口;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"状态计算的 Exactly-once 语义;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轻量级的容错机制;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持 EventTime 及乱序事件;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流批统一引擎。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. Kafka + Flink 流计算体系"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于 Kafka 和 Flink 的在消息中间件以及流式计算方面的耀眼表现,于是产生了围绕 Kafka 及 Flink 为基础的流计算平台体系,如下图所示:基于 APP、web 等方式将实时产生的日志采集到 Kafka,然后交由 Flink 来进行常见的 ETL,全局聚合以及Window 聚合等实时计算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/49\/494603ae231cc70f5efb7c34586b7959.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 网易云音乐使用 Kafka 的现状"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前我们有 10+个 Kafka 集群,各个集群的主要任务不同,有些作为业务集群,有些作为镜像集群,有些作为计算集群等。当前 Kafka 集群的总节点数达到 200+,单 Kafka 峰值 QPS 400W+。目前,网易云音乐基于 Kafka+Flink 的实时任务达到了 500+。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Flink+Kafka 平台化设计"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于以上情况,我们想要对 Kafka+Flink 做一个平台化的开发,减少用户的开发成本和运维成本。实际上在 2018 年的时候我们就开始基于 Flink 做一个实时计算平台,Kafka 在其中发挥着重要作用,今年,为了让用户更加方便、更加容易的去使用 Flink 和 Kafka,我们进行了重构。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于 Flink 1.0 版本我们做了一个 Magina 版本的重构,在 API 层次我们提供了 Magina SQL 和 Magina SDK 贯穿 DataStream 和 SQL 操作;然后通过自定义 Magina SQL Parser 会把这些 SQL 转换成 Logical Plan,在将 LogicalPlan 转化为物理执行代码,在这过程中会去通过 catalog 连接元数据管理中心去获取一些元数据的信息。我们在 Kafka 的使用过程中,会将 Kafka 元数据信息登记到元数据中心,对实时数据的访问都是以流表的形式。在 Magina 中我们对 Kafka 的使用主要做了三部分的工作:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集群 catalog 化;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Topic 流表化;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Message Schema 化。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/be\/beb829cdff7893ef86366bd00b89b19c.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用户可以在元数据管理中心登记不同的表信息或者 catalog 信息等,也可以在 DB 中创建和维护 Kafka 的表,用户在使用的过程只需要根据个人需求使用相应的表即可。下图是对 Kafka 流表的主要引用逻辑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/47\/4779d9ccc8a07dc38f52a533e345136a.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kafka 在实时数仓中的应用"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 在解决问题中发展"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 在实时数仓使用的过程中,我们遇到了不同的问题,中间也尝试了不同的解决办法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在平台初期, 最开始用于实时计算的只有两个集群,且有一个采集集群,单 Topic 数据量非常大;不同的实时任务都会消费同一个大数据量的 Topic,Kafka 集群 IO 压力异常大;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,在使用的过程发现 Kafka 的压力异常大,经常出现延迟、I\/O 飙升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们想到把大的 Topic 进行实时分发来解决上面的问题,基于 Flink 1.5 设计了如下图所示的数据分发的程序,也就是实时数仓的雏形。基于这种将大的 Topic 分发成小的 Topic 的方法,大大减轻了集群的压力,提升了性能,另外,最初使用的是静态的分发规则,后期需要添加规则的时候要进行任务的重启,对业务影响比较大,之后我们考虑了使用动态规则来完成数据分发的任务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/71\/71cfe13f35a0488ac931bce530baef9b.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解决了平台初期遇到的问题之后,在平台进阶过程中 Kafka 又面临新的问题:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"虽然进行了集群的扩展,但是任务量也在增加,Kafka 集群压力仍然不断上升;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集群压力上升有时候出现 I\/O 相关问题,消费任务之间容易相互影响;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用户消费不同的 Topic 过程没有中间数据的落地,容易造成重复消费;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任务迁移 Kafka 困难。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"针对以上问题,我们进行了如下图所示的 Kafka 集群隔离和数据分层处理。其过程简单来说,将集群分成 DS 集群、日志采集集群、分发集群,数据通过分发服务分发到 Flink 进行处理,然后通过数据清洗进入到 DW 集群,同时在 DW 写的过程中会同步到镜像集群,在这个过程中也会利用 Flink 进行实时计算的统计和拼接,并将生成的 ADS 数据写入在线 ADS 集群和统计 ADS 集群。通过上面的过程,确保了对实时计算要求比较高的任务不会受到统计报表的影响。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4c\/4c4d4814841c7d02e88309c72119123c.jpeg","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过上面的过程,确保了对实时计算要求比较高的任务不会受到统计报表的影响。但是我们分发了不同的集群以后就不可避免的面临新的问题:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何感知 Kafka 集群状态?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何快速分析 Job 消费异常?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"针对上面两个问题,我们做了一个 Kafka 监控系统,其监控分为如下两个维度,这样在出现异常的时候就可以进行具体判断出现问题的详细情况:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集群概况的监控:可以看到不同集群对应的 Topic 数量以及运行任务数量,以及每个 Topic 消费任务数据量、数据流入量、流入总量和平均每条数据大小;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指标监控:可以看到 Flink 任务以及对应的 Topic、GroupID、所属集群、启动时间、输入带宽、InTPS、OutTPS、消费延迟以及 Lag 情况。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. Flink + Kafka 在 Lambda 架构下的运用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流批统一是目前非常火的概念,很多公司也在考虑这方面的应用,目前常用的架构要么是 Lambda 架构,要么是 Kappa 架构。对于流批统一来讲需要考虑的包括存储统一和计算引擎统一,由于我们当前基建没有统一的存储,那么我们只能选择了 Lamda 架构。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下图是基于 Flink 和 Kafka 的 Lambda 架构在云音乐的具体实践,上层是实时计算,下层是离线计算,横向是按计算引擎来分,纵向是按实时数仓来区分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5f\/5fa088d4908a41de9ef26343a29dd846.jpeg","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"问题&改进"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在具体的应用过程中,我们也遇到了很多问题,最主要的两个问题是:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多 Sink 下 Kafka Source 重复消费问题;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同交换机流量激增消费计算延迟问题。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 多 Sink 下 Kafka Source 重复消费问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Magina 平台上支持多 Sink,也就是说在操作的过程中可以将中间的任意结果插入到不同的存储中。这个过程中就会出现一个问题,比如同一个中间结果,我们把不同的部分插入到不同的存储中,那么就会有多条 DAG,虽然都是临时结果,但是也会造成 Kafka Source 的重复消费,对性能和资源造成极大的浪费。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"于是我们想,是否可以避免临时中间结果的多次消费。在 1.9 版本之前,我们进行了 StreamGraph 的重建,将三个 DataSource 的 DAG 进行了合并;在 1.9 版本,Magina 自己也提供了一个查询和 Source 合并的优化;但是我们发现如果是在同一个 data update 中有对同一个表的多个 Source 的引用,它自己会合并,但是如果不是在同一个 data update 中,是不会立即合并的,于是在 1.9 版本之后中我们对 modifyOperations 做了一个 buffer 来解决这个问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6c\/6c57638fa56a57310a2d76539161f7a9.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 同交换机流量激增消费计算延迟问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个问题是最近才出现的问题,也可能不仅仅是同交换机,同机房的情况也可能。在同一个交换机下我们部署了很多机器,一部分机器部署了 Kafka 集群,还有一部分部署了 Hadoop 集群。在 Hadoop 上面我们可能会进行 Spark、Hive 的离线计算以及 Flink 的实时计算,Flink 也会消费 Kafka 进行实时计算。在运行的过程中我们发现某一个任务会出现整体延迟的情况,排查过后没有发现其他的异常,除了交换机在某一个时间点的浏览激增,进一步排查发现是离线计算的浏览激增,又因为同一个交换机的带宽限制,影响到了 Flink 的实时计算。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/da\/da9b2ff93ac071221448eb8bf7c56606.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为解决这个问题,我们就考虑要避免离线集群和实时集群的相互影响,去做交换机部署或者机器部署的优化,比如离线集群单独使用一个交换机,Kafka 和 Flink 集群也单独使用一个交换机,从硬件层面保证两者之间不会相互影响。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Q & A"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q1:Kafka 在实时数仓中的数据可靠吗?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A1:这个问题的答案更多取决于对数据准确性的定义,不同的标准可能得到不同的答案。自己首先要定义好数据在什么情况下是可靠的,另外要在处理过程中有一个很好的容错机制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q2:我们在学习的时候如何去学习这些企业中遇到的问题?如何去积累这些问题?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A2:个人认为学习的过程是问题推动,遇到了问题去思考解决它,在解决的过程中去积累经验和自己的不足之处。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q3:你们在处理 Kafka 的过程中,异常的数据怎么处理,有检测机制吗?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A3:在运行的过程中我们有一个分发的服务,在分发的过程中我们会根据一定的规则来检测哪些数据是异常的,哪些是正常的,然后将异常的数据单独分发到一个异常的 Topic 中去做查询等,后期用户在使用的过程中可以根据相关指标和关键词到异常的 Topic 中去查看这些数据。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文转载自:DataFunTalk(ID:datafuntalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文链接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/jCQ6lUejGPzsrHDaNTy6sw","title":"xxx","type":null},"content":[{"type":"text","text":"Flink+Kafka:网易云音乐实时数仓建设实践"}]}]}]}
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://www.infoq.cn/article/JPrjR3fB7NhDNILZAsIj?utm_source=rss&utm_medium=article

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云