I've helped some large customers use Kafka As a messaging backbone to build a microservice style architecture , And have a good understanding of its functions and the use cases that really make them work . But I'm definitely not Kafka's defense lawyer . Any technology that has experienced such rapid adoption of the curve is bound to polarize its audience , And attract some developers in some wrong way ,Kafka No exception . Like anything else , It takes a lot of time for you to fully understand Kafka And event flow , Then you can be fully proficient and use your abilities . Along the way , Prepare for some frustration .

I've sorted out some shortcomings , These shortcomings can cause frustration among developers , Or catch up with unsuspecting beginners . There's no special order :

Too many adjustable parameters

Kafka Medium Configuration parameters There may be countless , Not just for beginners , The same is true for experienced professionals . Maybe it's in addition to JVM The only exception to this is , I can't think of another technical tool with so many configuration parameters .

This is not to say that configuration options are not required . But some people want to know , How many of these parameters can be replaced by ergonomics , It's like Java Yes G1 What we did . therefore , Instead of specifying too many individual thresholds and tolerances , Let the operation and maintenance personnel set performance goals , And let the system get the best value set which can meet the goal .

Unsafe default values  

This is my biggest complaint about configuration options .Kafka The author puts forward several bold propositions about its order and the strength of message delivery assurance . If you think the default is wise , Then you will be forgiven , Because the default value should make security better than other competing qualities .

Kafka Default values are usually optimized for performance , And when security is critical , You need to explicitly override the default value on the client . ( Performance and security are contradictory ,kafka The default value of only takes care of performance , If you think about security over performance , So you can't use these defaults )

Fortunately, , Setting properties to ensure that security has only a small impact on performance - Kafka is still a beast . Remember the first rule of optimization : Don't do this . If Kafka's creators give more consideration , Kafka would have been better .

Some concrete examples :

  • enable.auto.commit :— The default is true, This causes users to 5 One offset per second ( from  

    auto.commit.interval.ms), Whether or not the consumer has completed the processing of the record . Usually , This is not what you want , Because it can lead to mixed message bottoming semantics - In the case of user failure , Some records may be passed twice , Other records may not be delivered at all . By default , It should be set to false, Let the client application specify commit .

  • max.in.flight.requests.per.connection

    — The default is 5, If one ( Or more ) The queued message timed out and tried again , It may result in disorderly release of messages . It should be changed here. The default is 1.

Appalling tools

Inconsistent naming of command line parameters , And the simple operation of publishing key messages requires you to skip : Deliver obscure 、 Unrecorded properties . Some native features are not even supported , For example, record head . The availability of built-in tools is Kafka Well known pain points in the community .

It's a shame . It's like buying a Ferrari , But it was delivered with plastic hub caps . For a long time , majority Kafka The practitioners gave up the ready-made CLI Utilities , And turn to other open source tools ( for example Kafdrop,Kafkacat And third party commercial products , for example Kafka Tool).

Complex boot process

The bootstrap and service discovery process that clients use to establish proxy connections is complex , And it's easy to confuse users . Initially, the client will be provided with a list of proxy addresses and ports . then , The client will connect directly to an address , Find the remaining proxy nodes , And then directly establish a new connection with the discovered node .

In a simple , In homogeneous network settings , It's very simple , Where all connections from all clients and peers traverse a single entry . In heterogeneous networks , There may be multiple entry points to isolate broker to broker communication , Internal clients living on the same local network and possibly through Internet Connected external clients .

guide / The discovery process requires special configuration , You need a dedicated listener and a set of Individual notified listeners , these The listener Will be presented to the connected client .

The crumbling Client Library

Use Java,Python,.NET and C The quality of client libraries written in languages other than / Maturity is not up to standard . If you are Java Developer , Then it's done – That's where most of the development work is concentrated . however Golang And other communities have been working hard to get a stable Library , Although some of them “ Independent ” The library has been around for years , But the number and severity of some of the errors I encounter in these languages are really relevant .

Lack of real multi tenant

According to the Kafka My defenders say , It supports multi tenancy . Its design is limited to access control lists (ACL) To isolate themes and maintain quotas , This gives the client the illusion of isolation , But isolation is not created in the management plane . This means that your refrigerator supports multi tenancy , Because it allows you to store food on different shelves .

A true multi tenant solution will provide multiple logical clusters in a larger physical cluster . These logical clusters can be managed separately ; for example , In a logical cluster ACL The configuration error of has no effect on other logical clusters .

Lack of regional awareness

Geocopy is not built into the agent , And it's recognized as high performance Kafka Cluster sum “stretch” Topologies don't mix . There's an open source project  MirrorMaker  , It's actually a pipe , Used to pump records from one cluster to another , Without retaining any key metadata ( For example, offset ).

Confluent With its proprietary tools Replicator  , This tool   Metadata will be preserved , But it's permissible Confluent Enterprise Part of the kit .

To make a long story short , Despite the above , I don't say Kafka is rubbish - contrary . Of course ,Kafka It's not without flaws . Say in a light way , The tools are sub standard .Kafka The breadth of configuration options is overwhelming , The default settings are full of pitfalls , You can shock those unsuspecting beginners at any time .

however , As an event flow platform ,Kafka Changed the way we're architecting and building complex systems . It gives us choices , It's a good thing . Its benefits go beyond the superfluous , And make those who have been so actively adopted technology tied to all the trouble .


