Design a high concurrency and high availability HTTP service for IP query based on flash

Bright brother in the workplace 2020-11-11 10:58:09
design high concurrency high availability

The structure design

The infrastructure is flask+gunicorn+ Load balancing , Load balancing is divided into Alibaba cloud hardware load balancing service and software load nginx.gunicorn Use supervisor Conduct management .

Use nginx Software load structure diagram

Use alicloud hardware load balancing service structure diagram

because flask app It needs to be saved in memory ip Trees and countries 、 Province 、 City related dictionaries , So it takes up a lot of memory .gunicorn Of 1 individual worker Take up 300M Memory ,nginx Of 4 individual worker Less memory consumption ( Less than 100M), So occupy 1.3G Of memory ( That is, you need a 2G Memory servers ). When gunicorn When any node hangs up or upgrades , Another node is still using , It doesn't affect the overall service

ip database

IP library ( Also called IP Address database ), It is collected by professional and technical personnel through a variety of technical means over a long period of time , And there are professionals to update it for a long time 、 maintain 、 Add .

ip Database parsing query code

Implementation based on binary search tree

import struct
from socket import inet_aton, inet_ntoa
import os
import sys
_unpack_V = lambda b: struct.unpack("<L", b)
_unpack_N = lambda b: struct.unpack(">L", b)
_unpack_C = lambda b: struct.unpack("B", b)
class IpTree:
def __init__(self):
self.ip_dict = {}
self.country_codes = {}
self.china_province_codes = {}
self.china_city_codes = {}
def load_country_codes(self, file_name):
path = os.path.abspath(file_name)
with open(path, "rb") as f:
for line in f.readlines():
data = line.split('\t')
self.country_codes[data[0]] = data[1]
# print self.country_codes
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def load_china_province_codes(self, file_name):
path = os.path.abspath(file_name)
with open(path, "rb") as f:
for line in f.readlines():
data = line.split('\t')
provinces = data[2].split('\r')
self.china_province_codes[provinces[0]] = data[0]
# print self.china_province_codes
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def load_china_city_codes(self, file_name):
path = os.path.abspath(file_name)
with open(path, "rb") as f:
for line in f.readlines():
data = line.split('\t')
cities = data[3].split('\r')
self.china_city_codes[cities[0]] = data[0]
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def loadfile(self, file_name):
ipdot0 = 254
path = os.path.abspath(file_name)
with open(path, "rb") as f:
local_binary0 =
local_offset, = _unpack_N(local_binary0[:4])
local_binary = local_binary0[4:local_offset]
# 256 nodes
while ipdot0 >= 0:
middle_ip = None
middle_content = None
lis = []
# offset
begin_offset = ipdot0 * 4
end_offset = (ipdot0 + 1) * 4
# index
start_index, = _unpack_V(local_binary[begin_offset:begin_offset + 4])
start_index = start_index * 8 + 1024
end_index, = _unpack_V(local_binary[end_offset:end_offset + 4])
end_index = end_index * 8 + 1024
while start_index < end_index:
content_offset, = _unpack_V(local_binary[start_index + 4: start_index + 7] +
content_length, = _unpack_C(local_binary[start_index + 7])
content_offset = local_offset + content_offset - 1024
content = local_binary0[content_offset:content_offset + content_length]
if middle_content != content and middle_content is not None:
contents = middle_content.split('\t')
lis.append((middle_ip, (contents[0], self.lookup_country_code(contents[0]),
contents[1], self.lookup_china_province_code(contents[1]),
contents[2], self.lookup_china_city_code(contents[2]),
contents[3], contents[4])))
middle_content, = content,
middle_ip = inet_ntoa(local_binary[start_index:start_index + 4])
start_index += 8
self.ip_dict[ipdot0] = self.generate_tree(lis)
ipdot0 -= 1
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def lookup_country(self, country_code):
for item_country, item_country_code in self.country_codes.items():
if country_code == item_country_code:
return item_country, item_country_code
return 'None', 'None'
except KeyError:
return 'None', 'None'
def lookup_country_code(self, country):
return self.country_codes[country]
except KeyError:
return 'None'
def lookup_china_province(self, province_code):
for item_province, item_province_code, in self.china_province_codes.items():
if province_code == item_province_code:
return item_province, item_province_code
return 'None', 'None'
except KeyError:
return 'None', 'None'
def lookup_china_province_code(self, province):
return self.china_province_codes[province.encode('utf-8')]
except KeyError:
return 'None'
def lookup_china_city(self, city_code):
for item_city, item_city_code in self.china_city_codes.items():
if city_code == item_city_code:
return item_city, item_city_code
return 'None', 'None'
except KeyError:
return 'None', 'None'
def lookup_china_city_code(self, city):
return self.china_city_codes[city]
except KeyError:
return 'None'
def lookup(self, ip):
ipdot = ip.split('.')
ipdot0 = int(ipdot[0])
if ipdot0 < 0 or ipdot0 > 255 or len(ipdot) != 4:
return None
d = self.ip_dict[int(ipdot[0])]
except KeyError:
return None
if d is not None:
return self.lookup1(inet_aton(ip), d)
return None
def lookup1(self, net_ip, (net_ip1, content, lefts, rights)):
if net_ip < net_ip1:
if lefts is None:
return content
return self.lookup1(net_ip, lefts)
elif net_ip > net_ip1:
if rights is None:
return content
return self.lookup1(net_ip, rights)
return content
def generate_tree(self, ip_list):
length = len(ip_list)
if length > 1:
lefts = ip_list[:length / 2]
rights = ip_list[length / 2:]
(ip, content) = lefts[length / 2 - 1]
return inet_aton(ip), content, self.generate_tree(lefts), self.generate_tree(rights)
elif length == 1:
(ip, content) = ip_list[0]
return inet_aton(ip), content, None, None
if __name__ == "__main__":
import sys
ip_tree = IpTree()
print ip_tree.lookup('')

http request

Provide ip Query service GET Request and POST request

@ip_app.route('/api/ip_query', methods=['POST'])
def ip_query():
ip = request.json['ip']
except KeyError as e:
raise InvalidUsage('bad request: no key ip in your request json body. {}'.format(e), status_code=400)
if not is_ip(ip):
raise InvalidUsage('{} is not a ip'.format(ip), status_code=400)
res = ip_tree.lookup(ip)
except Exception as e:
raise InvalidUsage('internal error: {}'.format(e), status_code=500)
if res is not None:
return jsonify(res)
raise InvalidUsage('no ip info in ip db for ip: {}'.format(ip), status_code=501)
@ip_app.route('/api/ip_query', methods=['GET'])
def ip_query_get():
ip = request.values.get('ip')
except ValueError as e:
raise InvalidUsage('bad request: no param ip in your request. {}'.format(e), status_code=400)
if not is_ip(ip):
raise InvalidUsage('{} is not a ip'.format(ip), status_code=400)
res = ip_tree.lookup(ip)
except Exception as e:
raise InvalidUsage('internal error: {}'.format(e), status_code=500)
if res is not None:
return jsonify(res)
raise InvalidUsage('no ip info in ip db for ip: {}'.format(ip), status_code=501)

POST The request needs to contain something like the following in the body of the request json Field

"ip": ""

GET The request is in the form of :

Service deployment

Install dependency library

Dependent Library requirements.txt as follows :


Installation method :pip install -r requirements.txt

To configure supervisor

vim /etc/supervisor/conf.d/ip_query_http_service.conf, The contents are as follows

directory = /root/qk_python/ip_query
command = gunicorn -w10 -b0.0.0.0:8080 ip_query_app:ip_app --worker-class gevent
autostart = true
startsecs = 5
autorestart = true
startretries = 3
user = root

After the content is added , Need to create stdout_logfile and stderr_logfile These two directories , otherwise supervisor Start will report an error . And then update supervisor start-up ip_query_http_service process .

# start-up supervisor
supervisord -c /etc/supervisor/supervisord.conf
# to update supervisor service
supervisorctl update

About supervisor See the resources at the end of the article for common operations .

install nginx

If it is in the form of soft load, it needs to be installed nginx, Compilation and installation nginx See the resources at the end of the article .

To configure nginx

vim /usr/local/nginx/nginx.conf, Modify the configuration file as follows :

#user nobody;
#nginx Number of processes , It is recommended to set equal to CPU Total core number .
worker_processes 4;
#error_log logs/error.log;
#error_log logs/error.log notice;
# Global error log definition type ,[ debug | info | notice | warn | error | crit ]
error_log logs/error.log info;
# Process documents
pid logs/;
# One nginx The maximum number of file descriptors opened by the process , The theoretical value should be the maximum number of open files ( Value of the system ulimit -n) And nginx Divide the number of processes , however nginx Allocation requests are not even , So suggestions and ulimit -n Consistent values for .
worker_rlimit_nofile 65535;
events {
# Refer to the event model linux Next use epoll
use epoll;
# Maximum connections per process ( maximum connection = The number of connections * Number of processes )
worker_connections 65535;
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
sendfile on;
#keepalive_timeout 0;
keepalive_timeout 65;
tcp_nopush on; # Prevent network congestion
tcp_nodelay on; # Prevent network congestion
#gzip on;
server {
# The proxy port provided by the convergence service is configured here .
listen 9000;
server_name localhost;
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
# root html;
# index index.html index.htm;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
# Back end Web The server can go through X-Forwarded-For Get the user's reality IP
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
client_max_body_size 10m; # Maximum number of single file bytes allowed for client requests
client_body_buffer_size 128k; # The maximum number of bytes requested by the buffer agent to buffer the client ,
proxy_buffer_size 4k; # Setting up a proxy server (nginx) Buffer size to hold user header information
proxy_temp_file_write_size 64k; # Set cache folder size , More than that , Will be taken from upstream Server transfer
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;

Pressure test

Do a stress test , Choosing the right tool is the premise . In the following tools ,jmeter Running on the windows There are many machines , Other tool suggestions run in *nix On the machine .

Stress testing tool selection

Tool name Advantages and disadvantages Suggest
ApacheBench(ab) The command is easy to use , Efficient , The statistical information is perfect , Pressure machine memory pressure is small recommend
locust python To write , Low efficiency , Limited by GIL, You need to write python The test script Not recommended
wrk The command is easy to use , Efficient , Statistical information is refined , Few pits , Report less mistakes Most recommended
jmeter be based on java,Apache Open source , Graphical interface , Easy to operate recommend
webbench Easy to use , But not supported POST request commonly
tsung erlang To write , There are many configuration templates , More complicated Not recommended

All the six tools mentioned above have been used in person , Next choice ab、wrk、jmeter The three tools briefly explain how to install and use , How to use other tools if necessary , On their own google



apt-get install apache2-utils

common options

option meaning
-r When receiving socket Wrong time ab Do not exit
-t The maximum time to send a request
-c Concurrency number , Number of requests constructed at one time
-n Number of requests sent
-p postfile, The specified contains post Data files
-T content-type, Appoint post and put The type of request body when sending a request


test GET request

ab -r -t 120 -c 5000

test POST request

ab -r -t 120 -c 5000 -p /tmp/post_data.txt -T 'application/json'

among /tmp/post_data.txt The content of the document is to be sent -T Data in specified format , Here is json Format

{"ip": ""}



apt-get install libssl-dev
git clone
cd wrk
cp wrk /usr/sbin

common options

option meaning
-c Number of open connections , That is, concurrent number
-d Stress testing time : The maximum time to send a request
-t The number of threads used by the pressure machine
-s Specify the... To load lua Script
--latency Print delay statistics


test GET request

wrk -t10 -c5000 -d120s --latency

test POST request

wrk -t50 -c5000 -d120s --latency -s /tmp/wrk_post.lua

among /tmp/wrk_post.lua The contents of the file are to be loaded lua Script , Appoint post Of path,header,body

request = function()
path = "/api/ip_query"
wrk.headers["Content-Type"] = "application/json"
wrk.body = "{\"ip\":\"\"}"
return wrk.format("POST", path)



install jmeter Installation is required jdk1.8. And then in Apache The official website can download jmeter, Download this



The above picture is from a test bull , Very detailed , complete xmind For file download, see :jmeter- Zhang Bei .xmind

jmeter You can also refer to the resources section at the end of the article : Use Apache Jmeter Do concurrent stress testing

Analysis of stress test results

wrk GET Request pressure test results

[email protected]:/tmp# wrk -t10 -c5000 -d60s --latency
Running 1m test @
10 threads and 5000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 897.19ms 322.83ms 1.99s 70.52%
Req/Sec 318.80 206.03 2.14k 68.84%
Latency Distribution
50% 915.29ms
75% 1.11s
90% 1.29s
99% 1.57s
187029 requests in 1.00m, 51.01MB read
Socket errors: connect 0, read 0, write 0, timeout 38
Requests/sec: 3113.27
Transfer/sec: 869.53KB

ab GET Request pressure test results

[email protected]:/tmp# ab -r -t 60 -c 5000
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,
Benchmarking (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: gunicorn/19.7.1
Server Hostname:
Server Port: 8080
Document Path: /api/ip_query?ip=
Document Length: 128 bytes
Concurrency Level: 5000
Time taken for tests: 19.617 seconds
Complete requests: 50000
Failed requests: 2
(Connect: 0, Receive: 0, Length: 1, Exceptions: 1)
Total transferred: 14050000 bytes
HTML transferred: 6400000 bytes
Requests per second: 2548.85 [#/sec] (mean)
Time per request: 1961.668 [ms] (mean)
Time per request: 0.392 [ms] (mean, across all concurrent requests)
Transfer rate: 699.44 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 597 1671.8 4 15500
Processing: 4 224 201.4 173 3013
Waiting: 4 223 200.1 172 2873
Total: 7 821 1694.4 236 15914
Percentage of the requests served within a certain time (ms)
50% 236
66% 383
75% 1049
80% 1155
90% 1476
95% 3295
98% 7347
99% 7551
100% 15914 (longest request)

jmeter GET Request pressure test results

Result analysis

The pressure test results of the above three tools are basically the same ,RPS(Requests per second) It's about 3000 about , At this time, the machine is configured as 4 nucleus 4G Memory , also gunicorn opened 10 individual worker, Memory footprint 3.2G. A single machine has only 3000 Concurrent , For machines with this configuration , Further analysis of the cause is needed . And then we'll get another machine , After load balancing, we can achieve 5000 The above can meet the requirements of use .

Notes on pressure test

Number of open files

During the pressure test, there is a general requirement for the opening number of documents of the pressure machine , More than that 1024 individual open files, Need to increase the linux The number of open files in the system , Increase method :

# Number of open files
ulimit -a
# Modify the number of open files
ulimit -n 500000

SYN Flood attack protection

linux There is a parameter in the system :/etc/sysctl.conf In the configuration file net.ipv4.tcp_syncookies Field . The default value of this field is 1, Indicates that the system will detect SYN Flood attack , And turn on the protection . Therefore, during the pressure measurement , If you send a request for a large amount of repetitive data , Pressure machine SYN After enabling queue overflow SYN cookie, As a result, there will be a large number of request timeout failures . Alibaba cloud's load balancing has SYN Flood attack detection and DDos Attack detection function , So there are two things you need to pay attention to when doing stress testing :

  • When testing, turn off the load balancing machine properly net.ipv4.tcp_syncookies Field
  • When making data, try to avoid a lot of repetitive data , To avoid being identified as an attack .

gunicorn Introduction and tuning

About gunicorn You can refer to the test report for your choice :Python WSGI Server Performance analysis

In selected gunicorn As WSGI server after , You need to choose the appropriate one according to the machine worker The quantity and each of them worker Of worker-class.

worker Quantity selection

every last worker Run as a separate child process , All hold a separate memory data , Every increase or decrease of one worker, There is a significant multiple change in system memory . At first, a single machine gunicorn Turn on 3 individual worker, The system only supports 1000RPS Concurrent . When put worker Expand to 9 After , System support 3000RPS Concurrent . So when there's enough memory , You can add worker Number .

worker-class choice

You can refer to the reference at the end of the article gunicorn Commonly used settings and Gunicorn several Worker class Performance test comparison These two articles .

take gunicorn At startup worker-class From the default sync Change to gevent after , System RPS Double directly .

worker-class worker Number ab The test of RPS
sync 3 573.90
gevent 3 1011.84

gevent rely on :gevent >= 0.13. So you need to use pip install . Corresponding gunicorn start-up flask The applied command needs to be modified to :

gunicorn -w10 -b0.0.0.0:8080 ip_query_app:ip_app --worker-class gevent


improvement ip Database accuracy

Lose efficiency for accuracy : Use a single ip There will be some ip Can't find out the result , And abroad ip Generally, it can only be accurate to the country . It can balance several ip The accuracy and coverage of the database , When you can't find out the exact address information, go to the other ip database .

Increase the concurrency of a single machine

From initiating a request , To WSGI The server processes , To the application interface , To ip Querying each process requires a separate analysis of the amount of executable per second , Then analyze the bottleneck of the system , Fundamentally improve the concurrency of single machine .

Reference material

Remember to give me some praise !

Carefully organized the computer in all directions from the entry 、 Advanced 、 Practical video courses and e-books , Classify according to the catalogue , Always find the learning materials you need , What are you waiting for ? Pay attention to downloads !!!


Not forget for a moment , There must be an echo , Guys, please give me a compliment , Thank you very much .

I'm a bright brother in the workplace ,YY Senior software engineer 、 Four years working experience , The slash programmer who refuses to be the leader of salted fish .

Listen to me , More progress , Program life is a shuttle

If I'm lucky enough to help you , Please order one for me 【 Fabulous 】, Pay attention , If you can give me a little encouragement with your comments , Thank you very much .

A list of articles by Liang Ge in the workplace : More articles


All my articles 、 The answers are all in cooperation with the copyright protection platform , The copyright belongs to brother Liang , unaccredited , Reprint must be investigated !

本文为[Bright brother in the workplace]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云