Mango milk 2021-02-23 16:05:34
k8s-prometheus prometheus disk

prometheus Storage

Layout on disk
The sample taken is divided into two hours . Each two-hour period contains a directory , The directory contains one or more block files , This file contains all time series samples of this time window , And metadata files and index files ( Time series used to index measure names and labels into block files ) ).
adopt API When deleting a series , Delete records are stored in a separate logical delete file ( Instead of deleting data from the block file immediately ).
The block of the current incoming sample remains in memory , It's not fully preserved .Prometheus When the server restarts after a crash , By prewriting logs (WAL) Prevent collapse , To prevent collapse .
Prewrite log file wal With 128MB Is stored in the directory . These files contain raw data that has not been compressed , So they are much larger than regular block files .
Prometheus Will at least keep 3 Pre write log files , But high traffic servers may see more than three WAL file , Because it needs to keep raw data for at least two hours .
The first two hours of blocks will eventually be compressed into longer blocks in the background .
Compression creates larger blocks , Most of the rental time 10%, namely 21 God , Whichever is less .
--storage.tsdb.path: That's for sure Prometheus Where to write to its database . The default is data/.
--storage.tsdb.retention.time: This determines when to delete old data . The default is 15d.storage.tsdb.retention If this flag is set to any value other than the default , Coverage .
--storage.tsdb.retention.size:[EXPERIMENTAL] This determines the maximum number of bytes that the memory block can use ( Please note that , This does not include WAL size , It can be very big ). The oldest data will be deleted first . The default is 0 Or disable . The logo is experimental , Changes can be made in future releases . Supported units :KB,MB,GB,PB. for example :“ 512MB”
--storage.tsdb.retention: This flag has been discarded , It is recommended to use storage.tsdb.retention.time.
--storage.tsdb.wal-compression: This flag enables write ahead logging (WAL) Compression of . Based on your data , You can expect WAL The size will be halved , And extra CPU The load is very small . Please note that , If this flag is enabled , And then Prometheus Demote to 2.11.0 The following version , Then you will need to delete WAL, Because it will not be readable .
On average, , Prometheus used only about 1-2 Bytes . therefore , To plan Prometheus The capacity of the server , You can use the following rough formula :
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
To adjust the rate of samples taken per second , It can reduce the number of time series captured ( Each target has fewer targets or fewer sequences ), Or you can increase the fetch interval . however , Because of compressing the samples in the sequence , Reducing the number of sequences may be more effective .
for example :
The set of nodes to be monitored is $$nodes={i|i>0}$$ 7
node i The number of measuring points on is $$metrics(i)$$ 50
The node i The grabbing time interval of is $$interval(i)$$, 15s
retention_time_seconds: 15d*24h*60min*60s
ingested_samples_per_second: 1/scrape_interval = 1/15s
needed_disk_space = 15d*24h*60min*60s * 1/15s * 1-2/byte * 7 * 50 = 60480000/byte /1024/1024 =57.678MB
本文为[Mango milk ]所创,转载请带上原文链接,感谢

