Comparison of Oracle, NoSQL and newsql database technologies

voltdb 2020-11-11 13:49:16
comparison oracle nosql newsql database


About author
John Ryan Experienced data warehouse architect 、 Developers and database administrators . He specializes in too many bytes Oracle On the system Kimball Dimension design , In many different industries, such as mobile phones and investment banking, it has accumulated more than 30 Year of IT Experience .
This article was first published as part of a series of articles on databases and big data .

01 The world has changed

In the past 20 year , The world has changed dramatically . stay 2000 In the year , There are only a few millions of people on the Internet , Or with a desktop computer 56k The cat came to the Internet , At that time, Amazon only sold books . today , Billions of people use smart phones or tablets every week 7 God 、 Every day 24 On the hour network , Almost everything is bought online , Also use Facebook、Twitter and Instagram These social apps interact with people . be a trend which cannot be halted .
People's psychological expectations have also changed . If the page doesn't refresh in a few seconds , We lost patience immediately , Change to another website . If a website is not accessible , We fear that that is the end of civilization as we know it . If a large website cannot be accessed , It's going to be big global news .
Instant gratification is not enough !
(Instant gratification takes too long!)
— Ladawn Clare-Panton

notes : If you're not an experienced database architect , You may need to read my previous articles on scalability and database architecture .

02 What has changed ?

The following conclusions can be drawn from the above :

  • Extensibility — Potentially explosive traffic growth ,IT The system needs to scale up quickly , To deal with exponential growth
  • High availability — IT The system has to be weekly 7 God 、 Every day 24 Hour run , And it must be fault tolerant .( Bank of America 2011 A breakdown occurred once a year , Yes 2900 Million customers for six days ).
  • High performance — With the increasing scalability , Performance has to keep up with , Keep it steady and fast . According to Amazon estimates , In extreme cases , Every additional second of page load time , The company loses every year 16 Billion dollars .
  • Speed — More and more networking sensors come with the device ( Far do not say , Smart phones come with built-in networked sensors ), There may be millions of transactions to be processed per second .
  • Real time analysis — Batch processing and business intelligence at night are out of date . The boundary between analysis and manipulation becomes blurred , There is a growing need for real-time decision-making .

The Internet of things (Internet of Things) Let's speed up sharply !
— Stonebraker Doctor (MIT) .

The above needs have led to wonderful marketing terms Translytical database , It means a hybrid solution , That is, the same solution can handle massive transactions , Real time analysis can also be done .

03 What's the problem ?

Provide high performance while reducing costs ( You may also want to use cheap servers ), It's a challenge for all database vendors . however , There are conflicting needs :

  • performance — Minimize latency , Complete transactions in milliseconds .
  • Usability — Even if one or more nodes of the system fail or are disconnected from the network , Can also maintain the ability to run .
  • Extensibility — Can continue to scale up , To meet the requirements of massive data and transaction speed .
  • Uniformity — Provide consistency 、 Accurate results — Especially in case of network failure .
  • Durability — Make sure that the modification will not be lost once implemented .
  • flexibility — Provide a common database solution , To support the workload of transaction and analysis .

We should have the ability of massive and progressive expansion , The only realistic way is to deploy a scale out distributed system . Usually , To maximize availability , Changes made to one node are immediately copied to two or more other nodes . however , Once data is allocated to multiple services device , It faces a trade-off between advantages and disadvantages .
for example :

3.1
Performance and availability and durability

many NoSQL The database copies the data to other nodes in the cluster , To improve usability . If The database node crashes immediately after the write operation , The data is backed up on other machines , So the changes are persistent . however , You can also relax this requirement , Return immediately without backup . This maximizes performance , But there's a risk of losing changes . Changes may not last at all .
 Insert picture description here
▲ Geographically distributed systems

3.2
Consistency and availability

NoSQL Databases support ultimate consistency . for example , In the diagram above , If the network with New York The connection is temporarily broken , There are two options :

  • Stop processing — But New York's availability has been affected
  • Accept read / Write operation — Eliminate differences after network connection is restored . But the risk of doing so is to provide expired or wrong results , You may need to solve the problem of writing

obviously ,NoSQL Databases trade consistency for availability .

3.3
Flexibility and scalability

And Oracle and DB2 Compared with general relational database ,NoSQL The database is relatively flexible Bad ,( for example ) I won't support it Join( Connect ) operation . Except for a lot of people who don't support SQL Language database , Some databases ( for example Neo4J and MongoDB) It's designed to support specific problems — Graph processing and JSON data structure .
Even if like HBase、Cassandra and Redis Such a database , Also abandon the relational join operation , But many also restrict access to a single primary key , And it doesn't support secondary indexes .
Many databases claim that 100% Support ACID Business ,
Actually provide formal ACID There are few guarantors .
— Peter Bailis Doctor ( Stanford university )

04 ACID Consistent with the final

Extended aspects of database solutions , One of the main challenges is to maintain ACID Uniformity . Amazon uses DynamoDB database , Relax the consistency constraint , In exchange for speed , This solves the performance problem , This has led to a large number of NoSQL database .
in addition , The most successful database ( Include Oracle) It doesn't provide real ACID Isolation, . This paper studies 18 A database , The default support SerializabilITy( Serializability ) There are only three databases of (VoltDB、Ingres and Berkeley DB). The main reason is that it is difficult to support serializability while maintaining performance .
In the end, consistency is a particularly weak pattern .

The system can return any data , We can still be consistent in the end .
— Peter Bailis Doctor ( Stanford )

On the other hand , Final consistency provides little guarantee of consistency . The following figure illustrates the problem of final consistency . A user deducts money from a bank account 100 Thousands of dollars , But before the account changes are copied , Another user checks the balance of this account . The only guarantee is , As long as there is no further write operation , The system will eventually provide consistent results . What's the use of this ? To be accepted, let alone .
 Insert picture description here

▲ Cassandra — Final consistency

05 Rethink OLTP database

Ten years ago ,Michael Stonebraker The doctor wrote 《 The end of the architecture era 》(The End of an ArchITectural Era) This article , Think Oracle、 Microsoft and IBM Proposed 1970 The database architecture of the S is out of date .
He put forward OLTP The database should have the following characteristics :

  • Dedicated to solving a problem — Quick execution of short predefined ( Not improvised ) Business , The query plan is relatively simple . In short , It's special OLTP platform .
  • accord with ACID standard — All transactions are single threaded , All serializability is provided by default . Always available — Using data replication ( Not hot standby ) To provide high availability , Almost no increase in cost .
  • Geographically dispersed — Run seamlessly on a grid of scattered machines ( Further improve resilience , And locally improve performance )
  • No shared architecture — Multiple machines are connected through a peer-to-peer grid , Share the load . Adding machines is a seamless operation that does not cause downtime , And the loss of one node only causes a slight performance degradation , Instead of shutting down the whole system .
  • Memory based — All in memory , To increase absolute speed , The durability is guaranteed by in memory data replication to other nodes .
  • Eliminate bottlenecks — Completely redesign the database internals , Implementation of single thread running , At the same time, eliminate redo (Redo) Logging and the need for locking and locking — These are the most significant constraints on database performance .

To prove the possibility of the above , He built a prototype , namely H-Store database , And prove using the same hardware , TPC-C Benchmark performance is that of a business competitor 82 times .H-Store The prototype is excellent , It realizes processing every second 70,000 One transaction , And despite a lot of effort by database administrators to tune , A business competitor only 850 individual .

06 Nothing is difficult in the world !

Stonebraker The doctor's achievements are impressive . Previous TCP-C The world record for every CPU The core is about 1,000 One transaction , but H-Store Dual core 2.8GHz Desktop computer , The speed is the original world record 35 times . He was in 2008 Articles from 《 Probe into OLTP 》(OLTP through the Looking Glass) Explains why business databases ( Include Oracle) Why is the performance so poor .
 Insert picture description here

▲ Processing resource consumption of relational database

Shown above , Yes 93% System overhead is used for traditional ( Historical legacy ) Database system of , Including locking 、 Latch and cache management . The total is only 7% The machine resource is dedicated to the task at hand .
H-Store Just by eliminating these bottlenecks , Use memory based processing instead of disk based processing , To achieve the seemingly impossible task , That is, comprehensive ACID Transaction consistency , It has increased the speed by several orders of magnitude .

07 NewSQL Database technology

 Insert picture description here

VoltDB First published in 2010 year , yes H-Store Commercial products of prototypes , Belong to the exclusive use of OLTP platform , be used for Web Transaction processing and real-time analysis . As this information graph shows , There are 250 A commercial database solution , Only one 13 Species are classified as NewSQL The ranks of Technology .

08 VoltDB

And others NewSQL The database is the same ,VoltDB Designed to run completely in memory , Provides the option to take periodic disk snapshots . It can run locally on 64 position Linux, You can also use AWS、 Google and Azure Cloud services to run , Adopt a horizontally scalable architecture .
Traditional relational databases write data to disk based log files .VoltDB Otherwise , It is to modify multiple machines in memory at the same time . for example , Even if two machines fail ,
K-Safety The coefficient is 2 It can guarantee no data loss , Because the data is stored in at least three memory nodes .
Business as Java stored procedure (stored procedure) Submit , It can be executed asynchronously in the database , And the data is automatically partitioned ( Fragmentation ), Assigned to nodes in the system , Although benchmark data can be replicated to maximize connection performance .VoltDB It's a little unusual , That is to say JSON The form of the data structure , Support semi-structured data .
In terms of performance ,2015 A benchmark test conducted in 1998 showed that ,VoltDB The processing speed is at least NoSQL database Cassandra Twice as many , But the cost is only AWS Six times the cost of cloud processing One .
Last ,VoltDB 6 .4 Version passed the extremely harsh Jepsen Distributed security testing .
by comparison , I was right before NoSQL database Riak The tests carried out show that , Even with the strongest one Sex setting , Writing will also drop 30-70%. meanwhile , When using lightweight transactions ,Cas- sandra At the most 5% Writing .

09 MemSQL

And VoltDB The same thing ,MemSQL It is a horizontally extended memory distributed database , Designed for fast data acquisition and real-time analysis . in addition , It can run locally , It can also run on the cloud , And it can automatically partition between different nodes , At every CPU Parallel execution of queries on the core .
 Insert picture description here

▲ Processing resource consumption of relational database
Despite the VoltDB There are many similarities , But the figure above shows an important difference .MemSQL Try to find a balance between the conflicting requirements of real-time transaction and data warehouse historical data processing . So ,MemSQL Store in rows (row store) To store data in memory , And use column oriented disk storage as backup , So it's going to be real-time ( lately ) Data is combined with historical results .
This makes it in OLTP And data warehouse (Data Warehouse) The field has gained a solid position , Although both solutions are aimed at the real-time data acquisition and analysis market .

10 Which applications need NewSQL technology ?

The acquisition speed and response speed are required to be very fast ( Average 1-2 millisecond ), Simultaneous requirements ACID Any application that guarantees the accuracy of the transaction provided — For example, customer billing .
Typical applications include :

  • Real time authorization — for example , Verify for analysis and billing 、 Recording and authorizing mobile phone calls . Usually ,99 .999% All database operations must be in 50 Complete in milliseconds .
  • Real time fraud detection — Used to perform complex analysis queries , Before the transaction is authorized , Accurately determine the possibility of fraud .
  • Game Analysis — It is used according to the player's ability and the player's typical behavior , Real time dynamic modification of game difficulty . The goal is to keep existing players , And turning free customers into paid players . At speed 、 In the case of high availability and accuracy requirements , By using these means, a customer , Increased player spending on games 40%.
  • Individualization Web advertisement — Real time dynamic selection based on Web Personalized advertising , Record ad presentation events for billing purposes , At the same time, the advertising results are recorded for subsequent analysis .

With the vast majority OLTP Application comparison , None of this looks impressive at first , But every week 7 God 、 Every day 24 The world of the hour Internet , These provide new frontiers for real-time analysis , And with the rise of the Internet of things , It also brings great opportunities .

11 Conclusion

although Hadoop More closely related to big data , And it's got a lot of attention lately , But database technology is anything IT The cornerstone of the system .
Similarly ,NoSQL Database provides a fast alternative to relational databases 、 Scalable options Choose , But despite the temptation to license free open source databases , In fact, it's still a dime a coin . in addition , just as VoltDB As shown , In fact, in the long run , Maybe it's better than NoSQL Class selection is cheaper .
On the whole , If there is Web scale 、OLTP and ( or ) Requirements for real-time analysis , You need to think about it NewSQL Class database .

If you are right about VoltDB Industrial Internet of things big data low latency solution 、 Real time data platform management in the whole life cycle , Welcome private message , Enter our official communication group .

版权声明
本文为[voltdb]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云