Consider the past you shall know the future
Knowledge is a thing , It seems that we really need to review the old and learn the new , Never , Forget about it .
The business is simple ： Need to batch insert some data , The data source may be tables from other databases , It could also be an external excel Import of .
So here comes the question , Is it necessary to check before inserting each time , See if it's repeated , Filter the data in the code , The repeated ones will be filtered out ？
When inserting values into big data databases , Also judge whether the insertion is repeated , Then insert . How to improve efficiency ？
It seems that I am not alone in this problem .
There are many solutions , Different scenarios have different solutions , When the amount of data is small , Whatever you do , But when there's a lot of data , This is not a simple problem
Millions of data , It's impossible to find out, come and go, reprocess ！
Talk about me Google The solution to the problem .
1、insert ignore into
When inserting data , If there is an error , Like duplicate data , No errors will be returned , Only return... In the form of a warning . So use ignore Make sure there is no problem with the statement itself , Otherwise, it will be ignored . for example ：
INSERT IGNORE INTO user (name) VALUES ('telami')
This method is very simple , But there's a possibility , The insertion is not due to duplicate data , But for other reasons , Also ignored ～
2、on duplicate key update
When primary perhaps unique When repeated , execute update sentence , Such as update After that is the useless statement , Such as id=id, Same with 1 Function the same , But mistakes don't go unnoticed .
for example , In order to achieve name Repeated data insertion does not report error , You can use the following statement ：
INSERT INTO user (name) VALUES ('telami') ON duplicate KEY UPDATE id = id
There is a prerequisite for this approach , Namely , Constraints that need to be inserted , It needs to be a primary key or a unique constraint （ In your business, if you want to be the only judge, set that field as the only constraint, that is unique key）.
3、insert … select … where not exist
according to select Whether to insert , It's not just through primary and unique To judge , Other conditions can also be adopted . for example ：
INSERT INTO user (name) SELECT 'telami' FROM dual WHERE NOT EXISTS (SELECT id FROM user WHERE id = 1)
This method actually uses mysql The way to create a temporary table , But it uses subqueries , Efficiency also has a little impact , If you can use the above, don't use this .
If there is primary or unique The same record , Delete first . Insert a new record .
REPLACE INTO user SELECT 1, 'telami' FROM books
This method is whether there are the same records or not , Will be deleted first and then inserted .
The choice is the second way
<insert id="batchSaveUser" parameterType="list"> insert into user (id,username,mobile_number) values <foreach collection="list" item="item" index="index" separator=","> ( ) </foreach> ON duplicate KEY UPDATE id = id </insert>
It's used here Mybatis, An operation of batch insertion ,mobile_number A unique constraint has been imposed . In this way, during batch insertion , If the mobile phone number is the same , It won't be inserted any more .