JVM Reuse is Hadoop Tune the contents of the parameters , the Hive Performance has a very big impact , Especially for scenes where it is difficult to avoid small files or task A lot of scenes , Most of these scenarios have a short execution time .
Hadoop The default configuration is usually to use derivation JVM To execute map and Reduce Mission . At this time JVM The startup process can be quite expensive , Especially for execution job There are hundreds of thousands of them task Mission status .JVM Reuse allows JVM Instance in the same job Reuse in N Time .N The value of can be in Hadoop Of mapred-site.xml File to configure . Usually in 10-20 Between , How much needs to be tested according to specific business scenarios .
<property> <name>mapreduce.job.jvm.numtasks</name> <value>10</value> <description>How many tasks to run per jvm. If set to -1, there is no limit. </description> </property>
We can also do that hive Through
This setting sets our jvm reusing
Of course , This function also has its disadvantages . Turn on JVM Reuse will always be used to task slot , For reuse , Not released until the mission is complete . If a “ unbalanced ”job Some of them reduce task It takes longer to execute than the others Reduce task It takes a lot of time , Then the reserved slot will always be free but cannot be otherwise job Use , Until all task It won't be released until it's over .
That's all for this sharing , If you benefit, please remember to leave a compliment before you leave ٩(๑>◡<๑)۶
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .