[original] KVM QEMU analysis of Linux Virtualization (V) memory virtualization

King Luoyan 2020-11-07 23:59:06
original kvm qemu analysis linux


  • Read the fucking source code! --By Lu xun
  • A picture is worth a thousand words. --By Gorky

explain :

  1. KVM edition :5.9.1
  2. QEMU edition :5.0.0
  3. Tools :Source Insight 3.5, Visio
  4. Articles synchronized in the blog Garden :https://www.cnblogs.com/LoyenWang/

1. summary

《Linux virtualization KVM-Qemu analysis ( Two ) And ARMv8 virtualization 》 This paper describes the general framework of memory virtualization , Let's review :

  1. Access to memory without virtualization

  • CPU Before accessing physical memory , You need to create a mapping table first ( Virtual address to physical address mapping ), Finally, through the way of looking up the table to complete the visit . stay ARMv8 in , The base address of the kernel page table is stored in TTBR1_EL1 in , The user space page table base address is stored in TTBR0_EL0 in ;
  1. Memory access under virtualization

  • In the case of virtualization , Memory access is divided into two Stage,Hypervisor adopt Stage 2 To control the memory view of the virtual machine , Controls whether the virtual machine can access a piece of physical memory , To achieve the purpose of isolation ;
  • Stage 1VA(Virtual Address)->IPA(Intermediate Physical Address),Host Operating system control of Stage 1 Transformation ;
  • Stage 2IPA(Intermediate Physical Address)->PA(Physical Address),Hypervisor control Stage 2 Transformation ;

Take a look at the two pictures above , I think I understand , Think carefully , I don't understand anything , The goal of this paper is to make this process clear .

Before we go into the details , A few concepts need to be described first :

gva - guest virtual address
gpa - guest physical address
hva - host virtual address
hpa - host physical address

  • Guest OS Virtual address to physical address mapping in , It's a typical routine operation , Refer to the previous memory management module series ;

For so long , Come to the two themes of this article :

  1. GPA->HVA;
  2. HVA->HPA;

Let's get started !


Remember the last article 《Linux virtualization KVM-Qemu analysis ( Four ) And CPU virtualization (2)》 Medium Sample Code Do you ?
KVM-Qemu In the plan ,GPA->HVA Transformation , It's through ioctl Medium KVM_SET_USER_MEMORY_REGION Command to achieve , Here's the picture :

We found the entrance , Let's further uncover the mystery .

2.1 data structure

The key data structures are as follows :

  • Virtual machine usage slot To organize physical memory , Every slot Corresponding to one struct kvm_memory_slot, All of a virtual machine slot It makes up its physical address space ;
  • User mode use struct kvm_userspace_memory_region To set up memory slot, Use... In the kernel struct kvm_memslots Structure to make kvm_memory_slot Organize ;
  • struct kvm_userspace_memory_region In the structure , Contains slot Of ID The sign is used to find the corresponding slot, In addition, it also includes the starting address and size of physical memory , as well as HVA Address ,HVA Addresses are allocated in the user process address space , That is to say Qemu A region in the process address space ;

2.2 Process analysis

The data structure section has listed the general relationship , So in KVM_SET_USER_MEMORY_REGION when , The operation around is slots The creation of 、 Delete , Update and other operations , Don't talk much , Here we are :

  • When the user wants to set the memory area , Will eventually be called to __kvm_set_memory_region function , Complete all the logic processing in this function ;
  • __kvm_set_memory_region function , First of all, we'll deal with the incoming struct kvm_userspace_memory_region The validity of each field is detected and judged , It mainly includes the alignment of the address , Detection of scope, etc ;
  • According to the user delivered slot Reference no. , To find the corresponding virtual machine slot, There are only two kinds of search results :1) Find an existing slot;2) If not, create a new one slot;
  • If the parameter passed in memory_size by 0, Then it will correspond to slot Delete operation ;
  • According to the parameters passed in by the user , Set up slot Treatment mode :KVM_MR_CREATE,KVM_MR_MOVE,KVM_MEM_READONLY;
  • According to the parameters passed by the user, decide whether to allocate dirty pages bitmap, Whether the identification page is available ;
  • The final call kvm_set_memslot To set up and update slot Information ;

2.2.1 kvm_set_memslot

Concrete memslot The settings are in kvm_set_memslot Done in the function ,slot The operation process is as follows :

  • First assign a new memslots, And put the original memslots Copy content to new memslots in ;
  • If it is aimed at slot The operation is to delete or move , First of all, according to the old slot id Number from memslots Find the original slot, Will be slot Set to unavailable , then memslots Install it back . This installation means , Namely RCU Of assignment operation , Don't understand this , It's suggested to take a look at the previous RCU Series articles . because slot Not available , Need to be lifted stage2 Mapping ;
  • kvm_arch_prepare_memory_region function , Used to deal with new slot May span multiple user processes VMA Regional issues , If it's an equipment area , You also need to map this area to the Guest IPA in ;
  • update_memslots Used to update the entire memslots,memslots be based on PFN To sort things out , add to 、 Delete 、 Movement and other operations are based on this condition . Because it's all orderly , So you can choose dichotomy to do the search operation ;
  • New... Will be added slot After memslots Install back KVM in ;
  • kvfree Used to transfer the original memslots release ;

2.2.2 kvm_delete_memslot

kvm_delete_memslot function , It's actually called kvm_set_memslot function , It's just slot The operation of is set to KVM_MR_DELETE nothing more , I won't repeat .


There's light GPA->HVA, It seems to be still with Hypervisor It doesn't matter , How to access physical memory ? It seems that I haven't seen how to create page table mapping ?
Come with me. , Start with questions !

Memory management related articles mentioned , Assign virtual address in user mode program vma after , The mapping between actual and physical memory is in page fault It's going on . So the same thing , We can follow this idea to find out if HVA->HPA The mapping of is also created during exception handling ? The answer is obvious .

Let's review the previous article 《Linux virtualization KVM-Qemu analysis ( Four ) And CPU virtualization (2)》 A picture of :

  • When user mode triggers kvm_arch_vcpu_ioctl_run when , Will make Guest OS To run in Hypervisor On , When Guest OS There is an exception in, exit to Host when , here handle_exit The reason for exit will be dealt with ;

Exception handling functions arm_exit_handlers as follows , Which processing function is selected for the specific call , It's based on ESR_EL2, Exception Syndrome Register(EL2) To determine the value of .

static exit_handle_fn arm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec,
[ESR_ELx_EC_WFx] = kvm_handle_wfx,
[ESR_ELx_EC_CP15_32] = kvm_handle_cp15_32,
[ESR_ELx_EC_CP15_64] = kvm_handle_cp15_64,
[ESR_ELx_EC_CP14_MR] = kvm_handle_cp14_32,
[ESR_ELx_EC_CP14_LS] = kvm_handle_cp14_load_store,
[ESR_ELx_EC_CP14_64] = kvm_handle_cp14_64,
[ESR_ELx_EC_HVC32] = handle_hvc,
[ESR_ELx_EC_SMC32] = handle_smc,
[ESR_ELx_EC_HVC64] = handle_hvc,
[ESR_ELx_EC_SMC64] = handle_smc,
[ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
[ESR_ELx_EC_SVE] = handle_sve,
[ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
[ESR_ELx_EC_WATCHPT_LOW]= kvm_handle_guest_debug,
[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
[ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
[ESR_ELx_EC_BRK64] = kvm_handle_guest_debug,
[ESR_ELx_EC_FP_ASIMD] = handle_no_fpsimd,
[ESR_ELx_EC_PAC] = kvm_handle_ptrauth,

Scan the function table with your big, watery eyes , Find out ESR_ELx_EC_DABT_LOW and ESR_ELx_EC_IABT_LOW Two exceptions , This is not the instruction exception and data exception , We made a bold guess ,HVA->HPA The mapping is built in kvm_handle_guest_abort Function .

3.1 kvm_handle_guest_abort

Let's add some knowledge first , It is more convenient to understand the following content :

  1. Guest OS When a sensitive instruction is executed , produce EL2 abnormal ,CPU Switch mode and jump to EL2 Of el1_syncarch/arm64/kvm/hyp/entry-hyp.S) Abnormal entry ;
  2. CPU Of ESR_EL2 The register records the cause of the exception ;
  3. Guest Exit to kvm after ,kvm According to the cause of the abnormal processing .

Take a brief look at ESR_EL2 register :

  • EC:Exception class, Exception class , Used to identify the cause of the exception ;
  • ISS:Instruction Specific Syndrome,ISS The domain defines more detailed exception details ;
  • stay kvm_handle_guest_abort Function , It is necessary to judge and handle the exception in many places ;

kvm_handle_guest_abort function , Handling address access exceptions , It can be divided into two categories :

  1. General memory access exception , Including no page table mapping 、 Read and write permissions, etc ;
  2. IO Memory access exception ,IO Simulation of is usually required Qemu To simulate ;

Have a look first kvm_handle_guest_abort Function comments :

* kvm_handle_guest_abort - handles all 2nd stage aborts
* Any abort that gets to the host is almost guaranteed to be caused by a
* missing second stage translation table entry, which can mean that either the
* guest simply needs more memory and we must allocate an appropriate page or it
* can mean that the guest tried to access I/O memory, which is emulated by user
* space. The distinction is based on the IPA causing the fault and whether this
* memory region has been registered as standard RAM by user space.
  • arrive Host Of abort It's all due to a lack of Stage 2 Page table conversion entry results in , This could be Guest More memory needs to be allocated and memory pages must be allocated for it , Or it could be Guest Try to visit IO Space ,IO Operations are simulated by user space . The difference between the two is that it triggers an exception IPA Whether the address has been registered as standard in user space RAM;

Here comes the calling process :

  • kvm_vcpu_trap_get_fault_type Used to get ESR_EL2 Data exception and instruction exception fault status code, That is to say ESR_EL2 Of ISS Domain ;
  • kvm_vcpu_get_fault_ipa Used to get the trigger exception IPA Address ;
  • kvm_vcpu_trap_is_iabt Used to get exception classes , That is to say ESR_EL2 Of EC, And judge whether it is ESR_ELx_IABT_LOW, That is, the instruction exception type ;
  • kvm_vcpu_dabt_isextabt Used to determine whether it is a synchronous external exception , In the case of synchronous external exceptions , If the support RAS,Host Can handle the exception , There is no need to inject exceptions into Guest;
  • Abnormal if not FSC_FAULT,FSC_PERM,FSC_ACCESS Three types of words , Direct return error ;
  • gfn_to_memslot,gfn_to_hva_memslot_prot These two functions , It's based on IPA To get the corresponding memslot and HVA Address , This place corresponds to the establishment of address relationship in Chapter 2 above , Because of the connection , You can go through IPA To find the corresponding HVA;
  • If you register RAM, You can get the right HVA, If it is IO Memory access , that HVA Will be set to KVM_HVA_ERR_BAD.kvm_is_error_hva perhaps (write_fault && !writable) For two kinds of mistakes :1) Command error , towards Guest Injection instruction exception ;2)IO Access error ,IO There are two kinds of access :2.1)Cache Maintenance instructions , Skip the instruction directly ;2.2) natural IO Operating instructions , call io_mem_abort Conduct IO Simulation operation ;
  • handle_access_fault Used to handle access rights issues , If the memory page cannot be accessed , Update their permissions ;
  • user_mem_abort, To allocate more memory , Actually, it's about finishing Stage 2 The establishment of page table mapping , According to abnormal IPA Address , Already corresponding HVA, Building mapping , The details are not shown .

The context is clear , Let's end it in a hurry , See you next time .

Reference resources

《Arm Architecture Registers Armv8, for Armv8-A architecture profile》

Welcome to follow individual public number , Share technical articles from time to time .

本文为[King Luoyan]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云