关于qemu的二三事(6)————qemu源码分析之vcpu


在前面的文章里面有说过,我在qemu的源码根目录建了个新路径专门来作为分析源码和debug之用。

好了,现在我们打开这个新路径:qemu/bin/debug/native

看过之前文章 关于qemu的二三事(4)————qemu源码的下载与编译,以及fdt  就知道,我再这个路径之下编译了qemu的源码。本来空空如也的文件夹,现在里面已经被填塞了一堆东西:

[root@localhost qemu]# ls bin/debug/native/
  chardev                  hmp.d           pc-bios              qemu-bridge-helper    qmp-commands.h    tests
accel.d            config-all-devices.mak   hmp.o           po                   qemu-bridge-helper.d  qmp.d             tpm.d
accel.o            config-all-disas.mak     hw              qapi                 qemu-bridge-helper.o  qmp-introspect.c  tpm.o
audio              config-host.h            io              qapi-event.c         qemu-ga               qmp-introspect.d  trace
backends           config-host.h-timestamp  iothread.d      qapi-event.d         qemu-img              qmp-introspect.h  trace-events-all
block              config-host.mak          iothread.o      qapi-event.h         qemu-img-cmds.h       qmp-introspect.o  trace-root.c
block.d            config.log               ivshmem-client  qapi-event.o         qemu-img.d            qmp-marshal.c     trace-root.c-timestamp
blockdev.d         config.status            ivshmem-server  qapi-generated       qemu-img.o            qmp-marshal.d     trace-root.d
blockdev-nbd.d     contrib                  libqemustub.a   qapi-types.c         qemu-io               qmp-marshal.o     trace-root.h
blockdev-nbd.o     cpus-common.d            libqemuutil.a   qapi-types.d         qemu-io-cmds.d        qmp.o             trace-root.h-timestamp
blockdev.o         cpus-common.o            linux-headers   qapi-types.h         qemu-io-cmds.o        qobject           trace-root.o
blockjob.d         crypto                   linux-user      qapi-types.o         qemu-io.d             qom               ui
blockjob.o         device-hotplug.d         Makefile        qapi-visit.c         qemu-io.o             replay            util
block.o            device-hotplug.o         migration       qapi-visit.d         qemu-nbd              replication.d     vl.d
     disas                    module_block.h  qapi-visit.h         qemu-nbd.d            replication.o     vl.o
bt-host.d          dma-helpers.d            nbd             qapi-visit.o         qemu-nbd.o            roms              x86_64-softmmu
bt-host.o          dma-helpers.o            net             qdev-monitor.d       qemu-options.def      slirp             x86_64-softmmu-config-devices.mak.d
bt-vhci.d          docs                     os-posix.d      qdev-monitor.o       qemu-version.h        stubs
bt-vhci.o          fsdev                    os-posix.o      qdict-test-data.txt  qga                   target
[root@localhost qemu]#

在这个里面的x86_64-softmmu之下的qemu-system-x86_64就是编译出来的可执行文件。好了,现在gdb搞起!

首先看到的就是vl.c文件里面的main函数,main函数是在2971行:

2968     return 0;
2969 }
2970
2971 int main(int argc, char **argv, char **envp)
2972 {
2973     int i;
2974     int snapshot, linux_boot;
2975     const char *initrd_filename;
2976     const char *kernel_filename, *kernel_cmdline;
2977     const char *boot_order = NULL;
2978     const char *boot_once = NULL;
2979     DisplayState *ds;
2980     int cyls, heads, secs, translation;
2981     QemuOpts *opts, *machine_opts;
2982     QemuOpts *hda_opts = NULL, *icount_opts = NULL, *accel_opts = NULL;
2983     QemuOptsList *olist;
... ...
 
gdb来调试首先要干的事是什么?

打断点啊!

断点打在哪里是门学问。合理的设置断点有助于提高程序调试的效率和速度,闲话少说,我们的第一个断点该设在哪里?

把vl.c里面main函数里面的内容大致过一遍,发现前面很大篇幅都是一些变量、数组、结构体的初始化、一些函数的注册,参数的解析,一直到4082行总算初步参数解析完了,这里只是初步,因为带有子选项的参数还没解析好,或者是说还没有做更进一步的处理,举个简单例子就是后面4201行的smp的参数解析和处理:

4201     smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL));
4202
4203     machine_class->max_cpus = machine_class->max_cpus ?: 1; /* Default to UP */
4204     if (max_cpus > machine_class->max_cpus) {
4205         error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
4206                      "supported by machine '%s' (%d)", max_cpus,
4207                      machine_class->name, machine_class->max_cpus);
4208         exit(1);

现在我们先不管这些,继续回到4082行往下看,下面都是一些虚拟机参数的初始化、设备和文件的注册、检查,完全没看到vcpu的create相关的东西嘛,不要急,慢慢来,过了4201的smp的检查继续往下看。下面好像也没有什么靠谱的东西,都是一些检查啊设置啊什么的,比如说display啊deamonize啊串口serious啊cdrom啊什么乱七八糟的,这都不是我们关注的重点,我们关注的重点还是vcpu相关的东西。

继续往下看,到4400行左右的时候,我们能看到一些关于guest OS的boot相关的代码:

4399     machine_opts = qemu_get_machine_opts();
4400     kernel_filename = qemu_opt_get(machine_opts, "kernel");
4401     initrd_filename = qemu_opt_get(machine_opts, "initrd");
4402     kernel_cmdline = qemu_opt_get(machine_opts, "append");
4403     bios_name = qemu_opt_get(machine_opts, "firmware");
4404
4405     opts = qemu_opts_find(qemu_find_opts("boot-opts"), NULL);
4406     if (opts) {
4407         boot_order = qemu_opt_get(opts, "order");
4408         if (boot_order) {
4409             validate_bootdevices(boot_order, &error_fatal);
4410         }
... .... 
这说明离我们要找的东西已经不远了,继续。看到4587行附近发现guest OS的初始化基本完成,要开始创建了,后边几行就是一些硬件设备的初始化了:

4587     current_machine->ram_size = ram_size;
4588     current_machine->maxram_size = maxram_size;
4589     current_machine->ram_slots = ram_slots;
4590     current_machine->boot_order = boot_order;
4591     current_machine->cpu_model = cpu_model;
4592
4593     machine_run_board_init(current_machine);
4594
4595     realtime_init();
4596
... ...
 
再到4701行的:

4700
4701     qdev_machine_creation_done();
基本可以断定vcpu的创建和初始化就在4593行的展开里面。这里打上断点。好了gdb正式搞起。

现在在gdb里面跑一个最简单的命令:

r  --enable-kvm -smp 2 -m 2048M -cpu host -hda /root/test/rhel7.qcow -monitor stdio
这时候直接C走到断点,进入断点:

4593     machine_run_board_init(current_machine);
进去之后是这样的:

736 void machine_run_board_init(MachineState *machine)
737 {
738     MachineClass *machine_class = MACHINE_GET_CLASS(machine);
739
740     if (nb_numa_nodes) {
741         machine_numa_validate(machine);
742     }
743     machine_class->init(machine);
744 }

然后我们继续进到
machine_class->init(machine);
它里面,看到:

(gdb) s
pc_init_v2_10 (machine=0x555556720000) at /root/qemu-2017-0531/qemu/hw/i386/pc_piix.c:449
449     DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
(gdb) l
444         m->alias = "pc";
445         m->is_default = 1;
446         m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
447     }
448
449     DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
450                           pc_i440fx_2_10_machine_options);
451
我们再去看看这个449行的宏是个什么东西:

 419
 420 #define DEFINE_I440FX_MACHINE(suffix, name, compatfn, optionfn) \
 421     static void pc_init_##suffix(MachineState *machine) \
 422     { \
 423         void (*compat)(MachineState *m) = (compatfn); \
 424         if (compat) { \
 425             compat(machine); \
 426         } \
 427         pc_init1(machine, TYPE_I440FX_PCI_HOST_BRIDGE, \
 428                  TYPE_I440FX_PCI_DEVICE); \
 429     } \
 430     DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
 431

显然最终执行的是pc_init1这个函数,那么再进入到它里面,可以看到在150行有个pc_cpus_init 的函数,继续进去,可以看到里面pc_cpus_init:

1137
1138 void pc_cpus_init(PCMachineState *pcms)
1139 {
1140     int i;
1141     CPUClass *cc;
1142     ObjectClass *oc;
1143     const char *typename;
1144     gchar **model_pieces;
1145     const CPUArchIdList *possible_cpus;
1146     MachineState *machine = MACHINE(pcms);
1147     MachineClass *mc = MACHINE_GET_CLASS(pcms);
1148
1149     /* init CPUs */
1150     if (machine->cpu_model == NULL) {
1151 #ifdef TARGET_X86_64
1152         machine->cpu_model = "qemu64";
... ... 
...
...
1182     possible_cpus = mc->possible_cpu_arch_ids(machine);
1183     for (i = 0; i < smp_cpus; i++) {
1184         pc_new_cpu(typename, possible_cpus->cpus[i].arch_id, &error_fatal);
1185     }


我们想找的东西,他就在这个函数pc_new_cpu里。在此之前的都是一些关于vcpu的参数配置啊类型啊什么的,gdb进去1184行这里,我们可以看到:

pc_new_cpu (typename=0x555556699690 "qemu64-x86_64-cpu", apic_id=0, errp=0x555556683790 ) at /root/qemu-2017-0531/qemu/hw/i386/pc.c:1097

这个pc_new_cpu展开之后是这样的:、

1096    static void pc_new_cpu(const char *typename, int64_t apic_id, Error **errp)
1097    {
1098        Object *cpu = NULL;
1099        Error *local_err = NULL;
1100
1101        cpu = object_new(typename);
(gdb)
1102
1103        object_property_set_int(cpu, apic_id, "apic-id", &local_err);
1104        object_property_set_bool(cpu, true, "realized", &local_err);
1105
1106        object_unref(cpu);
1107        error_propagate(errp, local_err);
1108    }
1109
继续gdb单步,我们发现1103行执行之后没啥变化,但是1104行执行之后,会有新的线程产生,考虑到qemu本身就是一个userspace的程序,与kvm的交互实际上是通过接口kvm_ioctrl来读写/dev/kvm来实现的,那么qemu启动的虚拟机实际上就是一个进程,而vcpu则是这个进程下面的子线程。

那么,我们有理由认为,vcpu的创建与初始化是在第1104行完成的。继续gdb进去,

object_property_set_bool (obj=0x555556779580, value=true, name=0x555555c453d0 "realized", errp=0x7fffffffdd68) at /root/qemu-2017-0531/qemu/qom/object.c:1162
1162        QBool *qbool = qbool_from_bool(value);
1158
1159    void object_property_set_bool(Object *obj, bool value,
1160                                  const char *name, Error **errp)
1161    {
1162        QBool *qbool = qbool_from_bool(value);
1163        object_property_set_qobject(obj, QOBJECT(qbool), name, errp);
1164
1165        QDECREF(qbool);
1166    }
继续gdb进入1163,

object_property_set_qobject (obj=0x555556779580, value=0x555556794a10, name=0x555555c453d0 "realized", errp=0x7fffffffdd68)
    at /root/qemu-2017-0531/qemu/qom/qom-qobject.c:26
26          v = qobject_input_visitor_new(value);
(gdb) l
21      void object_property_set_qobject(Object *obj, QObject *value,
22                                       const char *name, Error **errp)
23      {
24          Visitor *v;
25
26          v = qobject_input_visitor_new(value);
27          object_property_set(obj, v, name, errp);
28          visit_free(v);
29      }
继续到第27,

object_property_set (obj=0x555556779580, v=0x555556796150, name=0x555555c453d0 "realized", errp=0x7fffffffdd68) at /root/qemu-2017-0531/qemu/qom/object.c:1086
1086        ObjectProperty *prop = object_property_find(obj, name, errp);
1083    void object_property_set(Object *obj, Visitor *v, const char *name,
1084                             Error **errp)
1085    {
1086        ObjectProperty *prop = object_property_find(obj, name, errp);
1087        if (prop == NULL) {
1088            return;
1089        }
1090
(gdb)
1091        if (!prop->set) {
1092            error_setg(errp, QERR_PERMISSION_DENIED);
1093        } else {
1094            prop->set(obj, v, name, prop->opaque, errp);
1095        }
1096    }
然后再到1094,

property_set_bool (obj=0x555556779580, v=0x555556796150, name=0x555555c453d0 "realized", opaque=0x55555673e240, errp=0x7fffffffdd68)
    at /root/qemu-2017-0531/qemu/qom/object.c:1849
1849    {
(gdb) l
1844        visit_type_bool(v, name, &value, errp);
1845    }
1846
1847    static void property_set_bool(Object *obj, Visitor *v, const char *name,
1848                                  void *opaque, Error **errp)
1849    {
1850        BoolProperty *prop = opaque;
1851        bool value;
1852        Error *local_err = NULL;
1853
(gdb)
1854        visit_type_bool(v, name, &value, &local_err);
1855        if (local_err) {
1856            error_propagate(errp, local_err);
1857            return;
1858        }
1859
1860        prop->set(obj, value, errp);
1861    }
到1860,

device_set_realized (obj=0x555556779580, value=true, errp=0x7fffffffdd68) at /root/qemu-2017-0531/qemu/hw/core/qdev.c:879
879     {
(gdb) l
874
875         return true;
876     }
877
878     static void device_set_realized(Object *obj, bool value, Error **errp)
879     {
880         DeviceState *dev = DEVICE(obj);
881         DeviceClass *dc = DEVICE_GET_CLASS(dev);
882         HotplugHandler *hotplug_ctrl;
883         BusState *bus;
(gdb)
一直往下走,到917行,

915
916             if (dc->realize) {
917                 dc->realize(dev, &local_err);
918             }
919
920             if (local_err != NULL) {
进去917,

x86_cpu_realizefn (dev=0x555556779580, errp=0x7fffffffdbb0) at /root/qemu-2017-0531/qemu/target/i386/cpu.c:3487
3487    {
(gdb) l
3482                               (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
3483    #define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
3484                             (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
3485                             (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
3486    static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
3487    {
3488        CPUState *cs = CPU(dev);
3489        X86CPU *cpu = X86_CPU(dev);
3490        X86CPUClass *xcc = X86_CPU_GET_CLASS(dev);
3491        CPUX86State *env = &cpu->env;
(
这时候我们看到一个函数,x86_cpu_realizefn,在这个函数的展开里面,第3648行,这里,qemu如何创建vcpu终于露出真容了,

3648        qemu_init_vcpu(cs);


 (gdb) s
qemu_init_vcpu (cpu=0x555556779580) at /root/qemu-2017-0531/qemu/cpus.c:1750
1750        cpu->nr_cores = smp_cores;
1748    void qemu_init_vcpu(CPUState *cpu)
1749    {
1750        cpu->nr_cores = smp_cores;
1751        cpu->nr_threads = smp_threads;
1752        cpu->stopped = true;
1753
1754        if (!cpu->as) {
(gdb)
1755            /* If the target cpu hasn't set up any address spaces itself,
1756             * give it the default one.
1757             */
1758            AddressSpace *as = address_space_init_shareable(cpu->memory,
1759                                                            "cpu-memory");
1760            cpu->num_ases = 1;
1761            cpu_address_space_init(cpu, as, 0);
1762        }
1763
1764        if (kvm_enabled()) {
(gdb)
1765            qemu_kvm_start_vcpu(cpu);
1766        } else if (hax_enabled()) {
1767            qemu_hax_start_vcpu(cpu);
1768        } else if (tcg_enabled()) {
1769            qemu_tcg_init_vcpu(cpu);
1770        } else {
1771            qemu_dummy_start_vcpu(cpu);
1772        }
1773    } 
第1764行开始,就是vcpu的创建过程,在enablekvm的情况下,调用1765行的qemu_kvm_start_vcpu,那么我们来看一下这个函数:

qemu_kvm_start_vcpu (cpu=0x555556779580) at /root/qemu-2017-0531/qemu/cpus.c:1717

1715
1716    static void qemu_kvm_start_vcpu(CPUState *cpu)
1717    {
1718        char thread_name[VCPU_THREAD_NAME_SIZE];
1719
1720        cpu->thread = g_malloc0(sizeof(QemuThread));
1721        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
(gdb)
1722        qemu_cond_init(cpu->halt_cond);
1723        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
1724                 cpu->cpu_index);
1725        qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
1726                           cpu, QEMU_THREAD_JOINABLE);
1727        while (!cpu->created) {
1728            qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
1729        }
1730    }
喏,现在看清楚了吧,vcpu就是个线程,1725的qemu_thread_create我们再进去看看:

qemu_thread_create (thread=0x5555567a2210, name=0x7fffffffdaa0 "CPU 0/KVM", start_routine=0x555555791756 , arg=0x555556779580, mode=0)
    at /root/qemu-2017-0531/qemu/util/qemu-thread-posix.c:468
465     void qemu_thread_create(QemuThread *thread, const char *name,
466                            void *(*start_routine)(void*),
467                            void *arg, int mode)
468     {
469         sigset_t set, oldset;
470         int err;
471         pthread_attr_t attr;
472
473         err = pthread_attr_init(&attr);
474         if (err) {
475             error_exit(err, __func__);
476         }
477
478         /* Leave signal handling to the iothread.  */
479         sigfillset(&set);
480         pthread_sigmask(SIG_SETMASK, &set, &oldset);
481         err = pthread_create(&thread->thread, &attr, start_routine, arg);
482         if (err)
(gdb)
483             error_exit(err, __func__);
484
485         if (name_threads) {
486             qemu_thread_set_name(thread, name);
487         }
488
489         if (mode == QEMU_THREAD_DETACHED) {
490             err = pthread_detach(thread->thread);
491             if (err) {
492                 error_exit(err, __func__);
(gdb)
493             }
494         }
495         pthread_sigmask(SIG_SETMASK, &oldset, NULL);
496
497         pthread_attr_destroy(&attr);
498     }
然后我们再来看一下这个qemu_kvm_cpu_thread_fn,在它里面的kvm_init_vcpu才是在enablekvm情况下最终来由kvm来完成的部分:

1092 static void *qemu_kvm_cpu_thread_fn(void *arg)
1093 {
1094     CPUState *cpu = arg;
1095     int r;
1096
1097     rcu_register_thread();
1098
1099     qemu_mutex_lock_iothread();
1100     qemu_thread_get_self(cpu->thread);
1101     cpu->thread_id = qemu_get_thread_id();
1102     cpu->can_do_io = 1;
1103     current_cpu = cpu;
1104
1105     r = kvm_init_vcpu(cpu);
1106     if (r < 0) {
1107         fprintf(stderr, "kvm_init_vcpu failed: %s\n", strerror(-r));
1108         exit(1);
1109     }
1110
1111     kvm_init_cpu_signals(cpu);
这里就不对kvm_init_vcpu来多做展开了。

然后我们让程序执行到底,发现:

[New Thread 0x7fffeffff700 (LWP 16755)]

Continuing.
[New Thread 0x7fffeffff700 (LWP 16756)]
[New Thread 0x7fffee1ff700 (LWP 16758)]
[New Thread 0x7fffed9fe700 (LWP 16759)]
VNC server running on ::1:5900
(qemu) info cpus* CPU #0: pc=0x00000000000082ea thread_id=16555 
CPU #1: pc=0x00000000000fd406 (halted) thread_id=16756
(qemu) [Thread 0x7fffee1ff700 (LWP 16758) exited]

说明我们创建的两个vcpu线程号分别是16555和16756,然后我们用pstree来检查一下:

[root@localhost ~]# ps -ef | grep qemu
root     13695 13557  0 Jun07 pts/1    00:00:10 gdb x86_64-softmmu/qemu-system-x86_64
root     15616 13695  0 00:31 pts/1    00:00:16 /root/qemu/bin/debug/native/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 2 -m 2048M -hda /root/test/rhel7_cpu2006.qcow -monitor stdio
root     16779 14422  0 01:25 pts/5    00:00:00 grep --color=auto qemu
[root@localhost ~]# pstree -p 15616
qemu-system-x86(15616)─┬─{qemu-system-x86}(15617)  
                        ├─{qemu-system-x86}(16555)
                        ├─{qemu-system-x86}(16756)
                        └─{qemu-system-x86}(16759)
 
  
这些基本上能说明vcpu的性质了,在host看来,线程,线程,还是线程。而且是用户空间的线程。

总的来说,qemu在启动虚拟机的时候,创建vcpu的流程如下:
main(...) ==>machine_run_board_init(current_machine) ==> pc_init(...) ==> pc_init1(...) ==> pc_cpus_init(...) ==> pc_new_cpu(...)
==> object_property_set_bool(...) ==> object_property_set_bool(...) ==> object_property_set(...) ==> property_set_bool ==> device_set_realized
==> x86_cpu_realizefn ==> qemu_init_vcpu ==> qemu_kvm_start_vcpu ==> qemu_thread_create ==> qemu_kvm_cpu_thread_fn ==> kvm_init_vcpu

实际上,上面第二行的真正的代码实现应该是这样的,类似于C++的构造函数:
==>type_init(x86_cpu_register_types)
==>x86_cpu_register_types(void)
==>  type_register_static(&x86_cpu_type_info);
==>  static const TypeInfo x86_cpu_type_info = {}
==>   .class_init = x86_cpu_common_class_init,
==>   x86_cpu_common_class_init(ObjectClass *oc, void *data)
==>  dc->realize = x86_cpu_realizefn;
==>  x86_cpu_realizefn(DeviceState *dev, Error **errp)


仔细看源码会发现,qemu这帮人硬生生的用C语言实现了许多个类,还有他们的构造函数还有一堆模板什么的,我想说的是,你好好的用C++不好吗?

费劲巴拉的绕了一大圈,代码看的别扭死了,后边如果有时间,写一写kvm是如何实现vcpu的吧。



你可能感兴趣的:(Virtualization)