[hellogcc] Re: [投稿] QEMU Internal - Block Chaining 1/3

From: Yao Qi <qiyaoltc@xxxxxxxxx>
To: hellogcc@xxxxxxxxxxxxx
Date: Sat, 1 Oct 2011 22:08:26 +0800
On Sat, Oct 01, 2011 at 07:00:29PM +0800, 陳韋任 wrote:
>            Copyright (c) 2011 陳韋任 (Chen Wei-Ren)
> 

我对qemu一窍不通，有些低级的问题。

> 2. Block Chaining
> 
>   先複習一下 QEMU 的流程，目前的情況如底下這樣:
> 
>     QEMU -> code cache -> QEMU -> code cache -> ...
> 
> 之前提到，QEMU 是以一個 translation block 為單位進行翻譯並執行。
> 這也就是說，每當在 code cache 執行完一個 translation block 之後
> ，控制權必須交還給 QEMU。這很沒有效率。理想情況下，大部分時間
> 應該花在 code cache 中執行，只要在必要情況下才需要回到 QEMU。
> 基本想法是把 code cache 中的 translation block 串接起來。只要
> translation block 執行完後，它的跳躍目標確定且該跳躍目標也已經
> 在 code cache 裡，那我們就把這兩個 translation block 串接起來。
> 這個就叫做 block chaining/linking。
> 

这个在qemu是能控制的吗？ block chaining/linking 可以打开或者关闭吗？

>   在 QEMU 中，達成 block chaining 有兩種做法: 第一，採用 direct
> jump。此法直接修改 code cache 中分支指令的跳躍目標，因此依據
> host 有不同的 patch 方式。第二，則是透過修改 TranslationBlock
> 的 tb_next 欄位達成 block chaining。exec-all.h 中定義那些 host
> 支援使用 direct jump。
> 
>   這裡只介紹 direct jump。我們先回到 cpu_exec (cpu-exec.c) 的內
> 層迴圈。
> 
>   tb = tb_find_fast(env);
>  
>   if (tb_invalidated_flag) {
>       next_tb = 0; // 注意! next_tb 也被用來控制是否要做 block chaining。
>       tb_invalidated_flag = 0;
>   }
>  
>   // 注意!! next_tb 的名字會讓人誤解。block chaining 的方向為: next_tb -> tb。
>   // next_tb 不為 NULL 且 tb (guest binary) 不跨 guest page 的話，做 block
>   // chaining。原因之後再講。
>   if (next_tb != 0 && tb->page_addr[1] == -1) {
>       // 這邊利用 TranlationBlock 指針的最低有效位後兩位指引 block chaining 的方向。
>       tb_add_jump((TranslationBlock *)(next_tb & ~3), next_tb & 3, tb);
>   }
>  
>   // 執行 TB，也就是 tc_ptr 所指到的位址。注意，產生 TCG IR 的過程中，在 block 的最後會是
>   // exit_tb addr，此 addr 是正在執行的 TranslationBlock 指針，同時也是 tcg_qemu_tb_exec 的
>   // 回傳值。該位址後兩位會被填入 0、1 或 2 以指示 block chaining 的方向。
>   next_tb = tcg_qemu_tb_exec(tc_ptr);
> 
> 是不是有點暈頭轉向? 我們來仔細檢驗 guest binary -> TCG IR -> host binary 到底怎麼做的。
> 在一個 translation block 的結尾，TCG 都會產生 TCG IR "exit_tb"。我們來看看。:-)
> 
>   - tcg_gen_exit_tb (tcg/tcg-op.h) 呼叫 tcg_gen_op1i 生成 TCG IR，其 opcode 為
> INDEX_op_exit_tb (還記得 tcg.i 裡的 TCGOpcode 嗎?)，operand 為 val。
> 
> // 一般是這樣呼叫: tcg_gen_exit_tb((tcg_target_long)tb + tb_num);
> // 注意! 請留意其參數: (tcg_target_long)tb + tb_num。
> static inline void tcg_gen_exit_tb(tcg_target_long val)
> {
>     // 將 INDEX_op_exit_tb 寫入 gen_opc_buf; val 寫入 gen_opparam_buf。
>     tcg_gen_op1i(INDEX_op_exit_tb, val);
> }
> 

不管是否支持block chaining/linking，这里应该都是这样吧。

>   - tcg/ARCH/tcg-target.c 根據 TCG IR 產生對應 host binary。以 i386 為例:
>     (關於每一行的作用，我是憑經驗用猜的。如有錯請指正。)
> 
> static inline void tcg_out_op(TCGContext *s, int opc,
>                               const TCGArg *args, const int *const_args)
> {
>     // 總和效果: 返回 QEMU。
>     //           next_tb = tcg_qemu_tb_exec(tc_ptr);
>     //
>     // 圖示:
>     // struct TranslationBlock*         code cache
>     //        next_tb->tc_ptr    ->        tb
>     //
>     // next_tb 的末兩位編碼 next_tb 條件分支的方向。
>     // 等待下一次迴圈取得 tb = tb_find_fast()，
>     // 試圖做 block chaining: next_tb -> tb 
>     //
>     case INDEX_op_exit_tb:
>         // QEMU 把跳至 code cache 執行當作是函式呼叫，EAX 存放返回值。
>         // 將 val 寫進 EAX，val 是 (tcg_target_long)tb + tb_num。 
>         tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, args[0]);
> 

这里相当于生成了这样一条指令 mov eax, #(tcg_target_long)tb + tb_num ?

>         // e9 是 jmp 指令，後面的 operand 為相對偏移量，將會加上 eip。
>         // 底下兩條的總和效果是跳回 code_gen_prologue 中 prologue 以後的位置。
>         tcg_out8(s, 0xe9); /* jmp tb_ret_addr */

这里可以多介绍一下jmp指令，可能跟好理解。jmp指令一共五个字节，第一个字节是opcode，
应该就是 e9，这里生成一个字节的opcode

>         // tb_ret_addr 在 tcg_target_qemu_prologue 初始成指向 code_gen_prologue
>         // 中 prologue 以後的位置。
>         // 生成 host binary 的同時，s->code_ptr 會移向下一個 code buffer 的位址。
>         // 所以要減去 4。
>         tcg_out32(s, tb_ret_addr - s->code_ptr - 4);

然后生成跳转地址偏移 tb_ret_addr - s->code_ptr - 4，所以就是这样一条指令

  jmp （tb_ret_addr - s->code_ptr - 4）

这个 - 4 我不是和明白，是因为上边那条 mov eax, #((tcg_target_long)tb +
tb_num)，所以 - 4 吗？


这里想到于生成了这样的两个指令


   mov eax, #(tcg_target_long)tb + tb_num
   jmp （tb_ret_addr - s->code_ptr - 4）

jmp过去以后，eax的值有什么用吗？ 

>         break;
> }
> 
>   o tcg_out_movi 將 arg 移至 ret 代表的暫存器。
>

这个有什么用吗？

 
> static inline void tcg_out_movi(TCGContext *s, TCGType type,
>                                 int ret, int32_t arg)
> {
>     if (arg == 0) {
>         /* xor r0,r0 */
>         tcg_out_modrm(s, 0x01 | (ARITH_XOR << 3), ret, ret);
>     } else {
>         // move arg 至 ret
>         tcg_out8(s, 0xb8 + ret); // 0xb8 為 move，ret 代表目地暫存器。0xb8 + ret 合成一個 
> opcode。

这里是不是  0xb8 | ret 更好理解一些？

>         tcg_out32(s, arg);
>     }
> }
> 

-- 
Yao Qi <qiyaoltc AT gmail DOT com>

If two people love each other, there can be no happy end to it.
                -- Ernest Hemingway
Follow-Ups:
- [hellogcc] Re: [投稿] QEMU Internal - Block Chaining 1/3
  - From: 陳韋任
References:
- [hellogcc] [投稿] QEMU Internal - Block Chaining 1/3
  - From: 陳韋任
[hellogcc] Re: [投稿] QEMU Internal - Block Chaining 1/3

Other related posts: