[安全] ADCTF 2024 rkvm 题解
rkvm 题解
说明
下文大概率使用已经恢复命名的符号,是为了方便讲解前后一致
已经重命名的变量或者函数也会在分析后给出命名的理由
因为引用代码已经重命名,推荐对照未重命名的代码进行食用
有时候实际上并不需要把函数或者程序整个逆向,需要逆向的可能只是那一小个加密函数
这样,其它的都可以抄
VM 函数与结构分析
仔细遍历一下 main 调用的非标准库的函数,发现大概遵循以下结构
fn ixoy(args) -> T {
void reg;
rkvm_regset(number, ®, args);
// ...
run(address, len, ®);
return rkvm_regval(number, ®);
}
有两种数据的返回形式,一种是通过返回值返回,另一种是通过指针修改数值
在后期的分析中抽象出来的伪代码函数均使用 返回值返回 的形式
既然是 rkvm_regset 和 rkvm_regval,那很明显和 reg 有关,于是将以指针传入的数据命名为 reg
进入到 rkvm_regset 和 rkvm_regval,发现很明显的相似结构
fn rkvm_regouo(BinRegInfo: int, reg: pointer, ...) {
// 指定寄存器和位数
let regptr: pointer = rkvm_regptr(BinRegInfo, reg);
let regsiz: int = rkil_regsize(BinRegInfo);
// 根据传入的 setval 设置指定寄存器的值
// 或者返回寄存器中存储的值
}
深入到两个使用了 BinRegInfo 的函数里面查看,发现都是根据该变量的值来确定使用哪一个寄存器以及使用该寄存器的低多少位的
根据逻辑可以把这两个函数由 switch 形式转化得更加方便(其实不转变也行,毕竟我们并不需要逆向这个函数,可以抄)
def rkil_regsize(BinRegInfo: int) -> int:
return (1 << (3 - BinRegInfo % 4))
def rkvm_regptr(BinRegInfo: int) -> int:
return (BinRegInfo // 4)
BinRegInfo 是 二进制寄存器信息 的意思,就是说把关于寄存器的信息都压缩到一个二进制数之类的意思
回到 rkvm_regset 和 rkvm_regval
观察其中的结构可以发现它们的作用如下
regptr: 指定使用哪个寄存器regsiz: 指定使用寄存器以及输入值的低 regsiz 个位regset: 将setval的低regsiz个位赋值给*regptrregval: 返回*regptr的低regsiz个位形成的值
因此可以复现这两个函数
def rkvm_regset(BinRegInfo: int, setval: int | str) -> None:
regid = rkvm_regptr(BinRegInfo)
regsz = rkil_regsize(BinRegInfo)
print(f"reg[{regid}] as u{regsz * 8:<2} := ({setval}) as u{regsz * 8:<2}")
def rkvm_regval(BinRegInfo: int) -> str:
regid = rkvm_regptr(BinRegInfo)
regsz = rkil_regsize(BinRegInfo)
return f"reg[{regid}] as u{regsz * 8:<2}"
这里使用了 print 进行输出以及用 str 表示返回值
因为实际上做 vm 题一般都要对操作进行 log,因此对 reg 修改的 log 集成在修改函数里
我认为 log 应该紧跟行为之后,这样语义性好很多
至于返回值使用 str 则涉及到 vm 翻译器运行中值的表示的考量
由于部分值我们通常能从字节码中得到(立即数),另外一部分则是不运行就很难得出或者与输入有关的
不是立即数的部分,我希望它能用符号来表示,自由的符号表示就是 str 了,人能看得懂,也好打印
因此 vm 翻译器中的值类型我设置成了 int | str,表示已知的二进制数值或者符号
接下来回到函数 ixoy 的解析当中
看了一下 run 函数,明显和 vm 运行有关了,vm 的运行都是统一的,main 函数调用的函数大多数都是对 vm 运行的一个包装
即包含了
- 寄存器设置
- 运行 vm (固定位置的 vm 代码)
- 从寄存器中取返回值
三步的包装函数,它们应该有一个统一的命名
我以 ixoy 的格式进行命名,表示 input x args && output y retuan_values 的意思
且由于它们运行固定的 vm 代码,我猜功能应该不会在程序的不同位置改变,因此每一个函数分别分析的方向是合理的
接下来正式看 run
int64_t run(void* bincode_address, uint64_t length, void* reg)
{
FILE* fp = fmemopen(bincode_address, length, "r");
struct VMSPACE vmspace;
rkvme_load(fp, &vmspace);
rkvm_interpret(&vmspace, reg);
rkvme_free(&vmspace);
return fclose(fp);
}
bincode_address 和 length 的命名很明显了,这里一定存放着虚拟机相关的信息
fp 是一个文件指针,指向加载的虚拟机信息,实际上是一块 u8 内存
那么 rkvme_load 自然是将 fp 中的信息解析到 vmspace 中,因此用 vmspace 表示 vm 相关的空间
rkvm_interpret 自然就是解释 vmspace 中可能包含的字节码,这个函数里面通常有字节码的解释,通常是一个 while 配上一个大 switch
rkvme_free 应该是清除 vmspace 创建过程中申请的内存
最后关闭文件指针
接下来看 rkvme_load
uint64_t rkvme_load(FILE* fp, struct VMSPACE* vmspace)
// fread 是 C 标准库函数,声明在 <stdio.h> 中。
// 签名:
// size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
// 功能: 从文件流 stream 中读取 nmemb
// 个元素(每个元素大小为 size 字节),写入到 ptr
// 指向的缓冲区。返回实际成功读取的元素个数。文件位置指示器会随后移已读取的字节数。
{
fread(&vmspace->element_counts, 8, 1, fp);
// 16 * vmspace[1] bytes
// or maybe vmspace[1] u16 s
vmspace->base_address = calloc(vmspace->element_counts, 0x10);
uint64_t n;
for (int64_t i = 0; i < vmspace->element_counts; i += 1)
{
struct Node* lineptr = &vmspace->base_address[i];
lineptr->c_string = 0;
// getdelim(&lineptr->c_string, &n, 0, fp);
getdelim(lineptr, &n, 0, fp);
fread(&lineptr->add_data, 8, 1, fp);
}
uint64_t count = 0x400;
int64_t var_20 = 0x400;
vmspace->vmcode = malloc(0x400);
vmspace->code_size = 0;
uint64_t result; // read once only
do
{
if (var_20 - vmspace->code_size < count)
// ?扩容
vmspace->vmcode = realloc(vmspace->vmcode, (var_20 >> 1) * 3);
n = fread(vmspace->vmcode + vmspace->code_size, 1, count, fp);
vmspace->code_size += n;
result = n;
} while (result >= count);
return result;
}
当没有合适的结构体命名的使用应该是类似 vmspace[0] 这样的指针下标使用
不过通读函数之后大概可以发现它在干嘛
fp中读取一个u64,这个u64应该是元素数量,命名为element_counts,然后下面calloc开了element_counts个大小为 16 bytes 的元素的空间- 然后是从
fp中读取element_counts个元素,每个元素第一位是一个指向字符串的指针,先读一个以0结尾的 c 式字符串,然后在读一个u64,这两个值作为一个 16 bytes 的元素
至此我们可以知道 vmspace 前两个的结构
struct VMSPACE {
base_address : pointer_sz64;
element_counts: u64;
};
同时也知道 base_address 实际上是一个 Node 数组
struct Node {
c_string: pointer_char_sz64;
offset : u64;
};
let base_address: [Node; element_counts] = {};
命名稍后解释
继续分析,下面的这个循环大概是一个带扩容的数据读入,是一直读到 fp 尾的,我猜这就是字节码了,因为前面那个明显不是
可惜这个扩容似乎写错了,没有对 var_20 进行更新,而且我猜读 0x400 的 3/2 并不能容纳下两个 0x400 ?
按下不表,我们仍然可以知道 VMSPACE 的后两个位置,是字节码和字节码长度
struct VMSPACE {
base_address : pointer_sz64;
element_counts: u64;
vmcode : pointer;
code_size : u64;
};
let vmspace: VMSPACE = {};
既然 vmspace 的结构已经解明,那么接下来看 rkvm_interpret 应该会轻松不少
接下来大概会把 rkvm_interpret 拆成几块来讲
uint8_t* rkvm_interpret(struct VMSPACE* vmspace, void* reg)
{
/*
获取字节码的头指针和尾指针,然后开始大循环
*/
uint8_t* vmcode_ptr = vmspace->vmcode;
uint8_t* end_vmcode = vmspace->code_size + vmspace->vmcode;
while (vmcode_ptr < end_vmcode)
{
char var_11_1 = 0;
/*
这样的结构大概就是取一个 u8 的值然后 ++ptr
可以理解为
opcode = *vmcode_ptr;
vmcode_ptr++;
*/
uint8_t* vmcode_2 = vmcode_ptr;
vmcode_ptr = &vmcode_2[1]; // next
uint32_t opcode = (uint32_t)*(uint8_t*)vmcode_2;
uint32_t BinRegInfo;
int64_t set_val;
char* symbol_str_ptr;
/*
一些关于字节码的判定,反正这部分也是抄,不太用管,大概是做了一些分类讨论
*/
if (opcode <= 0x44)
{
if (opcode >= 8)
{
int64_t rdx_2 = 1 << ((uint8_t)opcode - 8);
int64_t rdx_3;
(uint8_t)rdx_3 = 0x220242002222 & rdx_2;
if ((uint8_t)rdx_3)
{
/*
取 BinRegInfo, 得到要操作的 reg 的 size
根据 regsize 在字节码中读取对应长度的立即数
读取过程也是抄就好了
setval 从字节码来
*/
label_40176e:
uint8_t* vmcode_ptr_1 = vmcode_ptr;
vmcode_ptr = &vmcode_ptr_1[1]; // next
BinRegInfo = (uint32_t)*(uint8_t*)vmcode_ptr_1;
char bytes_len = rkil_regsize(BinRegInfo);
// 按照 bytes_len 压缩一个 setval
set_val = 0;
for (char i = 0; i < bytes_len; i += 1)
{
uint8_t* vmcode_ptr_2 = vmcode_ptr;
vmcode_ptr = &vmcode_ptr_2[1];
set_val +=
(int64_t)((uint32_t)*(uint8_t*)vmcode_ptr_2 << i << 3);
}
}
else
{
int64_t rdx_4;
(uint8_t)rdx_4 = 0x110121001111 & rdx_2;
if ((uint8_t)rdx_4)
{
/*
读取了 BinRegInfo 和 setval_BinRegInfo
要操作的目标寄存器由 BinRegInfo 指定
setval 由 setval_BinRegInfo 指定,从对应寄存器中读取
setval 从寄存器中来
*/
label_401734: // next
char* setval_BinRegInfo = &vmcode_ptr[1];
BinRegInfo = (uint32_t)*(uint8_t*)vmcode_ptr;
vmcode_ptr = &setval_BinRegInfo[1]; // next
set_val =
rkvm_regval((uint32_t)*(uint8_t*)setval_BinRegInfo, reg);
}
else
{
int64_t rax_9;
(uint8_t)rax_9 = rdx_2 & 0x1f00000000000000;
if ((uint8_t)rax_9)
{
/*
没有读取 setval
从字节码中读了一个字符串,0结尾
*/
symbol_str_ptr = vmcode_ptr;
vmcode_ptr = &vmcode_ptr[strlen(symbol_str_ptr) + 1];
}
}
}
}
else
{
if (opcode == 5)
goto label_40176e;
if (opcode == 4)
goto label_401734;
if (!opcode)
goto label_401734;
if (opcode == 1)
goto label_40176e;
}
}
/*
上面大概是根据 opcode 类型操作 setval
下面就是实际操作了
一些基础运算,都很简单,看两眼就懂
*/
if (opcode <= 0x44)
switch (jump_table_403220[(uint64_t)opcode])
{
case 0x40180c: // set
{
rkvm_regset(BinRegInfo, reg, set_val);
break;
}
case 0x401826: // add
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) + set_val);
break;
}
case 0x40185c: // minus
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) - set_val);
break;
}
case 0x40188f: // product
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) * set_val);
break;
}
case 0x4018c6: // divide
{
rkvm_regset(BinRegInfo, reg,
COMBINE(0, rkvm_regval(BinRegInfo, reg)) / set_val);
break;
}
case 0x4018fe: // test
{
int64_t rax_52 = rkvm_regval(BinRegInfo, reg);
int64_t var_50_6 = rax_52 - set_val;
*(uint8_t*)((char*)reg + 0x40) = !var_50_6;
*(uint8_t*)((char*)reg + 0x41) = rax_52 < var_50_6;
break;
}
case 0x401949: // bitwise_or
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) | set_val);
break;
}
case 0x40197c: // bitwise_and
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) & set_val);
break;
}
case 0x4019af: // bitwise_xor
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) ^ set_val);
break;
}
case 0x4019e2: // rshift
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) >> (uint8_t)set_val);
break;
}
case 0x401a1a: // lshift
{
rkvm_regset(BinRegInfo, reg,
rkvm_regval(BinRegInfo, reg) << (uint8_t)set_val);
break;
}
case 0x401a4f: // true find_symbol?
{
var_11_1 = 1;
break;
}
case 0x401a55: // 0x40
{
var_11_1 = *(uint8_t*)((char*)reg + 0x40);
break;
}
case 0x401a62: // not 0x40
{
uint32_t rax_82;
(uint8_t)rax_82 = (uint32_t)*(uint8_t*)((char*)reg + 0x40);
var_11_1 = ((uint8_t)rax_82 ^ 1) & 1; // not
break;
}
case 0x401a81: // 0x41
{
var_11_1 = *(uint8_t*)((char*)reg + 0x41);
break;
}
case 0x401a8e: // not 0x41
{
uint32_t rax_89;
(uint8_t)rax_89 = (uint32_t)*(uint8_t*)((char*)reg + 0x41);
var_11_1 = ((uint8_t)rax_89 ^ 1) & 1; // not
break;
}
}
/*
var_11_1 由某些 opcode 指定,由两个标志位确定,标志位由 test opcode 确定
如果 var_11_1 为真,那么查表,确认跳转偏移量,对 vmcode_ptr 进行跳转
*/
if (var_11_1) // jmp symbol->address
{
struct Node* node;
rkvm_find_symbol(vmspace->base_address, vmspace->element_counts,
symbol_str_ptr, &node);
vmcode_ptr = node->offset + vmspace->vmcode;
}
}
return vmcode_ptr;
}
我们看看 rkvm_find_symbol 函数
void** rkvm_find_symbol(void* vmspace_base_address, int64_t vmspace_element_counts, char* symbol_str_ptr, void** node_c_string)
{
*(uint64_t*)node_c_string = vmspace_base_address;
int64_t i = 0;
void** result;
while (true)
{
if (i >= vmspace_element_counts)
{
result = node_c_string;
*(uint64_t*)result = nullptr;
break;
}
result = strcmp(**(uint64_t**)node_c_string, symbol_str_ptr);
if (!(uint32_t)result)
break;
i += 1;
*(uint64_t*)node_c_string += 0x10;
}
return result;
}
遍历 Node 数组,找到字符串比对完全相同的,返回这个 Node 对象
然后跳转过程使用了 Node 的第二个数据作为跳转目标,因此第二个数据命名为 offset
到这里 vm 函数和 vm 结构就差不多分析完了,接下来需要写反编译器,将每一块 vm 字节码的意义解明,同时对应到每一个 ixoy 包装函数中
vm 字节码翻译器
基本上分为这几个部分
- 基础库和工具
- VM 结构编写
- VM 函数的模拟(对就是抄)
- 包装函数的组装
基本上手撕 vm 也是这几步,把函数和结构抄一遍之后在你觉得需要输出的地方打 log,一般是解释字节码的时候
每一条语句(每一个循环)都打印一次,基本上就能知道这段字节码在干嘛
我个人感觉代码写得比较明白了,对着反编译看也很清晰,也有明确分类,不做过多解释
对了 BytesIO 库可以在需要模拟 fp 操作的时候使用
import binaryninja
from io import BytesIO
from dataclasses import dataclass
import struct
from collections import defaultdict
from typing import DefaultDict
#########################################################
# Utils
#########################################################
def u8(x: int) -> int:
return x & 0xFF
def u16(x: int) -> int:
return x & 0xFFFF
def u32(x: int) -> int:
return x & 0xFFFFFFFF
def u64(x: int) -> int:
return x & 0xFFFFFFFFFFFFFFFF
#########################################################
# VM structure
#########################################################
@dataclass
class Node:
c_string: bytes
offset : int
@dataclass
class VMSPACE:
base_address : list[Node]
elements_counts : int
vmcode : bytes
code_size : int
rkvm_interpret_base_address = 0x00401631
jump_table = {
0x00 : rkvm_interpret_base_address+0x1db,
0x01 : rkvm_interpret_base_address+0x1db,
0x02 : rkvm_interpret_base_address+0x47b,
0x03 : rkvm_interpret_base_address+0x47b,
0x04 : rkvm_interpret_base_address+0x1f5,
0x05 : rkvm_interpret_base_address+0x1f5,
0x06 : rkvm_interpret_base_address+0x47b,
0x07 : rkvm_interpret_base_address+0x47b,
0x08 : rkvm_interpret_base_address+0x22b,
0x09 : rkvm_interpret_base_address+0x22b,
0x0a : rkvm_interpret_base_address+0x47b,
0x0b : rkvm_interpret_base_address+0x47b,
0x0c : rkvm_interpret_base_address+0x25e,
0x0d : rkvm_interpret_base_address+0x25e,
0x0e : rkvm_interpret_base_address+0x47b,
0x0f : rkvm_interpret_base_address+0x47b,
0x10 : rkvm_interpret_base_address+0x295,
0x11 : rkvm_interpret_base_address+0x295,
0x12 : rkvm_interpret_base_address+0x47b,
0x13 : rkvm_interpret_base_address+0x47b,
0x14 : rkvm_interpret_base_address+0x2cd,
0x15 : rkvm_interpret_base_address+0x2cd,
0x16 : rkvm_interpret_base_address+0x47b,
0x17 : rkvm_interpret_base_address+0x47b,
0x18 : rkvm_interpret_base_address+0x47b,
0x19 : rkvm_interpret_base_address+0x47b,
0x1a : rkvm_interpret_base_address+0x47b,
0x1b : rkvm_interpret_base_address+0x47b,
0x1c : rkvm_interpret_base_address+0x47b,
0x1d : rkvm_interpret_base_address+0x47b,
0x1e : rkvm_interpret_base_address+0x47b,
0x1f : rkvm_interpret_base_address+0x47b,
0x20 : rkvm_interpret_base_address+0x318,
0x21 : rkvm_interpret_base_address+0x318,
0x22 : rkvm_interpret_base_address+0x47b,
0x23 : rkvm_interpret_base_address+0x47b,
0x24 : rkvm_interpret_base_address+0x47b,
0x25 : rkvm_interpret_base_address+0x34b,
0x26 : rkvm_interpret_base_address+0x34b,
0x27 : rkvm_interpret_base_address+0x47b,
0x28 : rkvm_interpret_base_address+0x37e,
0x29 : rkvm_interpret_base_address+0x37e,
0x2a : rkvm_interpret_base_address+0x47b,
0x2b : rkvm_interpret_base_address+0x47b,
0x2c : rkvm_interpret_base_address+0x47b,
0x2d : rkvm_interpret_base_address+0x47b,
0x2e : rkvm_interpret_base_address+0x47b,
0x2f : rkvm_interpret_base_address+0x47b,
0x30 : rkvm_interpret_base_address+0x3b1,
0x31 : rkvm_interpret_base_address+0x3b1,
0x32 : rkvm_interpret_base_address+0x47b,
0x33 : rkvm_interpret_base_address+0x47b,
0x34 : rkvm_interpret_base_address+0x3e9,
0x35 : rkvm_interpret_base_address+0x3e9,
0x36 : rkvm_interpret_base_address+0x47b,
0x37 : rkvm_interpret_base_address+0x47b,
0x38 : rkvm_interpret_base_address+0x47b,
0x39 : rkvm_interpret_base_address+0x47b,
0x3a : rkvm_interpret_base_address+0x47b,
0x3b : rkvm_interpret_base_address+0x47b,
0x3c : rkvm_interpret_base_address+0x47b,
0x3d : rkvm_interpret_base_address+0x47b,
0x3e : rkvm_interpret_base_address+0x47b,
0x3f : rkvm_interpret_base_address+0x47b,
0x40 : rkvm_interpret_base_address+0x41e,
0x41 : rkvm_interpret_base_address+0x424,
0x42 : rkvm_interpret_base_address+0x431,
0x43 : rkvm_interpret_base_address+0x450,
0x44 : rkvm_interpret_base_address+0x45d,
}
reversed_jump_table: DefaultDict[int, list[int]] = defaultdict(list)
for k, v in jump_table.items():
reversed_jump_table[v].append(k)
#########################################################
# VM functions
#########################################################
def rkil_regsize(BinRegInfo: int) -> int:
return (1 << (3 - BinRegInfo % 4))
def rkvm_regptr(BinRegInfo: int) -> int:
return (BinRegInfo // 4)
def rkvm_regset(BinRegInfo: int, setval: int | str) -> None:
regid = rkvm_regptr(BinRegInfo)
regsz = rkil_regsize(BinRegInfo)
print(f"reg[{regid}] as u{regsz * 8:<2} := ({setval}) as u{regsz * 8:<2}")
def rkvm_regval(BinRegInfo: int) -> str:
regid = rkvm_regptr(BinRegInfo)
regsz = rkil_regsize(BinRegInfo)
return f"reg[{regid}] as u{regsz * 8:<2}"
def rkvme_load(bincode: bytes) -> VMSPACE:
buf = BytesIO(bincode)
element_counts = struct.unpack("<Q", buf.read(8))[0]
base_address = []
for _ in range(element_counts):
s = b""
c = buf.read(1)
while c != b'\0':
s += c
c = buf.read(1)
offset = struct.unpack("<Q", buf.read(8))[0]
base_address.append(Node(s, offset))
vmcode = buf.read()
code_size = len(vmcode)
return VMSPACE(base_address, element_counts, vmcode, code_size)
def rkvm_find_symbol(base_address: list[Node], element_counts: int, symbol_str: bytes) -> int:
for i in range(element_counts):
if base_address[i].c_string == symbol_str:
return base_address[i].offset
return 0
def rkvm_interpret(vmspace: VMSPACE) -> None:
buf = BytesIO(vmspace.vmcode)
while buf.tell() < len(vmspace.vmcode):
IP = buf.tell()
opcode = buf.read(1)[0]
BinRegInfo: int
set_val: int | str
symbol_str: bytes
jmp_flag: str | None = None
if opcode <= 0x44:
if opcode >= 8:
rdx_2 = 1 << (u8(opcode) - 8)
rdx_3 = 0x220242002222 & rdx_2
if rdx_3:
BinRegInfo = buf.read(1)[0]
bytes_len = rkil_regsize(BinRegInfo)
set_val = 0
for i in range(bytes_len):
tmpv = buf.read(1)[0]
set_val += (tmpv << (i << 3))
else:
rdx_4 = 0x110121001111 & rdx_2
if rdx_4:
BinRegInfo = buf.read(1)[0]
sv_BinRegInfo = buf.read(1)[0]
set_val = rkvm_regval(sv_BinRegInfo)
else:
rax_9 = rdx_2 & 0x1f00000000000000
if rax_9:
symbol_str = b""
c = buf.read(1)
while c != b'\0':
symbol_str += c
c = buf.read(1)
else:
if opcode == 5:
BinRegInfo = buf.read(1)[0]
bytes_len = rkil_regsize(BinRegInfo)
set_val = 0
for i in range(bytes_len):
tmpv = u8(buf.read(1)[0])
set_val += (tmpv << (i << 3))
if opcode == 4:
BinRegInfo = buf.read(1)[0]
sv_BinRegInfo = buf.read(1)[0]
set_val = rkvm_regval(sv_BinRegInfo)
if opcode == 0:
BinRegInfo = buf.read(1)[0]
sv_BinRegInfo = buf.read(1)[0]
set_val = rkvm_regval(sv_BinRegInfo)
if opcode == 1:
BinRegInfo = buf.read(1)[0]
bytes_len = rkil_regsize(BinRegInfo)
set_val = 0
for i in range(bytes_len):
tmpv = u8(buf.read(1)[0])
set_val += (tmpv << (i << 3))
if opcode <= 0x44:
ch = jump_table[opcode]
if ch == 0x40180c:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, set_val)
elif ch == 0x401826:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) + ({set_val})")
elif ch == 0x40185c:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) - ({set_val})")
elif ch == 0x40188f:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) * ({set_val})")
elif ch == 0x4018c6:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) / ({set_val})")
elif ch == 0x4018fe:
reg_val = rkvm_regval(BinRegInfo)
print(f"[{IP:03d}]\t", end="")
print(f"reg[{0x40}] := (({set_val}) == ({reg_val})) as u{8}")
print(f"[{IP:03d}]\t", end="")
print(f"reg[{0x41}] := (({reg_val}) < ({set_val})) as u{8}")
elif ch == 0x401949:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) | ({set_val})")
elif ch == 0x40197c:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) & ({set_val})")
elif ch == 0x4019af:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) ^ ({set_val})")
elif ch == 0x4019e2:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) >> ({set_val})")
elif ch == 0x401a1a:
print(f"[{IP:03d}]\t", end="")
rkvm_regset(BinRegInfo, f"({rkvm_regval(BinRegInfo)}) << ({set_val})")
elif ch == 0x401a4f:
jmp_flag = "True"
print(f"[{IP:03d}]\t", end="")
print(f"jmp_flag = {jmp_flag}")
elif ch == 0x401a55:
jmp_flag = f"reg[{0x40}]"
print(f"[{IP:03d}]\t", end="")
print(f"jmp_flag = {jmp_flag}")
elif ch == 0x401a62:
jmp_flag = f"not reg[{0x40}]"
print(f"[{IP:03d}]\t", end="")
print(f"jmp_flag = {jmp_flag}")
elif ch == 0x401a81:
jmp_flag = f"reg[{0x41}]"
print(f"[{IP:03d}]\t", end="")
print(f"jmp_flag = {jmp_flag}")
elif ch == 0x401a8e:
jmp_flag = f"not reg[{0x41}]"
print(f"[{IP:03d}]\t", end="")
print(f"jmp_flag = {jmp_flag}")
if jmp_flag is not None:
print(f"[{IP:03d}]\tif {jmp_flag}:", end="")
offset = rkvm_find_symbol(vmspace.base_address, vmspace.elements_counts, symbol_str)
print(f"\tjmp to [{offset}]")
def run(bv: binaryninja.BinaryView, bincode_address: int, length: int) -> None:
# assume that there is reg
bincode = bv.read(bincode_address, length)
vmspace = rkvme_load(bincode)
# beautiful print for base_address
print("#######################")
print("Symbol-Offset Table: ")
for node in vmspace.base_address:
print(f"{node.c_string}\t:\t{node.offset}")
print("#######################")
rkvm_interpret(vmspace)
# free and free ovo
#########################################################
# Wrapped function
#########################################################
def i4o1(bv: binaryninja.BinaryView) -> None:
print(" d[0] d[1] d[2] d[3] | low bit ")
print("to compress 4 u8 data like above ^")
print()
print("I4O1(data: [u8; 4]) -> u64")
rkvm_regset(3, "data[0]")
rkvm_regset(7, "data[1]")
rkvm_regset(0xb, "data[2]")
rkvm_regset(0xf, "data[3]")
run(bv, 0x4035b0, 0x63)
print(f"return ({rkvm_regval(0)} >> {0x20})")
def i2o1(bv: binaryninja.BinaryView) -> None:
print(" arg1 arg2 | low bit ")
print("to compress 2 u32 into a u64 like above ^")
print()
print("I2O1(arg1: u32, arg2: u32) -> u64")
rkvm_regset(1, "arg1")
rkvm_regset(5, "arg2")
run(bv, 0x403630, 0x15)
print(f"return ({rkvm_regval(0)})")
def i1o2(bv: binaryninja.BinaryView) -> None:
print("[ 0: high 32 bits, 1: low 32 bits]")
print("unpack a u32 list with 2 elements from a u64 like above")
print()
print("I1O2(arg1: u64) -> [u32; 2]")
rkvm_regset(0, "arg1")
run(bv, 0x403660, 0x15)
print(f"return [({rkvm_regval(1)}), ({rkvm_regval(5)})]")
"""
r0 = arg1
r1 = arg2
r2 = arg3
i = 0
r4 = 0
r5 = 7
while i < 8:
r5 = (r5 + 33) & 0xFF
r4 <<= 8
r0.8 = (r0.8 * r1.8) & 0xFF
r0.8 ^= r1.8
r2.8 ^= r0.8
r4.8 |= r2.8
r0 >>= 8
r1 >>= 8
r2 >>= 8
i += 1
return r4
"""
def i3o1(bv: binaryninja.BinaryView) -> None:
print("I3O1(arg1: u64, arg2: u64, arg3: u64) -> u64")
rkvm_regset(0, "arg1")
rkvm_regset(4, "arg2")
rkvm_regset(8, "arg3")
run(bv, 0x403530, 0x68)
print(f"return ({rkvm_regval(0x10)})")
def i6o2(bv: binaryninja.BinaryView) -> None:
print("Standard TEA")
print()
print("I6O2(list1: [u32; 2], list2: [u32; 4]) -> [u32; 2]")
rkvm_regset(0, "i2o1(list2[0], list2[1])")
rkvm_regset(4, "i2o1(list2[2], list2[3])")
rkvm_regset(8, "i2o1(list1[0], list1[1])")
run(bv, 0x403450, 0xcb)
print(f"return i1o2({rkvm_regval(8)})")
#########################################################
# RUN in Binary Ninja to read the translations
#########################################################
翻译代码分析
I4O1
I4O1(data: [u8; 4]) -> u64
reg[0] as u8 := (data[0]) as u8
reg[1] as u8 := (data[1]) as u8
reg[2] as u8 := (data[2]) as u8
reg[3] as u8 := (data[3]) as u8
#######################
Symbol-Offset Table:
#######################
[000] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[010] reg[0] as u8 := (reg[1] as u8 ) as u8
[013] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[023] reg[0] as u8 := (reg[2] as u8 ) as u8
[026] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[036] reg[0] as u8 := (reg[3] as u8 ) as u8
[039] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[049] reg[0] as u8 := (reg[4] as u8 ) as u8
[052] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[062] reg[0] as u8 := (reg[5] as u8 ) as u8
[065] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[075] reg[0] as u8 := (reg[6] as u8 ) as u8
[078] reg[0] as u64 := ((reg[0] as u64) << (8)) as u64
[088] reg[0] as u8 := (reg[7] as u8 ) as u8
return (reg[0] as u64 >> 32)
压缩八个 u8 到一个 u64 上,然后返回高 32 位,本质上是压缩四个 u8 到一个 u32 上
所以注释如下
d[0] d[1] d[2] d[3] | low bit
to compress 4 u8 data like above ^
I2O1
I2O1(arg1: u32, arg2: u32) -> u64
reg[0] as u32 := (arg1) as u32
reg[1] as u32 := (arg2) as u32
#######################
Symbol-Offset Table:
#######################
[000] reg[0] as u64 := ((reg[0] as u64) << (32)) as u64
[010] reg[0] as u32 := (reg[1] as u32) as u32
return (reg[0] as u64)
压缩两个 u32 到一个 u64 上,注释如下
arg1 arg2 | low bit
to compress 2 u32 into a u64 like above ^
I1O2
I1O2(arg1: u64) -> [u32; 2]
reg[0] as u64 := (arg1) as u64
#######################
Symbol-Offset Table:
#######################
[000] reg[1] as u32 := (reg[0] as u32) as u32
[003] reg[0] as u64 := ((reg[0] as u64) >> (32)) as u64
return [(reg[0] as u32), (reg[1] as u32)]
将一个 u64 解包成两个 u32, 注释如下
[ 0: high 32 bits, 1: low 32 bits]
unpack a u32 list with 2 elements from a u64 like above
I3O1
I3O1(arg1: u64, arg2: u64, arg3: u64) -> u64
reg[0] as u64 := (arg1) as u64
reg[1] as u64 := (arg2) as u64
reg[2] as u64 := (arg3) as u64
#######################
Symbol-Offset Table:
b'loop' : 16
#######################
[000] reg[3] as u8 := (0) as u8
[003] reg[4] as u64 := (0) as u64
[013] reg[5] as u8 := (7) as u8
[016] reg[5] as u8 := ((reg[5] as u8 ) + (33)) as u8
[019] reg[4] as u64 := ((reg[4] as u64) << (8)) as u64
[029] reg[0] as u8 := ((reg[0] as u8 ) * (reg[1] as u8 )) as u8
[032] reg[0] as u8 := ((reg[0] as u8 ) ^ (reg[1] as u8 )) as u8
[035] reg[2] as u8 := ((reg[2] as u8 ) ^ (reg[0] as u8 )) as u8
[038] reg[4] as u8 := (reg[2] as u8 ) as u8
[041] reg[0] as u64 := ((reg[0] as u64) >> (8)) as u64
[051] reg[1] as u64 := ((reg[1] as u64) >> (8)) as u64
[061] reg[2] as u64 := ((reg[2] as u64) >> (8)) as u64
[071] reg[3] as u8 := ((reg[3] as u8 ) + (1)) as u8
[074] reg[64] := ((8) == (reg[3] as u8 )) as u8
[074] reg[65] := ((reg[3] as u8 ) < (8)) as u8
[077] jmp_flag = reg[65]
[077] if reg[65]: jmp to [16]
return (reg[4] as u64)
对三个数字的每一个字节进行乘法和异或的扰乱,然后反着存到返回值里面
查看了这个函数的引用,发现与输入无关,因此使用 python 复现一遍就可以了
def i3o1(arg1: int, arg2: int, arg3: int) -> int:
r0, r1, r2 = arg1, arg2, arg3
r4 = 0
for _ in range(8):
r4 <<= 8
low0, low1, low2 = u8(r0), u8(r1), u8(r2)
low0 = u8(low0 * low1) ^ low1
r4 |= low2 ^ low0
r0 >>= 8
r1 >>= 8
r2 >>= 8
return r4
I6O2
I6O2(list1: [u32; 2], list2: [u32; 4]) -> [u32; 2]
reg[0] as u64 := (i2o1(list2[0], list2[1])) as u64
reg[1] as u64 := (i2o1(list2[2], list2[3])) as u64
reg[2] as u64 := (i2o1(list1[0], list1[1])) as u64
#######################
Symbol-Offset Table:
b'loop' : 9
#######################
[000] reg[3] as u32 := (421101824) as u32
[006] reg[4] as u8 := (0) as u8
[009] reg[3] as u32 := ((reg[3] as u32) + (2233333945)) as u32
[015] reg[7] as u64 := (reg[2] as u64) as u64
[018] reg[6] as u64 := (reg[0] as u64) as u64
[021] reg[6] as u64 := ((reg[6] as u64) >> (32)) as u64
[031] reg[5] as u32 := (reg[7] as u32) as u32
[034] reg[5] as u32 := ((reg[5] as u32) << (4)) as u32
[040] reg[5] as u32 := ((reg[5] as u32) + (reg[6] as u32)) as u32
[043] reg[6] as u32 := (reg[7] as u32) as u32
[046] reg[6] as u32 := ((reg[6] as u32) + (reg[3] as u32)) as u32
[049] reg[5] as u32 := ((reg[5] as u32) ^ (reg[6] as u32)) as u32
[052] reg[6] as u64 := (reg[0] as u64) as u64
[055] reg[7] as u32 := ((reg[7] as u32) >> (5)) as u32
[061] reg[7] as u32 := ((reg[7] as u32) + (reg[6] as u32)) as u32
[064] reg[5] as u32 := ((reg[5] as u32) ^ (reg[7] as u32)) as u32
[067] reg[6] as u64 := (reg[2] as u64) as u64
[070] reg[6] as u64 := ((reg[6] as u64) >> (32)) as u64
[080] reg[6] as u32 := ((reg[6] as u32) + (reg[5] as u32)) as u32
[083] reg[6] as u64 := ((reg[6] as u64) << (32)) as u64
[093] reg[6] as u32 := (reg[2] as u32) as u32
[096] reg[2] as u64 := (reg[6] as u64) as u64
[099] reg[7] as u64 := (reg[2] as u64) as u64
[102] reg[7] as u64 := ((reg[7] as u64) >> (32)) as u64
[112] reg[6] as u64 := (reg[1] as u64) as u64
[115] reg[6] as u64 := ((reg[6] as u64) >> (32)) as u64
[125] reg[5] as u32 := (reg[7] as u32) as u32
[128] reg[5] as u32 := ((reg[5] as u32) << (4)) as u32
[134] reg[5] as u32 := ((reg[5] as u32) + (reg[6] as u32)) as u32
[137] reg[6] as u32 := (reg[7] as u32) as u32
[140] reg[6] as u32 := ((reg[6] as u32) + (reg[3] as u32)) as u32
[143] reg[5] as u32 := ((reg[5] as u32) ^ (reg[6] as u32)) as u32
[146] reg[6] as u64 := (reg[1] as u64) as u64
[149] reg[7] as u32 := ((reg[7] as u32) >> (5)) as u32
[155] reg[7] as u32 := ((reg[7] as u32) + (reg[6] as u32)) as u32
[158] reg[5] as u32 := ((reg[5] as u32) ^ (reg[7] as u32)) as u32
[161] reg[2] as u32 := ((reg[2] as u32) + (reg[5] as u32)) as u32
[164] reg[3] as u32 := ((reg[3] as u32) + (421101824)) as u32
[170] reg[4] as u8 := ((reg[4] as u8 ) + (1)) as u8
[173] reg[64] := ((32) == (reg[4] as u8 )) as u8
[173] reg[65] := ((reg[4] as u8 ) < (32)) as u8
[176] jmp_flag = reg[65]
[176] if reg[65]: jmp to [9]
return i1o2(reg[2] as u64)
最复杂的一个函数
但是呢有很多很好玩的特征,根据猜测和简单的还原可以发现其实是标准的 TEA
reg3 就是 sum 了,一开始先设置为 421101824,然后每轮循环头加 2233333945,每轮循环末加 421101824
循环之后没有用,相当于每轮循环头加 421101824 + 2233333945 = 2654435769
这正是标准 TEA 的 delta 常数
据此所有包装函数分析完毕,可以分析主函数的调用了
主函数分析
int32_t main(int32_t argc, char** argv, char** envp)
{
puts(&encrypted[0x12]);
char input[0x50];
memset(&input, 0, 0x48);
fgets(&input, 0x48, __TMC_END__);
input[strcspn(&input, "\r\n")] = 0;
int64_t block_cnt = (((strlen(&input) + 3) >> 2) + 1) & 0xfffffffffffffffe;
if (block_cnt == 0x10) // 57 58 59 60 61 62 63 64
{
int32_t var_b8;
__builtin_strncpy(&var_b8, "oR0yuki5g3n0ww1@", 0x10);
int64_t key0 = var_b8;
int32_t var_b0;
int64_t key1 = var_b0;
int64_t key2 = ((uint64_t)i4o1(&input) << 0x20) + 0x77333163;
int64_t key3;
__builtin_strncpy(&key3, "!DA0t3m0", 8);
int64_t i = 0;
while (true)
{
if (i >= block_cnt)
{
puts("congratulations.");
break;
}
int64_t var_a0 = i2o1((uint32_t)key0, *(uint32_t*)((char*)&key0 + 4));
i3o1(key2, key3, &var_a0);
i1o2(var_a0, &key0);
var_a0 = i2o1((uint32_t)key1, *(uint32_t*)((char*)&key1 + 4));
i3o1(key3, key2, &var_a0);
i1o2(var_a0, &key1);
int32_t cp_input[0x2];
cp_input[0] = i4o1(&input[i << 2])[0];
cp_input[1] = i4o1(&input[(i + 1) << 2]);
int32_t enc_input[0x2] = cp_input;
i6o2(&enc_input, &key0);
if (enc_input != *(uint64_t*)((i << 2) + &encrypted))
{
puts("nope.");
break;
}
i += 2;
}
}
else
puts("nope.");
return 0;
}
首先一开始那几个常量猜测是 key 相关的量,然后进行了输入,并使用输入的前 4 个字节作为了 key 的一部分
这里猜测此四字节为 flag
实际上我们发现整个循环和输入有关的只有这一部分
int32_t cp_input[0x2];
cp_input[0] = i4o1(&input[i << 2])[0];
cp_input[1] = i4o1(&input[(i + 1) << 2]);
int32_t enc_input[0x2] = cp_input;
i6o2(&enc_input, &key0);
要逆向的算法只有标准的 TEA 加密,其它都可以抄
solve.py
对主函数的逻辑以及各包装函数的逻辑进行了模拟,对关键加密函数 I6O2(标准 TEA 加密)进行了逆向
#########################################################
# Utils
#########################################################
def u8(x: int) -> int:
return x & 0xFF
def u16(x: int) -> int:
return x & 0xFFFF
def u32(x: int) -> int:
return x & 0xFFFFFFFF
def u64(x: int) -> int:
return x & 0xFFFFFFFFFFFFFFFF
#########################################################
# VM wrapped functions
#########################################################
"""
compress 2 u32 as a u64
'arg1 arg2 | low bit' <
"""
def i2o1(arg1: int, arg2: int) -> int:
return u32(arg2) | (u32(arg1) << 32)
"""
unpack a u32 list with 2 elements from a u64 like below
[ 0: high 32 bits, 1: low 32 bits]
"""
def i1o2(arg1: int) -> tuple[int, int]:
return (u32(arg1 >> 32), u32(arg1))
"""
to compress 4 u8 data like below
'd[0] d[1] d[2] d[3] | low bit'<
"""
def i4o1(data: list[int]) -> int:
return (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]
"""
complex uwu
"""
def i3o1(arg1: int, arg2: int, arg3: int) -> int:
r0, r1, r2 = arg1, arg2, arg3
r4 = 0
for _ in range(8):
r4 <<= 8
low0, low1, low2 = u8(r0), u8(r1), u8(r2)
low0 = u8(low0 * low1) ^ low1
r4 |= low2 ^ low0
r0 >>= 8
r1 >>= 8
r2 >>= 8
return r4
def revTEA(encrypted: list[int], key: list[int]) -> list[int]:
x, y = encrypted
delta = 2654435769
sumxd = u32(delta << 5)
for _ in range(32):
y = u32(y - (((x << 4) + key[2]) ^ (x + sumxd) ^ ((x >> 5) + key[3])))
x = u32(x - (((y << 4) + key[0]) ^ (y + sumxd) ^ ((y >> 5) + key[1])))
sumxd = u32(sumxd - delta)
return [x, y]
#########################################################
# Main logic and Decrypt
#########################################################
encrypted_b64 = "9ZIAZkgASBs0Lo1UlwlJv8+zy+LOvztsjOfzWB2SccV+dqzjFP/baFQtqr+lVn3Xejh/oCzbpPIrgAPFheqCpHI="
import base64, struct
encrypted_byt = base64.b64decode(encrypted_b64)[:64]
encrypted = [t[0] for t in struct.iter_unpack("<I", encrypted_byt)]
key0 = [t[0] for t in struct.iter_unpack("<I", b"oR0yuki5g3n0ww1@")]
key2 = [i4o1(list(b"flag")), 0x77333163]
key3 = [t[0] for t in struct.iter_unpack("<I", b"!DA0t3m0")][::-1]
flag = b""
for i in range(0, 16, 2):
var_a0 = i2o1(key0[0], key0[1])
var_a0 = i3o1(i2o1(*key2), i2o1(*key3), var_a0)
key0[0], key0[1] = i1o2(var_a0)
var_a0 = i2o1(key0[2], key0[3])
var_a0 = i3o1(i2o1(*key3), i2o1(*key2), var_a0)
key0[2], key0[3] = i1o2(var_a0)
decrypted = revTEA(encrypted[i:i+2], key0)
flag += b"".join(struct.pack(">I", v) for v in decrypted)
print(flag)
得到 flag 为
b'flag{d0_U_iMp13meN7_A_d1s@s3mb13r_4_th3_i7t3rmEd1@t3_1@ngUag3?}\x00'
心得和总结
我能从这里学到什么?
值的类型
值的类型可以设置为 int | str,方便立即数和符号表示
其实感觉也可以直接设置成 str 了…
需要进行运算的使用使用 f-string 来拼接
字节操作
可以使用 BytesIO 来模拟 fread 的效果,from io import BytesIO
对于每次读 x 个字节并移动指针这个操作来说很便捷
配合 struct.unpack() 可以做到基本无痛的字节操作和类型转换
感觉这里应该记录一下常用的 struct 库和 BytesIO 库的 api ?
动静分析
某些量的计算是静态的,也就是说是程序预处理的,与输入无关,这样的量我们可以模拟或者借用程序的代码来实现,不用逆向
直接抄逻辑或者打 hook 复用代码都是可以的
需要逆向的算法一定只和输入有关,理论上我甚至可以用 frida 给除了 i6o2 的函数打 hook 来调用替代我 solve.py 里面的实现
依赖性
另外你可以看到我有关 vm 翻译的部分使用了 binaryninja 的 api,实际上我只用了一个 bv.read,将其替换为静态数据也是可以分析的