转换出来的jsons数据主要分为四大类:Event
、Subject
、Object
和Principal
,分别代表系统事件、主体、客体和用户。各种大类中子类的数量,取决于CDM的版本。ShadeWatcher使用的是e3的数据,采用的是CDM18,而e5默认使用的是CDM20。相较于cdm18,发生了一些变化,主要是改变了一些字段,增加了一些类型。这些类型大多是系统调用,用于细化事件类型。本文以ShadeWatcher和转换出来的小样本数据为参考依据对数据格式进行分析,可能存在遗漏。json数据样例(只包含EVENT)可在我GitHub进行下载,或是参考之前的博客自行转换。DARPA TC-engagement5数据集解析为json格式输出到本地
e3数据集采用cdm18进行解析,Event一共有23种类型。ShadeWatcher采用了其中的19种类型,丢弃了4种类型EVENT_BOOT
、EVENT_MMAP
、EVENT_OTHER
、EVENT_MPROTECT
。cdm20新增了五种,目测这几个类型的数据占比都不大。
syscallMap["EVENT_EXECUTE"] = SyscallType_t::Execve;
syscallMap["EVENT_CLONE"] = SyscallType_t::Clone;
syscallMap["EVENT_FORK"] = SyscallType_t::Clone;
syscallMap["EVENT_OPEN"] = SyscallType_t::Open;
syscallMap["EVENT_CLOSE"] = SyscallType_t::Close;
syscallMap["EVENT_CONNECT"] = SyscallType_t::Connect;
syscallMap["EVENT_UNLINK"] = SyscallType_t::Delete;
syscallMap["EVENT_READ"] = SyscallType_t::Read;
syscallMap["EVENT_WRITE"] = SyscallType_t::Write;
syscallMap["EVENT_RECVFROM"] = SyscallType_t::Recvfrom;
syscallMap["EVENT_SENDTO"] = SyscallType_t::Sendto;
syscallMap["EVENT_RECVMSG"] = SyscallType_t::Recvmsg;
syscallMap["EVENT_SENDMSG"] = SyscallType_t::Sendmsg;
syscallMap["EVENT_RENAME"] = SyscallType_t::Rename;
syscallMap["EVENT_READ_SOCKET_PARAMS"] = SyscallType_t::Recv;
syscallMap["EVENT_WRITE_SOCKET_PARAMS"] = SyscallType_t::Send;
syscallMap["EVENT_LOADLIBRARY"] = SyscallType_t::Load;
syscallMap["EVENT_CREATE_OBJECT"] = SyscallType_t::Create;
syscallMap["EVENT_UPDATE"] = SyscallType_t::Update;
一个数据样例如下所示:
{
"CDMVersion": "20",
"source": "SOURCE_LINUX_THEIA",
"type": "RECORD_EVENT",
"#":"会话号,shadewatcher采用了这个字段但是没有采用它的值,而是将一个图视为一个会话",
"sessionNumber": 5,
"datum": {
"com.bbn.tc.schema.avro.cdm20.Event": {
"#":"第二个对象,update和rename事件会有两个对象(对应两条边),其余对象该字段为全0",
"predicateObject2": {
"com.bbn.tc.schema.avro.cdm20.UUID": "00000000-0000-0000-0000-000000000000"
},
"predicateObjectPath": null,
"subject": {
"com.bbn.tc.schema.avro.cdm20.UUID": "2A266F68-012B-5E22-9CA7-575CE8BEE27C"
},
"programPoint": null,
"properties": {
"map": {}
},
"predicateObject": {
"com.bbn.tc.schema.avro.cdm20.UUID": "B5AF11CE-7902-5F60-8E72-4ECB30FDAEDA"
},
"threadId": {
"int": 1958
},
"predicateObject2Path": null,
"type": "EVENT_READ",
"uuid": "FD4496E1-54A8-598C-9408-5E123500A8D4",
"size": {
"long": 272
},
"timestampNanos": "1557235299707",
"names": null,
"parameters": null,
"#":"表示事件相对于同一执行线程中的其他事件的逻辑顺序",
"sequence": {
"long": 1
},
"location": null
}
},
"hostId": "37345038-89F2-5899-8FD2-B6D0844A7DBF",
"@timestamp": "2019-05-07T13:21:39.707Z"
}
Subject有且只有一种,那就是进程。
{
"CDMVersion": "20",
"source": "SOURCE_LINUX_THEIA",
"type": "RECORD_SUBJECT",
"sessionNumber": 5,
"datum": {
"com.bbn.tc.schema.avro.cdm20.Subject": {
"privilegeLevel": null,
"unitId": null,
"#":"ppid是父进程的进程号,path为程序地址,其余字段为一些文件权限",
"properties": {
"map": {
"sgid": "1003",
"suid": "1003",
"egid": "1003",
"gid": "1003",
"uid": "1003",
"tgid": "1911",
"fsgid": "1003",
"fsuid": "1003",
"euid": "1003",
"path": "/usr/lib/gvfs/gvfs-afc-volume-monitor",
"ppid": "1"
}
},
"iteration": null,
"type": "SUBJECT_PROCESS",
"uuid": "C6A9DF04-D14A-57F2-97C3-3CBE2C0FF4FF",
"parentSubject": {
"com.bbn.tc.schema.avro.cdm20.UUID": "DD56B598-9E74-58C3-B3E8-2C623780B8ED"
},
"importedLibraries": null,
"#":"进程号",
"cid": 1912,
"localPrincipal": {
"com.bbn.tc.schema.avro.cdm20.UUID": "991869FF-5610-5CCB-9BA4-346353351B12"
},
"startTimestampNanos": {
"long": 1557235386887758779
},
"count": null,
"#":"内核启动参数,相当于是启动该进程的命令",
"cmdLine": {
"string": "/usr/lib/gvfs/gvfs-afc-volume-monitor"
},
"exportedLibraries": null
}
},
"hostId": "37345038-89F2-5899-8FD2-B6D0844A7DBF",
"@timestamp": "2023-08-08T02:36:47.351Z"
}
object一共4种,分别为RECORD_MEMORY_OBJECT
、RECORD_IPC_OBJECT
、RECORD_FILE_OBJECT
、RECORD_NET_FLOW_OBJECT
。ShadeWatcher中只采用了后两种。
{
"CDMVersion": "20",
"source": "SOURCE_LINUX_THEIA",
"type": "RECORD_MEMORY_OBJECT",
"sessionNumber": 5,
"datum": {
"com.bbn.tc.schema.avro.cdm20.MemoryObject": {
"pageNumber": null,
"baseObject": {
"epoch": null,
"properties": {
"map": {
"rc": "0"
}
},
"permission": null
},
"uuid": "B83AA80F-B1CD-5E10-B8A4-365281753277",
"memoryAddress": 139867157049344,
"pageOffset": null,
"size": {
"long": 2327040
}
}
},
"hostId": "37345038-89F2-5899-8FD2-B6D0844A7DBF",
"@timestamp": "2023-08-08T02:36:40.315Z"
}
{
"CDMVersion": "20",
"source": "SOURCE_LINUX_THEIA",
"type": "RECORD_IPC_OBJECT",
"sessionNumber": 5,
"datum": {
"com.bbn.tc.schema.avro.cdm20.IpcObject": {
"uuid1": null,
"baseObject": {
"epoch": null,
"properties": {
"map": {
"path": "@/tmp/.X11-unix/X0"
}
},
"permission": null
},
"type": "IPC_OBJECT_SOCKET_ABSTRACT",
"uuid": "B5AF11CE-7902-5F60-8E72-4ECB30FDAEDA",
"fd1": null,
"uuid2": null,
"fd2": null
}
},
"hostId": "37345038-89F2-5899-8FD2-B6D0844A7DBF",
"@timestamp": "2023-08-08T02:36:40.327Z"
}
{
"CDMVersion": "20",
"source": "SOURCE_LINUX_THEIA",
"type": "RECORD_FILE_OBJECT",
"sessionNumber": 5,
"datum": {
"com.bbn.tc.schema.avro.cdm20.FileObject": {
"fileDescriptor": null,
"hashes": null,
"peInfo": null,
"localPrincipal": {
"com.bbn.tc.schema.avro.cdm20.UUID": "B6C54489-38A0-5F50-A60A-FD8D76219CAE"
},
"baseObject": {
"epoch": null,
"properties": {
"map": {
"uid": "0",
"inode": "0x520ca3",
"mode": "0",
"dev": "0xfd00001",
"#":"少部分文件对象不具有这个字段",
"filename": "/lib/x86_64-linux-gnu/libutil-2.15.so",
"ids": "0/0",
"gid": "0"
}
},
"permission": null
},
"type": "FILE_OBJECT_BLOCK",
"uuid": "0100D00F-A30C-5200-0000-0000BB90005A",
"size": null
}
},
"hostId": "37345038-89F2-5899-8FD2-B6D0844A7DBF",
"@timestamp": "2023-08-08T02:36:40.827Z"
}
{
"CDMVersion": "20",
"source": "SOURCE_LINUX_THEIA",
"type": "RECORD_NET_FLOW_OBJECT",
"sessionNumber": 5,
"datum": {
"com.bbn.tc.schema.avro.cdm20.NetFlowObject": {
"fileDescriptor": null,
"localAddress": {
"string": "10.0.6.60"
},
"remoteAddress": {
"string": "10.0.4.2"
},
"localPort": {
"int": 22
},
"remotePort": {
"int": 36764
},
"ipProtocol": null,
"baseObject": {
"epoch": null,
"properties": {
"map": {}
},
"permission": null
},
"uuid": "0A00063C-1600-0A00-0402-9C8F00000000",
"initTcpSeqNum": null
}
},
"hostId": "37345038-89F2-5899-8FD2-B6D0844A7DBF",
"@timestamp": "2023-08-08T02:36:40.324Z"
}