[NAME] ALL.daovm.architecture.bytecode [TITLE] Bytecode Format [DESCRIPTION] This document contains specifications of the bytecode format for Dao virtual machine. In this bytecode format, integers are always stored in big endian. In the following specifications or examples, each byte is represented by two hexadecimal digits, unless it is quoted by quotation marks. 1 Header Section The header section contains 32 bytes, which are divided as the following: 1 Byte # ESC, 0x1B; 2 Byte # 0x44, namely 'D'; 3 Byte # 0x61, namely 'a'; 4 Byte # 0x6F, namely 'o'; 5 Byte # major version number, 0x2; 6 Byte # minor version number, 0x0; 7 Byte # Carriage Return (CR), 0x0D; 8 Byte # Line Feed (LF), 0x0A; 9 Byte # format class, 0x0 for the official one; 10 Byte # size of integer type, default 0x4; 11 Byte[4] # format hash (rotating hash of the ASM tags and VM opcodes); 12 Byte[16] # 16 reserved bytes; 13 Byte # Carriage Return (CR), 0x0D; 14 Byte # Line Feed (LF), 0x0A; The ninth byte is for format class, where 0x0 is reserved for the official format, and 0x1 for encrypted format (only the main section is encrypted, see below for more information). The four bytes for format hash serves as a signature for the format in which the bytecode is encoded. It is the rotating hash value of a string that is constructed from the bytecode tag indices and names, and virtual machine opcode indices and names: 1 TagIndex1:TagName1;TagIndex2:TagName2;...; OpcodeIndex1:OpcodeName1;... Each index is separated with its corresponding name by a colon; and each pair of index and name is followed by a semicolon. The substrings for bytecode tags and opcodes are seperated by a blank space. The rotating hash is computed by 1 hash = length( text ); 2 for(byte in text) hash = ((hash<<4)^(hash>>28)^byte)&0x7fffffff; 3 return hash; 2 Source Path Section 1 Byte[2] # length of the source path; 2 Byte[] # source path (null-terminated); 3 Byte # Carriage Return (CR), 0x0D; 4 Byte # Line Feed (LF), 0x0A; 3 Main Section The main section is encoded as structured blocks. Each block is divided into chunks of 9 bytes, where the first byte always stores a tag which identifies the chunk type. The remaining 8 bytes are used to store data. There are the following type of chunks: 1 ASM_COPY 2 ASM_TYPEOF 3 ASM_TYPEDEF 4 ASM_ROUTINE 5 ASM_CLASS 6 ASM_INTERFACE 7 ASM_ENUM 8 ASM_TYPE 9 ASM_VALUE 10 ASM_EVAL 11 ASM_BASES 12 ASM_DECOS 13 ASM_PATTERNS 14 ASM_CONSTS 15 ASM_TYPES 16 ASM_CODE 17 ASM_END 18 ASM_LOAD 19 ASM_USE 20 ASM_VERBATIM 21 ASM_CONST 22 ASM_STATIC 23 ASM_GLOBAL 24 ASM_VAR 25 ASM_DATA 26 ASM_DATA2 27 ASM_SEEK 4 Chunk Specifications: 4.1 Values: 1 int: 2 ASM_VALUE(1Byte): DAO_INTEGER(1Bytes), Zeros(7Bytes); 3 ASM_END(1B): Value(4B/8B), Zeros(4B/0B); 4 5 6 float: 7 ASM_VALUE(1B): DAO_FLOAT(1B), Zeros(7B); 8 ASM_END(1B): Value(4B), Zeros(4B); 9 10 11 double: 12 ASM_VALUE(1B): DAO_DOUBLE(1B), Zeros(7B); 13 ASM_END(1B): Value(8B); 14 15 16 complex: 17 ASM_VALUE(1B): DAO_COMPLEX(1B), Zeros(7B); 18 ASM_DATA(1B): Real(8B); 19 ASM_END(1B): Imag(8B); 20 21 22 long: 23 ASM_VALUE(1B): DAO_LONG(1B), Base(1B), Sign(1B), SizeMod16(1B), Digits(4B); 24 ASM_DATA(1B); Digits (8B); 25 ASM_END(1B): Digits(8B); 26 27 28 string: 29 ASM_VALUE(1B): DAO_STRING(1B), MBS/WCS(1B), SizeMod16(1B), Bytes(5B); 30 ASM_DATA(1B); Bytes(8B); 31 ASM_END(1B): Bytes(8B); 32 33 34 enum symbol: 35 ASM_VALUE(1B): DAO_ENUM(1B), Zeros(1B), Type-Index(2B), Zeros(4B); 36 ASM_END(1B): Value(4B), Zeros(0); Notes: The "Type-Index" reference previous blocks which are located backwardly by a such"index" offset. Only blocks that represent values are indexed, and such index is stored as a two-byte short. In case short is not sufficient to represent such index, an intermediate indexing chunk can be used: 1 ASM_SEEK(1B): New-Index(2B), Zeros(6B); When "New-Index" is also seeked backwardly, and is relative to the seek chunk. 1 array: 2 ASM_VALUE(1B): DAO_ARRAY(1B), Numeric-Type(1B), Dimensions(2B), Size(4B); 3 ASM_DATA(1B); Dim1(4B), Dim2(4B); 4 ASM_DATA(1B); More dimensions; 5 ASM_DATA(1B); Data(4B), Data(4B); Or Data(8B); 6 ASM_DATA(1B); More Data; 7 ASM_END(1B): Data(8B); 8 9 10 list: 11 ASM_VALUE(1B): DAO_LIST(1B), Zeros(1B), Type-Index(2B), Size(4B); 12 ASM_DATA(1B); Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 13 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 14 15 16 map: 17 ASM_VALUE(1B): DAO_MAP(1B), Zeros(1B), Type-Index(2B), Hash-Seed(4B); 18 ASM_DATA(1B); Key-Index(2B), Value-Index(2B), Key-Index(2B), Value-Index(2B); 19 ASM_END(1B): Key-Index(2B), Value-Index(2B), Key-Index(2B), Value-Index(2B); 20 21 A pair of "Value-Index"s is for a pair of key-value, zero marks the end. 22 23 24 tuple: 25 ASM_VALUE(1B): DAO_TUPLE(1B), SubTypeID(1B), Type-Index(2B), Size(2B), Value-Index(2B); 26 ASM_DATA(1B); Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 27 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 28 29 30 namevalue: 31 ASM_VALUE(1B): DAO_PAR_NAMED(1B), Zeros(1B), Name-Index(2B), Value-Index(2B), Type-Index(2B); 32 ASM_END(1B): Zeros(8B); 33 34 35 specialized ctype: 36 ASM_VALUE(1B): DAO_CTYPE(1B), Zeros(1B), Value-Index(2B), Type-Index(2B) X 2; 37 ASM_DATA(1B): Type-Index(2B) X 4; 38 ASM_END(1B): Type-Index(2B) X 4; 4.2 Other Values 1 copied value: 2 ASM_COPY(1B): Value-Index(2B), Zeros(6B); 3 4 type of a value: 5 ASM_TYPEOF(1B): Value-Index(2B), Zeros(6B); 6 7 type alias: 8 ASM_TYPEDEF(1B): Name-Index(2B), Type-Index(2B), Zeros(4B); 4.3 Structures: 1 routine: 2 ASM_ROUTINE(1B): Name-Index(2B), Type-Index(2B), Host-Index(2B), Attrib(2B); 3 ... 4 ASM_END: RegCount(2B), Zeros(4B), DefaultConstructor(1B), Permission(1B); 5 6 7 class: 8 ASM_CLASS(1B): Name/Decl-Index(2B), Parent-Index(2B), Attrib(4B); 9 ASM_BASES(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 10 ... 11 ASM_END(1B): LineDef(2B), Zeros(5B), Permission(1B); 12 13 14 interface: 15 ASM_INTERFACE(1B): Name/Decl-Index(2B), Parent-Count(2B), Zeros(4B); 16 ASM_BASES(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 17 ... 18 ASM_END(1B): LineDef(2B), Zeros(5B), Permission(1B); 19 20 21 enum: 22 ASM_ENUM(1B): Name-Index(2B), Enum/Flag(2B), Count(4B); 23 ASM_DATA(1B): Name-Index(2B), Value(4B), Zeros(2B); 24 ASM_END(1B): Name-Index(2B), Value(4B), Zeros(2B); 25 26 27 type: 28 ASM_TYPE(1B): Name-Index(2B), TypeID(2B), Aux-Index(2B), CodeBlockType-Index(2B); 29 ASM_DATA(1B): Type-Index(2B) X 4; 30 ASM_END(1B): Type-Index(2B) X 4; 31 32 Note 1: the nested types are zero Type-Index terminated; 33 Note 2: "Aux-Index" could be index to returned type or class block etc; 34 35 36 type alias: 37 ASM_TYPE(1B): Name-Index(2B), Type-Index(2B), Zeros(4B); 38 39 40 typeof: 41 ASM_TYPE(1B): Value-Index(2B), Zeros(6B); 42 43 44 value: 45 See above; 46 47 48 evaluation: 49 ASM_EVAL(1B): Opcode(2B), OpB(2B), Type-Index(2B), Zeros(2B); 50 ASM_DATA(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 51 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 52 53 54 bases (mixin components or interface parents): 55 ASM_BASES(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 56 ASM_DATA(1B): Value-Index(2B) X 4; 57 ASM_END(1B): Value-Index(2B) X 4; 58 59 60 decorators for the current routine: 61 ASM_DECOS(1B): Func-Index(2B), ParList-Index(2B), Func-Index(2B), ParList-Index(2B); 62 ASM_DATA(1B): Func-Index(2B), ParList-Index(2B), Func-Index(2B), ParList-Index(2B); 63 ASM_END(1B): Func-Index(2B), ParList-Index(2B), Func-Index(2B), ParList-Index(2B); 64 65 66 patterns for automatic decorator application: 67 ASM_PATTERNS(1B): PatternString-Index(2B) X 4; 68 ASM_DATA(1B): PatternString-Index(2B) X 4; 69 ASM_END(1B): PatternString-Index(2B) X 4; 70 71 72 consts: 73 ASM_CONSTS(1B): Count(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 74 ASM_DATA(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 75 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 76 77 78 types: 79 ASM_TYPES(1B): Count(2B), Zeros(2B), Var-Index(2B), Type-Index(2B); 80 ASM_DATA(1B): Var-Index(2B), Type-Index(2B), Var-Index(2B), Type-Index(2B); 81 ASM_END(1B): Var-Index(2B), Type-Index(2B), Var-Index(2B), Type-Index(2B); 82 83 84 code: 85 ASM_CODE(1B): CodeNum(2B), Line-Num-Count(2B), LineNum(2B), Count(2B); 86 ASM_DATA(1B): LineDiff(2B), Count(2B), LineDiff(2B), Count(2B); 87 ASM_DATA(1B): Opcode(2B), A(2B), B(2B), C(2B); 88 ASM_END(1B): Opcode(2B), A(2B), B(2B), C(2B); 4.4 Statement: 1 load statement: 2 ASM_LOAD(1B): File-Path-Index(2B), Optional-Name-Index(2B), Zeros(4B); 3 4 use namespace: 5 ASM_USE(1B): DAO_NAMESPACE(2B), Value-Index(2B), Zeros(4B); 6 7 use enum constants: 8 ASM_USE(1B): DAO_ENUM(2B), Type-Index(2B), Zeros(4B); 9 10 use constructors: 11 ASM_USE(1B): DAO_ROUTINE(2B), Routine-Index(2B), Zeros(4B); 12 13 verbatim: 14 ASM_VERBATIM(1B): Tag-Index(2B), Mode-Index(2B), Text-Index(2B), LineNum(2B); 15 16 var declaration: 17 ASM_VAR(1B): Name-Index(2B), Value-Index(2B), Type-Index(2B), Scope(1B), Perm(1B); 18 19 const declaration: 20 ASM_CONST(1B): Name-Index(2B), Value-Index(2B), Zeros(2B), Scope(1B), Permission(1B); 21 22 static declaration: 23 ASM_STATIC(1B): Name-Index(2B), Value-Index(2B), Type-Index(2B), Scope(1B), Perm(1B); 24 25 global declaration: 26 ASM_GLOBAL(1B): Name-Index(2B), Value-Index(2B), Type-Index(2B), Scope(1B), Perm(1B); 27 28 seek: 29 ASM_SEEK(1B): New-Index(2B), Zeros(6B); 5 Samples Input code: 1 io.writeln( 'Hello Dao!' ); Output disassembled bytecode: 1 ASM_ROUTINE: 0, 0, 0, 512; 2 ASM_VALUE: DAO_STRING, 0, 2, 'io'; 3 ASM_END: ''; 4 5 ASM_EVAL: GETCG, 1, 0, 0; 6 ASM_END: 1, 0, 0, 0; 7 8 ASM_VALUE: DAO_STRING, 0, 7, 'write'; 9 ASM_END: 'ln'; 10 11 ASM_EVAL: GETF, 2, 0, 0; 12 ASM_END: 2, 1, 0, 0; 13 14 ASM_VALUE: DAO_STRING, 0, 10, 'Hello'; 15 ASM_END: ' Dao!'; 16 17 ASM_CONSTS: 2, 2, 1, 0; 18 ASM_END: 0, 0, 0, 0; 19 20 ASM_TYPES: 0, 0, 0, 0; 21 ASM_END: 0, 0, 0, 0; 22 23 ASM_CODE: 6, 1, 1, 6; 24 ASM_DATA: GETCG , 1, 5, 0; 25 ASM_DATA: GETCL , 0, 0, 1; 26 ASM_DATA: LOAD , 0, 0, 2; 27 ASM_DATA: GETCL , 0, 1, 3; 28 ASM_DATA: MCALL , 1, 16386, 4; 29 ASM_END: RETURN , 0, 0, 0; 30 ASM_END: ; Input code: 1 load web.cgi 2 3 enum Bool 4 { 5 False, 6 True 7 } 8 9 use enum Bool 10 11 static abc = random_string( 100 ) 12 13 global index = 123 + %abc 14 15 class Klass 16 { 17 const name = "abc"; 18 var index = 123; 19 20 routine Method( a :int ){ 21 } 22 } 23 24 routine Func() 25 { 26 name = index 27 } 28 29 klass = Klass() Output disassembled bytecode: 1 ASM_ROUTINE: 0, 0, 0, 1536; 2 ASM_VALUE: DAO_STRING, 0, 7, 'web/c'; 3 ASM_END: 'gi'; 4 5 ASM_LOAD: 1, 0, 0, 0; 6 7 ASM_VALUE: DAO_STRING, 0, 4, 'Bool'; 8 ASM_END: ''; 9 10 ASM_VALUE: DAO_STRING, 0, 5, 'False'; 11 ASM_END: ''; 12 13 ASM_VALUE: DAO_STRING, 0, 4, 'True'; 14 ASM_END: ''; 15 16 ASM_VALUE: DAO_STRING, 0, 0, 'enum<'; 17 ASM_DATA: 'False,Tr'; 18 ASM_END: 'ue>'; 19 20 ASM_ENUM: 1, 0, 2; 21 ASM_DATA: 3, 0; 22 ASM_END: 2, 1; 23 24 ASM_TYPEDEF: 5, 1, 0, 0; 25 26 ASM_USE: 7, 1, 0, 0; 27 28 ASM_VALUE: DAO_STRING, 0, 13, 'rando'; 29 ASM_END: 'm_string'; 30 31 ASM_EVAL: GETCG, 1, 0, 0; 32 ASM_END: 1, 0, 0, 0; 33 34 ASM_VALUE: DAO_INTEGER; 35 ASM_END: 100 ; 36 37 ASM_VALUE: DAO_STRING, 0, 1, '?'; 38 ASM_END: ''; 39 40 ASM_TYPE: 1, 66, 0, 0; 41 ASM_END: 0, 0, 0, 0; 42 43 ASM_EVAL: CALL, 1, 1, 0; 44 ASM_END: 4, 3, 0, 0; 45 46 ASM_TYPEOF: 1, 0, 0, 0; 47 48 ASM_VALUE: DAO_STRING, 0, 3, 'abc'; 49 ASM_END: ''; 50 51 ASM_STATIC: 1, 3, 2, 0; 52 53 ASM_VALUE: DAO_STRING, 0, 5, 'index'; 54 ASM_END: ''; 55 56 ASM_GLOBAL: 1, 0, 5, 3; 57 58 ASM_VALUE: DAO_STRING, 0, 5, 'Klass'; 59 ASM_END: ''; 60 61 ASM_CLASS: 1, 0, 0, 0; 62 ASM_END: ; 63 64 ASM_VALUE: DAO_STRING, 0, 0, 'inter'; 65 ASM_DATA: 'face<Kla'; 66 ASM_END: 'ss>'; 67 68 ASM_INTERFACE: 2, 0, 0, 0; 69 ASM_BASES: 0, 0, 0, 0; 70 ASM_END: 0, 0, 0, 0; 71 ASM_END: ; 72 73 ASM_CLASS: 3, 0, 0, 1; 74 ASM_BASES: 0, 0, 0, 0; 75 ASM_END: 0, 0, 0, 0; 76 77 ASM_VALUE: DAO_STRING, 0, 4, 'name'; 78 ASM_END: ''; 79 80 ASM_VALUE: DAO_STRING, 1, 3, "a"; 81 ASM_END: "bb"; 82 83 ASM_CONST: 2, 1, 0, 3; 84 85 ASM_VALUE: DAO_INTEGER; 86 ASM_END: 123 ; 87 88 ASM_VALUE: DAO_STRING, 0, 3, 'int'; 89 ASM_END: ''; 90 91 ASM_TYPE: 1, 1, 0, 0; 92 ASM_END: 0, 0, 0, 0; 93 94 ASM_VAR: 11, 3, 1, 3; 95 96 ASM_TYPE: 10, 12, 6, 0; 97 ASM_END: 0, 0, 0, 0; 98 99 ASM_VALUE: DAO_STRING, 0, 10, 'self:'; 100 ASM_END: 'Klass'; 101 102 ASM_TYPE: 1, 30, 2, 0; 103 ASM_END: 0, 0, 0, 0; 104 105 ASM_VALUE: DAO_STRING, 0, 5, 'a:int'; 106 ASM_END: ''; 107 108 ASM_TYPE: 1, 30, 5, 0; 109 ASM_END: 0, 0, 0, 0; 110 111 ASM_VALUE: DAO_STRING, 0, 12, 'routi'; 112 ASM_DATA: 'ne<self:'; 113 ASM_DATA: 'Klass,a:'; 114 ASM_END: 'int=>?>'; 115 116 ASM_TYPE: 1, 18, 21, 0; 117 ASM_END: 4, 2, 0, 0; 118 119 ASM_VALUE: DAO_STRING, 0, 6, 'Metho'; 120 ASM_END: 'd'; 121 122 ASM_ROUTINE: 1, 2, 8, 1; 123 ASM_END: ; 124 125 ASM_ROUTINE: 1, 3, 9, 1; 126 ASM_CONSTS: 2, 0, 0, 0; 127 ASM_END: 0, 0, 0, 0; 128 129 ASM_TYPES: 2, 0, 0, 8; 130 ASM_END: 1, 11, 0, 0; 131 132 ASM_CODE: 1, 1, 0, 1; 133 ASM_END: RETURN , 0, 0, 0; 134 ASM_END: ; 135 136 ASM_VALUE: DAO_STRING, 0, 0, 'routi'; 137 ASM_DATA: 'ne<=>Kla'; 138 ASM_END: 'ss>'; 139 140 ASM_TYPE: 1, 18, 11, 0; 141 ASM_END: 0, 0, 0, 0; 142 143 ASM_VALUE: DAO_STRING, 0, 12, 'Klass'; 144 ASM_END: '::Klass'; 145 146 ASM_ROUTINE: 1, 2, 13, 256; 147 ASM_CONSTS: 0, 0, 0, 0; 148 ASM_END: 0, 0, 0, 0; 149 150 ASM_TYPES: 0, 0, 0, 0; 151 ASM_END: 0, 0, 0, 0; 152 153 ASM_CODE: 1, 1, 0, 1; 154 ASM_END: RETURN , 0, 0, 0; 155 ASM_END: ; 156 ASM_END: ; 157 158 ASM_VALUE: DAO_STRING, 0, 12, 'routi'; 159 ASM_END: 'ne<=>?>'; 160 161 ASM_TYPE: 1, 18, 30, 0; 162 ASM_END: 0, 0, 0, 0; 163 164 ASM_VALUE: DAO_STRING, 0, 4, 'Func'; 165 ASM_END: ''; 166 167 ASM_ROUTINE: 1, 2, 0, 0; 168 ASM_CONSTS: 0, 0, 0, 0; 169 ASM_END: 0, 0, 0, 0; 170 171 ASM_TYPES: 0, 0, 0, 0; 172 ASM_END: 0, 0, 0, 0; 173 174 ASM_CODE: 3, 1, 26, 3; 175 ASM_DATA: GETVG , 0, 29, 0; 176 ASM_DATA: MOVE_XX , 0, 0, 1; 177 ASM_END: RETURN , 0, 0, 0; 178 ASM_END: ; 179 180 ASM_CONSTS: 1, 24, 0, 0; 181 ASM_END: 0, 0, 0, 0; 182 183 ASM_TYPES: 0, 0, 0, 0; 184 ASM_END: 0, 0, 0, 0; 185 186 ASM_CODE: 10, 3, 9, 1; 187 ASM_DATA: 4, 5, 16, 4; 188 ASM_DATA: GETCG , 0, 43, 0; 189 ASM_DATA: DATA_I , 1, 123, 1; 190 ASM_DATA: GETVS , 0, 0, 2; 191 ASM_DATA: SIZE , 2, 0, 3; 192 ASM_DATA: ADD_III , 1, 3, 4; 193 ASM_DATA: SETVG_II , 4, 7, 0; 194 ASM_DATA: GETCL , 0, 0, 5; 195 ASM_DATA: CALL , 5, 0, 6; 196 ASM_DATA: MOVE_PP , 6, 0, 7; 197 ASM_END: RETURN , 0, 0, 0; 198 ASM_END: ;