banner
raye~

Raye's Journey

且趁闲身未老,尽放我、些子疏狂。
medium
tg_channel
twitter
github
email
nintendo switch
playstation
steam_profiles

Exploration of JS Memory Triggered by a Question

The title is actually not very appropriate, let's leave it like this for now#

Group 1 (1)

Starting from a Memory Rule#

The reason is that while researching WeChat mini-games, I discovered that there has been a persistent rule circulating online, which states that game values * 2 = memory values.

For example: CE Game Memory Modification - Love God Garden

DraggedImage

This doubling is indeed a bit puzzling. According to my understanding of compilation principles, during the process of code finally becoming assembly code, although operations like constant folding may occur, the value in the code is the same as the value in actual memory. For example, when we modify Plants vs. Zombies through CE, the code does not encrypt the sunshine value, so we can directly search for the value to locate it: [re Introduction] CE Modification of Plants vs. Zombies

If the game had made changes to the values, it would be understandable, but the actual game code does not perform such operations (if this operation were to be done, it wouldn't simply be a multiplication by 2, right? 🤔)

Using this as a factor, I began to study the memory issues in JS.

v8 Engine#

Essentially, the code for WeChat mini-programs also runs on the v8 engine, so I became curious about how JS code is gradually transformed into assembly code.

I thought node.js was just v8, after all, node.js is also based on v8, but I found that there are still differences between the two. v8 itself provides a shell, which is the most authentic version.

You can compile a v8 shell locally.

Steps reference: https://gist.github.com/Becavalier/58da63744f255abe5717e23dacc673e5

# Download binary tools
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
export PATH=/path/depot_tools:$PATH
source ~/.bash_profile

# Create a directory where you want to install v8
gclient

fetch v8
cd v8
gclient sync
tools/dev/v8gen.py x64.optdebug
ninja -C out.gn/x64.optdebug 

# Finally, get the installation path of d8
/path/chromev8/v8/out.gn/x64.optdebug/d8

How does v8 compile and execute JS code?#

The basic principle is still the same: lexical analysis -> syntax analysis -> intermediate code generation, etc.

However, v8 has also made many optimizations, namely ignition and turbofan, but these are not within the scope of this discussion.

DraggedImage-1

From the diagram, we can see that the final result is bytecode. So what does this bytecode look like?

Let's write a piece of code for testing:

function test( obj ) {
  return obj.money + 10;
}

class UserData {
  constructor( money, power ) {
    this.money = money;
    this.power = power;
  }
}
u = new UserData(50, 100);
console.log(u.money, u.power);
for(let i = 0; i < 100000000; i++){
  test( u );
}

Print the bytecode:

d8  -print-bytecode test1.js 

Here I will put the complete bytecode:

[generated bytecode for function:  (0x35e90011ae01 <SharedFunctionInfo>)]
Bytecode length: 131
Parameter count 1
Register count 7
Frame size 56
         0x35e90011afec @    0 : 13 00             LdaConstant [0]
         0x35e90011afee @    2 : c3                Star2
         0x35e90011afef @    3 : 19 fe f7          Mov <closure>, r3
         0x35e90011aff2 @    6 : 65 68 01 f8 02    CallRuntime [DeclareGlobals], r2-r3
         0x35e90011aff7 @   11 : 82 01             CreateBlockContext [1]
         0x35e90011aff9 @   13 : 1a f8             PushContext r2
         0x35e90011affb @   15 : 10                LdaTheHole
         0x35e90011affc @   16 : bf                Star6
         0x35e90011affd @   17 : 81 03 00 00       CreateClosure [3], [0], #0
         0x35e90011b001 @   21 : c2                Star3
         0x35e90011b002 @   22 : 13 02             LdaConstant [2]
         0x35e90011b004 @   24 : c1                Star4
         0x35e90011b005 @   25 : 19 f7 f5          Mov r3, r5
         0x35e90011b008 @   28 : 65 2a 00 f6 03    CallRuntime [DefineClass], r4-r6
         0x35e90011b00d @   33 : 1b f8             PopContext r2
         0x35e90011b00f @   35 : 0b f5             Ldar r5
         0x35e90011b011 @   37 : 25 02             StaCurrentContextSlot [2]
         0x35e90011b013 @   39 : 16 02             LdaCurrentContextSlot [2]
         0x35e90011b015 @   41 : c3                Star2
         0x35e90011b016 @   42 : 0d 32             LdaSmi [50]
         0x35e90011b018 @   44 : c2                Star3
         0x35e90011b019 @   45 : 0d 64             LdaSmi [100]
         0x35e90011b01b @   47 : c1                Star4
         0x35e90011b01c @   48 : 0b f8             Ldar r2
         0x35e90011b01e @   50 : 69 f8 f7 02 00    Construct r2, r3-r4, [0]
         0x35e90011b023 @   55 : 23 04 02          StaGlobal [4], [2]
         0x35e90011b026 @   58 : 21 05 04          LdaGlobal [5], [4]
         0x35e90011b029 @   61 : c2                Star3
         0x35e90011b02a @   62 : 2d f7 06 06       GetNamedProperty r3, [6], [6]
         0x35e90011b02e @   66 : c3                Star2
         0x35e90011b02f @   67 : 21 04 08          LdaGlobal [4], [8]
         0x35e90011b032 @   70 : c1                Star4
         0x35e90011b033 @   71 : 2d f6 07 0a       GetNamedProperty r4, [7], [10]
         0x35e90011b037 @   75 : c1                Star4
         0x35e90011b038 @   76 : 21 04 08          LdaGlobal [4], [8]
         0x35e90011b03b @   79 : c0                Star5
         0x35e90011b03c @   80 : 2d f5 08 0c       GetNamedProperty r5, [8], [12]
         0x35e90011b040 @   84 : c0                Star5
         0x35e90011b041 @   85 : 5f f8 f7 f6 f5 0e CallProperty2 r2, r3, r4, r5, [14]
         0x35e90011b047 @   91 : 0c                LdaZero
         0x35e90011b048 @   92 : c4                Star1
         0x35e90011b049 @   93 : 0e                LdaUndefined
         0x35e90011b04a @   94 : c5                Star0
         0x35e90011b04b @   95 : 01 0d 00 e1 f5 05 LdaSmi.ExtraWide [100000000]
         0x35e90011b051 @  101 : 6d f9 10          TestLessThan r1, [16]
         0x35e90011b054 @  104 : 9a 18             JumpIfFalse [24] (0x35e90011b06c @ 128)
         0x35e90011b056 @  106 : 21 09 11          LdaGlobal [9], [17]
         0x35e90011b059 @  109 : c3                Star2
         0x35e90011b05a @  110 : 21 04 08          LdaGlobal [4], [8]
         0x35e90011b05d @  113 : c2                Star3
         0x35e90011b05e @  114 : 62 f8 f7 13       CallUndefinedReceiver1 r2, r3, [19]
         0x35e90011b062 @  118 : c5                Star0
         0x35e90011b063 @  119 : 0b f9             Ldar r1
         0x35e90011b065 @  121 : 50 15             Inc [21]
         0x35e90011b067 @  123 : c4                Star1
         0x35e90011b068 @  124 : 8a 1d 00 16       JumpLoop [29], [0], [22] (0x35e90011b04b @ 95)
         0x35e90011b06c @  128 : 0b fa             Ldar r0
         0x35e90011b06e @  130 : aa                Return
Constant pool (size = 10)
0x35e90011af9d: [FixedArray] in OldSpace
 - map: 0x35e900000089 <Map(FIXED_ARRAY_TYPE)>
 - length: 10
           0: 0x35e90011ae71 <FixedArray[2]>
           1: 0x35e90011ae55 <ScopeInfo CLASS_SCOPE>
           2: 0x35e90011af79 <FixedArray[7]>
           3: 0x35e90011aebd <SharedFunctionInfo UserData>
           4: 0x35e900002b85 <String[1]: #u>
           5: 0x35e9000044a9 <String[7]: #console>
           6: 0x35e900311cad <String[3]: #log>
           7: 0x35e90011adc5 <String[5]: #money>
           8: 0x35e90011add9 <String[5]: #power>
           9: 0x35e900310aed <String[4]: #test>
Handler Table (size = 0)
Source Position Table (size = 0)
[generated bytecode for function: UserData (0x35e90011aebd <SharedFunctionInfo UserData>)]
Bytecode length: 14
Parameter count 3
Register count 0
Frame size 0
         0x35e90011b1f8 @    0 : 0b 03             Ldar a0
         0x35e90011b1fa @    2 : 32 02 00 00       SetNamedProperty <this>, [0], [0]
         0x35e90011b1fe @    6 : 0b 04             Ldar a1
         0x35e90011b200 @    8 : 32 02 01 02       SetNamedProperty <this>, [1], [2]
         0x35e90011b204 @   12 : 0e                LdaUndefined
         0x35e90011b205 @   13 : aa                Return
Constant pool (size = 2)
0x35e90011b1c9: [FixedArray] in OldSpace
 - map: 0x35e900000089 <Map(FIXED_ARRAY_TYPE)>
 - length: 2
           0: 0x35e90011adc5 <String[5]: #money>
           1: 0x35e90011add9 <String[5]: #power>
Handler Table (size = 0)
Source Position Table (size = 0)
50 100
[generated bytecode for function: test (0x35e90011ae81 <SharedFunctionInfo test>)]
Bytecode length: 8
Parameter count 2
Register count 0
Frame size 0
         0x35e90011b374 @    0 : 2d 03 00 01       GetNamedProperty a0, [0], [1]
         0x35e90011b378 @    4 : 44 0a 00          AddSmi [10], [0]
         0x35e90011b37b @    7 : aa                Return
Constant pool (size = 1)
0x35e90011b349: [FixedArray] in OldSpace
 - map: 0x35e900000089 <Map(FIXED_ARRAY_TYPE)>
 - length: 1
           0: 0x35e90011adc5 <String[5]: #money>
Handler Table (size = 0)
Source Position Table (size = 0)

For someone encountering this bytecode for the first time, it will definitely take some time to understand it clearly, so we directly throw it to chatGPT to see if it can restore the original code:

> The following response is from GPT4

This bytecode represents a more complex JavaScript code. Let's analyze it step by step.

### Main Function Bytecode

#### Constant Pool:

1. Other fixed arrays and metadata
2. ScopeInfo for Class
3. Other fixed arrays and metadata
4. Shared function information for `UserData`
5. String `u`
6. String `console`
7. String `log`
8. String `money`
9. String `power`
10. String `test`

#### Analysis:

1. Defines a class `UserData` with two properties: `money` and `power`.
2. Creates an instance of `UserData`, passing `50` and `100` as initialization values for `money` and `power`.
3. Assigns this instance to the variable `u`.
4. Calls `console.log(u.money, u.power)`.
5. Creates a loop that runs `100000000` times, calling the function `test` with `u` as the parameter.

### UserData Constructor Function Bytecode

#### Analysis:

1. Sets `this.money` to the first parameter passed in.
2. Sets `this.power` to the second parameter passed in.

### Test Function Bytecode

#### Analysis:

1. Retrieves the `money` property from the passed object.
2. Adds 10 to the retrieved value.
3. Returns this result.

### Restored JavaScript Code:

```javascript
class UserData {
    constructor(money, power) {
        this.money = money;
        this.power = power;
    }
}

let u = new UserData(50, 100);
console.log(u.money, u.power);

function test(obj) {
    return obj.money + 10;
}

for (let i = 0; i < 100000000; i++) {
    test(u);
}

This is the JavaScript code restored from the given bytecode.

GPT4 is really amazing 🥹, it can perfectly replace me.

Overall, the bytecode is still quite understandable. With a few more readings and combining it with the corresponding instruction format, it is not difficult to analyze.

And we can see these two instructions, which are very similar to moving a value in assembly instructions, so this value must be preserved intact in the memory data.

         0x35e90011b016 @   42 : 0d 32             LdaSmi [50]

         0x35e90011b019 @   45 : 0d 64             LdaSmi [100]

Since the v8 runtime does not have setInterval (which is also quite amazing), I directly conducted this experiment in the d8 shell:

It can be clearly identified that the memory value is exactly twice the actual runtime value (2474 = 1237 * 2).

DraggedImage-2

Modifying the memory, the value changes directly.

DraggedImage-3

Comparing with Node.js#

Since that's the case, let's take a look at Node.js.

Custom Class#

The same code, when run in the Node.js shell, instantiates an object user. I can see that I set user.money to a very strange value, aiming to ensure that there are no duplicate values in memory (although three duplicates still appeared). I modified all three values, and the effect was very obvious; after modification, the value of user.money changed.

So there is no doubling relationship, right? 🤔 It's quite normal; just locate the value and modify it directly.

DraggedImage-4

DraggedImage-5

Built-in Class#

For custom classes, the corresponding values can be directly searched. So will the classes provided by Node.js be different? Let's test with a Map.

class UserData {
    constructor(money, power) {
        this.test_map = new Map();
        this.test_map.set('money', money);
        this.test_map.set('power', power);
    }
}

Through CE, it can still be located:

DraggedImage-6

And it can also be modified:

DraggedImage-7

Comparing with Browsers#

Only now do I realize that Chrome's developer tools have a tool for taking memory snapshots, mainly used for troubleshooting memory leaks, but it's also very convenient for studying memory layouts.

Chrome Memory Panel#

We can write a sample code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    
</body>
<script>
var _____testArray_____ = [ {value: 'hello'}, {money: 9999} ];
console.info(_____testArray_____);
</script>
</html>

Take a memory snapshot by clicking "Take Snapshot".

DraggedImage-8

In the current page, press cmd + F (for Windows, it should be ctrl + F) to open the search window at the bottom.

Since our code contains the string hello, we can search for it directly. After finding it, right-click to display it in the summary view, which allows us to locate the corresponding constructor.

DraggedImage-9

From the memory structure,

That is:

  • hello string @17037 (can be understood as an identifier for a segment of memory)
    • Stored in Object@92971 (corresponding to {value: 'hello'})
      • The object is also stored in an array, stored in Array@92961 (corresponding to [ {value: 'hello'} ])
        • The array is stored in the variable testArray. Since it is a global variable, it corresponds to Window@92867.

This also corresponds to the fact that variables declared with var are directly mounted on the Window object's properties.

What is SMI?#

If you are observant enough, you should have noticed that the number 9999 is marked as a SMI number.

DraggedImage-10

The following is GPT4's response:
In the V8 engine of JavaScript (mainly used in Google Chrome and Node.js), SMI (Small Integer) is a special integer representation method. This representation method is mainly used for performance optimization. SMI is a "tagged" integer type, where part of the bits is used to store the actual integer value, while other bits are used to tag this number as SMI type.
The specific implementation may depend on the platform and environment, but on a typical 64-bit system, an SMI may use 31 bits to represent the integer value, while the remaining bits are used for other purposes, such as garbage collection marking, type information, etc. Because these extra bits are used, the range of integers that SMI can represent is smaller than that of typical 32-bit or 64-bit integers.

From GPT's response, it can be seen that an SMI uses 31 bits to represent the integer value, with the remaining 1 bit used for other purposes. This is equivalent to the data being left-shifted by one bit! This corresponds to the memory rule mentioned at the beginning of the article.

However, there remains a question: why does Node.js not use SMI? At least from the above tests, it seems that Node.js stores values directly.

GPT did not provide a good answer to this either; we can only speculate that Node.js has made some modifications to v8 behind the scenes. 🤣

References#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.