Welcome back to the series of common mistakes in Golang. In this article, we will learn about the causes of memory leaks related to slices and arrays in Go. Unlike C/C++, Go has GC, so we don’t need to care about memory allocation or release. However, it is because of the help of GC that we need to understand how GC works to prevent unintentional memory leaks. And in this article, we will learn about leaking memory due to the use of slicing slices.
1 Scenario
Suppose we have a service consumer, this service will receive data in the form of slices, the first 5 positions of the slice are the type of the data. Service will take the first 5 elements of data to perform some function.
Below is a simple example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | <span class="token keyword">package</span> main <span class="token keyword">import</span> <span class="token punctuation">(</span> <span class="token string">"fmt"</span> <span class="token string">"runtime"</span> <span class="token punctuation">)</span> <span class="token keyword">func</span> <span class="token function">main</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token comment">// Start</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> dataReceived <span class="token operator">:=</span> <span class="token function">make</span> <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">,</span> <span class="token number">1024</span> <span class="token operator">*</span> <span class="token number">1024</span> <span class="token punctuation">)</span> <span class="token comment">// Giả sử mình nhận được data slice có dung lượng là 1 MB</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> typeData <span class="token operator">:=</span> <span class="token function">getTypeOfData</span> <span class="token punctuation">(</span> dataReceived <span class="token punctuation">)</span> <span class="token comment">// Lấy 5 phần tử đầu tiên của data</span> <span class="token comment">// Do something with typeData</span> <span class="token comment">// storeTypeDataInCache(typeData)</span> runtime <span class="token punctuation">.</span> <span class="token function">GC</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token comment">// Chạy GC thu gom</span> runtime <span class="token punctuation">.</span> <span class="token function">KeepAlive</span> <span class="token punctuation">(</span> typeData <span class="token punctuation">)</span> <span class="token comment">// Giữ lại typeData variable</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token comment">// End</span> <span class="token punctuation">}</span> <span class="token keyword">func</span> <span class="token function">getTypeOfData</span> <span class="token punctuation">(</span> data <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">{</span> <span class="token keyword">return</span> data <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token number">5</span> <span class="token punctuation">]</span> <span class="token comment">// thực hiện slicing slice</span> <span class="token punctuation">}</span> <span class="token keyword">func</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">var</span> m runtime <span class="token punctuation">.</span> MemStats runtime <span class="token punctuation">.</span> <span class="token function">ReadMemStats</span> <span class="token punctuation">(</span> <span class="token operator">&</span> m <span class="token punctuation">)</span> fmt <span class="token punctuation">.</span> <span class="token function">Printf</span> <span class="token punctuation">(</span> <span class="token string">"%d MBn"</span> <span class="token punctuation">,</span> m <span class="token punctuation">.</span> Alloc <span class="token operator">/</span> <span class="token number">1024</span> <span class="token operator">/</span> <span class="token number">1024</span> <span class="token punctuation">)</span> <span class="token punctuation">}</span> |
- Here for simplicity, I will initialize dataReceived with a capacity of 1MB (You can imagine that we receive 1MB of data from a request or another service :v).
- After receiving the dataReceived , I will proceed to get the first 5 elements of the slice by slicing like the code above.
- Finally, I will use runtime.KeepAlive to keep the typeData from being collected by the GC, to illustrate how I will save the typeData in the memory cache of the program.
- After each step, I will print the memory of the process to see how much space the process currently consumes.
The code has no problems and is pretty easy to understand. Before running the program, you often guess how much memory at start and end. Theoretically, dataReceived has about 1 million elements, which takes up 1MB of memory, typeData with 5 words, takes up about 5 bytes he. Run the program to see if it is correct.
Europe shit, memory when receiving data and after GC collects is the same. So not 5 bytes as we expected. So does it mean that typeData is holding the space as 1MB? So, if the service receives 1 thousand data like this, it needs 1GB of memory. At this point, you may think that one is the code that has problems, the other is that our assumption above is wrong. So let’s find out why.
2. How slice work
First, we need to understand how slices work.
Slices in Golang are fat pointers. You can read this article to understand more about fat pointers. The structure of the slice includes:
1 2 3 4 5 6 | <span class="token keyword">type</span> SliceHeader <span class="token keyword">struct</span> <span class="token punctuation">{</span> Data <span class="token builtin">uintptr</span> <span class="token comment">// đỉa chỉ trong bộ nhớ của con trỏ trỏ tới underlying array của slice. </span> Len <span class="token builtin">int</span> <span class="token comment">// độ dài của slice.</span> Cap <span class="token builtin">int</span> <span class="token comment">// kích thước tối đa mà vùng nhớ trỏ tới slice được cấp phát.</span> <span class="token punctuation">}</span> |
Basically, you can understand that when we have a slice, we have a pointer to the underlying array of that slice.
3. Reason
After understanding how slices work, let’s go back to visualize the problem above.
dataReceived when init, you can imagine it like this. An array of 1 million elements will be allocated in memory, and dataReceived will point a pointer to it.
Then we create one more slice of typeData from slice dataReceived by slicing method. When using the slicing method, instead of creating a new underlying array, Go will point a pointer to that existing underlying array as shown below. Then, our typeData , although it has only 5 elements, its capacity is 1M elements.
Finally when the process ends.
We keepAlive typeData , and dataReceived will be collected by GC. But since the underlying array of dataReceived is still being pointed to by typeData , the GC won’t collect it, but it will persist in memory until there are no more slices pointing to it. And this is why after the process ends, they still see 1MB exists in memory. So what is the solution to this problem?
4. Solution
Above 2 slices point to the same underlying array, so if we now separate those 2 slices into 2 separate underlying arrays, that will solve the problem. Then the GC will collect the dataReceived completely and keep only the typeData .
To implement this solution, we will use copy slice method instead of slicing slice as before.
Because we are using copy method, typeData will have length 5 and capacity 5 instead of 1M, storing 5 bytes in memory instead of 1MB as before.
Modify the getTypeOfData function.
1 2 3 4 5 6 | <span class="token keyword">func</span> <span class="token function">getTypeOfData</span> <span class="token punctuation">(</span> data <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">{</span> dataType <span class="token operator">:=</span> <span class="token function">make</span> <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">,</span> <span class="token number">5</span> <span class="token punctuation">)</span> <span class="token function">copy</span> <span class="token punctuation">(</span> dataType <span class="token punctuation">,</span> data <span class="token punctuation">)</span> <span class="token keyword">return</span> dataType <span class="token punctuation">}</span> |
Run it again and see the results.
Oops, The result after we update the code is 0 MB. This is because my function printAlloc() has divided the byte capacity by 1024*1024 to convert the capacity to MB, so it will round the float. I will modify the function a bit to let it print the amount of bytes.
1 2 3 4 5 6 | <span class="token keyword">func</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">var</span> m runtime <span class="token punctuation">.</span> MemStats runtime <span class="token punctuation">.</span> <span class="token function">ReadMemStats</span> <span class="token punctuation">(</span> <span class="token operator">&</span> m <span class="token punctuation">)</span> fmt <span class="token punctuation">.</span> <span class="token function">Printf</span> <span class="token punctuation">(</span> <span class="token string">"%d bytes - %d MB n"</span> <span class="token punctuation">,</span> m <span class="token punctuation">.</span> Alloc <span class="token punctuation">,</span> m <span class="token punctuation">.</span> Alloc <span class="token operator">/</span> <span class="token number">1024</span> <span class="token operator">/</span> <span class="token number">1024</span> <span class="token punctuation">)</span> <span class="token punctuation">}</span> |
Run again we will get the result.
When the process initializes, the memory is 104800 bytes, and after the process ends, the memory is 110408 bytes, a difference of about 5608 bytes. And in this 5608 bytes will exist our 5 bytes storing typeData . Because when running the program, there will be additional parts that are initialized and allocated memory below that we can’t see, so the number of memory will be slightly different. But it’s nice to see that the memory capacity is no longer 1MB as before, isn’t it :v.
5. Recap
In short, keep in mind that slicing a large slice or array can lead to memory leaks. Underlying array will not be collected by the GC as long as there is a pointer to it. And we can keep the very large underlying array in memory while using only a few elements of that underlying array. And copy slice is the solution to avoid this situation.
6. References
Harsanyi, T. (2022) 100 go mistakes. Shelter Island: Manning Publications.