Mistakes when using slicing slices in Golang

Chủ Nhật, 08/01/2023

Tram Ho

Welcome back to the series of common mistakes in Golang. In this article, we will learn about the causes of memory leaks related to slices and arrays in Go. Unlike C/C++, Go has GC, so we don’t need to care about memory allocation or release. However, it is because of the help of GC that we need to understand how GC works to prevent unintentional memory leaks. And in this article, we will learn about leaking memory due to the use of slicing slices.

1 Scenario

Suppose we have a service consumer, this service will receive data in the form of slices, the first 5 positions of the slice are the type of the data. Service will take the first 5 elements of data to perform some function.

Below is a simple example.

<span class="token keyword">package</span> main

<span class="token keyword">import</span> <span class="token punctuation">(</span>
	<span class="token string">"fmt"</span>
	<span class="token string">"runtime"</span>
<span class="token punctuation">)</span>

<span class="token keyword">func</span> <span class="token function">main</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">{</span>
	<span class="token comment">// Start</span>
	<span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span>
	
	dataReceived <span class="token operator">:=</span> <span class="token function">make</span> <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">,</span> <span class="token number">1024</span> <span class="token operator">*</span> <span class="token number">1024</span> <span class="token punctuation">)</span> <span class="token comment">// Giả sử mình nhận được data slice có dung lượng là 1 MB</span>
	<span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span>

	typeData <span class="token operator">:=</span> <span class="token function">getTypeOfData</span> <span class="token punctuation">(</span> dataReceived <span class="token punctuation">)</span> <span class="token comment">// Lấy 5 phần tử đầu tiên của data</span>
	<span class="token comment">// Do something with typeData</span>
	<span class="token comment">// storeTypeDataInCache(typeData)</span>

	runtime <span class="token punctuation">.</span> <span class="token function">GC</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span>                <span class="token comment">// Chạy GC thu gom</span>
	runtime <span class="token punctuation">.</span> <span class="token function">KeepAlive</span> <span class="token punctuation">(</span> typeData <span class="token punctuation">)</span> <span class="token comment">// Giữ lại typeData variable</span>
	<span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span>
	<span class="token comment">// End</span>
<span class="token punctuation">}</span>

<span class="token keyword">func</span> <span class="token function">getTypeOfData</span> <span class="token punctuation">(</span> data <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">{</span>
	<span class="token keyword">return</span> data <span class="token punctuation">[</span> <span class="token punctuation">:</span> <span class="token number">5</span> <span class="token punctuation">]</span> <span class="token comment">// thực hiện slicing slice</span>
<span class="token punctuation">}</span>

<span class="token keyword">func</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">{</span>
	<span class="token keyword">var</span> m runtime <span class="token punctuation">.</span> MemStats
	runtime <span class="token punctuation">.</span> <span class="token function">ReadMemStats</span> <span class="token punctuation">(</span> <span class="token operator">&amp;</span> m <span class="token punctuation">)</span>
	fmt <span class="token punctuation">.</span> <span class="token function">Printf</span> <span class="token punctuation">(</span> <span class="token string">"%d MBn"</span> <span class="token punctuation">,</span> m <span class="token punctuation">.</span> Alloc <span class="token operator">/</span> <span class="token number">1024</span> <span class="token operator">/</span> <span class="token number">1024</span> <span class="token punctuation">)</span>
<span class="token punctuation">}</span>

package main

import (

"fmt"

"runtime"

)

func main ( ) {

// Start

printAlloc ( )

dataReceived := make ( [ ] byte , 1024 * 1024 ) // Giả sử mình nhận được data slice có dung lượng là 1 MB

printAlloc ( )

typeData := getTypeOfData ( dataReceived ) // Lấy 5 phần tử đầu tiên của data

// Do something with typeData

// storeTypeDataInCache(typeData)

runtime . GC ( ) // Chạy GC thu gom

runtime . KeepAlive ( typeData ) // Giữ lại typeData variable

printAlloc ( )

// End

}

func getTypeOfData ( data [ ] byte ) [ ] byte {

return data [ : 5 ] // thực hiện slicing slice

}

func printAlloc ( ) {

var m runtime . MemStats

runtime . ReadMemStats ( & m )

fmt . Printf ( "%d MBn" , m . Alloc / 1024 / 1024 )

}

Here for simplicity, I will initialize dataReceived with a capacity of 1MB (You can imagine that we receive 1MB of data from a request or another service :v).
After receiving the dataReceived , I will proceed to get the first 5 elements of the slice by slicing like the code above.
Finally, I will use runtime.KeepAlive to keep the typeData from being collected by the GC, to illustrate how I will save the typeData in the memory cache of the program.
After each step, I will print the memory of the process to see how much space the process currently consumes.

The code has no problems and is pretty easy to understand. Before running the program, you often guess how much memory at start and end. Theoretically, dataReceived has about 1 million elements, which takes up 1MB of memory, typeData with 5 words, takes up about 5 bytes he. Run the program to see if it is correct.

Europe shit, memory when receiving data and after GC collects is the same. So not 5 bytes as we expected. So does it mean that typeData is holding the space as 1MB? So, if the service receives 1 thousand data like this, it needs 1GB of memory. At this point, you may think that one is the code that has problems, the other is that our assumption above is wrong. So let’s find out why.

2. How slice work

First, we need to understand how slices work.

Slices in Golang are fat pointers. You can read this article to understand more about fat pointers. The structure of the slice includes:

<span class="token keyword">type</span>  SliceHeader  <span class="token keyword">struct</span> <span class="token punctuation">{</span>
    Data  <span class="token builtin">uintptr</span>  <span class="token comment">// đỉa chỉ trong bộ nhớ của con trỏ trỏ tới underlying array của slice. </span>
    Len   <span class="token builtin">int</span>      <span class="token comment">// độ dài của slice.</span>
    Cap   <span class="token builtin">int</span>      <span class="token comment">// kích thước tối đa mà vùng nhớ trỏ tới slice được cấp phát.</span>
<span class="token punctuation">}</span>

type SliceHeader struct {

Data uintptr // đỉa chỉ trong bộ nhớ của con trỏ trỏ tới underlying array của slice.

Len int // độ dài của slice.

Cap int // kích thước tối đa mà vùng nhớ trỏ tới slice được cấp phát.

}

Basically, you can understand that when we have a slice, we have a pointer to the underlying array of that slice.

3. Reason

After understanding how slices work, let’s go back to visualize the problem above.

dataReceived when init, you can imagine it like this. An array of 1 million elements will be allocated in memory, and dataReceived will point a pointer to it.

Then we create one more slice of typeData from slice dataReceived by slicing method. When using the slicing method, instead of creating a new underlying array, Go will point a pointer to that existing underlying array as shown below. Then, our typeData , although it has only 5 elements, its capacity is 1M elements.

Finally when the process ends.

We keepAlive typeData , and dataReceived will be collected by GC. But since the underlying array of dataReceived is still being pointed to by typeData , the GC won’t collect it, but it will persist in memory until there are no more slices pointing to it. And this is why after the process ends, they still see 1MB exists in memory. So what is the solution to this problem?

4. Solution

Above 2 slices point to the same underlying array, so if we now separate those 2 slices into 2 separate underlying arrays, that will solve the problem. Then the GC will collect the dataReceived completely and keep only the typeData .

To implement this solution, we will use copy slice method instead of slicing slice as before.

Because we are using copy method, typeData will have length 5 and capacity 5 instead of 1M, storing 5 bytes in memory instead of 1MB as before.

Modify the getTypeOfData function.

<span class="token keyword">func</span> <span class="token function">getTypeOfData</span> <span class="token punctuation">(</span> data <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">{</span>
	dataType <span class="token operator">:=</span> <span class="token function">make</span> <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">]</span> <span class="token builtin">byte</span> <span class="token punctuation">,</span> <span class="token number">5</span> <span class="token punctuation">)</span>
	<span class="token function">copy</span> <span class="token punctuation">(</span> dataType <span class="token punctuation">,</span> data <span class="token punctuation">)</span>
	<span class="token keyword">return</span> dataType
<span class="token punctuation">}</span>

dataType := make ( [ ] byte , 5 )

copy ( dataType , data )

return dataType

}

Run it again and see the results.

Oops, The result after we update the code is 0 MB. This is because my function printAlloc() has divided the byte capacity by 1024*1024 to convert the capacity to MB, so it will round the float. I will modify the function a bit to let it print the amount of bytes.

<span class="token keyword">func</span> <span class="token function">printAlloc</span> <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">{</span>
	<span class="token keyword">var</span> m runtime <span class="token punctuation">.</span> MemStats
	runtime <span class="token punctuation">.</span> <span class="token function">ReadMemStats</span> <span class="token punctuation">(</span> <span class="token operator">&amp;</span> m <span class="token punctuation">)</span>
	fmt <span class="token punctuation">.</span> <span class="token function">Printf</span> <span class="token punctuation">(</span> <span class="token string">"%d bytes - %d MB n"</span> <span class="token punctuation">,</span> m <span class="token punctuation">.</span> Alloc <span class="token punctuation">,</span> m <span class="token punctuation">.</span> Alloc <span class="token operator">/</span> <span class="token number">1024</span> <span class="token operator">/</span> <span class="token number">1024</span> <span class="token punctuation">)</span>
<span class="token punctuation">}</span>

var m runtime . MemStats

fmt . Printf ( "%d bytes - %d MB n" , m . Alloc , m . Alloc / 1024 / 1024 )

}

Run again we will get the result.

When the process initializes, the memory is 104800 bytes, and after the process ends, the memory is 110408 bytes, a difference of about 5608 bytes. And in this 5608 bytes will exist our 5 bytes storing typeData . Because when running the program, there will be additional parts that are initialized and allocated memory below that we can’t see, so the number of memory will be slightly different. But it’s nice to see that the memory capacity is no longer 1MB as before, isn’t it :v.

5. Recap

In short, keep in mind that slicing a large slice or array can lead to memory leaks. Underlying array will not be collected by the GC as long as there is a pointer to it. And we can keep the very large underlying array in memory while using only a few elements of that underlying array. And copy slice is the solution to avoid this situation.

6. References

Harsanyi, T. (2022) 100 go mistakes. Shelter Island: Manning Publications.

Chia sẻ bài viết ngay

Nguồn bài viết : Viblo

Mistakes when using slicing slices in Golang

1 Scenario

2. How slice work

3. Reason

4. Solution

5. Recap

6. References

TikTok trở thành nền tảng mảng xã hội lớn thứ hai ở Nam Phi

Mất giá nhanh nhất sau 9 tháng ra mắt, iPhone 14 Pro Max tiếp tục phá đáy tại Việt Nam

Hướng dẫn sử dụng ngôn ngữ R cho người mới bắt đầu

10 plugin cần thiết của SublimeText dành cho các lập trình viên JavaScript