首页 > 代码库 > erlang二进制数据垃圾回收机制


erlang二进制数据在内存中有两种存在形式,当数据大小不到 64 bytes,就直接存在进程堆内,如果超过了64 bytes,就被保存到进程外的共享堆里,可以给节点内所有进程共享。

erlang有两种二进制容器:heap binaries和refc binaries。

heap binaries

Heap binaries are small binaries, up to 64 bytes, that are stored directly on the process heap. They will be copied when the process is garbage collected and when they are sent as a message. They don‘t require any special handling by the garbage collector.


针对heap binaries,在R13B03后,erlang还增加了bin vheap来加快二进制数据的回收。

OTP-8202  A new garbage collecting strategy for binaries which is more aggressive than the previous implementation. Binaries now has a virtual binary heap tied to each process. When binaries are created or received to a process it will check if the heap limit has been reached and if a reclaim should be done. This imitates the behavior of ordinary Erlang terms. The virtual heaps are grown and shrunk like ordinary heaps. This will lessen the memory footprint of binaries in a system.

就是说heap binaries的垃圾回收使用了进程堆数据的回收方式,但使用了一个虚拟二进制堆(vheap)来计算这些二进制的使用情况,加快内存回收速度。

refc binaries

Refc binaries consist of two parts: an object stored on the process heap, called a ProcBin, and the binary object itself stored outside all process heaps.The binary object can be referenced by any number of ProcBins from any number of processes; the object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.






最后,说下erlang另外两种二进制数据:sub binary和match context

A sub binary is created by split_binary/2 and when a binary is matched out in a binary pattern. A sub binary is a reference into a part of another binary (refc or heap binary, never into a another sub binary). Therefore, matching out a binary is relatively cheap because the actual binary data is never copied.

A match context is similar to a sub binary, but is optimized for binary matching; for instance, it contains a direct pointer to the binary data. For each field that is matched out of a binary, the position in the match context will be incremented.

前面谈到erlang为避免二进制数据复制带来的时间和空间的开销,erlang这里做得更彻底一点,sub binary和match context其实是引用对象,被用来引用heap binary和refc binary的数据

说到sub binary和match context,这两者有什么区别?

sub binary是一个子二进制数据,从一个二进制分割出来,或匹配一个二进制后产生,具有二进制数据通用的属性和方法;match context是匹配上下文,在erlang进行二进制数据匹配时产生,如果接下来使用了匹配到的二进制数据,那么erlang就将这个match context数据转成sub binary,就是说,match context数据不直接被用户使用,只是erlang用以二进制匹配优化的过程数据


