首页 > 代码库 > OTP supervisor的monitor_child是否有漏洞
OTP supervisor的monitor_child是否有漏洞
问题描述
OTP的supervisor中为了防止淘气的Child从link的另一端断掉link,supervisor会在shutdown child之前unlink(Child)并切换为monitor状态,这样supervisor对Child的监控将无法被Chlid终止。这段代码是由monitor_child/1实现的,其具体实现代码如下:
872 %% Help function to shutdown/2 switches from link to monitor approach 873 monitor_child(Pid) -> 874 875 %% Do the monitor operation first so that if the child dies 876 %% before the monitoring is done causing a ‘DOWN‘-message with 877 %% reason noproc, we will get the real reason in the ‘EXIT‘-message 878 %% unless a naughty child has already done unlink... 879 erlang:monitor(process, Pid), 880 unlink(Pid), 881 882 receive 883 %% If the child dies before the unlink we must empty 884 %% the mail-box of the ‘EXIT‘-message and the ‘DOWN‘-message. 885 {‘EXIT‘, Pid, Reason} -> 886 receive 887 {‘DOWN‘, _, process, Pid, _} -> 888 {error, Reason} 889 end 890 after 0 -> 891 %% If a naughty child did unlink and the child dies before 892 %% monitor the result will be that shutdown/2 receives a 893 %% ‘DOWN‘-message with reason noproc. 894 %% If the child should die after the unlink there 895 %% will be a ‘DOWN‘-message with a correct reason 896 %% that will be handled in shutdown/2. 897 ok 898 end
但是这里我们会发现一个问题,unlink后monitor_child/1有一段奇怪的代码:
882 receive 883 %% If the child dies before the unlink we must empty 884 %% the mail-box of the ‘EXIT‘-message and the ‘DOWN‘-message. 885 {‘EXIT‘, Pid, Reason} -> 886 receive 887 {‘DOWN‘, _, process, Pid, _} -> 888 {error, Reason} 889 end 890 after 0 ->
注释的意思是,如果在unlink之前child已经死掉,则 ‘EXIT‘消息中的Reason才是真实的Reason ,而之后因monitor/2而产生的‘DOWN‘消息会因为无法找到目标进程而变为noproc. 但是这里就存在一个问题:receive语句在扫描信箱后立刻就退出了,但是有可能unlink之前的‘EXIT‘消息此时 并未到达。
问题解决
supervisor究竟是否存在这个问题呢?Erlang OTP的文档中对unlink/1是这样描述的:
Once unlink(Id) has returned it is guaranteed that the link between the caller and the entity referred to by Id has no effect on the caller in the future (unless the link is setup again). If caller is trapping exits, an {‘EXIT‘, Id, _} message due to the link might have been placed in the caller‘s message queue prior to the call, though. Note, the {‘EXIT‘, Id, _} message can be the result of the link, but can also be the result of Id calling exit/2. Therefore, it may be appropriate to cleanup the message queue when trapping exits after the call to unlink(Id), as follow: unlink(Id), receive {‘EXIT‘, Id, _} -> true after 0 -> true end Note: Prior to OTP release R11B (erts version 5.5) unlink/1 behaved completely asynchronous, i.e., the link was active until the "unlink signal" reached the linked entity. This had one undesirable effect, though. You could never know when you were guaranteed not to be effected by the link. Current behavior can be viewed as two combined operations: asynchronously send an "unlink signal" to the linked entity and ignore any future results of the link.
从最后一句话中,我们可以看出——新版本的unlink/1的语义中不仅包含断开link,同时包含不再接收‘EXIT‘信号。所以unlink/1后如果信箱中还有‘EXIT‘信号,那一定是 unlink/1真正生效之前到达 的。也就是说,不可能存在unlink/1之后到达的‘EXIT‘消息,也就不会出现之前分析的‘EXIT‘信号堆积问题。
一开始还在怀疑Erlang的实现怎么如此不严谨,原来Erlang的代码看似简单,但是底层的代码事实上是仔细考虑了许多问题的。
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。