RFS 理解

首页 > 代码库 > RFS 理解

2024-08-09 16:50:11 213人阅读

1.背景

网卡接收一个数据包的情况下，会经过三个阶段：

- 网卡产生硬件中断通知CPU有包到达

- 通过软中断处理此数据包

- 在用户态程序处理此数据包

在SMP体系下，这三个阶段有可能在3个不同的CPU上处理，如下图所示：

而RFS的目标就是增加CPU缓存的命中率从而提高网络延迟。当使用RFS后，其效果如下：

2.实现原理

当用户程序调用 revmsg() 或者 sendmsg()的时候，RFS会将此用户程序运行的CPU id存入hash表；

而当有关用户程序的数据包到达的时候，RFS尝试从hash表中取出相应的CPU id, 并将数据包放置

到此CPU的队列，从而对性能进行优化。

3.重要数据结构

/** The rps_sock_flow_table contains mappings of flows to the last CPU* on which they were processed by the application (set in recvmsg).*/struct rps_sock_flow_table {    unsigned int mask;    u16 ents[0];};#define RPS_SOCK_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_sock_flow_table) + \    ((_num) * sizeof(u16)))

View Code

结构体 rps_sock_flow_table 实现了一个hash表，RFS会将其声明一个全局变量用于存放所有sock对应的CPU。

/** The rps_dev_flow structure contains the mapping of a flow to a CPU, the* tail pointer for that CPU‘s input queue at the time of last enqueue, and* a hardware filter index.*/struct rps_dev_flow {    u16 cpu;     //此链路上次使用的cpu    u16 filter;    unsigned int last_qtail;   //此设备队列入队的sk_buff的个数};#define RPS_NO_FILTER 0xffff/** The rps_dev_flow_table structure contains a table of flow mappings.*/struct rps_dev_flow_table {    unsigned int mask;    struct rcu_head rcu;    struct rps_dev_flow flows[0]; //实现hash表};     #define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \    ((_num) * sizeof(struct rps_dev_flow)))

View Code

结构体 rps_dev_flow_table 是针对一个设备队列

4.具体实现

用户程序使用revmsg() 或者 sendmsg()的时候设置CPU id。

int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,         size_t size, int flags){      struct sock *sk = sock->sk;    int addr_len = 0;    int err;       sock_rps_record_flow(sk);   //设置CPU id       err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,                   flags & ~MSG_DONTWAIT, &addr_len);    if (err >= 0)        msg->msg_namelen = addr_len;    return err;}EXPORT_SYMBOL(inet_recvmsg);

View Code

当有数据包进行了响应后，会调用get_rps_cpu()选择合适的CPU id。其关键代码如下：

3117     hash = skb_get_hash(skb);3118     if (!hash)3119         goto done;31203121     flow_table = rcu_dereference(rxqueue->rps_flow_table);     //设备队列的hash表3122     sock_flow_table = rcu_dereference(rps_sock_flow_table);    //全局的hash表3123     if (flow_table && sock_flow_table) {3124         u16 next_cpu;3125         struct rps_dev_flow *rflow;31263127         rflow = &flow_table->flows[hash & flow_table->mask];3128         tcpu = rflow->cpu;  31293130         next_cpu = sock_flow_table->ents[hash & sock_flow_table->mask];   //得到用户程序运行的CPU id31313132         /*3133          * If the desired CPU (where last recvmsg was done) is3134          * different from current CPU (one in the rx-queue flow3135          * table entry), switch if one of the following holds:3136          *   - Current CPU is unset (equal to RPS_NO_CPU).3137          *   - Current CPU is offline.3138          *   - The current CPU‘s queue tail has advanced beyond the3139          *     last packet that was enqueued using this table entry.3140          *     This guarantees that all previous packets for the flow3141          *     have been dequeued, thus preserving in order delivery.3142          */3143         if (unlikely(tcpu != next_cpu) &&3144             (tcpu == RPS_NO_CPU || !cpu_online(tcpu) ||3145              ((int)(per_cpu(softnet_data, tcpu).input_queue_head -3146               rflow->last_qtail)) >= 0)) {3147             tcpu = next_cpu;3148             rflow = set_rps_cpu(dev, skb, rflow, next_cpu);3149         }31503151         if (tcpu != RPS_NO_CPU && cpu_online(tcpu)) {3152             *rflowp = rflow;3153             cpu = tcpu;3154             goto done;3155         }3156     }

View Code

上面的代码中第3145行比较难理解，数据结构 softnet_data用于管理进出的流量，他有两个关键的变量：

2374 #ifdef CONFIG_RPS2375     /* Elements below can be accessed between CPUs for RPS */2376     struct call_single_data csd ____cacheline_aligned_in_smp;2377     struct softnet_data *rps_ipi_next;2378     unsigned int        cpu;2379     unsigned int        input_queue_head;   //队列头，也可以理解为出队的位置2380     unsigned int        input_queue_tail;     //队列尾，也可以理解为入队的位置 2381 #endif

View Code

表达式 (int)(per_cpu(softnet_data, tcpu).input_queue_head 求出了在tcpu 这个CPU上的出队数目，而rflow->last_qtail

代表设备队列上此sock对应的最后入队的位置，如果出队数目大于入队数目，那么说明这一链路上的包都处理完毕，不会

出现乱序处理的包。第3143的if 语句就是为了防止乱序包的出现，假如是多进程或者多线程同时处理一个socket，那么此

socket对应的CPU id就会不停变化。

参考文献：

http://www.pagefault.info/?p=115

http://syuu.dokukino.com/2013/05/linux-kernel-features-for-high-speed.html

https://www.kernel.org/doc/Documentation/networking/scaling.txt

RFS 理解

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > RFS 理解

RFS 理解

看完仍有疑问？有类似问题直接问程序猿