ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

【RDMA】基于RoCE v1配置PFC

2021-04-10 19:58:31  阅读:302  来源: 互联网

标签:PFC standalone skprio priority v1 switch RDMA master 6bd534


环境:


两台host(各配有一块双端口40Gbps ConnectX-3 网卡,驱动版本为4.1-1.0.2.0,OS为Ubuntu 16.04)

一台32端口Mellanox Spectrum交换机SN2700,onyx版本为3.6.8102.

 

PFC背景知识:

PFC:https://blog.csdn.net/bandaoyu/article/details/115346857

引用Juniper对PFC的介绍,“基于优先级的流控制(PFC,Priority-based flow control),IEEE标准802.1Qbb,是一种链路级流控制机制。该流控制机制与IEEE 802.3x的暂停机制类似,但是暂停的是链路上某个优先级的消息(每个级别是一个虚拟通道,暂停某个虚拟通道),而不是整个链路暂停。PFC允许您根据其类别有选择地暂停流量。”

 

可见,相比于IEEE 802.3x(整个链路),PFC的粒度更小(暂停某个虚拟通道)。因此配置的过程可以理解为将应用流量映射到某一个优先级的过程。根据对流量标记位置的不同,可以分为Trust L2和Trust L3。由于ConnectX-3仅支持RoCE v1,因此本文只关注Trust L2。

在端主机侧,映射关系为:

ToS -> skb_priority -> Vlan-qos (也记为User Priority,即UP,其值为Vlan tag中PCP的值) -> tc。

在交换机侧,映射关系为:

PCP + DEI -> switch-priority -> ingress Port Group (PG)。其中PG包含对PFC阈值的配置。

本文使用tc 4以及switch-priority 4为例。

 

配置过程:


首先配置交换机:

0. 进入配置模式:

switch-6bd534 [standalone: master] > enable
switch-6bd534 [standalone: master] # configure terminal


1. 创建VLAN,并设置交换机端口为hybrid模式:

switch-6bd534 [standalone: master] (config) # vlan 10
switch-6bd534 [standalone: master] (config vlan 10) # exit
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 switchport mode hybrid
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 switchport hybrid allowed-vlan add 10


2. 关闭所有端口的flow control:

switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 flowcontrol send off force
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 flowcontrol receive off force

3.使能priority 4,并在所有端口启用PFC:

switch-6bd534 [standalone: master] (config) # dcb priority-flow-control enable
This action might cause traffic loss while shutting down a port with priority-flow-control mode on
Type 'yes' to confirm enable pfc globally: yes
switch-6bd534 [standalone: master] (config) # dcb priority-flow-control priority 4 enable
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 dcb priority-flow-control mode on force

注:如需关闭PFC

switch-6bd534 [standalone: master] (config) # no dcb priority-flow-control enable
This action might cause traffic loss while shutting down a port with priority-flow-control mode on
Type 'yes' to confirm disable pfc globally: yes
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 no dcb priority-flow-control mode force

4. 修改端口的buffer配置,并做switch-priority和PG buffer之间的映射:

switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg0 map pool iPool0 type lossy reserved 20K shared alpha 8
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg4 map pool iPool0 type lossless reserved 70K xoff 17K xon 17K shared alpha 2
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 egress-buffer  ePort.tc4 map pool ePool0 reserved 1500 shared alpha inf
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg4 bind switch-priority 4

5. 做PCP+DEI到switch-priority的映射:

switch-6bd534 [standalone: master] (config) # qos trust L2
switch-6bd534 [standalone: master] (config) # interface ethernet 1/1-1/32 qos map pcp 4 dei 0 to switch-priority 4


这样,交换机侧就配置好了。

接下来配置端主机:

1. 设置pfctx和pfcrx 参数:

# vim /etc/modprobe.d/mlx4.conf

添加:

options mlx4_en pfctx=0x16 pfcrx=0x16


注意,pfctx和pfcrx均为8 bits的bitmap,使能priority 4即为0x16.

然后重启网卡:

# /etc/init.d/openibd restart

验证:

# RX=`cat /sys/module/mlx4_en/parameters/pfcrx`;printf "0x%x\n" $RX

输出结果为:0x16 即正确。

2. 创建VLAN,并设置IP。

# modprobe 8021q
# vconfig add eth2 10
Added VLAN with VID == 10 to IF -:eth2:-
# ifconfig eth2.10 10.10.10.5/24 up


3. 对TCP/IP流量做skb_priority到UP的映射,将所有skb_priority都映射到UP 4:

# for i in {0..7}; do vconfig set_egress_map eth2.10 $i 4 ; done
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10
Set egress mapping on device -:eth2.10:- Should be visible in /proc/net/vlan/eth2.10


4. 对不经过内核的流量,即RDMA流量,做skb_priority到UP的映射,将所有skb_priority都映射到UP 4:

# tc_wrap.py -i eth2 -u 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
skprio2up is available only for RoCE in kernels that don't support set_egress_map
Traffic classes are set to 8
UP  0
UP  1
UP  2
UP  3
UP  4
        skprio: 0
        skprio: 1
        skprio: 2 (tos: 8)
        skprio: 3
        skprio: 4 (tos: 24)
        skprio: 5
        skprio: 6 (tos: 16)
        skprio: 7
        skprio: 8
        skprio: 9
        skprio: 10
        skprio: 11
        skprio: 12
        skprio: 13
        skprio: 14
        skprio: 15
        skprio: 0 (vlan 10)
        skprio: 1 (vlan 10)
        skprio: 2 (vlan 10 tos: 8)
        skprio: 3 (vlan 10)
        skprio: 4 (vlan 10 tos: 24)
        skprio: 5 (vlan 10)
        skprio: 6 (vlan 10 tos: 16)
        skprio: 7 (vlan 10)
UP  5
UP  6
UP  7


5. 做UP到TC的映射,将UP 4映射到TC 4,其他UP各自映射到相应的TC,并开启priority 4上的PFC:

# mlnx_qos -i eth2 -p 0,1,2,3,4,5,6,7 -f 0,0,0,0,1,0,0,0
Priority trust mode is not supported on your system
Priority trust mode: none
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   1   0   0   0
 
tc: 0 ratelimit: unlimited, tsa: vendor
         priority:  0
tc: 1 ratelimit: unlimited, tsa: vendor
         priority:  1
tc: 2 ratelimit: unlimited, tsa: vendor
         priority:  2
tc: 3 ratelimit: unlimited, tsa: vendor
         priority:  3
tc: 4 ratelimit: unlimited, tsa: vendor
         priority:  4
tc: 5 ratelimit: unlimited, tsa: vendor
         priority:  5
tc: 6 ratelimit: unlimited, tsa: vendor
         priority:  6
tc: 7 ratelimit: unlimited, tsa: vendor
         priority:  7


这样就都配置完成了。

最后,保存配置,防止重启失效:

switch-6bd534 [standalone: master] (config) # write memory

验证
用ib_write_bw测试(使用rdma_cm建立连接),一台做sender,一台做receiver。

receiver:

$ ib_write_bw -d mlx4_0 -i 2 -x 2 -S 4 --report_gbits -D 10
sender:

$ ib_write_bw 10.10.10.6 -d mlx4_0 -i 2 -x 2 -S 4 --report_gbits -D 10


然后在交换机上查看PG4是否接收到了数据:

switch-6bd534 [standalone: master] (config) # show interfaces ethernet 1/5 counters pg 4
 
PG 4:
  44321827              packets
  48853700404           bytes
  0                     queue depth
  0                     no buffer discard
  0                     shared buffer discard


或者查看PFC (注意,并不一定会触发PFC)

 

switch-6bd534 [standalone: master] (config) # show interfaces ethernet 1/5 counters pfc prio 4
 
PFC 4:
  Rx:
    0                     pause packets
    0                     pause duration
 
  Tx:
    18                    pause packets
    4                     pause duration


在端主机侧查看priority 4的counter:

$ ethtool -S eth2 | grep prio_4
     rx_pause_prio_4: 88
     rx_pause_duration_prio_4: 0
     rx_pause_transition_prio_4: 0
     tx_pause_prio_4: 0
     tx_pause_duration_prio_4: 11
     tx_pause_transition_prio_4: 44
     rx_prio_4_packets: 9155756
     rx_prio_4_bytes: 752828084
     tx_prio_4_packets: 862787989
     tx_prio_4_bytes: 950840867498
 

参考:

HowTo Run RoCE over L2 Enabled with PFC 

How to Enable PFC on Mellanox Switches (Spectrum)

HowTo Configure PFC on ConnectX-4

Mellanox support

原文链接:https://blog.csdn.net/u013431916/article/details/82385641

标签:PFC,standalone,skprio,priority,v1,switch,RDMA,master,6bd534
来源: https://blog.csdn.net/bandaoyu/article/details/115582637

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有