[FUG-BR] tuning de rede - FreeBSD 7.3

Quarta Fevereiro 23 12:18:30 BRT 2011

Bom dia pessoal.

Tenho um servidor FreeBSD 7.3 com 8 placas de rede com chipset Intel e
Broadcom. Este é o servidor gateway da minha rede no qual rodo:

pf
named (base)
snmp
openbgp

Possuo três sessões BGP Full e uma Partial.

O que acontece é que estou tendo uma performance muito ruim com relação a
placa Broadcom. Possuo um CMTS ligado diretamente na placa sem switch no
meio, e estou tendo perda de pacote até o CMTS. Já realizei troca de cabo e
não resolveu.

gw# ifconfig bce0
bce0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
        ether 1c:c1:de:08:de:90
        inet 10.20.0.1 netmask 0xfffffffc broadcast 10.20.0.3
        media: Ethernet 1000baseTX <full-duplex>
        status: active

Verificando com 'top -S' o uso de interrupção na bce0 é bastante alto. O
tráfego nesta placa passa dos 100Mbps na maior parte do tempo então
consequentemente vai consumir mais cpu. Porém, através de uma sysctl abaixo
consegui fazer com que esse uso fosse reduzido, mas continuo perdendo pacote
até o CMTS:

net.isr.direct=0

No momento o 'top -S' me mostra o seguinte:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   13 root        1 171 ki31     0K     8K CPU1    1  45.3H 96.48% idle:
cpu1
   12 root        1 171 ki31     0K     8K RUN     2  43.0H 90.77% idle:
cpu2
   14 root        1 171 ki31     0K     8K RUN     0  40.0H 77.20% idle:
cpu0
   11 root        1 171 ki31     0K     8K RUN     3  38.1H 73.00% idle:
cpu3
   15 root        1 -44    -     0K     8K WAIT    3  19.8H 52.49% swi1: net
   29 root        1 -68    -     0K     8K WAIT    2 151:40  5.57% irq257:
bce0
   40 root        1 -68    -     0K     8K WAIT    2  85:46  2.10% irq265:
em3

Se eu ativo a sysctl acima, tenho o seguinte:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        1 171 ki31     0K     8K CPU3    3  38.2H 92.87% idle:
cpu3
   13 root        1 171 ki31     0K     8K CPU1    1  45.4H 84.38% idle:
cpu1
   14 root        1 171 ki31     0K     8K RUN     0  40.1H 79.39% idle:
cpu0
   12 root        1 171 ki31     0K     8K CPU2    2  43.1H 62.35% idle:
cpu2
   29 root        1 -68    -     0K     8K WAIT    2 151:55 25.59% irq257:
bce0
   40 root        1 -68    -     0K     8K WAIT    2  85:55 21.58% irq265:
em3

Abaixo, resultado com netstat na interface:

gw# netstat -I bce0 -w 1
            input         (bce0)           output
   packets  errs      bytes    packets  errs      bytes colls
     15311     0    4469692      20459     0   17440474     0
     15631     0    4589699      21200     0   18848188     0
     15608     0    4497235      20683     0   17818101     0
     15529     0    4446912      20156     0   17090265     0
     14414     0    4071791      17597     0   14750674     0
     14713     0    4270162      18578     0   15301508     0
     15030     0    4332616      18052     0   15021321     0
     13900     0    4053945      17033     0   13895997     0
     14095     0    4051387      18936     0   16074572     0
     15515     0    4518106      20720     0   17377395     0
     15606     0    4468322      20386     0   17128122     0
     15494     0    4611991      20409     0   17379323     0
     15375     0    4584624      20574     0   17758892     0
     15610     0    4435950      21340     0   18658094     0
     15277     0    4573331      20444     0   17695037     0

Segue algumas sysctls no qual alterei os valores:

kern.ipc.nmbclusters=65536
kern.ipc.nsfbufs=10240
kern.ipc.maxsockbuf=8388608
net.inet.tcp.rfc1323=1
net.inet.tcp.sendspace=131072
net.inet.tcp.recvspace=131072
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.interrupt=0
kern.ipc.somaxconn=1024
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.link.ether.inet.log_arp_wrong_iface=0

O ping começa com valor baixo depois sobe e o valor não cai denovo. Enquanto
este ping está alto, eu abro uma nova shell e executo outro ping. Ele começa
baixo por uns tempos, menor que 1ms, depois aumenta e iguala ao primeiro
ping executado.

64 bytes from 10.20.0.2: icmp_seq=22 ttl=255 time=0.321 ms
64 bytes from 10.20.0.2: icmp_seq=23 ttl=255 time=0.291 ms
64 bytes from 10.20.0.2: icmp_seq=24 ttl=255 time=0.305 ms
64 bytes from 10.20.0.2: icmp_seq=25 ttl=255 time=0.304 ms
64 bytes from 10.20.0.2: icmp_seq=26 ttl=255 time=0.197 ms
64 bytes from 10.20.0.2: icmp_seq=27 ttl=255 time=0.361 ms
64 bytes from 10.20.0.2: icmp_seq=28 ttl=255 time=0.277 ms
64 bytes from 10.20.0.2: icmp_seq=29 ttl=255 time=0.233 ms
64 bytes from 10.20.0.2: icmp_seq=30 ttl=255 time=10.524 ms
64 bytes from 10.20.0.2: icmp_seq=31 ttl=255 time=23.673 ms
64 bytes from 10.20.0.2: icmp_seq=32 ttl=255 time=44.541 ms
64 bytes from 10.20.0.2: icmp_seq=33 ttl=255 time=10.661 ms
64 bytes from 10.20.0.2: icmp_seq=34 ttl=255 time=6.178 ms
64 bytes from 10.20.0.2: icmp_seq=35 ttl=255 time=7.265 ms
64 bytes from 10.20.0.2: icmp_seq=36 ttl=255 time=5.732 ms
64 bytes from 10.20.0.2: icmp_seq=37 ttl=255 time=5.907 ms
64 bytes from 10.20.0.2: icmp_seq=38 ttl=255 time=8.711 ms
64 bytes from 10.20.0.2: icmp_seq=39 ttl=255 time=19.407 ms
64 bytes from 10.20.0.2: icmp_seq=40 ttl=255 time=65.276 ms

Segue resultado do 'vmstat -i':

interrupt                          total       rate
irq28: ciss0                     2050498         11
irq1: atkbd0                          10          0
irq17: atapci0+                      116          0
irq22: uhci0                           2          0
cpu0: timer                    356305937       2000
irq256: em0                    431705246       2423
irq257: bce0                  2101722530      11797
irq258: bce1                    26351598        147
irq259: em1                       697982          3
irq260: em1                       456930          2
irq261: em1                            2          0
irq262: em2                    410075391       2301
irq263: em2                    461421780       2590
irq264: em2                         4101          0
irq265: em3                   1020986798       5731
irq266: em3                   1083892349       6084
irq267: em3                       483373          2
irq268: bge0                   317083675       1779
irq269: bge1                   256196830       1438
cpu1: timer                    356297628       2000
cpu3: timer                    356297780       2000
cpu2: timer                    356297339       2000
Total                         7538327895      42315

netstat -nm:

3191/4114/7305 mbufs in use (current/cache/total)
3189/3471/6660/65536 mbuf clusters in use (current/cache/total/max)
3188/2700 mbuf+clusters out of packet secondary zone in use (current/cache)
0/119/119/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
7218K/8446K/15664K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/7/10240 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Não sei mais o que fazer neste caso. Já olhei o CMTS e o problema não é lá
pois se eu pego uma estação e pingo ela diretamente no CMTS, não perco
pacote, mas se pingo no ip do servidor FreeBSD, tenho perda.
Acredito que com algum tuning - sysctl - possa resolvar ou melhorar muito o
desempenho. Alguém poderia me informar quais valores e sysctls eu devo
alterar? Se precisar de mais informações posso passar sem problema.

Desde já agradeço.