[FUG-BR] [off-topic] segunda tentativa de migração manicomio-share para FreeBSD [RESOLVIDO]

Marcelo Gondim gondim em bsdinfo.com.br
Domingo Janeiro 13 23:54:59 BRST 2013


Em 13/01/13 22:59, Antônio Pessoa escreveu:
> 2013/1/13 Marcelo Gondim <gondim em bsdinfo.com.br>
>> Pessoal,
>>
>> Acho que descobri algo que pode estar causando todo o problema. Após
>> colocar o KVM-IP e agora também tenho percebido melhor nos logs o seguinte:
>>
>> MCA: Bank 8, Status 0xcc1949000001009f
>> MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
>> MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 0
>> MCA: CPU 0 COR (25892) OVER RD channel ?? memory error
>> MCA: Address 0x5480c7b40
>> MCA: Misc 0x4670220100010386
>>
>> Essa mensagem vira e mexe dá e quando o mysql dispara na cpu elas
>> aparecem. Pelo que estou percebendo isso pode ser problema com algum
>> banco de memória do servidor. Estou correto?
>> Até os filhos do apache estão sendo assassinados com essas mensagens:
>>
>> [Wed Jan 09 23:49:40 2013] [notice] child pid 54806 exit signal Illegal
>> instruction (4)
>> [Wed Jan 09 23:49:40 2013] [notice] child pid 54308 exit signal Illegal
>> instruction (4)
>> [Wed Jan 09 23:49:40 2013] [notice] child pid 53252 exit signal Illegal
>> instruction (4)
>> [Wed Jan 09 23:49:40 2013] [notice] child pid 53120 exit signal Illegal
>> instruction (4)
>>
>> E tipo já corrompeu uma base mysql uma vez e uma partição me obrigando à
>> entrar em fsck manual. Também aconteceu de no meio do boot rebootar e
>> umas duas vezes travar na ACPI e ficar quase 1 hora pra sair.
>>
>> Pedi para checarem a memória do servidor. Vamos ver, depois dessa ainda
>> existe luz no fim do túnel. rsrsrsrs
>
>
> Você tem condições de executar o memtest completo nesse servidor?
> Seria interessante, mesmo com o resultado do suporte do data center.
Ummm vou tentar. O problema também é que o suporte do datacenter não é 
tão bom, eles demoram muito e eles estão 7 horas na nossa frente.
Ainda bem que não é comum ter essas paradas, só fiz dessa vez para 
tentar migrar para o FreeBSD e acho que acabei descobrindo um problema 
no Hardware.
Também fiz umas mexidas de tunning. Abaixo como estão:

sysctl.conf:
=========
kern.ipc.somaxconn=4096
kern.ipc.shmall=262144
net.inet.ip.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.icmp.maskrepl=0
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.drop_synfin=1
net.inet.udp.blackhole=1
net.inet.tcp.blackhole=2
net.inet6.icmp6.nodeinfo=0
net.inet6.ip6.use_tempaddr=1
net.inet6.ip6.prefer_tempaddr=1
net.inet6.icmp6.rediraccept=0
net.inet.ip.fw.dyn_max=65536
net.inet.icmp.icmplim=500

loader.conf:
==========
loader_logo="beastie"
kern.maxusers=1024
kern.ipc.nmbclusters=32768
kern.ipc.semmnu=256
kern.ipc.semmns=1024
kern.ipc.semmni=520
kern.ipc.semume=100
kern.ipc.shmmni=256
kern.ipc.msgseg=32767
kern.ipc.msgssz=32
kern.ipc.msgmnb=65535
kern.ipc.msgtql=2046

netstat -m:
=========
8659/13361/22020 mbufs in use (current/cache/total)
8551/4127/12678/32768 mbuf clusters in use (current/cache/total/max)
8551/4121 mbuf+clusters out of packet secondary zone in use (current/cache)
89/905/994/16384 4k (page size) jumbo clusters in use 
(current/cache/total/max)
0/0/0/8192 9k jumbo clusters in use (current/cache/total/max)
0/0/0/4096 16k jumbo clusters in use (current/cache/total/max)
19622K/15214K/34837K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
681 requests for I/O initiated by sendfile
0 calls to protocol drain routines

ipcs -a:
======
Message Queues:
T           ID          KEY MODE        OWNER    GROUP    CREATOR 
CGROUP                 CBYTES                 QNUM QBYTES        
LSPID        LRPID STIME    RTIME    CTIME

Shared Memory:
T           ID          KEY MODE        OWNER    GROUP    CREATOR 
CGROUP         NATTCH        SEGSZ         CPID         LPID ATIME    
DTIME    CTIME

Semaphores:
T           ID          KEY MODE        OWNER    GROUP    CREATOR 
CGROUP          NSEMS OTIME    CTIME

gstat:
=====
dT: 1.002s  w: 1.000s
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
     0      2      0      0    0.0      2     64    0.4    0.1| mfid0
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p1
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p2
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p3
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p4
     0      2      0      0    0.0      2     64    0.4    0.1| mfid0p5
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p6
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p7
     0      0      0      0    0.0      0      0    0.0    0.0| mfid0p8
     0      0      0      0    0.0      0      0    0.0    0.0| 
gptid/f315c6e7-5a5d-11e2-97d0-001e67036860
     0      0      0      0    0.0      0      0    0.0    0.0| label/rootfs
     0      0      0      0    0.0      0      0    0.0    0.0| label/swap
     0      0      0      0    0.0      0      0    0.0    0.0| label/usr
     0      2      0      0    0.0      2     64    0.4    0.1| label/var
     0      0      0      0    0.0      0      0    0.0    0.0| label/mysql
     0      0      0      0    0.0      0      0    0.0    0.0| label/home
     0      0      0      0    0.0      0      0    0.0    0.0| label/tmp

hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
hw.ncpu: 12
hw.byteorder: 1234
hw.physmem: 51457007616
hw.usermem: 44779876352
hw.pagesize: 4096
hw.floatingpoint: 1
hw.machine_arch: amd64
hw.realmem: 53418655744

FreeBSD ms.manicomio-share.com 9.1-STABLE FreeBSD 9.1-STABLE #0 r245225: 
Wed Jan  9 16:28:50 BRST 2013 
root em ms.manicomio-share.com:/usr/obj/usr/src/sys/MANICOMIO  amd64

last pid: 30230;  load averages:  0.91,  0.95, 0.87 up 0+23:09:37  23:53:53
520 processes: 2 running, 517 sleeping, 1 zombie
CPU 0:   2.4% user,  0.0% nice,  3.5% system,  0.4% interrupt, 93.7% idle
CPU 1:   3.5% user,  0.0% nice,  2.4% system,  0.4% interrupt, 93.7% idle
CPU 2:   2.0% user,  0.0% nice,  0.8% system,  0.8% interrupt, 96.5% idle
CPU 3:   1.6% user,  0.0% nice,  1.6% system,  0.8% interrupt, 96.1% idle
CPU 4:   3.5% user,  0.0% nice,  2.0% system,  0.8% interrupt, 93.7% idle
CPU 5:   3.9% user,  0.0% nice,  2.4% system,  0.0% interrupt, 93.7% idle
CPU 6:   4.3% user,  0.0% nice,  3.1% system,  0.0% interrupt, 92.5% idle
CPU 7:   2.0% user,  0.0% nice,  2.0% system,  0.8% interrupt, 95.3% idle
CPU 8:   2.7% user,  0.0% nice,  4.3% system,  0.8% interrupt, 92.2% idle
CPU 9:   4.3% user,  0.0% nice,  2.7% system,  0.0% interrupt, 93.0% idle
CPU 10:  5.1% user,  0.0% nice,  3.9% system,  0.0% interrupt, 91.0% idle
CPU 11:  3.5% user,  0.0% nice,  3.5% system,  0.0% interrupt, 92.9% idle
Mem: 3185M Active, 21G Inact, 6442M Wired, 4917M Buf, 16G Free
Swap: 16G Total, 16G Free

   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU 
COMMAND
10470 mysql       510  20    0 10130M  7461M sbwait  7 122:22  1.03% mysqld
29487 root          1  20    0 72052K 10660K select  7   0:10  0.63% sshd
30144 www           1  20    0   308M 37664K select 10   0:00  0.10% httpd
29962 www           1  20    0   308M 38776K select  0   0:00  0.05% httpd
30001 www           1  20    0   308M 38828K select  4   0:00  0.05% httpd
30174 www           1  20    0   308M 37500K select  0   0:00  0.05% httpd
30181 www           1  20    0   308M 37580K select  9   0:00  0.05% httpd
30179 www           1  20    0   308M 37632K select 11   0:00  0.05% httpd
.
.
.

É isso :)




Mais detalhes sobre a lista de discussão freebsd