国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

Exadata db節(jié)點(diǎn)網(wǎng)卡異常down處理

IT那活兒 / 1839人閱讀
Exadata db節(jié)點(diǎn)網(wǎng)卡異常down處理
點(diǎn)擊上方“IT那活兒”公眾號(hào),關(guān)注后了解更多內(nèi)容,不管IT什么活兒,干就完了?。?!

背景

日常巡檢發(fā)現(xiàn)db節(jié)點(diǎn)ib1網(wǎng)卡異常down了,查看鏈路狀態(tài)也是down,通過ifconfig ib1 down/up也無法恢復(fù)。
  • 環(huán)境
    exadata x8-2
    Image version: 21.2.6
1.1 ib1網(wǎng)卡狀態(tài)沒有running
ib0: flags=4163 mtu 65520
inet 192.168.XX.35 netmask 255.255.252.0 broadcast 192.168.XX.255
inet6 fe80::ba59:9f03:91:7fd1 prefixlen 64 scopeid 0x20
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).

infiniband 80:00:02:08:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)
RX packets 441801234 bytes 135108307679 (125.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 571055390 bytes 386080892808 (359.5 GiB)
TX errors 0 dropped 200 overruns 0 carrier 0 collisions 0

ib1: flags=4099 mtu 65520
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
infiniband 80:00:02:09:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)

RX packets 44482253 bytes 15202598429 (14.1 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16442693 bytes 4497934510 (4.1 GiB)
TX errors 0 dropped 16 overruns 0 carrier 0 collisions 0

ib0:P02: flags=4163 mtu 65520
inet 192.168.XX.36 netmask 255.255.252.0 broadcast 192.168.XX.255
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
infiniband 80:00:02:08:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)

lo: flags=73 mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 64915590345 bytes 15540910225759 (14.1 TiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 64915590345 bytes 15540910225759 (14.1 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
1.2 鏈路顯示down了
Infiniband device mlx4_0 port 1 status:
default gid: fe80:0000:0000:0000:b859:9f03:0091:7fd1
base lid: 0x2b
sm lid: 0x13
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
link_layer: InfiniBand

Infiniband device mlx4_0 port 2 status:
default gid: fe80:0000:0000:0000:b859:9f03:0091:7fd2
base lid: 0x2c
sm lid: 0x13
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X)
link_layer: InfiniBand
1.3 db節(jié)點(diǎn)alert日志顯示IB交換機(jī)端口異常
359_1 2022-02-05T22:28:34+08:00 critical "InfiniBand port 
HCA-1:2 is in an invalid state. State : Down Physical Link 
State : Polling Data Rate : 10 Gps Symbol Errors : 0 Received Errors : 0"

到這里可以排除網(wǎng)卡損壞,開始核查所有ib交換機(jī)狀態(tài)。


分析原因

在所有ib交換機(jī)上執(zhí)行檢查,發(fā)現(xiàn)iba01存在異常。
Environment test started:
Starting Environment Daemon test:
Environment daemon running
Environment Daemon test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.25 V
Measured 3.3V Standby = 3.37 V
Measured 12V = 11.97 V
Measured 5V = 4.99 V
Measured VBAT = 3.03 V
Measured 2.5V = 2.48 V
Measured 1.8V = 1.78 V
Measured I4 1.2V = 1.21 V
Voltage test returned OK
Starting PSU test:
PSU 0 present OK
PSU 1 present OK
PSU test returned OK
Starting Temperature test:
Back temperature 29
Front temperature 31
SP temperature 46
Switch temperature 57, maxtemperature 60
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 15478
Fan 2 running at rpm 15696
Fan 3 running at rpm 15696
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Starting SSD test:
SSD test returned OK
Starting Auto-link-disable test:
WARNING Autodisabled ports
Auto-link-disable test returned 1 faults
Environment test FAILED
# listlinkup
Connector 0A Present <-> Switch Port 20 is up (Enabled)
Connector 1A Present <-> Switch Port 22 is down (AutomaticHighErrorRate)
Connector 2A Present <-> Switch Port 24 is down (Enabled)
Connector 3A Present <-> Switch Port 26 is up (Enabled)
Connector 4A Present <-> Switch Port 28 is up (Enabled)
Connector 5A Present <-> Switch Port 30 is up (Enabled)
# showunhealthy
WARNING Autodisabled ports
FAILURE - 1 sensors NOT OK
vendid=0x2c9
devid=0x1003
sysimgguid=0xb8599f0300917fd3
caguid=0xb8599f0300917fd0
Ca 2 "H-b8599f0300917fd0" # "exdadbadm10 S 192.168.XX.35,192.168.XX.36 HCA-1"
[1](b8599f0300917fd1) "S-0010e0cdd353a0a0"[22] # lid 43 lmc 0 "SUN DCS 36P QDR exdasw-ibb01 168.168.XX.37" lid 26 4xQDR

可以確認(rèn),連接ib1的連接端口22有報(bào)錯(cuò),而22端口由于high error rate導(dǎo)致處于關(guān)閉狀態(tài)。


處理步驟

3.1 首先清理報(bào)錯(cuò)信息
[root@exdasw-iba01 ~]# ibclearerrors

## Summary: 35 nodes cleared 0 errors
[root@exdasw-iba01 ~]#
[root@exdasw-iba01 ~]#
[root@exdasw-iba01 ~]# ibclearcounters

## Summary: 35 nodes cleared 0 errors
3.2 啟用IB交換機(jī)端口并驗(yàn)證
[root@exdasw-iba01 ~]# enableswitchport --automatic 22

[root@exdasw-iba01 ~]# listlinkup
Connector 0A Present <-> Switch Port 20 is up (Enabled)
Connector 1A Present <-> Switch Port 22 is up (Enabled)

Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Starting SSD test:
SSD test returned OK
Starting Auto-link-disable test:
Auto-link-disable test returned OK
可以看到狀態(tài)已經(jīng)恢復(fù)正常。
再次檢查db節(jié)點(diǎn)ib01網(wǎng)卡,無需人工處理,該網(wǎng)卡已經(jīng)處于running狀態(tài)。
3.3 開始清理告警日志
[root@exdasw-iba01 ~]# spsh

Oracle(R) Integrated Lights Out Manager

Version 2.2.16-3 ILOM 3.2.11 r137127

Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: exdasw-iba01

-> show faulty
Target | Property | Value
----------------------------------+----------------------------------------+-----------------------------------------------------------
/SP/faultmgmt/0                   | fru | /SYS
/SP/faultmgmt/0/faults/0          | class | fault.device.ib.auto-link-disable
/SP/faultmgmt/0/faults/0          | sunw-msg-id | ---
/SP/faultmgmt/0/faults/0          | component | /SYS
/SP/faultmgmt/0/faults/0          | uuid | 8f0fd5cf-661b-e0f5-db37-cfbcbd7b673a
/SP/faultmgmt/0/faults/0          | timestamp | 2022-02-02/10:18:11


-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y

faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2022-02-05/22:22:57 57471550-2815-eba0-ef6b-f46fa7788bad IBSWITCH-8000-D6 Major

Fault class : fault.chassis.device.ib.link-error

FRU : /SYS
(Part Number: 7305544)
(Serial Number: AK00417399)

Description : One or more ports have been auto-disabled due to high error
rate or bad link speed or width.

Response : Illuminate service-required LED on the chassis.

Impact : One or more ports have been auto-disabled due to high error
rate or bad link speed or width.

Action : Please refer to the associated reference document at %s for
the latest service procedures and policies regarding this
diagnosis.

faultmgmtsp> fmadm repair 57471550-2815-eba0-ef6b-f46fa7788bad
faultmgmtsp> fmadm faulty
No problems found
faultmgmtsp> show faulty
Invalid command show - type help for a list of commands.

faultmgmtsp> exit
-> show faulty
Target | Property | Value
------------------------------------------------+--------------------------------------------------------+---------------------------------------------------------------------------------

->
總 結(jié)
當(dāng)網(wǎng)卡出現(xiàn)異常時(shí),不一定是硬件損壞,需要核查整個(gè)鏈路上的組件運(yùn)行狀態(tài)。


本文作者湯 杰(上海新炬中北團(tuán)隊(duì))

本文來源:“IT那活兒”公眾號(hào)

文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址:http://m.specialneedsforspecialkids.com/yun/129136.html

相關(guān)文章

  • 【獨(dú)家】終生受用的Redis高可用技術(shù)解決方案大全

    摘要:哨兵是社區(qū)版本推出的原生高可用解決方案,部署架構(gòu)主要包括兩部分集群和數(shù)據(jù)集群,其中集群是由若干節(jié)點(diǎn)組成的分布式集群。自研推薦推薦自研的高可用解決方案,主要體現(xiàn)在配置中心故障探測(cè)和的處理機(jī)制上,通常需要根據(jù)企業(yè)業(yè)務(wù)的實(shí)際線上環(huán)境來定制化。 最近很多朋友向我咨詢關(guān)于高可用的方案的優(yōu)缺點(diǎn)以及如何選擇合適的方案線上使用,剛好最近在給宜人貸,光大銀行做企業(yè)內(nèi)訓(xùn)的時(shí)候也詳細(xì)講過,這里我再整理發(fā)出來...

    cc17 評(píng)論0 收藏0
  • 【獨(dú)家】終生受用的Redis高可用技術(shù)解決方案大全

    摘要:哨兵是社區(qū)版本推出的原生高可用解決方案,部署架構(gòu)主要包括兩部分集群和數(shù)據(jù)集群,其中集群是由若干節(jié)點(diǎn)組成的分布式集群。自研推薦推薦自研的高可用解決方案,主要體現(xiàn)在配置中心故障探測(cè)和的處理機(jī)制上,通常需要根據(jù)企業(yè)業(yè)務(wù)的實(shí)際線上環(huán)境來定制化。 最近很多朋友向我咨詢關(guān)于高可用的方案的優(yōu)缺點(diǎn)以及如何選擇合適的方案線上使用,剛好最近在給宜人貸,光大銀行做企業(yè)內(nèi)訓(xùn)的時(shí)候也詳細(xì)講過,這里我再整理發(fā)出來...

    helloworldcoding 評(píng)論0 收藏0
  • 留給傳統(tǒng) DBA 的時(shí)間不多了?看餓了么如何構(gòu)建數(shù)據(jù)庫(kù)平臺(tái)自動(dòng)化

    摘要:因?yàn)閭鹘y(tǒng)的數(shù)據(jù)庫(kù)管理方式在當(dāng)前這種架構(gòu)下依靠手工或者借助簡(jiǎn)單的工具是無法應(yīng)對(duì)多活架構(gòu)大規(guī)模管理帶來的復(fù)雜性,因此平臺(tái)化顯得非常重。我們?cè)谧龅姆桨笗r(shí)做了充分調(diào)查及論證,最終沒有選擇這種方式。 蔡鵬,2015年加入餓了么,見證了餓了么業(yè)務(wù)&技術(shù)從0到1的發(fā)展過程,并全程參與了數(shù)據(jù)庫(kù)及DBA團(tuán)隊(duì)高速發(fā)展全過程。同時(shí)也完成個(gè)人職能的轉(zhuǎn)型-由運(yùn)維DBA到DEV-DBA的轉(zhuǎn)變,也從DB的維穩(wěn)轉(zhuǎn)變到專心為...

    explorer_ddf 評(píng)論0 收藏0

發(fā)表評(píng)論

0條評(píng)論

最新活動(dòng)
閱讀需要支付1元查看
<