오픈스택 컴퓨트 노드 장애 수동 복구

| 2021년 1월 29일 | 0 Comments

오픈스택을 운영하다 컴퓨트 노드가 장애 발생할 백업만 제대로 되있으면 복구가 의외로 간단 합니다.

컴퓨트 노드의 instances uuid 디렉토리만 가지고 있으면 데이터 베이스 수정으로 쉽게 복구가 됩니다.

오픈스택 버전 미타카, 퀸즈에서 테스트 진행했을때 문제 없이 복구 되었으며 특정 컴퓨트 노드가 장애 났음을 가정하여 테스트 하였습니다.

 


 

  1. 인스턴스 현황 파악

    # nova list
    +————————————–+——+——–+————+————-+————————————+
    | ID | Name | Status | Task State | Power State | Networks |
    +————————————–+——+——–+————+————-+————————————+
    | 1ae13320-cee8-4bfe-922f-7d84bf60dba4 | jyh1 | SHUTOFF | – | Running | jyh1=192.168.1.134, 115.xx.xxx.223 |
    | 6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03 | jyh2 | SHUTOFF | – | Running | jyh1=192.168.1.135, 115.xx.xxx.224 |
    | ca46b44b-4bf7-4bde-92df-606f3ad2f084 | jyh3 | SHUTOFF | – | Running | jyh1=192.168.1.136, 115.xx.xxx.225 |
    | 1351f22d-95a6-4a25-8299-1befa56ec690 | jyh4 | SHUTOFF | – | Running | jyh1=192.168.1.137, 115.xx.xxx.226 |
    | d0a7eaab-b9bf-4021-a855-25a3cfee4964 | jyh5 | SHUTOFF | – | Running | jyh1=192.168.1.138, 115.xx.xxx.227 |
    | 43e6a6d4-d5dc-45c9-853a-6b45188f564f | jyh6 | SHUTOFF | – | Running | jyh1=192.168.1.139, 115.xx.xxx.228 |
    +————————————–+——+——–+————+————-+————————————+

    # for i in $(nova list –a | awk ‘{print $2}’);do echo $i;nova show $i | grep hyper;done
    1ae13320-cee8-4bfe-922f-7d84bf60dba4
    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute2 |

    6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03
    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 |

    ca46b44b-4bf7-4bde-92df-606f3ad2f084
    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 |

    1351f22d-95a6-4a25-8299-1befa56ec690
    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute2 |

    d0a7eaab-b9bf-4021-a855-25a3cfee4964
    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute2 |

    43e6a6d4-d5dc-45c9-853a-6b45188f564f
    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute1

    # 호스트 정보 확인한다. 인스턴스는 데이터 무결성을 위해 가능한 shutdown 한다.

  2.  장애 노드 인스턴스 파일들 복사 (compute1 → compute2)

    root@compute1:/var/lib/nova/instances# scp -r  43e6a6d4-d5dc-45c9-853a-6b45188f564f nova@compute2:/var/lib/nova/instances/

    Warning: Permanently added ‘[compute2],[172.16.210.47]’ (ECDSA) to the list of known hosts.

    console.log 100% 35KB 34.8KB/s 00:00
    disk 100% 20MB 19.7MB/s 00:00
    disk.info 100% 79 0.1KB/s 00:00
    libvirt.xml 100% 2648 2.6KB/s 00:00

    root@compute1:/var/lib/nova/instances# scp -r  6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03 nova@compute2:/var/lib/nova/instances/
    Warning: Permanently added ‘[compute2],[172.16.210.47]’ (ECDSA) to the list of known hosts.
    console.log 100% 35KB 35.1KB/s 00:00
    disk 100% 20MB 19.9MB/s 00:00
    disk.info 100% 79 0.1KB/s 00:00
    libvirt.xml 100% 2648 2.6KB/s 00:00

    root@compute1:/var/lib/nova/instances# scp -r  ca46b44b-4bf7-4bde-92df-606f3ad2f084/ nova@compute2:/var/lib/nova/instances/
    Warning: Permanently added ‘[compute2],[172.16.210.47]’ (ECDSA) to the list of known hosts.
    console.log 100% 35KB 35.0KB/s 00:00
    disk 100% 20MB 19.7MB/s 00:00
    disk.info 100% 79 0.1KB/s 00:00
    libvirt.xml

    # _base 디렉토리의 backing os 이미지는 복사 하지 않으며 io 에러로 scp 명령어가 안될경우 rsync를 사용한다.

  3.  인스턴스 정보 데이타베이스 수정

    [nova]> select launched_on,node,host from instances where uuid=’43e6a6d4-d5dc-45c9-853a-6b45188f564f’;

    +————-+———–+———–+
    | launched_on | node | host |
    +————-+———–+———–+
    | compute1 | compute1 | compute1 |
    +————-+———–+———–+
    1 row in set (0.00 sec)

    [nova]> select launched_on,node,host from instances where uuid=’6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03′;
    +————-+———–+———–+
    | launched_on | node | host |
    +————-+———–+———–+
    | compute1 | compute1 | compute1 |
    +————-+———–+———–+
    1 row in set (0.00 sec)

    [nova]> select launched_on,node,host from instances where uuid=’ca46b44b-4bf7-4bde-92df-606f3ad2f084′;
    +————-+———–+———–+
    | launched_on | node | host |
    +————-+———–+———–+
    | compute1 | compute1 | compute1 |
    +————-+———–+———–+
    1 row in set (0.00 sec)

    [nova]> update instances set launched_on=’compute2′,node=’compute2′,host=’compute2′ where uuid=’43e6a6d4-d5dc-45c9-853a-6b45188f564f’;
    Query OK, 1 row affected (0.03 sec)
    Rows matched: 1 Changed: 1 Warnings: 0

    [nova]> update instances set launched_on=’compute2′,node=’compute2′,host=’compute2′ where uuid=’6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03′;
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 0

    [nova]> update instances set launched_on=’compute2′,node=’compute2′,host=’compute2′ where uuid=’ca46b44b-4bf7-4bde-92df-606f3ad2f084′;
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 0

    [nova]> select launched_on,node,host from instances where uuid=’43e6a6d4-d5dc-45c9-853a-6b45188f564f’;
    +————-+———–+———–+
    | launched_on | node | host |
    +————-+———–+———–+
    | compute2 | compute2 | compute2 |
    +————-+———–+———–+
    1 row in set (0.00 sec)

    [nova]> select launched_on,node,host from instances where uuid=’6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03′;
    +————-+———–+———–+
    | launched_on | node | host |
    +————-+———–+———–+
    | compute2 | compute2 | compute2 |
    +————-+———–+———–+
    1 row in set (0.00 sec)

    [nova]> select launched_on,node,host from instances where uuid=’ca46b44b-4bf7-4bde-92df-606f3ad2f084′;
    +————-+———–+———–+
    | launched_on | node | host |
    +————-+———–+———–+
    | compute2 | compute2 | compute2 |
    +————-+———–+———–+
    1 row in set (0.00 sec)

    4. 인스턴스 시작

    # nova start 6ca379b3-0c2b-4f2b-8274-e7ad97f7fc03
    # nova start ca46b44b-4bf7-4bde-92df-606f3ad2f084
    # nova start 43e6a6d4-d5dc-45c9-853a-6b45188f564f

    5. 하이퍼바이저 정보 db 수정

    [nova]> select vcpus_used from compute_nodes where hypervisor_hostname=”compute2″;

    +————+

    | vcpus_used |

    +————+

    | 0 |

    +————+

    1 row in set (0.01 sec)

    [nova]> select running_vms from compute_nodes where hypervisor_hostname=”compute2″;

    +————-+

    | running_vms |

    +————-+

    | 0 |

    +————-+

    1 row in set (0.00 sec)

    [nova]> update compute_nodes set running_vms=”3″ where hypervisor_hostname=”compute2″;

    Query OK, 1 row affected (0.01 sec)

    Rows matched: 1 Changed: 1 Warnings: 0

    [nova]> update compute_nodes set vcpus_used=”6″ where hypervisor_hostname=”compute2″;

    Query OK, 1 row affected (0.00 sec)

    Rows matched: 1 Changed: 1 Warnings: 0

    [nova]> select vcpus_used from compute_nodes where hypervisor_hostname=”compute2″;

    +————+

    | vcpus_used |

    +————+

    | 6 |

    +————+

    1 row in set (0.00 sec)

    [nova]> select running_vms from compute_nodes where hypervisor_hostname=”compute2″;

    +————-+

    | running_vms |

    +————-+

    | 3 |

    +————-+

    1 row in set (0.01 sec)

Category: 가상화/클라우드

Jang Smile

About the Author ()