Ceph 는 분산 저장 시스템으로, 대규모 데이터를 안정적이고 확장 가능하게 저장하고 관리할 수 있는 오픈 소스 소프트웨어입니다. 객체 스토리지, 블록 스토리지 등을 제공하여 데이터의 저장및 관리가 용이하며, 다수의 노드에 걸쳐 데이터를 분산시켜 안정성을 확보하는것이 특징입니다.
Ceph-Ansible은 Ceph 클러스터를 구축하고 유지보수하기 위한 도구입니다. yaml 파일형식을 사용하여 다수의 서버를 설정하고 관리할 수 있는 오픈 소스 도구이며 이를 사용해 빠르고 효율적으로 Ceph 클러스터를 배포할 수 있습니다.
Ceph 구성요소
Monitor (mon):
Monitor는 클러스터 상태와 매핑 정보를 유지하고 다른 Ceph 데몬과의 상호 작용합니다. 고가용성 구성을 위해 최소 3개의 Monitor가 필요합니다.
Manager (mgr):
Manager는 클러스터의 상태, 성능 통계 및 모니터링 정보를 수집하고 표시합니다.
Object Storage Daemon (OSD):
클러스터의 실제 데이터를 저장하고 관리합니다. 데이터를 분산 저장하고, 복제 및 데이터 회복을 담당하여 데이터의 안정성과 가용성을 보장합니다.
## 호스트 이름 설정 (node0, node1, node2)
$ hostnamectl set-hostname $hostname
## selinux 비활성화
$ vim /etc/selinux/config
1 # This file controls the state of SELinux on the system.
2 # SELINUX= can take one of these three values:
3 # enforcing - SELinux security policy is enforced.
4 # permissive - SELinux prints warnings instead of enforcing.
5 # disabled - No SELinux policy is loaded.
6 SELINUX=enforcing -> disabled(변경)
7 # SELINUXTYPE= can take one of three values:
8 # targeted - Targeted processes are protected,
9 # minimum - Modification of targeted policy. Only selected processes are protected.
10 # mls - Multi Level Security protection.
11 SELINUXTYPE=targeted
## 방화벽 및 포트 설정
# firewalld 또는 iptables 를 종료하거나,
# 80/tcp 2003/tcp 4505/tcp 4506/tcp 6800 7300/tcp 6789/tcp 8443/tcp 9100/tcp 9283/tcp 3000/tcp 9092/tcp 9093/tcp 9094/tcp 9094/udp 8080/tcp 포트에 대한 허용 규칙을 작성합니다.
# 본문에서는 방화벽을 사용하지 않습니다.
$ systemctl stop firewalld
$ systemctl disable firewalld
## 호스트 이름 설정 (node0, node1, node2)
$ hostnamectl set-hostname $hostname
## selinux 비활성화
$ vim /etc/selinux/config
1 # This file controls the state of SELinux on the system.
2 # SELINUX= can take one of these three values:
3 # enforcing - SELinux security policy is enforced.
4 # permissive - SELinux prints warnings instead of enforcing.
5 # disabled - No SELinux policy is loaded.
6 SELINUX=enforcing -> disabled(변경)
7 # SELINUXTYPE= can take one of three values:
8 # targeted - Targeted processes are protected,
9 # minimum - Modification of targeted policy. Only selected processes are protected.
10 # mls - Multi Level Security protection.
11 SELINUXTYPE=targeted
## 방화벽 및 포트 설정
# firewalld 또는 iptables 를 종료하거나,
# 80/tcp 2003/tcp 4505/tcp 4506/tcp 6800 7300/tcp 6789/tcp 8443/tcp 9100/tcp 9283/tcp 3000/tcp 9092/tcp 9093/tcp 9094/tcp 9094/udp 8080/tcp 포트에 대한 허용 규칙을 작성합니다.
# 본문에서는 방화벽을 사용하지 않습니다.
$ systemctl stop firewalld
$ systemctl disable firewalld
package 설치
ansible 은 ssh를 사용하여 배포를 진행하기에 ansible을 사용하기 위하여 최소 패키지 설치 및 호스트 등록을 진행합니다.
네트워크 인터페이스가 1개 이기에 cluster_network 를 위하여 각 노드에 vip를 생성하여 사용합니다.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
## node0
$ ifconfig ens3:1 192.168.102.2 netmask 255.255.255.0 up
## node1
$ ifconfig ens3:1 192.168.102.3 netmask 255.255.255.0 up
## node2
$ ifconfig ens3:1 192.168.102.4 netmask 255.255.255.0 up
## 또는 netplan 을 사용하여 아래와같은 서식으로 vlan을 생성하여 사용합니다.
vlans:
vlan.2:
id: 2
link: ens3
addresses: [ 192.168.102.2/24]
## node0
$ ifconfig ens3:1 192.168.102.2 netmask 255.255.255.0 up
## node1
$ ifconfig ens3:1 192.168.102.3 netmask 255.255.255.0 up
## node2
$ ifconfig ens3:1 192.168.102.4 netmask 255.255.255.0 up
## 또는 netplan 을 사용하여 아래와같은 서식으로 vlan을 생성하여 사용합니다.
vlans:
vlan.2:
id: 2
link: ens3
addresses: [ 192.168.102.2/24]
## node0
$ ifconfig ens3:1 192.168.102.2 netmask 255.255.255.0 up
## node1
$ ifconfig ens3:1 192.168.102.3 netmask 255.255.255.0 up
## node2
$ ifconfig ens3:1 192.168.102.4 netmask 255.255.255.0 up
## 또는 netplan 을 사용하여 아래와같은 서식으로 vlan을 생성하여 사용합니다.
vlans:
vlan.2:
id: 2
link: ens3
addresses: [ 192.168.102.2/24]
Comment – 23.11.23
vlan을 생성하여 cluster network 용도로 사용하려 했으나, ceph-ansible에서는 vlan을 넣을 수 없음(인터페이스명에 특수문자가 들어가기에 구문오류 발생, 해결가능할듯 하나 아직 방법을 모름), 따라서 vlan을 사용하지않고 public_ip를 사용합니다.
Ceph-ansible 다운로드
ceph-ansible은 git repository 를 통하여 공유되며 git clone을 사용하여 받아 올 수 있습니다.
git clone https://github.com/ceph/ceph-ansible.git
Your branch is up to date with 'origin/stable-6.0'.
## node0(deploy)
$ cd ~
$ git clone https://github.com/ceph/ceph-ansible.git
$ cd ~/ceph-ansible/
$ git branch -a
~
remotes/origin/stable-2.1
remotes/origin/stable-2.2
remotes/origin/stable-3.0
remotes/origin/stable-3.1
remotes/origin/stable-3.2
remotes/origin/stable-4.0
remotes/origin/stable-5.0
remotes/origin/stable-6.0
remotes/origin/stable-6.0-ceph-volume-tests
remotes/origin/stable-7.0
~
$ git checkout stable-6.0
Switched to branch 'stable-6.0'
Your branch is up to date with 'origin/stable-6.0'.
## node0(deploy)
$ cd ~
$ git clone https://github.com/ceph/ceph-ansible.git
$ cd ~/ceph-ansible/
$ git branch -a
~
remotes/origin/stable-2.1
remotes/origin/stable-2.2
remotes/origin/stable-3.0
remotes/origin/stable-3.1
remotes/origin/stable-3.2
remotes/origin/stable-4.0
remotes/origin/stable-5.0
remotes/origin/stable-6.0
remotes/origin/stable-6.0-ceph-volume-tests
remotes/origin/stable-7.0
~
$ git checkout stable-6.0
Switched to branch 'stable-6.0'
Your branch is up to date with 'origin/stable-6.0'.
ceph 배포를 위해서는 stable branch를 사용하여 배포를 진행하며, 각 버전마다 설치를 지원하는 ceph 버전이 다릅니다.
stable-3.0 Supports Ceph versions jewel and luminous. This branch requires Ansible version 2.4.
stable-3.1 Supports Ceph versions luminous and mimic. This branch requires Ansible version 2.4.
stable-3.2 Supports Ceph versions luminous and mimic. This branch requires Ansible version 2.6.
stable-4.0 Supports Ceph version nautilus. This branch requires Ansible version 2.9.
stable-5.0 Supports Ceph version octopus. This branch requires Ansible version 2.9.
stable-6.0 Supports Ceph version pacific. This branch requires Ansible version 2.10.
stable-7.0 Supports Ceph version quincy. This branch requires Ansible version 2.12.
main Supports the main (devel) branch of Ceph. This branch requires Ansible version 2.12.
배포하고 싶은 버전에 맞추어 branch를 변경하여 사용하면 됩니다. 필자는 quincy 를 사용하기위해 stable-7.0을 사용하였지만, 알수없는 코드상의 구문오류가 계속하여 발생하여 stable-6.0으로 pacific을 설치하였습니다.
Comment – 23.11.23
quincy버전을 사용하던도중 syntex에러가 발생했던 이유는 아래에서 설명할 group_vars/all.yml 파일에서 컨테이너 태그명을 latest-quincy 로 입력하였기 때문이었음, latest-master로 변경하여 설치 진행하면 quincy 버전도 설치가 정상적으로 됨
Python-venv settings
ceph-ansible를 이용하여 ceph 배포를 위해서는 기본적으로 ansible, six, netaddr 등 의 패키지가 필요합니다. 필자는 다양한 ceph 버전을 테스트해보기위해 개별적인 환경을 구성할 수 없나 싶어서 이리저리 구글링을 하다가 얻은 팁입니다. python-venv를 사용하면 python 가상환경을 구성하여 사용할 수 있습니다. 해당 환경에 접속해 있는 경우에만 세팅된 환경을 사용할 수 있기에 다른 환경에 영향을 끼치지 않습니다. 아래와 같이 세팅후, ceph-ansible의 requierment.txt에 있는 내용을 설치합니다.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
$ cd ~
$ python -m venv stable-6.0
$ source ~/stable-6.0/bin/activate # 가상환경 접속, deactivate 를 입력하면 종료됩니다.
$ cd ~
$ python -m venv stable-6.0
$ source ~/stable-6.0/bin/activate # 가상환경 접속, deactivate 를 입력하면 종료됩니다.
(stable-6.0)$ cd ~/ceph-ansible
(stable-6.0)$ pip3 install -r requirement.txt # 필요한 패키지가 설치됩니다(ansible, netaddr, six)
$ cd ~
$ python -m venv stable-6.0
$ source ~/stable-6.0/bin/activate # 가상환경 접속, deactivate 를 입력하면 종료됩니다.
(stable-6.0)$ cd ~/ceph-ansible
(stable-6.0)$ pip3 install -r requirement.txt # 필요한 패키지가 설치됩니다(ansible, netaddr, six)
혹여나 다른 버전을 테스트하고 싶다면 deactivate를 통하여 빠져나온뒤 새로운 venv를 생성하고 ceph-ansbile의 브랜치를 원하는 버전으로 변경 후, requierment를 새로 설치하면 됩니다. local 환경에 직접 설치할경우 버전 관리가 번거롭기에 여러 버전을 운용하기 유용한 방법입니다.
Ceph-ansible inventory 작성
각 노드의 호스트정보 및 역할을 작성하여야 합니다. 디렉토리 위치나 파일명은 상관없습니다.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
$ cat inventory
[mons]
node0 ansible_host=10.101.0.14
node1 ansible_host=10.101.0.4
node2 ansible_host=10.101.0.15
[osds]
node0
node1
node2
[mgrs]
node0
node1
node2
[grafana-server]
node0
# mons, osds, mgrs, rgws 그룹을 묶어서 하나의 all 그룹으로 정의
# 아래와 같이 입력시 mons, osds, mgrs에 작성한 호스트가 all 그룹의 호스트로 정의됩니다.
[all:children]
mons
osds
mgrs
$ cat inventory
[mons]
node0 ansible_host=10.101.0.14
node1 ansible_host=10.101.0.4
node2 ansible_host=10.101.0.15
[osds]
node0
node1
node2
[mgrs]
node0
node1
node2
[grafana-server]
node0
# mons, osds, mgrs, rgws 그룹을 묶어서 하나의 all 그룹으로 정의
# 아래와 같이 입력시 mons, osds, mgrs에 작성한 호스트가 all 그룹의 호스트로 정의됩니다.
[all:children]
mons
osds
mgrs
$ cat inventory
[mons]
node0 ansible_host=10.101.0.14
node1 ansible_host=10.101.0.4
node2 ansible_host=10.101.0.15
[osds]
node0
node1
node2
[mgrs]
node0
node1
node2
[grafana-server]
node0
# mons, osds, mgrs, rgws 그룹을 묶어서 하나의 all 그룹으로 정의
# 아래와 같이 입력시 mons, osds, mgrs에 작성한 호스트가 all 그룹의 호스트로 정의됩니다.
[all:children]
mons
osds
mgrs
group_vars/all.yml 작성
사용할 네트워크, 설치할 ceph 버전, dashboard 설정등을 진행 할 수 있는 파일입니다.
필자는 아래와 같이 설정하여 사용하였습니다. ceph버전은 꼭 brach 버전과 맞추어 세팅하여야 합니다.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
$ cp group_vars/all.yml.sample group_vars/all.yml
$ cat group_vars/all.yml
dummy:
cluster: ceph
ntp_service_enabled: true
ntp_daemon_type: chronyd
monitor_interface: ens3
#monitor_address: 192.168.101.2 # octopus version 이후 불필요
osd_auto_discovery – true로 사용하면 devices 를 정의하지 않아도 자동으로 devices 를 찾아 설치한다고 합니다. 비활성화로 진행합니다.
osd_scenario: lvm – 기본적으로 lvm을 파티션을 사용함으로 lvm 으로 파티셔닝 하도록 설정합니다.
osd_objectstore: bluestore
obj storage를 사용하기 위한 설정이라고 하는데.. 사실 아직 잘 모르겠습니다. 아래 링크에 설명이 잘 되어있는 듯 합니다.
https://www.ibm.com/docs/en/storage-ceph/5?topic=components-ceph-bluestore
간혹 purge-container-cluster.yml 을 사용하더라도 lvm 은 정상적으로 삭제되지 않는경우가 있습니다, 해당경우에는 lvs 또는 lvmdiskscan 명령어로 활성화 되어있는 lvm 디스크를 확인하고 lvremove $디바이스명 명령어를 통하여 lvm 디스크를 삭제후 재배포 진행합니다.
Comment-옵션 설명
-i : 인벤토리(플레이북을 수행할 대상지/호스트) 지정
-e : 추가 변수를 지정 키=값 형태 또는 파일로 할당
-vvv : verbose, 상세로 확인 가능(작업시간을 많이 단축시켜줬습니다.)
ceph-ansible 의 task가 모두다 정상적으로 종료되었다면, 각 노드에서 docker 명령어를 통하여 배포된 컨테이너를 확인합니다.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
## node0
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
faf8f5a2cb8e grafana/grafana:6.7.4"/run.sh"26 minutes ago Up 26 minutes grafana-server
740f6c4fa0d5 prom/prometheus:v2.7.2"/bin/prometheus --c…"32 minutes ago Up 32 minutes prometheus
6ca225e5605f prom/alertmanager:v0.16.2"/bin/alertmanager -…"32 minutes ago Up 32 minutes alertmanager
913268b46a99 prom/node-exporter:v0.17.0"/bin/node_exporter …"33 minutes ago Up 33 minutes node-exporter
d419d8fe49d3 ceph/daemon:latest-pacific "/opt/ceph-container…"33 minutes ago Up 33 minutes ceph-osd-3
2179f191cb9c ceph/daemon:latest-pacific "/opt/ceph-container…"33 minutes ago Up 33 minutes ceph-osd-0
cce8ff7ad886 ceph/daemon:latest-pacific "/opt/ceph-container…"35 minutes ago Up 35 minutes ceph-mgr-node0
c79447d4db87 ceph/daemon:latest-pacific "/opt/ceph-container…"36 minutes ago Up 36 minutes ceph-mon-node0
## node1
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a1311404428d prom/node-exporter:v0.17.0"/bin/node_exporter …"33 minutes ago Up 33 minutes node-exporter
2b6e9dd65411 ceph/daemon:latest-pacific "/opt/ceph-container…"34 minutes ago Up 34 minutes ceph-osd-5
84d2ed4ca50e ceph/daemon:latest-pacific "/opt/ceph-container…"34 minutes ago Up 34 minutes ceph-osd-2
bb0c4af6651a ceph/daemon:latest-pacific "/opt/ceph-container…"35 minutes ago Up 35 minutes ceph-mgr-node1
a1593bd9f46f ceph/daemon:latest-pacific "/opt/ceph-container…"36 minutes ago Up 36 minutes ceph-mon-node1
## node2
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffa62c2f513 prom/node-exporter:v0.17.0"/bin/node_exporter …"33 minutes ago Up 33 minutes node-exporter
50f208b3b737 ceph/daemon:latest-pacific "/opt/ceph-container…"34 minutes ago Up 34 minutes ceph-osd-4
6728621fbd6a ceph/daemon:latest-pacific "/opt/ceph-container…"34 minutes ago Up 34 minutes ceph-osd-1
8aa31cf5484b ceph/daemon:latest-pacific "/opt/ceph-container…"35 minutes ago Up 35 minutes ceph-mgr-node2
d9dacfa21734 ceph/daemon:latest-pacific "/opt/ceph-container…"37 minutes ago Up 37 minutes ceph-mon-node2
## node0
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
faf8f5a2cb8e grafana/grafana:6.7.4 "/run.sh" 26 minutes ago Up 26 minutes grafana-server
740f6c4fa0d5 prom/prometheus:v2.7.2 "/bin/prometheus --c…" 32 minutes ago Up 32 minutes prometheus
6ca225e5605f prom/alertmanager:v0.16.2 "/bin/alertmanager -…" 32 minutes ago Up 32 minutes alertmanager
913268b46a99 prom/node-exporter:v0.17.0 "/bin/node_exporter …" 33 minutes ago Up 33 minutes node-exporter
d419d8fe49d3 ceph/daemon:latest-pacific "/opt/ceph-container…" 33 minutes ago Up 33 minutes ceph-osd-3
2179f191cb9c ceph/daemon:latest-pacific "/opt/ceph-container…" 33 minutes ago Up 33 minutes ceph-osd-0
cce8ff7ad886 ceph/daemon:latest-pacific "/opt/ceph-container…" 35 minutes ago Up 35 minutes ceph-mgr-node0
c79447d4db87 ceph/daemon:latest-pacific "/opt/ceph-container…" 36 minutes ago Up 36 minutes ceph-mon-node0
## node1
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a1311404428d prom/node-exporter:v0.17.0 "/bin/node_exporter …" 33 minutes ago Up 33 minutes node-exporter
2b6e9dd65411 ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-5
84d2ed4ca50e ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-2
bb0c4af6651a ceph/daemon:latest-pacific "/opt/ceph-container…" 35 minutes ago Up 35 minutes ceph-mgr-node1
a1593bd9f46f ceph/daemon:latest-pacific "/opt/ceph-container…" 36 minutes ago Up 36 minutes ceph-mon-node1
## node2
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffa62c2f513 prom/node-exporter:v0.17.0 "/bin/node_exporter …" 33 minutes ago Up 33 minutes node-exporter
50f208b3b737 ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-4
6728621fbd6a ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-1
8aa31cf5484b ceph/daemon:latest-pacific "/opt/ceph-container…" 35 minutes ago Up 35 minutes ceph-mgr-node2
d9dacfa21734 ceph/daemon:latest-pacific "/opt/ceph-container…" 37 minutes ago Up 37 minutes ceph-mon-node2
## node0
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
faf8f5a2cb8e grafana/grafana:6.7.4 "/run.sh" 26 minutes ago Up 26 minutes grafana-server
740f6c4fa0d5 prom/prometheus:v2.7.2 "/bin/prometheus --c…" 32 minutes ago Up 32 minutes prometheus
6ca225e5605f prom/alertmanager:v0.16.2 "/bin/alertmanager -…" 32 minutes ago Up 32 minutes alertmanager
913268b46a99 prom/node-exporter:v0.17.0 "/bin/node_exporter …" 33 minutes ago Up 33 minutes node-exporter
d419d8fe49d3 ceph/daemon:latest-pacific "/opt/ceph-container…" 33 minutes ago Up 33 minutes ceph-osd-3
2179f191cb9c ceph/daemon:latest-pacific "/opt/ceph-container…" 33 minutes ago Up 33 minutes ceph-osd-0
cce8ff7ad886 ceph/daemon:latest-pacific "/opt/ceph-container…" 35 minutes ago Up 35 minutes ceph-mgr-node0
c79447d4db87 ceph/daemon:latest-pacific "/opt/ceph-container…" 36 minutes ago Up 36 minutes ceph-mon-node0
## node1
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a1311404428d prom/node-exporter:v0.17.0 "/bin/node_exporter …" 33 minutes ago Up 33 minutes node-exporter
2b6e9dd65411 ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-5
84d2ed4ca50e ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-2
bb0c4af6651a ceph/daemon:latest-pacific "/opt/ceph-container…" 35 minutes ago Up 35 minutes ceph-mgr-node1
a1593bd9f46f ceph/daemon:latest-pacific "/opt/ceph-container…" 36 minutes ago Up 36 minutes ceph-mon-node1
## node2
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ffa62c2f513 prom/node-exporter:v0.17.0 "/bin/node_exporter …" 33 minutes ago Up 33 minutes node-exporter
50f208b3b737 ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-4
6728621fbd6a ceph/daemon:latest-pacific "/opt/ceph-container…" 34 minutes ago Up 34 minutes ceph-osd-1
8aa31cf5484b ceph/daemon:latest-pacific "/opt/ceph-container…" 35 minutes ago Up 35 minutes ceph-mgr-node2
d9dacfa21734 ceph/daemon:latest-pacific "/opt/ceph-container…" 37 minutes ago Up 37 minutes ceph-mon-node2
정상적으로 컨테이너 배포가 완료되었다면 컨테이너에서 ceph -s 명령어를 통하여 정상적 osd가 올라왔는지 확인할 수 있습니다.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
docker exec ceph-mon-node0 ceph -s
cluster:
id: 467ae2d9-123d-4bba-b391-dc3258094daa
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
services:
mon: 3 daemons, quorum node1,node0,node2 (age 34m)
mgr: node0(active, since 23m), standbys: node1, node2
osd: 6 osds: 6 up (since 32m), 6in (since 32m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 32 MiB used, 300 GiB / 300 GiB avail
pgs: 1 active+clean
docker exec ceph-mon-node0 ceph -s
cluster:
id: 467ae2d9-123d-4bba-b391-dc3258094daa
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
services:
mon: 3 daemons, quorum node1,node0,node2 (age 34m)
mgr: node0(active, since 23m), standbys: node1, node2
osd: 6 osds: 6 up (since 32m), 6 in (since 32m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 32 MiB used, 300 GiB / 300 GiB avail
pgs: 1 active+clean
docker exec ceph-mon-node0 ceph -s
cluster:
id: 467ae2d9-123d-4bba-b391-dc3258094daa
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
services:
mon: 3 daemons, quorum node1,node0,node2 (age 34m)
mgr: node0(active, since 23m), standbys: node1, node2
osd: 6 osds: 6 up (since 32m), 6 in (since 32m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 32 MiB used, 300 GiB / 300 GiB avail
pgs: 1 active+clean
Dashboard
각 노드 관리를 위하여 ceph-dashboard 및 grafana 도 같이 배포되며 grafana를 배포한 서버의 8443포트 및 group_vars/all.yml에서 설정한 계정정보를 사용하여 ceph-dashboard에 접속 할 수 있습니다
배포중 직면했던 에러들…
ansible 및 ceph 의 경험이 적다보니 다사다난 했습니다. 다음에 같은 문제 발생시 찾아보기위해 적어둡니다.
배포할 노드의 디스크를 파티셔닝하여 발생했던 이슈
ansible-playbook 을 이용하여 배포 진행중 아래와 같은 에러 발생
stderr: '--> RuntimeError: Device /dev/vdb has partitions.'
stderr: '--> RuntimeError: Device /dev/vdb has partitions.'
노드들의 디스크 파티션을 삭제뒤 해결했습니다.
mon 노드의 keyring 을 교환할 수 없었던 이슈
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
TASK [ceph-mon : ceph monitor mkfs with keyring] ********************************************************
Wednesday 22 November 202314:27:27 +0900 (0:00:00.030) 0:01:02.642 ****
fatal: [osd1]: FAILED! => changed=true
cmd:
- docker
- run
- --rm
- --net=host
- -v
- /var/lib/ceph/:/var/lib/ceph:z
- -v
- /etc/ceph/:/etc/ceph/:z
- --entrypoint=ceph-mon
- docker.io/ceph/daemon:latest-pacific
- --cluster
- ceph
- --setuser
- '167'
- --setgroup
- '167'
- --mkfs
- -i
- osd1
- --fsid
- fe7f37b4-f783-4f36-8574-73c3e9c2d90e
- --keyring
- /var/lib/ceph/tmp/ceph.mon..keyring
delta: '0:00:00.629752'
end: '2023-11-22 14:27:28.673441'
msg: non-zero return code
rc: 1
start: '2023-11-22 14:27:28.043689'
stderr: |-
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mon/MonMap.h: Infunction'void MonMap::add(const mon_info_t&)' thread 7f0537d64700 time 2023-11-22T05:27:28.459109+0000
TASK [ceph-osd : wait for all osd to be up] *************************************************************
Wednesday 22 November 2023 14:44:14 +0900 (0:00:01.942) 0:04:00.910 ****
skipping: [osd1]
FAILED - RETRYING: wait for all osd to be up (60 retries left).
FAILED - RETRYING: wait for all osd to be up (59 retries left).
FAILED - RETRYING: wait for all osd to be up (58 retries left).
FAILED - RETRYING: wait for all osd to be up (57 retries left).
FAILED - RETRYING: wait for all osd to be up (56 retries left).
FAILED - RETRYING: wait for all osd to be up (55 retries left).
FAILED - RETRYING: wait for all osd to be up (54 retries left).
FAILED - RETRYING: wait for all osd to be up (53 retries left).
FAILED - RETRYING: wait for all osd to be up (52 retries left).
FAILED - RETRYING: wait for all osd to be up (51 retries left).
FAILED - RETRYING: wait for all osd to be up (50 retries left).
FAILED - RETRYING: wait for all osd to be up (49 retries left).
FAILED - RETRYING: wait for all osd to be up (48 retries left).
FAILED - RETRYING: wait for all osd to be up (47 retries left).
FAILED - RETRYING: wait for all osd to be up (46 retries left).
FAILED - RETRYING: wait for all osd to be up (45 retries left).
FAILED - RETRYING: wait for all osd to be up (44 retries left).
FAILED - RETRYING: wait for all osd to be up (43 retries left).
FAILED - RETRYING: wait for all osd to be up (42 retries left).
FAILED - RETRYING: wait for all osd to be up (41 retries left).
FAILED - RETRYING: wait for all osd to be up (40 retries left).
FAILED - RETRYING: wait for all osd to be up (39 retries left).
FAILED - RETRYING: wait for all osd to be up (38 retries left).
FAILED - RETRYING: wait for all osd to be up (37 retries left).
FAILED - RETRYING: wait for all osd to be up (36 retries left).
FAILED - RETRYING: wait for all osd to be up (35 retries left).
FAILED - RETRYING: wait for all osd to be up (34 retries left).
FAILED - RETRYING: wait for all osd to be up (33 retries left).
FAILED - RETRYING: wait for all osd to be up (32 retries left).
FAILED - RETRYING: wait for all osd to be up (31 retries left).
FAILED - RETRYING: wait for all osd to be up (30 retries left).
FAILED - RETRYING: wait for all osd to be up (29 retries left).
FAILED - RETRYING: wait for all osd to be up (28 retries left).
FAILED - RETRYING: wait for all osd to be up (27 retries left).
FAILED - RETRYING: wait for all osd to be up (26 retries left).
FAILED - RETRYING: wait for all osd to be up (25 retries left).
FAILED - RETRYING: wait for all osd to be up (24 retries left).
FAILED - RETRYING: wait for all osd to be up (23 retries left).
FAILED - RETRYING: wait for all osd to be up (22 retries left).
FAILED - RETRYING: wait for all osd to be up (21 retries left).
FAILED - RETRYING: wait for all osd to be up (20 retries left).
FAILED - RETRYING: wait for all osd to be up (19 retries left).
FAILED - RETRYING: wait for all osd to be up (18 retries left).
FAILED - RETRYING: wait for all osd to be up (17 retries left).
FAILED - RETRYING: wait for all osd to be up (16 retries left).
FAILED - RETRYING: wait for all osd to be up (15 retries left).
FAILED - RETRYING: wait for all osd to be up (14 retries left).
FAILED - RETRYING: wait for all osd to be up (13 retries left).
FAILED - RETRYING: wait for all osd to be up (12 retries left).
FAILED - RETRYING: wait for all osd to be up (11 retries left).
FAILED - RETRYING: wait for all osd to be up (10 retries left).
FAILED - RETRYING: wait for all osd to be up (9 retries left).
FAILED - RETRYING: wait for all osd to be up (8 retries left).
FAILED - RETRYING: wait for all osd to be up (7 retries left).
FAILED - RETRYING: wait for all osd to be up (6 retries left).
FAILED - RETRYING: wait for all osd to be up (5 retries left).
FAILED - RETRYING: wait for all osd to be up (4 retries left).
FAILED - RETRYING: wait for all osd to be up (3 retries left).
FAILED - RETRYING: wait for all osd to be up (2 retries left).
FAILED - RETRYING: wait for all osd to be up (1 retries left).
fatal: [osd2 -> osd1]: FAILED! => changed=false
attempts: 60
cmd:
- docker
- exec
- ceph-mon-osd1
- ceph
- --cluster
- ceph
- osd
- stat
- -f
- json
delta: '0:00:00.352410'
end: '2023-11-22 14:54:43.851913'
msg: ''
rc: 0
start: '2023-11-22 14:54:43.499503'
stderr: ''
stderr_lines: <omitted>
stdout: |2-
{"epoch":8,"num_osds":2,"num_up_osds":0,"osd_up_since":0,"num_in_osds":1,"osd_in_since":1700632444,"num_remapped_pgs":0}
stdout_lines: <omitted>
TASK [ceph-osd : wait for all osd to be up] *************************************************************
Wednesday 22 November 2023 14:44:14 +0900 (0:00:01.942) 0:04:00.910 ****
skipping: [osd1]
FAILED - RETRYING: wait for all osd to be up (60 retries left).
FAILED - RETRYING: wait for all osd to be up (59 retries left).
FAILED - RETRYING: wait for all osd to be up (58 retries left).
FAILED - RETRYING: wait for all osd to be up (57 retries left).
FAILED - RETRYING: wait for all osd to be up (56 retries left).
FAILED - RETRYING: wait for all osd to be up (55 retries left).
FAILED - RETRYING: wait for all osd to be up (54 retries left).
FAILED - RETRYING: wait for all osd to be up (53 retries left).
FAILED - RETRYING: wait for all osd to be up (52 retries left).
FAILED - RETRYING: wait for all osd to be up (51 retries left).
FAILED - RETRYING: wait for all osd to be up (50 retries left).
FAILED - RETRYING: wait for all osd to be up (49 retries left).
FAILED - RETRYING: wait for all osd to be up (48 retries left).
FAILED - RETRYING: wait for all osd to be up (47 retries left).
FAILED - RETRYING: wait for all osd to be up (46 retries left).
FAILED - RETRYING: wait for all osd to be up (45 retries left).
FAILED - RETRYING: wait for all osd to be up (44 retries left).
FAILED - RETRYING: wait for all osd to be up (43 retries left).
FAILED - RETRYING: wait for all osd to be up (42 retries left).
FAILED - RETRYING: wait for all osd to be up (41 retries left).
FAILED - RETRYING: wait for all osd to be up (40 retries left).
FAILED - RETRYING: wait for all osd to be up (39 retries left).
FAILED - RETRYING: wait for all osd to be up (38 retries left).
FAILED - RETRYING: wait for all osd to be up (37 retries left).
FAILED - RETRYING: wait for all osd to be up (36 retries left).
FAILED - RETRYING: wait for all osd to be up (35 retries left).
FAILED - RETRYING: wait for all osd to be up (34 retries left).
FAILED - RETRYING: wait for all osd to be up (33 retries left).
FAILED - RETRYING: wait for all osd to be up (32 retries left).
FAILED - RETRYING: wait for all osd to be up (31 retries left).
FAILED - RETRYING: wait for all osd to be up (30 retries left).
FAILED - RETRYING: wait for all osd to be up (29 retries left).
FAILED - RETRYING: wait for all osd to be up (28 retries left).
FAILED - RETRYING: wait for all osd to be up (27 retries left).
FAILED - RETRYING: wait for all osd to be up (26 retries left).
FAILED - RETRYING: wait for all osd to be up (25 retries left).
FAILED - RETRYING: wait for all osd to be up (24 retries left).
FAILED - RETRYING: wait for all osd to be up (23 retries left).
FAILED - RETRYING: wait for all osd to be up (22 retries left).
FAILED - RETRYING: wait for all osd to be up (21 retries left).
FAILED - RETRYING: wait for all osd to be up (20 retries left).
FAILED - RETRYING: wait for all osd to be up (19 retries left).
FAILED - RETRYING: wait for all osd to be up (18 retries left).
FAILED - RETRYING: wait for all osd to be up (17 retries left).
FAILED - RETRYING: wait for all osd to be up (16 retries left).
FAILED - RETRYING: wait for all osd to be up (15 retries left).
FAILED - RETRYING: wait for all osd to be up (14 retries left).
FAILED - RETRYING: wait for all osd to be up (13 retries left).
FAILED - RETRYING: wait for all osd to be up (12 retries left).
FAILED - RETRYING: wait for all osd to be up (11 retries left).
FAILED - RETRYING: wait for all osd to be up (10 retries left).
FAILED - RETRYING: wait for all osd to be up (9 retries left).
FAILED - RETRYING: wait for all osd to be up (8 retries left).
FAILED - RETRYING: wait for all osd to be up (7 retries left).
FAILED - RETRYING: wait for all osd to be up (6 retries left).
FAILED - RETRYING: wait for all osd to be up (5 retries left).
FAILED - RETRYING: wait for all osd to be up (4 retries left).
FAILED - RETRYING: wait for all osd to be up (3 retries left).
FAILED - RETRYING: wait for all osd to be up (2 retries left).
FAILED - RETRYING: wait for all osd to be up (1 retries left).
fatal: [osd2 -> osd1]: FAILED! => changed=false
attempts: 60
cmd:
- docker
- exec
- ceph-mon-osd1
- ceph
- --cluster
- ceph
- osd
- stat
- -f
- json
delta: '0:00:00.352410'
end: '2023-11-22 14:54:43.851913'
msg: ''
rc: 0
start: '2023-11-22 14:54:43.499503'
stderr: ''
stderr_lines: <omitted>
stdout: |2-
{"epoch":8,"num_osds":2,"num_up_osds":0,"osd_up_since":0,"num_in_osds":1,"osd_in_since":1700632444,"num_remapped_pgs":0}
stdout_lines: <omitted>
이부분도 네트워크 인터페이스 이슈였으며, 네트워크를 인터페이스를 cluster_network 로 지정해두니 해결되었습니다.
보통 하드웨어 리소스가 부족하거나, network 가 불안정할때 발생한다고 합니다.
ansible-playbook 구문 에러
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use
callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature
will be removed in version 2.16. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
ERROR! couldn't resolve module/action 'openstack.config_template.config_template'. This often indicates a misspelling, missing collection, or incorrect module path.
The error appears to be in '/root/ceph-ansible/roles/ceph-config/tasks/main.yml': line 138, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:
with_items:
- {{ foo }}
Should be written as:
with_items:
- "{{ foo }}"
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use
callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature
will be removed in version 2.16. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
ERROR! couldn't resolve module/action 'openstack.config_template.config_template'. This often indicates a misspelling, missing collection, or incorrect module path.
The error appears to be in '/root/ceph-ansible/roles/ceph-config/tasks/main.yml': line 138, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: "generate {{ cluster }}.conf configuration file"
^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:
with_items:
- {{ foo }}
Should be written as:
with_items:
- "{{ foo }}"
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use
callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature
will be removed in version 2.16. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
ERROR! couldn't resolve module/action 'openstack.config_template.config_template'. This often indicates a misspelling, missing collection, or incorrect module path.
The error appears to be in '/root/ceph-ansible/roles/ceph-config/tasks/main.yml': line 138, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: "generate {{ cluster }}.conf configuration file"
^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:
with_items:
- {{ foo }}
Should be written as:
with_items:
- "{{ foo }}"
ansible.posix 가 설치하려는 버전에 맞게 설치되어 있지 않아서 발생했던 이슈, pip3 install ansible.posix를 통해 설치 (requirement.txt 에도 있음)
또는 git brach의 버전과 배포하려는 ceph 의 버전 그리고 ansible 버전이 맞지 않아도 위와같이 구문 오류가 발생
어떤버전을 사용하고있는지 한번더 확인후 서로 맞춰준뒤 해결되었습니다.
fsid error
해당 로그를 삭제해서 가지고있지는 않지만 에러내용은 배포중 fsid 를 가져오는데 매우 긴시간 대기하다가. 위에서 발생한 keyring 에러로 다시 순환하는 현상이 있었습니다.
배포한 노드들을 모두 초기화하고, keyring error부분에서 설명했던 네트워크 인터페이스를 재정의하니 해결되었습니다.
When you login first time using a Social Login button, we collect your account public profile information shared by Social Login provider, based on your privacy settings. We also get your email address to automatically create an account for you in our website. Once your account is created, you'll be logged-in to this account.
DisagreeAgree
I allow to create an account
When you login first time using a Social Login button, we collect your account public profile information shared by Social Login provider, based on your privacy settings. We also get your email address to automatically create an account for you in our website. Once your account is created, you'll be logged-in to this account.