Ceph: Difference between revisions
Jump to navigation
Jump to search
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<!-- == Subpage Table of Contents == --> | |||
'''Subpage Table of Contents''' | |||
{{Special:PrefixIndex/{{PAGENAME}}/}} | |||
<br/> | |||
== Ceph == | |||
== Hardware Recommendations == | |||
hardware recommendations — Ceph Documentation | |||
https://docs.ceph.com/en/quincy/start/hardware-recommendations/ | |||
== Status == | |||
ceph status | |||
# OR: ceph -s | |||
Example: | |||
<pre> | |||
# ceph status | |||
cluster: | |||
id: ff74f760-84b2-4dc4-b518-8408e3f10779 | |||
health: HEALTH_OK | |||
services: | |||
mon: 3 daemons, quorum vm-05,vm-06,vm-07 (age 12m) | |||
mgr: vm-07(active, since 47m), standbys: vm-06, vm-05 | |||
mds: 1/1 daemons up, 2 standby | |||
osd: 3 osds: 3 up (since 4m), 3 in (since 4m) | |||
data: | |||
volumes: 1/1 healthy | |||
pools: 4 pools, 97 pgs | |||
objects: 3.68k objects, 13 GiB | |||
usage: 38 GiB used, 3.7 TiB / 3.7 TiB avail | |||
pgs: 97 active+clean | |||
io: | |||
client: 107 KiB/s rd, 4.0 KiB/s wr, 0 op/s rd, 0 op/s wr | |||
</pre> | |||
== Health == | |||
Health summary: | |||
osd health | |||
# good health: | |||
HEALTH_OK | |||
# bad health: | |||
HEALTH_WARN Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 54 slow ops, oldest one blocked for 212 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.lmt-vm-05] have slow ops. | |||
Health details: | |||
osd health detail | |||
# good health: | |||
HEALTH_OK | |||
<pre> | |||
# bad health: | |||
HEALTH_WARN 1 osds down; 1 host (1 osds) down; Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 49 slow ops, oldest one blocked for 306 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.prox-05] have slow ops. | |||
[WRN] OSD_DOWN: 1 osds down | |||
osd.5 (root=default,host=prox-06) is down | |||
[WRN] OSD_HOST_DOWN: 1 host (1 osds) down | |||
host prox-06 (root=default) (1 osds) is down | |||
[WRN] PG_AVAILABILITY: Reduced data availability: 47 pgs inactive, 47 pgs peering | |||
pg 3.0 is stuck peering for 6m, current state peering, last acting [3,5,4] | |||
pg 3.3 is stuck peering for 7w, current state peering, last acting [5,1,0] | |||
... | |||
</pre> | |||
== Watch == | |||
Watch live changes: | |||
ceph -w | |||
== OSD == | == OSD == | ||
Line 47: | Line 125: | ||
1 ssd 1.50069 osd.1 up 1.00000 1.00000 | 1 ssd 1.50069 osd.1 up 1.00000 1.00000 | ||
</pre> | </pre> | ||
List down tree OSD nodes: <ref>https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/</ref> | |||
ceph osd tree down | |||
==== osd stat ==== | ==== osd stat ==== | ||
Line 66: | Line 147: | ||
=== Deleted OSD === | === Deleted OSD === | ||
First mark it out: | |||
ceph osd out osd.{osd-num} | |||
<ref>Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/</ref> | Mark it down: | ||
ceph osd down osd.{osd-num} | |||
Remove it: | |||
ceph osd rm osd.{osd-num} | |||
Check tree for removal: | |||
ceph osd tree | |||
--- | |||
If you get an error that it is busy.. <ref>https://medium.com/@george.shuklin/how-to-remove-osd-from-ceph-cluster-b4c37cc0ec87</ref> | |||
Go to host that has the OSD and stop the service: | |||
systemctl stop ceph-osd@{osd-num} | |||
Remove it again: | |||
ceph osd rm osd.{osd-num} | |||
Check tree for removal: | |||
ceph osd tree | |||
If 'ceph osd tree' reports 'DNE (do not exist), then do the following... | |||
Remove from the CRUSH: | |||
ceph osd crush rm osd.{osd-num} | |||
Clear auth: | |||
ceph auth del osd.{osd-num}. | |||
ref: <ref>Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/</ref> | |||
=== Create OSD === | === Create OSD === | ||
Create OSD:<ref>https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_osd_create</ref> | |||
pveceph osd create /dev/sd[X] | |||
If the disk was in use before (for example, for ZFS or as an OSD) you first need to zap all traces of that usage: | |||
ceph-volume lvm zap /dev/sd[X] --destroy | |||
Create OSD ID: | Create OSD ID: | ||
Line 95: | Line 214: | ||
== References == | == References == | ||
{references} | {{references}} | ||
== keywords == | == keywords == |
Latest revision as of 03:54, 22 April 2024
Subpage Table of Contents
Ceph
Hardware Recommendations
hardware recommendations — Ceph Documentation https://docs.ceph.com/en/quincy/start/hardware-recommendations/
Status
ceph status # OR: ceph -s
Example:
# ceph status cluster: id: ff74f760-84b2-4dc4-b518-8408e3f10779 health: HEALTH_OK services: mon: 3 daemons, quorum vm-05,vm-06,vm-07 (age 12m) mgr: vm-07(active, since 47m), standbys: vm-06, vm-05 mds: 1/1 daemons up, 2 standby osd: 3 osds: 3 up (since 4m), 3 in (since 4m) data: volumes: 1/1 healthy pools: 4 pools, 97 pgs objects: 3.68k objects, 13 GiB usage: 38 GiB used, 3.7 TiB / 3.7 TiB avail pgs: 97 active+clean io: client: 107 KiB/s rd, 4.0 KiB/s wr, 0 op/s rd, 0 op/s wr
Health
Health summary:
osd health
# good health: HEALTH_OK
# bad health: HEALTH_WARN Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 54 slow ops, oldest one blocked for 212 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.lmt-vm-05] have slow ops.
Health details:
osd health detail
# good health: HEALTH_OK
# bad health: HEALTH_WARN 1 osds down; 1 host (1 osds) down; Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 49 slow ops, oldest one blocked for 306 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.prox-05] have slow ops. [WRN] OSD_DOWN: 1 osds down osd.5 (root=default,host=prox-06) is down [WRN] OSD_HOST_DOWN: 1 host (1 osds) down host prox-06 (root=default) (1 osds) is down [WRN] PG_AVAILABILITY: Reduced data availability: 47 pgs inactive, 47 pgs peering pg 3.0 is stuck peering for 6m, current state peering, last acting [3,5,4] pg 3.3 is stuck peering for 7w, current state peering, last acting [5,1,0] ...
Watch
Watch live changes:
ceph -w
OSD
List OSDs
volume lvm list
Note: only shows local OSDs..
ceph-volume lvm list
Example:
====== osd.0 ======= [block] /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7 block device /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7 block uuid uvsm7p-c9KU-iaVe-GJGv-NBRM-xGrr-XPf3eB cephx lockbox secret cluster fsid ff74f760-84b2-4dc4-b518-8408e3f10779 cluster name ceph crush device class encrypted 0 osd fsid ff991dbd-7698-44ab-ad90-102340ec05c7 osd id 0 osdspec affinity type block vdo 0 devices /dev/fioa
osd tree
ceph osd tree
Example:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.69246 root default -3 1.09589 host vm-05 0 ssd 1.09589 osd.0 up 1.00000 1.00000 -7 1.09589 host vm-06 2 ssd 1.09589 osd.2 down 0 1.00000 -5 1.50069 host vm-07 1 ssd 1.50069 osd.1 up 1.00000 1.00000
List down tree OSD nodes: [2]
ceph osd tree down
osd stat
ceph osd stat
osd dump
ceph osd dump
Mark OSD Online (In)
ceph osd in [OSD-NUM]
Mark OSD Offline (Out)
ceph osd out [OSD-NUM]
Deleted OSD
First mark it out:
ceph osd out osd.{osd-num}
Mark it down:
ceph osd down osd.{osd-num}
Remove it:
ceph osd rm osd.{osd-num}
Check tree for removal:
ceph osd tree
---
If you get an error that it is busy.. [3]
Go to host that has the OSD and stop the service:
systemctl stop ceph-osd@{osd-num}
Remove it again:
ceph osd rm osd.{osd-num}
Check tree for removal:
ceph osd tree
If 'ceph osd tree' reports 'DNE (do not exist), then do the following...
Remove from the CRUSH:
ceph osd crush rm osd.{osd-num}
Clear auth:
ceph auth del osd.{osd-num}.
ref: [4]
Create OSD
Create OSD:[5]
pveceph osd create /dev/sd[X]
If the disk was in use before (for example, for ZFS or as an OSD) you first need to zap all traces of that usage:
ceph-volume lvm zap /dev/sd[X] --destroy
Create OSD ID:
ceph osd create # will generate the next ID in sequence
Create directory:
mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
Init data directory:
ceph-osd -i {osd-num} --mkfs --mkkey
Register:
ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
Add to CRUSH map:
ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
POOL
Pool Stats
ceph osd pool stats
References
- ↑ https://docs.ceph.com/en/quincy/ceph-volume/lvm/list/
- ↑ https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/
- ↑ https://medium.com/@george.shuklin/how-to-remove-osd-from-ceph-cluster-b4c37cc0ec87
- ↑ Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/
- ↑ https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_osd_create