Ceph: Difference between revisions

Latest revision as of 03:54, 22 April 2024

Subpage Table of Contents

Ceph

Hardware Recommendations

hardware recommendations — Ceph Documentation
https://docs.ceph.com/en/quincy/start/hardware-recommendations/

Status

ceph status
# OR: ceph -s

Example:

# ceph status
  cluster:
    id:     ff74f760-84b2-4dc4-b518-8408e3f10779
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum vm-05,vm-06,vm-07 (age 12m)
    mgr: vm-07(active, since 47m), standbys: vm-06, vm-05
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 4m), 3 in (since 4m)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 3.68k objects, 13 GiB
    usage:   38 GiB used, 3.7 TiB / 3.7 TiB avail
    pgs:     97 active+clean

  io:
    client:   107 KiB/s rd, 4.0 KiB/s wr, 0 op/s rd, 0 op/s wr

Health

Health summary:

osd health

# good health:
HEALTH_OK

# bad health:
HEALTH_WARN Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 54 slow ops, oldest one blocked for 212 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.lmt-vm-05] have slow ops.

Health details:

osd health detail

# good health:
HEALTH_OK

# bad health:
HEALTH_WARN 1 osds down; 1 host (1 osds) down; Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 49 slow ops, oldest one blocked for 306 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.prox-05] have slow ops.
[WRN] OSD_DOWN: 1 osds down
    osd.5 (root=default,host=prox-06) is down
[WRN] OSD_HOST_DOWN: 1 host (1 osds) down
    host prox-06 (root=default) (1 osds) is down
[WRN] PG_AVAILABILITY: Reduced data availability: 47 pgs inactive, 47 pgs peering
    pg 3.0 is stuck peering for 6m, current state peering, last acting [3,5,4]
    pg 3.3 is stuck peering for 7w, current state peering, last acting [5,1,0]
...

Watch

Watch live changes:

ceph -w

OSD

List OSDs

volume lvm list

Note: only shows local OSDs..

ceph-volume lvm list

Example:

====== osd.0 =======

  [block]       /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7

      block device              /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7
      block uuid                uvsm7p-c9KU-iaVe-GJGv-NBRM-xGrr-XPf3eB
      cephx lockbox secret
      cluster fsid              ff74f760-84b2-4dc4-b518-8408e3f10779
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  ff991dbd-7698-44ab-ad90-102340ec05c7
      osd id                    0
      osdspec affinity
      type                      block
      vdo                       0
      devices                   /dev/fioa

^[1]

osd tree

ceph osd tree

Example:

ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         3.69246  root default
-3         1.09589      host vm-05
 0    ssd  1.09589          osd.0           up   1.00000  1.00000
-7         1.09589      host vm-06
 2    ssd  1.09589          osd.2         down         0  1.00000
-5         1.50069      host vm-07
 1    ssd  1.50069          osd.1           up   1.00000  1.00000

List down tree OSD nodes: ^[2]

ceph osd tree down

osd stat

ceph osd stat

osd dump

ceph osd dump

Mark OSD Online (In)

 ceph osd in [OSD-NUM]

Mark OSD Offline (Out)

 ceph osd out [OSD-NUM]

Deleted OSD

First mark it out:

ceph osd out osd.{osd-num}

Mark it down:

ceph osd down osd.{osd-num}

Remove it:

ceph osd rm osd.{osd-num}

Check tree for removal:

ceph osd tree

---

If you get an error that it is busy.. ^[3]

Go to host that has the OSD and stop the service:

systemctl stop ceph-osd@{osd-num}

Remove it again:

ceph osd rm osd.{osd-num}

Check tree for removal:

ceph osd tree

If 'ceph osd tree' reports 'DNE (do not exist), then do the following...

Remove from the CRUSH:

ceph osd crush rm osd.{osd-num}

Clear auth:

ceph auth del osd.{osd-num}.

ref: ^[4]

Create OSD

Create OSD:^[5]

pveceph osd create /dev/sd[X]

If the disk was in use before (for example, for ZFS or as an OSD) you first need to zap all traces of that usage:

ceph-volume lvm zap /dev/sd[X] --destroy

Create OSD ID:

ceph osd create
 # will generate the next ID in sequence

Create directory:

mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}

Init data directory:

ceph-osd -i {osd-num} --mkfs --mkkey

Register:

ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring

Add to CRUSH map:

ceph osd crush add {id-or-name} {weight}  [{bucket-type}={bucket-name} ...]

POOL

Pool Stats

ceph osd pool stats

References

keywords

[1] ttps://docs.ceph.com/en/quincy/ceph-volume/lvm/list/

[2] ttps://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/

[3] ttps://medium.com/@george.shuklin/how-to-remove-osd-from-ceph-cluster-b4c37cc0ec87

[4] Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/

[5] ttps://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_osd_create

[1]

[2]

[3]

[4]

[5]

Ceph: Difference between revisions

Latest revision as of 03:54, 22 April 2024

Ceph

Hardware Recommendations

Status

Health

Watch

OSD

List OSDs

volume lvm list

osd tree

osd stat

osd dump

Mark OSD Online (In)

Mark OSD Offline (Out)

Deleted OSD

Create OSD

POOL

Pool Stats

References

keywords

Navigation menu

Search