CentOS 7 – zfs 安裝筆記

首先先查看一下目前是使用哪一個版本的 release

# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

然後安裝對應的版本 zfs-release 把 el7_X 改為 el7_6

# yum install http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm

然後安裝 zfs ,目前預設的是使用 dkms ,所以請記得先把 dkms 安裝好。或是手動去改變 /etc/yum.repos.d/zfs.repo 檔案,把檔案內的 dkms 內 enabled=1 改為 0 然後把 kmod 內 enabled=0 改為 1 就可以了。

# yum install zfs

然後,看想不想重新開機都可,重新開機就可以把 zfs module 載入。或是用下列指令也可以:

# modprobe zfs

可以用 lsblk 指令列表你目前的硬碟。

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 558.4G 0 disk
├─sda1 8:1 0 37.3G 0 part /
├─sda2 8:2 0 7.5G 0 part [SWAP]
└─sda3 8:3 0 513.7G 0 part /data
sdc 8:32 0 6.5T 0 disk
└─sdc1 8:33 0 6.5T 0 part
sde 8:64 0 5.5T 0 disk
├─sde1 8:65 0 5.5T 0 part
└─sde9 8:73 0 8M 0 part
sdf 8:80 0 5.5T 0 disk
├─sdf1 8:81 0 5.5T 0 part
└─sdf9 8:89 0 8M 0 part
sdg 8:96 0 5.5T 0 disk
├─sdg1 8:97 0 5.5T 0 part
└─sdg9 8:105 0 8M 0 part
sdh 8:112 0 5.5T 0 disk
├─sdh1 8:113 0 5.5T 0 part
└─sdh9 8:121 0 8M 0 part
sdi 8:128 0 5.5T 0 disk
├─sdi1 8:129 0 5.5T 0 part
└─sdi9 8:137 0 8M 0 part
sdj 8:144 0 5.5T 0 disk
├─sdj1 8:145 0 5.5T 0 part
└─sdj9 8:153 0 8M 0 part
sdk 8:160 0 5.5T 0 disk
├─sdk1 8:161 0 5.5T 0 part
└─sdk9 8:169 0 8M 0 part
sdl 8:176 0 5.5T 0 disk
├─sdl1 8:177 0 5.5T 0 part
└─sdl9 8:185 0 8M 0 part
sdm 8:192 0 5.5T 0 disk
├─sdm1 8:193 0 5.5T 0 part
└─sdm9 8:201 0 8M 0 part
sdn 8:208 0 5.5T 0 disk
├─sdn1 8:209 0 5.5T 0 part
└─sdn9 8:217 0 8M 0 part
sdo 8:224 0 5.5T 0 disk
├─sdo1 8:225 0 5.5T 0 part
└─sdo9 8:233 0 8M 0 part
sdp 8:240 0 5.5T 0 disk
└─sdp1 8:241 0 5.5T 0 part

在zfs 可以建立三種 pools :

  • stripped pool
  • mirrored pool
  • raid pool

先建立一個 stripped pool 來玩看看,看你自己有幾顆就用幾顆,我先抓個 10 顆硬碟來建立zdisk1。你也可以自己命名 zfs pool 名稱,在這我用 zdisk1 名稱。

# zpool create zdisk1 /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn

建立後,列表一下檢查是否建立成功。

# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zdisk1 54.4T 112K 54.4T - 0% 0% 1.00x ONLINE -

然後 df 一下,神奇的事情發生了。居然就掛載上去了,比我想像中的還容易。

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 38G 16G 22G 42% /
devtmpfs 95G 0 95G 0% /dev
tmpfs 95G 0 95G 0% /dev/shm
tmpfs 95G 18M 95G 1% /run
tmpfs 95G 0 95G 0% /sys/fs/cgroup
/dev/sda3 514G 34M 514G 1% /data
tmpfs 19G 0 19G 0% /run/user/0
zdisk1 53T 0 53T 0% /zdisk1

測試一下硬碟速度,記得切換到 zdisk1

# dd if=/dev/zero of=test bs=64k count=64k conv=fdatasync
65536+0 records in
65536+0 records out
4294967296 bytes (4.3 GB) copied, 7.41308 s, 579 MB/s

用下列指令增加一顆硬碟到 pool

# zpool add zdisk1 /dev/sdo

可以用 zpool status 來檢查目前狀態。

# zpool status
pool: zdisk1
  state: ONLINE
   scan: none requested
 config:

NAME        STATE     READ WRITE CKSUM
zdisk1      ONLINE       0     0     0
  sde       ONLINE       0     0     0
  sdf       ONLINE       0     0     0
  sdg       ONLINE       0     0     0
  sdh       ONLINE       0     0     0
  sdi       ONLINE       0     0     0
  sdj       ONLINE       0     0     0
  sdk       ONLINE       0     0     0
  sdl       ONLINE       0     0     0
  sdm       ONLINE       0     0     0
  sdn       ONLINE       0     0     0
  sdo       ONLINE       0     0     0

再來練習一下建立 mirror pool

# zpool create zdisk2 mirror /dev/sdp /dev/sdq

查看一下建立的狀況

# zpool status zdisk2
   pool: zdisk2
  state: ONLINE
   scan: none requested
 config:

NAME        STATE     READ WRITE CKSUM
zdisk2      ONLINE       0     0     0
  mirror-0  ONLINE       0     0     0
    sdp     ONLINE       0     0     0
    sdq     ONLINE       0     0     0

我好奇從新開機後會不會自動掛載 zfs,結果重開後用 zpool list 指令查看。居然 zdisk1 不見了。後來用下列指令恢復

# zpool import zdisk1

假設沒有掛載,可以用下列指令掛載

# zfs mount zdisk1

卸除掛載用下列指令

# zfs unmount zdisk1

可以用下列指令清除 zfs pool

# zpool destroy zdisk1
# zpool destroy zdisk2

接下來來建立 raid pool,在 raid pool 可以建立容錯掉幾顆的 raid。就掉一顆的設定就用 raidz1,這應該相當於 RAID5。容錯壞掉兩顆的設定是 raidz2,相當於 RAID6。容錯三顆raidz3,應該是這樣? 我也是新手,希望不要誤人子弟。

這次來建立一個 raidz2 然後指派一顆 spare 硬碟,指令如下:

# zpool create zdisk1 raidz2 /dev/sdc /dev/sdd /dev/sde /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp spare /dev/sdq

然後拔掉兩顆硬碟測試看看資料會不會掉?

# zpool status
   pool: zdisk1
  state: DEGRADED
 status: One or more devices are faulted in response to persistent errors.
     Sufficient replicas exist for the pool to continue functioning in a
     degraded state.
 action: Replace the faulted device, or use 'zpool clear' to mark the device
     repaired.
   scan: none requested

NAME        STATE     READ WRITE CKSUM
zdisk1      DEGRADED     0     0     0
  raidz2-0  DEGRADED     0     0     0
    sdc     ONLINE       0     0     0
    sdd     ONLINE       0     0     0
    sde     ONLINE       0     0     0
    sdf     ONLINE       0     0     0
    sdg     ONLINE       0     0     0
    sdh     ONLINE       0     0     0
    sdi     ONLINE       0     0     0
    sdj     ONLINE       0     0     0
    sdk     ONLINE       0     0     0
    sdl     ONLINE       0     0     0
    sdm     ONLINE       0     0     0
    sdn     ONLINE       0     0     0
    sdo     FAULTED      0     0     0
  too many errors
      sdp     FAULTED      0     0     0
  too many errorsspares
    sdq       AVAIL

很好資料沒有掉,但我以為 spare 會自己自動 rebuild ,從上面的訊息看起來沒有。沒關係自己下指令讓它重建。

# zpool replace zdisk1 sdp sdq

下完指令後,再查看一下狀態。

# zpool status
pool: zdisk1
  state: DEGRADED
 status: One or more devices are faulted in response to persistent errors.
     Sufficient replicas exist for the pool to continue functioning in a
     degraded state.
 action: Replace the faulted device, or use 'zpool clear' to mark the device
     repaired.
   scan: resilvered 418M in 0h0m with 0 errors on Wed Aug  7 15:37:00 2019
 config:

NAME        STATE     READ WRITE CKSUM
zdisk1      DEGRADED     0     0     0
  raidz2-0  DEGRADED     0     0     0
    sdc     ONLINE       0     0     0
    sdd     ONLINE       0     0     0
    sde     ONLINE       0     0     0
    sdf     ONLINE       0     0     0
    sdg     ONLINE       0     0     0
    sdh     ONLINE       0     0     0
    sdi     ONLINE       0     0     0
    sdj     ONLINE       0     0     0
    sdk     ONLINE       0     0     0
    sdl     ONLINE       0     0     0
    sdm     ONLINE       0     0     0
    sdn     ONLINE       0     0     0
    sdo     FAULTED      0     0     0  too many errors
    spare-11  DEGRADED     0     0     0
      sdp     FAULTED      0     0     0  too many errors
      sdq     ONLINE       0     0     0
spares
  sdq         INUSE     currently in use

把硬碟插回去後 zfs 會自行修復 (resilver),前提是硬碟沒壞掉的狀況。

# zpool status
pool: zdisk1
  state: ONLINE
 status: One or more devices is currently being resilvered.  The pool will
     continue to function, possibly in a degraded state.
 action: Wait for the resilver to complete.
   scan: resilver in progress since Wed Aug  7 16:43:19 2019
     30.3G scanned out of 72.6G at 270M/s, 0h2m to go
     2.50G resilvered, 41.75% done
 config:

NAME          STATE     READ WRITE CKSUM
zdisk1        ONLINE       0     0     0
  raidz2-0    ONLINE       0     0     0
    sdc       ONLINE       0     0     0
    sdd       ONLINE       0     0     0
    sde       ONLINE       0     0     0
    sdf       ONLINE       0     0     0
    sdg       ONLINE       0     0     0
    sdh       ONLINE       0     0     0
    sdi       ONLINE       0     0     0
    sdj       ONLINE       0     0     0
    sdk       ONLINE       0     0     0
    sdl       ONLINE       0     0     0
    sdm       ONLINE       0     0     0
    sdn       ONLINE       0     0     0
    sdo       ONLINE       0     0     0
    spare-11  ONLINE       0     0     0
      sdp     ONLINE       0     0     0  (resilvering)
      sdq     ONLINE       0     0     0
spares
  sdq         INUSE     currently in use

然後要把壞掉的硬碟拿掉指令如下:

# zpool offline zdisk1 sdp
# zpool detach zdisk1 sdp

要把更換顆硬碟加入 pool 當作 spare 來用,就用下列指令:

# zpool add zdisk1 spare /dev/sdp

如果要讓它自動重建,要使用下列指令:

# zpool set autoreplace=on zdisk1

另外,zpool get all 這個指令可以取得目前設定狀態

# zpool get all
NAME PROPERTY VALUE SOURCE
zdisk1 size 65T -
zdisk1 capacity 0% -
zdisk1 altroot - default
zdisk1 health DEGRADED -
zdisk1 guid 8790863350631634119 -
zdisk1 version - default
zdisk1 bootfs - default
zdisk1 delegation on default
zdisk1 autoreplace on local
zdisk1 cachefile - default
zdisk1 failmode wait default
zdisk1 listsnapshots off default
zdisk1 autoexpand off default
zdisk1 dedupditto 0 default
zdisk1 dedupratio 1.00x -
zdisk1 free 64.9T -
zdisk1 allocated 72.6G -
zdisk1 readonly off -
zdisk1 ashift 0 default
zdisk1 comment - default
zdisk1 expandsize - -
zdisk1 freeing 0 -
zdisk1 fragmentation 0% -
zdisk1 leaked 0 -
zdisk1 multihost off default
zdisk1 feature@async_destroy enabled local
zdisk1 feature@empty_bpobj enabled local
zdisk1 feature@lz4_compress active local
zdisk1 feature@multi_vdev_crash_dump enabled local
zdisk1 feature@spacemap_histogram active local
zdisk1 feature@enabled_txg active local
zdisk1 feature@hole_birth active local
zdisk1 feature@extensible_dataset active local
zdisk1 feature@embedded_data active local
zdisk1 feature@bookmarks enabled local
zdisk1 feature@filesystem_limits enabled local
zdisk1 feature@large_blocks enabled local
zdisk1 feature@large_dnode enabled local
zdisk1 feature@sha512 enabled local
zdisk1 feature@skein enabled local
zdisk1 feature@edonr enabled local
zdisk1 feature@userobj_accounting active local

加入 cache 或 log 可以幫助提高讀寫效能,通常是使用SSD或是NVMe來當作快取。這兩個我還沒搞清楚XD

# zpool add zdisk1 cache /dev/sdr
# zpool add zdisk1 log /dev/sds

另外,檢查 i/o 狀態也是很重要的,可以用下列指令:

# zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
------------------------------------------ ----- ----- ----- ----- ----- -----
zdisk1 9.97G 65.0T 70 568 3.69M 56.7M
raidz2 9.97G 65.0T 70 546 3.69M 53.9M
scsi-3600d02310000c75f75aa5ada42cbbeb9 - - 6 45 343K 4.57M
scsi-3600d02310000c75f47199b320c4da724 - - 6 46 343K 4.57M
scsi-3600d02310000c75f6a6ea613172ca9ce - - 6 45 344K 4.57M
sdh - - 6 45 343K 4.57M
sdi - - 6 46 343K 4.57M
sdj - - 6 45 343K 4.57M
sdk - - 6 45 343K 4.57M
sdl - - 6 45 343K 4.57M
sdm - - 6 45 343K 4.57M
sdn - - 6 45 343K 4.57M
sdo - - 6 45 343K 4.57M
sdp - - 0 60 790 5.26M
logs - - - - - -
sds 128K 548G 0 153 920 19.1M
cache - - - - - -
sdr 3.66G 547G 0 95 132 11.8M
------------------------------------------ ----- ----- ----- ----- ----- -----

如果要把 cache 和 log 移除可以用下面指令:

# zpool remove zdisk1 sds
# zpool remove zdisk1 sdr

以上初學筆記。如果有寫錯歡迎幫忙指正,謝謝。

補充:如果要將一個已經建立好的 zfs 一般地碟轉換成鏡像 (mirror) 可以參考下面:

# zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:51 with 0 errors on Sun Aug 14 00:24:52 2022
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  sdb       ONLINE       0     0     0

# zpool attach data /dev/sdb /dev/sdc

# zpool status
  pool: data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Aug 14 07:04:40 2022
	31.0G scanned at 7.76G/s, 1.77G issued at 453M/s, 31.0G total
	1.76G resilvered, 5.70% done, 0 days 00:01:06 to go
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0  (resilvering)

errors: No known data errors

# zpool status
  pool: data
 state: ONLINE
  scan: resilvered 31.1G in 0 days 00:13:27 with 0 errors on Sun Aug 14 07:18:07 2022
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0

errors: No known data errors

重新開機的時候,使用 “zpool status” 指令會遇到 “no pools available”的問題,只要用下面的指令就能取回設定

# zpool import zdisk1