[linux-lvm] LVM + software Raid 5 : freeze when using snapshots.

Serge Rossi Serge.Rossi at renault.com
Wed Jun 4 10:53:02 UTC 2003

I'm using two IBM file servers (Samba, NFS...) running RH 7.2 (with 
latest updates) and LVM over software Raid 5.

RH Kernel 2.4.20-13.7 (Including LVM 1.0.3)
lvm tools and lib 1.0.5

Hardware : Dual Xeon MP (HT activated) + 1 GB Ram

Each server have 14 * 73 GB Disks for the datas.
md0 and md1 are RAID 5 units of 7 disks each, both included in a VG.

The VG is subdivded into several 100 GB LV and some free space for 

Without snapshots, everything runs perfectly : 2 months without problems 
with 1500 Windows PCs (SMB clients) and 500 SGI Workstations (NFS 
clients). Load on the servers is between 0.1 and 1.0.

We tried to snapshot 3 LV each night. After 2 or 3 days (6 or 9 
snapshots stored in the VG), the load on the servers was between 2.0 and 
3.0 but sometimes one of the servers stop responding.

The kernel is still responsive to ping or Sysrq but not on the command 
line, even on the console.

No activity at all on the disks.

Sysrq S only syncs the system disk (not in the VG) but did not sync the
LVs. Reboot is the only way out.

Nothing special in messages (no Oops).

We removed the snapshots and the servers are stable again.
I have not tried to test what is the number of snapshots required to 
freeze the servers (difficult on production servers !!!).

Did anybody have any idea on this problem ?

Are the servers too weak to handle the load caused by the snapshots ?

