[Linux-cluster] GFS2 DLM problem on NVMes

Thu Nov 23 05:36:02 UTC 2017

Hi Dave,

When errors started to come out, the system got slower (perf degraded) and lots of error messages showed up repeatedly. Specifically, when the large amount of slab memory was reclaimed such as 9GB to 6GB, the about 30 error messages came out.

‘send_repeat_remove’ messages were printed about 5 times intermittently as well. But the system didn’t get stuck.

We are running JMeter tool to simulate the CDN workloads and there are 2 million files(3MB size per file) in my storage that are read by 4 host servers.

160Gbps bandwidth were reached using 16 client servers with 10Gb and 4 host servers with 40Gb that runs GFS. Hope this helps you understand my usage.

eric

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com]
Sent: Thursday, November 23, 2017 12:04 AM
To: 장홍석/SW-Defined Storage Lab <echang at sk.com>
Cc: linux-cluster at redhat.com; swhiteho at redhat.com; mferrell at redhat.com; 성백재/SW-Defined Storage Lab <bj.sung at sk.com>; 윤진혁/SW-Defined Storage Lab <jhyoon01 at sk.com>; 민항준/SW-Defined Storage Lab <hangjun.min at sk.com>
Subject: Re: [Linux-cluster] GFS2 DLM problem on NVMes

On Wed, Nov 22, 2017 at 04:32:13AM +0000, Eric H. Chang wrote:

> We  ve tested with different   toss_secs   as advised. When we

> configured it as 1000, we saw the   send_repeat_remove   log after

> 1000sec. We can test with other values on   toss_secs  , but we think

> it would have the same problem potentially when freeing up the slab

> after the configured sec.

Do you see many of these messages?  Do gfs operations become stuck after they appear?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20171123/6024e88e/attachment.htm>