[Linux-cachefs] 'bad page state' error
Rob Bos
rbos at sfu.ca
Sun Jun 23 19:51:13 UTC 2013
Got to work today, reproduced the crash error on a second VM in a
different cluster under RHEL6.4, with the 373 kernel patch set.
Doing full (non-canceled) file copies is generally okay, doesn't seem to
crash anything. Canceling still makes it generate a page state error.
I ran a torrent session, which downloaded to 99%, and then the VM
crashed. During the torrent transfer, no 'bad page state' errors, but
once the client hit the final few bits, VM crashed with the attached
screenshot error dump on console.
It also looks like caching isn't working properly with torrented data.
'cp' works fine retrieving the data from the cache, but torrent chunks
seem to be getting pulled from the CIFS share, not the cache. Eventually
the disk fills up and the VM crashes instantly after mounting the share.
So there's probably some problem with identifying whether a bit of data
is in the cache or not.
I should probably just shelve this project for now.
On 6/19/2013 11:56 AM, Rob Bos wrote:
> On 6/19/2013 3:03 AM, David Howells wrote:
>> Rob Bos <rbos at sfu.ca> wrote:
>>
>>> I applied the 373 patchset and compiled a version with CIFS_FSCACHE
>>> enabled.
>>>
>>> Same problem. Got ~2GiB into a cp before it started generating 'bad
>>> page
>>> state' errors.
>> Okay, thanks. Looks like there's still another bug in there:-(
>>
>> When you say you git 2GiB into a cp, were you actually copying a file
>> of that
>> size? Or was this cumulative?
>
> Bunch of small files. Matlab installer, specifically.
>
> I was working on duplicating the error messages this morning by
> copying files of known size in a controlled fashion, and crashed the
> VM (another process was reading data from it at the time). Had the
> VMware guy capture a screenshot of the oops, attached, but alas, no
> scrollback.
>
> When I get some time I'll write up a quick script to repeatedly
> write/read files of a fixed size and stop when an error is found in
> dmesg, under more controlled circumstances. That might tell us if it's
> a certain file access doing it, or a certain amount of writes, or reads.
>
>>
>> If I could work out how to reproduce this reliably, there's a good
>> chance I'll
>> be able to fix it.
>>
>> David
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2013-06-23_115403.png
Type: image/png
Size: 120272 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cachefs/attachments/20130623/f7b68fca/attachment.png>
More information about the Linux-cachefs
mailing list