[Linux-cachefs] 'bad page state' error

Rob Bos rbos at sfu.ca
Sun Jun 23 19:51:13 UTC 2013


Got to work today, reproduced the crash error on a second VM in a 
different cluster under RHEL6.4, with the 373 kernel patch set.

Doing full (non-canceled) file copies is generally okay, doesn't seem to 
crash anything.  Canceling still makes it generate a page state error.

I ran a torrent session, which downloaded to 99%, and then the VM 
crashed. During the torrent transfer, no 'bad page state' errors, but 
once the client hit the final few bits, VM crashed with the attached 
screenshot error dump on console.

It also looks like caching isn't working properly with torrented data.  
'cp' works fine retrieving the data from the cache, but torrent chunks 
seem to be getting pulled from the CIFS share, not the cache. Eventually 
the disk fills up and the VM crashes instantly after mounting the share.

So there's probably some problem with identifying whether a bit of data 
is in the cache or not.

I should probably just shelve this project for now.

On 6/19/2013 11:56 AM, Rob Bos wrote:
> On 6/19/2013 3:03 AM, David Howells wrote:
>> Rob Bos <rbos at sfu.ca> wrote:
>>
>>> I applied the 373 patchset and compiled a version with CIFS_FSCACHE 
>>> enabled.
>>>
>>> Same problem. Got ~2GiB into a cp before it started generating 'bad 
>>> page
>>> state' errors.
>> Okay, thanks.  Looks like there's still another bug in there:-(
>>
>> When you say you git 2GiB into a cp, were you actually copying a file 
>> of that
>> size?  Or was this cumulative?
>
> Bunch of small files. Matlab installer, specifically.
>
> I was working on duplicating the error messages this morning by 
> copying files of known size in a controlled fashion, and crashed the 
> VM (another process was reading data from it at the time). Had the 
> VMware guy capture a screenshot of the oops, attached, but alas, no 
> scrollback.
>
> When I get some time I'll write up a quick script to repeatedly 
> write/read files of a fixed size and stop when an error is found in 
> dmesg, under more controlled circumstances. That might tell us if it's 
> a certain file access doing it, or a certain amount of writes, or reads.
>
>>
>> If I could work out how to reproduce this reliably, there's a good 
>> chance I'll
>> be able to fix it.
>>
>> David
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2013-06-23_115403.png
Type: image/png
Size: 120272 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cachefs/attachments/20130623/f7b68fca/attachment.png>


More information about the Linux-cachefs mailing list