<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">2016-10-11 11:56 GMT+03:00 Pino Toscano <span dir="ltr"><<a href="mailto:ptoscano@redhat.com" target="_blank">ptoscano@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Saturday, 8 October 2016 18:27:21 CEST Matteo Cafasso wrote:<br> > Patch ready for merging.<br> ><br> > v4:<br> ><br> > - check return code of tsk_fs_attr_walk<br> > - pass TSK_FS_FILE_WALK_FLAG_NOSPARSE as additional flag to<br> > tsk_fs_attr_walk<br> ><br> > After discussing with TSK authors the behaviour is clear. [1]<br> <br> </span>Thanks, this improves the situation a bit.<br> <span class="gmail-"><br> > In case of COMPRESSED blocks, the callback will be called for all the<br> > attributes no matter whether they are on disk or not (sparse). In<br> > such cases, the block address will be 0. [2]<br> <br> </span>Note that the API docs say:<br> For compressed and sparse attributes, the address *may* be zero.<br> (emphasis is mine)<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> My concern is that, if the address in such cases is "unspecified", then<br> the comparisons in "attrwalk_callback" are done against a<br> random/unitialized value (which would be bad).<br></blockquote><div><br></div><div>I understand your concerns. The data will not be wrong. Is the API documentation being misleading.<br></div><div>The data *will* be 0 for SPARSE blocks and *might* be 0 or not for compressed blocks based on certain criteria. See below.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Also, if the block address would be zero, what's the point of having it<br> among the blocks tsk_fs_attr_walk() iterates over?<br></blockquote><div><br></div><div>This is due to the way NTFS organizes information and deals with its compression and the way the API loops over them.<br><br>For each file or directory, there is a MFT (Master File Table) record which consists in a linear repository of attributes (1Kb of size each). <br></div><div>Attributes can be resident within the MFT or non-resident according to their size. The $DATA attribute storing the actual file content is an example of typically non-resident ones.<br><br></div><div>Non-resident attributes are stored on disk in what is referred as data-runs (contiguous blocks) which are then mapped within the attribute itself. A typical file greater than 800 Bytes has the $DATA attribute containing a map of data runs with their location on the disk. If the map itself is too big for the $DATA attribute (this can happen if the actual content is too fragmented), then extra records are created and their mapping is placed in a special attribute called $ATTRIBUTE_LIST. [1]<br><br></div><div>When the given file is compressed (native NTFS compression, not application level one), the algorithm goes on each data run within the attribute and: [2]<br></div><div> 1 if the data run is zero filled, will set the corresponding blocks as sparse and set their address to 0.<br></div><div> 2 if compressing the data run does not save any disk block, it will leave it as is.<br></div><div> 3 if compressing the data run does save one or more blocks, the spared one will be again marked as sparse and their address will be 0.<br><br></div><div>Note that the entire attribute will be marked as compressed no matter what happened to the clusters on disk.<br><br></div><div>The logic loops through all non-resident attributes (which is what we want: we want all the disk blocks allocated for that file). For each attribute, it loops over all the blocks which that attributes maps and calls the callback.<br><br></div><div>Our issue is the information at the origin of the sparse flag: the information might come from the block (BAD/ALLOC/UNALLOC), or from the file metadata (RAW,SPARSE,COMPRESSED,CONT, META). [3]<br><br></div><div>The tsk_fs_attr_walk() walks over the given attribute's blocks. In case we are inspecting attributes of compressed files, the flag will report the *file* status (COMPRESSED) yet will not able to tell us what the compression algorithm did (1,2,3) to that block. It will still correctly give us the address: 0 if sparse (case 1 or 3) or the correct number otherwise (case 2 or 3).<br></div><div><br>[1] <a href="https://en.wikipedia.org/wiki/NTFS#Attribute_lists.2C_attributes.2C_and_streams">https://en.wikipedia.org/wiki/NTFS#Attribute_lists.2C_attributes.2C_and_streams</a><br>[2] <a href="http://www.digital-evidence.org/fsfa/">http://www.digital-evidence.org/fsfa/</a> - Chapter 11<br></div><div>[3] <a href="http://www.sleuthkit.org/sleuthkit/docs/api-docs/4.2/tsk__fs_8h.html#a1e6bf157f5d258191bf5d8ae31ee7148">http://www.sleuthkit.org/sleuthkit/docs/api-docs/4.2/tsk__fs_8h.html#a1e6bf157f5d258191bf5d8ae31ee7148</a> - "Note that some of these are set only by file_walk because they are file-level details, such as compression and sparse."<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Thanks,<br> <span class="gmail-HOEnZb"><font color="#888888">--<br> Pino Toscano</font></span><br>______________________________<wbr>_________________<br> Libguestfs mailing list<br> <a href="mailto:Libguestfs@redhat.com">Libguestfs@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/libguestfs" rel="noreferrer" target="_blank">https://www.redhat.com/<wbr>mailman/listinfo/libguestfs</a><br></blockquote></div><br></div></div>