Skip to content

Outlining large inline files doesn't always work in lfs_dir_orphaningcommit #1200

@jmaksymowicz

Description

@jmaksymowicz

Tested on commit 6cb4e86

As I understand, any "large inline" files (inline files which are too large to fit into cache under current settings) which are open must be "outlined" (evicted into CTZ list) whenever a commit is added to the directory that such a file resides in. However, this process doesn't always work and results in the call that creates the file returning LFS_ERR_CORRUPT.

To reproduce this issue I created the attached PoC. It can be run with argument 0, 1 or 2.

  • 0 is a sanity check that should always pass. In this case we never create a situation that contains "large inline" files.
    The test passes printing successful run
  • 1 leads to a situation where a "large inline" file is created. The file is then opened. Then another file is created in the same directory. Creation fails with LFS_ERR_CORRUPT when it shouldn't.
    The test fails printing error -84 line 97
  • 2 is similar to 1, but after the "large inline" file is opened, it is seeked to the end of the file, which somehow makes the outlining work correctly. To ensure that the outlining was successful, the file is reopened and verified at the end.
    The test passes printing successful run

It uses the RAM block device defined in this repo from bd/lfs_rambd.c.

The PoC code: main.c

I managed to track down the LFS_ERR_CORRUPT value to lfs_bd_read. Below is gdb backtrace from when this incorrect read happens:

#0  lfs_bd_read (lfs=lfs@entry=0x5555555613e0 <lfs>, pcache=pcache@entry=0x0, rcache=rcache@entry=0x7fffffffceb8, hint=4096, block=4294967294, off=0, buffer=0x7fffffffce6f, size=1) at lfs.c:51
#1  0x00005555555573b4 in lfs_file_flushedread (lfs=lfs@entry=0x5555555613e0 <lfs>, file=file@entry=0x7fffffffce70, buffer=buffer@entry=0x7fffffffce6f, size=size@entry=1) at lfs.c:3544
#2  0x000055555555885c in lfs_file_flush (lfs=lfs@entry=0x5555555613e0 <lfs>, file=file@entry=0x555555561360 <file>) at lfs.c:3380
#3  0x00005555555599df in lfs_dir_orphaningcommit (lfs=lfs@entry=0x5555555613e0 <lfs>, dir=0x5555555612ec <file2+12>, attrs=0x7fffffffd020, attrcount=3) at lfs.c:2422
#4  0x000055555555a24d in lfs_dir_commit (lfs=0x5555555613e0 <lfs>, dir=<optimised out>, attrs=<optimised out>, attrcount=<optimised out>) at lfs.c:2604
#5  0x000055555555a888 in lfs_file_opencfg_ (lfs=lfs@entry=0x5555555613e0 <lfs>, file=file@entry=0x5555555612e0 <file2>, path=<optimised out>, path@entry=0x55555555dd54 "/some_file", flags=flags@entry=256, cfg=cfg@entry=0x55555555dfb0 <defaults>)
    at lfs.c:3125
#6  0x000055555555b86e in lfs_file_open_ (flags=256, path=0x55555555dd54 "/some_file", file=0x5555555612e0 <file2>, lfs=0x5555555613e0 <lfs>) at lfs.c:3242
#7  0x00005555555553b6 in main (argc=<optimised out>, argv=0x7fffffffd1e8) at main.c:97

It appears that in lfs_file_flushedread the file->block member has value LFS_BLOCK_INLINE, which then gets treated as if it were a valid block number and a call to lfs_bd_read happens.

I understand that "large inline" files are considered somewhat "anomalous", so to say, but if the code has provisions for handling them then they should work correctly. For our use case we use external tools to create an LFS image which is then read on embedded device - setting cache size explicitly is an option, but there is risk of user error if cache size is out of sync between host tools and embedded device.

If this is fixed then I suggest adding a test for this condition to the test suite - currently the inline eviction code path (around lfs.c:2421) is never triggered during the tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions