Skip to content

UBSAN: shift-out-of-bounds spew #14777

Description

@adamdmoss

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04.2
Kernel Version Linux version 5.19.0-1022-lowlatency (buildd@lcy02-amd64-044) (x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu122.04) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #2322.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 29 15:33:15 UTC 2
Architecture x64
OpenZFS Version 135d9a9 / master

Describe the problem you're observing

kernel spew when mounting(?) zfs filesystems:

UBSAN: shift-out-of-bounds in /var/lib/dkms/zfs/2.1.99/build/module/zfs/zio.c:5008:28
[  390.361064] shift exponent -5 is negative

Full spew:

[  390.361053] ================================================================================
[  390.361060] UBSAN: shift-out-of-bounds in /var/lib/dkms/zfs/2.1.99/build/module/zfs/zio.c:5008:28
[  390.361064] shift exponent -5 is negative
[  390.361066] CPU: 5 PID: 10457 Comm: z_rd_int_0 Tainted: P           OE     5.19.0-1022-lowlatency #23~22.04.1-Ubuntu
[  390.361069] Hardware name: Gigabyte Technology Co., Ltd. Z68MA-D2H-B3/Z68MA-D2H-B3, BIOS F10 02/23/2012
[  390.361071] Call Trace:
[  390.361073]  <TASK>
[  390.361076]  show_stack+0x52/0x69
[  390.361082]  dump_stack_lvl+0x49/0x6d
[  390.361087]  dump_stack+0x10/0x18
[  390.361089]  ubsan_epilogue+0x9/0x43
[  390.361093]  __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef
[  390.361098]  ? __sbitmap_get_word+0x36/0x90
[  390.361104]  zbookmark_compare.cold+0x20/0x66 [zfs]
[  390.361278]  zbookmark_subtree_completed+0x60/0x90 [zfs]
[  390.361423]  dsl_scan_check_prefetch_resume+0x6b/0xa0 [zfs]
[  390.361565]  ? abd_fletcher_4_fini+0x58/0x70 [zfs]
[  390.361680]  ? abd_fletcher_4_native+0x92/0xd0 [zfs]
[  390.361824]  ? dma_map_page_attrs+0x35/0x90
[  390.361829]  ? ktime_get+0x43/0xc0
[  390.361832]  ? __rq_qos_issue+0x26/0x50
[  390.361837]  ? blk_mq_start_request+0x3d/0x150
[  390.361842]  ? nvme_prep_rq.part.0+0xac/0x120 [nvme]
[  390.361847]  ? nvme_queue_rqs+0x1e0/0x290 [nvme]
[  390.361852]  ? aggsum_add+0x1af/0x1d0 [zfs]
[  390.361976]  ? kmem_cache_alloc+0x1b3/0x340
[  390.361979]  ? spl_kmem_cache_alloc+0x121/0x790 [spl]
[  390.361991]  ? buf_cons+0x65/0x80 [zfs]
[  390.362115]  ? arc_buf_fill+0x983/0xd90 [zfs]
[  390.362240]  dsl_scan_prefetch+0x8e/0x280 [zfs]
[  390.362382]  dsl_scan_prefetch_cb+0x158/0x310 [zfs]
[  390.362525]  arc_read_done+0x2d9/0x590 [zfs]
[  390.362650]  l2arc_read_done+0x7cf/0xae0 [zfs]
[  390.362768]  ? zio_wait_for_children+0xb2/0x140 [zfs]
[  390.362905]  zio_done+0x412/0x12a0 [zfs]
[  390.363038]  zio_execute+0x94/0x170 [zfs]
[  390.363178]  taskq_thread+0x27a/0x490 [spl]
[  390.363190]  ? wake_up_q+0xa0/0xa0
[  390.363197]  ? zio_gang_tree_free+0x70/0x70 [zfs]
[  390.363339]  ? taskq_thread_spawn+0x60/0x60 [spl]
[  390.363349]  kthread+0xeb/0x120
[  390.363353]  ? kthread_complete_and_exit+0x20/0x20
[  390.363357]  ret_from_fork+0x1f/0x30
[  390.363362]  </TASK>
[  390.363408] ================================================================================

The spew points to this code in zio.c:

int
zbookmark_compare(uint16_t dbss1, uint8_t ibs1, uint16_t dbss2, uint8_t ibs2,
    const zbookmark_phys_t *zb1, const zbookmark_phys_t *zb2)
{
	/*
	 * These variables represent the "equivalent" values for the zbookmark,
	 * after converting zbookmarks inside the meta dnode to their
	 * normal-object equivalents.
	 */
	uint64_t zb1obj, zb2obj;
	uint64_t zb1L0, zb2L0;
	uint64_t zb1level, zb2level;

	if (zb1->zb_object == zb2->zb_object &&
	    zb1->zb_level == zb2->zb_level &&
	    zb1->zb_blkid == zb2->zb_blkid)
		return (0);

	IMPLY(zb1->zb_level > 0, ibs1 >= SPA_MINBLOCKSHIFT);
	IMPLY(zb2->zb_level > 0, ibs2 >= SPA_MINBLOCKSHIFT);

	/*
	 * BP_SPANB calculates the span in blocks.
	 */
	zb1L0 = (zb1->zb_blkid) * BP_SPANB(ibs1, zb1->zb_level);
	zb2L0 = (zb2->zb_blkid) * BP_SPANB(ibs2, zb2->zb_level);

... and that final line is line 5009:
zb2L0 = (zb2->zb_blkid) * BP_SPANB(ibs2, zb2->zb_level);

BP_SPANB is

#define	BP_SPANB(indblkshift, level) \
	(((uint64_t)1) << ((level) * ((indblkshift) - SPA_BLKPTRSHIFT)))

spa.h says:
#define SPA_BLKPTRSHIFT 7

so level * (indblkshift - 7) == -5

I think those IMPLYs are trying to catch such a situation but I'm not on a debug kernel... :)

I don't see anything clearly related in recent git history so I guess my pool has some interesting corruption, but I'm reporting it just in case.

Describe how to reproduce the problem

Unsure.

Metadata

Metadata

Assignees

Labels

Type: DefectIncorrect behavior (e.g. crash, hang)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions