Commit ceb77b1
committed
fix ksh login crash on disk full (rhbz#1212992)
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch
Prior discussion:
https://www.mail-archive.com/[email protected]/msg01037.html
https://www.mail-archive.com/[email protected]/msg01038.html
https://www.mail-archive.com/[email protected]/msg01042.html
https://bugzilla.redhat.com/1212992
On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote:
> I have a user with a ksh crashing problem, and that has
> some "Write error: No space left on device" messages
> in /var/log/messages.
>
> After some debugging, and creating a chroot on a file
> disk image, and a test user, and slowly filling the
> "on file" filesystem, e.g.
>
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2
>
> until leaving just around 12K, I managed to reproduce the
> problem, and be able to debug it with valgrind and vgdb;
> debugging on these conditions is tricky, as cannot tell
> valgrind to spawn gdb, because then gdb itself would fail
> to start.
>
> So, after following the code enough, I learned that at places
> it handles SH_JMPEXIT, there was almost non existing
> handling of SH_JMPERREXIT.
>
> ksh would evently cause a crash due to the struct
> subshell allocated on stack, in sh/subshell.c:sh_subshell
> kept set to the global subshell_data, after it siglongjmp
> back the stack due to, not fully handling the out of disk
> space errors. It would print a few messages, everytime
> a pipe was created, e.g.:
>
> /etc/profile: line 28: write to 3 failed [No space left on device]
>
> until eventually crashing due to corrupted memory; e.g. the
> references to stack data from sh_subsell in the global
> subshell_data. One strange thing to me in coredump analysis
> was that subshell_data prev field was pointing to itself when
> it eventually crashed, what later was understood and expected...
>
> The attached patch handles SH_JMPERREXIT in the code
> paths SH_JMPEXIT is handled, and the failed login, on
> full disk, ends in a pause() call:
>
> ---terminal 1---
> $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
> --vgdb-error=0 /bin/ksh -l
> ==17730== (action at startup) vgdb me ...
> ==17730==
> ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
> ==17730== /path/to/gdb /bin/ksh
> ==17730== and then give GDB the following command
> ==17730== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
> ==17730== --pid is optional if only one valgrind process is running
> ==17730==
> ==17730== Syscall param mount(type) points to unaddressable byte(s)
> ==17730== at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
> ==17730== by 0x493E58: fs3d_mount (fs3d.c:115)
> ==17730== by 0x493C8B: fs3d (fs3d.c:57)
> ==17730== by 0x423E41: sh_init (init.c:1302)
> ==17730== by 0x405CD3: sh_main (main.c:141)
> ==17730== by 0x405B84: main (pmain.c:45)
> ==17730== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==17730==
> ==17730== (action on error) vgdb me ...
> ==17730== Continuing ...
> /etc/profile: line 28: write to 3 failed [No space left on device]
> ---8<---
>
> ---terminal 2---
> (gdb) c
> Continuing.
> ^C
> Program received signal SIGTRAP, Trace/breakpoint trap.
> 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> ksh-community#1 0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
> ksh-community#2 0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
> ksh-community#3 0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
> file=0x524804 <e_sysprofile> "/etc/profile")
> at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
> ksh-community#4 0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
> at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
> ksh-community#5 0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
> (gdb)
> ---8<---1 parent e3d7bf1 commit ceb77b1
1 file changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
418 | 418 | | |
419 | 419 | | |
420 | 420 | | |
421 | | - | |
| 421 | + | |
422 | 422 | | |
423 | 423 | | |
424 | 424 | | |
| |||
594 | 594 | | |
595 | 595 | | |
596 | 596 | | |
597 | | - | |
| 597 | + | |
598 | 598 | | |
599 | 599 | | |
600 | 600 | | |
| |||
0 commit comments