Discussion:
perl 5.10 threads on 1.5.25 = instant crash
(too old to reply)
Steven Hartland
2009-07-14 16:44:01 UTC
Permalink
Been looking around but cant find any mention of it so asking here:
Is there a known issue with the latest 1.5.25 + perl 5.10 threads,
as doing anything with threads here causes an instant crash.

[test]
#!/bin/perl -w

use warnings;
use strict;
use threads;

print STDERR "Testing threads...\n";
my $thrd = threads->create( \&dothread );
$thrd->join();
print STDERR "Testing done\n";

sub dothread { print STDERR "I'm a thread!\n" }
[/test]

Environment details:
$ uname -a
CYGWIN_NT-6.1-WOW64 ibm 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin

$ perl -V
Summary of my perl5 (revision 5 version 10 subversion 0 patch 34065) configurati
on:
Platform:
osname=cygwin, osvers=1.5.25(0.15642), archname=cygwin-thread-multi-64int
uname='cygwin_nt-5.1 reini 1.5.25(0.15642) 2008-06-12 19:34 i686 cygwin '
config_args='-de -Dmksymlinks -Dusethreads -Dmad=y -Dusedevel'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=undef, uselongdouble=undef
usemymalloc=y, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -fno-strict-ali
asing -pipe -I/usr/local/include',
optimize='-O3',
cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -fno-strict-aliasing -pip
e -I/usr/local/include'
ccversion='', gccversion='3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lsee
ksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='g++', ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,-
-stack,8388608 -Wl,--enable-auto-image-base -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib
libs=-lgdbm -ldb -ldl -lcrypt -lgdbm_compat
perllibs=-ldl -lcrypt
libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' --shared -Wl,--enable-auto-import -Wl,--export-
all-symbols -Wl,--stack,8388608 -Wl,--enable-auto-image-base -L/usr/local/lib'


Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY MYMALLOC PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_MAD PERL_MALLOC_WRAP
PERL_USE_SAFE_PUTENV USE_64_BIT_INT USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
Locally applied patches:
MAINT34065
CYG11 no-bs
CYG12 no archlib in otherlibdirs
CYG14 Dynaloader
CYG15 static-Win32CORE
Bug#55162 File::Spec::case_tolerant performance
Built under cygwin
Compiled at Jun 30 2008 16:05:15
%ENV:
CYGWIN=""
@INC:
/usr/lib/perl5/5.10/i686-cygwin
/usr/lib/perl5/5.10
/usr/lib/perl5/site_perl/5.10/i686-cygwin
/usr/lib/perl5/site_perl/5.10
/usr/lib/perl5/vendor_perl/5.10/i686-cygwin
/usr/lib/perl5/vendor_perl/5.10
/usr/lib/perl5/vendor_perl/5.10
/usr/lib/perl5/site_perl/5.8
/usr/lib/perl5/vendor_perl/5.8
.
[/quote]

Regards
Steve
Steven Hartland
2009-07-14 17:33:55 UTC
Permalink
Also happens on 5.8.8 :(
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=cygwin, osvers=1.5.24(0.15642), archname=cygwin-thread-multi-64int
uname='cygwin_nt-5.1 reini 1.5.24(0.15642) 2007-01-31 10:57 i686 cygwin '
config_args='-de -Dmksymlinks -Duse64bitint -Dusethreads -Uusemymalloc -Doptimize=-O3 -Dman3ext=3pm -Dusesitecustomize -Dusedev
l'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -Wdeclaration-after-statement',
optimize='-O3',
cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -Wdeclaration-after-statement'
ccversion='', gccversion='3.4.4 (cygming special, gdc 0.12, using dmd 0.125)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='ld2', ldflags =' -s -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib
libs=-lgdbm -ldb -ldl -lcrypt -lgdbm_compat
perllibs=-ldl -lcrypt -lgdbm_compat
libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' -s'
cccdlflags=' ', lddlflags=' -s -L/usr/local/lib'


Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
PERL_MALLOC_WRAP PERL_USE_SAFE_PUTENV
USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
USE_PERLIO USE_REENTRANT_API USE_SITECUSTOMIZE
Locally applied patches:
CYG01 - hints.cygwin.sh ldflags -s
CYG02 - lib-ExtUtils-Embed insensitive against leading \s
CYG03 - lib-Test-Harness-Straps $ENV{PERL5LIB} = ''
CYG04 - major.version.cygwin.sh cygperl-5_8.dll and not cygperl-5_8_x.dll
CYG05 - add Win32CORE to core
CYG07 - File-Spec-Cygwin-TMPDIR.patch
Bug#38628 - allow legacy Cwd->cwd()
Bug#40103 - File-Spec-case_tolerant.patch from 5.9.5
Built under cygwin
Compiled at Jul 8 2007 19:12:08
%ENV:
CYGWIN=""
@INC:
/usr/lib/perl5/5.8/cygwin
/usr/lib/perl5/5.8
/usr/lib/perl5/site_perl/5.8/cygwin
/usr/lib/perl5/site_perl/5.8
/usr/lib/perl5/site_perl/5.8
/usr/lib/perl5/vendor_perl/5.8/cygwin
/usr/lib/perl5/vendor_perl/5.8
/usr/lib/perl5/vendor_perl/5.8
.
----- Original Message -----
From: "Steven Hartland" <***@multiplay.co.uk>
To: "Cygwin List" <***@cygwin.com>
Sent: Tuesday, July 14, 2009 5:44 PM
Subject: perl 5.10 threads on 1.5.25 = instant crash
Post by Steven Hartland
Is there a known issue with the latest 1.5.25 + perl 5.10 threads,
as doing anything with threads here causes an instant crash.
[test]
#!/bin/perl -w
use warnings;
use strict;
use threads;
print STDERR "Testing threads...\n";
my $thrd = threads->create( \&dothread );
$thrd->join();
print STDERR "Testing done\n";
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
$ uname -a
CYGWIN_NT-6.1-WOW64 ibm 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin
$ perl -V
Summary of my perl5 (revision 5 version 10 subversion 0 patch 34065) configurati
osname=cygwin, osvers=1.5.25(0.15642), archname=cygwin-thread-multi-64int
uname='cygwin_nt-5.1 reini 1.5.25(0.15642) 2008-06-12 19:34 i686 cygwin '
config_args='-de -Dmksymlinks -Dusethreads -Dmad=y -Dusedevel'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=undef, uselongdouble=undef
usemymalloc=y, bincompat5005=undef
cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -fno-strict-ali
asing -pipe -I/usr/local/include',
optimize='-O3',
cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -fno-strict-aliasing -pip
e -I/usr/local/include'
ccversion='', gccversion='3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lsee
ksize=8
alignbytes=8, prototype=define
ld='g++', ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,-
-stack,8388608 -Wl,--enable-auto-image-base -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib
libs=-lgdbm -ldb -ldl -lcrypt -lgdbm_compat
perllibs=-ldl -lcrypt
libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
gnulibc_version=''
dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' --shared -Wl,--enable-auto-import -Wl,--export-
all-symbols -Wl,--stack,8388608 -Wl,--enable-auto-image-base -L/usr/local/lib'
Compile-time options: MULTIPLICITY MYMALLOC PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_MAD PERL_MALLOC_WRAP
PERL_USE_SAFE_PUTENV USE_64_BIT_INT USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
MAINT34065
CYG11 no-bs
CYG12 no archlib in otherlibdirs
CYG14 Dynaloader
CYG15 static-Win32CORE
Bug#55162 File::Spec::case_tolerant performance
Built under cygwin
Compiled at Jun 30 2008 16:05:15
CYGWIN=""
/usr/lib/perl5/5.10/i686-cygwin
/usr/lib/perl5/5.10
/usr/lib/perl5/site_perl/5.10/i686-cygwin
/usr/lib/perl5/site_perl/5.10
/usr/lib/perl5/vendor_perl/5.10/i686-cygwin
/usr/lib/perl5/vendor_perl/5.10
/usr/lib/perl5/vendor_perl/5.10
/usr/lib/perl5/site_perl/5.8
/usr/lib/perl5/vendor_perl/5.8
.
[/quote]
Regards
Steve
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Dave Korn
2009-07-14 18:16:46 UTC
Permalink
Post by Steven Hartland
[test]
#!/bin/perl -w
use warnings;
use strict;
use threads;
print STDERR "Testing threads...\n";
my $thrd = threads->create( \&dothread );
$thrd->join();
print STDERR "Testing done\n";
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
WFM with perl 5.10.0 on both 1.5 and 1.7.

cheers,
DaveK
Steven Hartland
2009-07-14 18:18:44 UTC
Permalink
----- Original Message -----
Post by Dave Korn
Post by Steven Hartland
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
WFM with perl 5.10.0 on both 1.5 and 1.7.
Thanks Dave, maybe a Windows Server 2008 R2 64bit issue?

Regards
Steve
Christopher Faylor
2009-07-14 18:55:06 UTC
Permalink
Post by Steven Hartland
Post by Dave Korn
Post by Steven Hartland
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
WFM with perl 5.10.0 on both 1.5 and 1.7.
Thanks Dave, maybe a Windows Server 2008 R2 64bit issue?
FYI, 1.5.25 is rumored to have problems on Windows Server 2008 64bit.

At the very least, it is known to have inexplicable hangs not seen
in 1.7.

cgf
Corinna Vinschen
2009-07-14 19:04:36 UTC
Permalink
Post by Christopher Faylor
Post by Steven Hartland
Post by Dave Korn
Post by Steven Hartland
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
WFM with perl 5.10.0 on both 1.5 and 1.7.
Thanks Dave, maybe a Windows Server 2008 R2 64bit issue?
FYI, 1.5.25 is rumored to have problems on Windows Server 2008 64bit.
At the very least, it is known to have inexplicable hangs not seen
in 1.7.
... and the script works fine on Windows 7 Build 7201 running under
Cygwin 1.7.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Steven Hartland
2009-07-14 19:38:01 UTC
Permalink
----- Original Message -----
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Steven Hartland
Post by Dave Korn
Post by Steven Hartland
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
WFM with perl 5.10.0 on both 1.5 and 1.7.
Thanks Dave, maybe a Windows Server 2008 R2 64bit issue?
FYI, 1.5.25 is rumored to have problems on Windows Server 2008 64bit.
At the very least, it is known to have inexplicable hangs not seen
in 1.7.
... and the script works fine on Windows 7 Build 7201 running under
Cygwin 1.7.
Thanks guys looks like an update to 1.7 is in order then, any tips
or gotchas I should be aware of?

Regards
Steve
Corinna Vinschen
2009-07-14 20:06:31 UTC
Permalink
----- Original Message ----- From: "Corinna Vinschen"
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Steven Hartland
Post by Steven Hartland
sub dothread { print STDERR "I'm a thread!\n" }
[/test]
Post by Dave Korn
WFM with perl 5.10.0 on both 1.5 and 1.7.
Thanks Dave, maybe a Windows Server 2008 R2 64bit issue?
FYI, 1.5.25 is rumored to have problems on Windows Server 2008 64bit.
At the very least, it is known to have inexplicable hangs not seen
in 1.7.
... and the script works fine on Windows 7 Build 7201 running under
Cygwin 1.7.
Thanks guys looks like an update to 1.7 is in order then, any tips
or gotchas I should be aware of?
The fine manual at
http://cygwin.com/1.7/cygwin-ug-net/cygwin-ug-net.html
is probably a big help.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Steven Hartland
2009-07-14 20:36:24 UTC
Permalink
----- Original Message -----
Post by Corinna Vinschen
... and the script works fine on Windows 7 Build 7201 running under
Cygwin 1.7.
No joy here with 1.7 either, just crashes out instantly.

Running Windows Server 2008 R2 Standard 64bit build 7100 on dual E5520
with 18GB RAM showing 16 cores if that may be of interest.


gdb is not much help, so not sure what to try next?

[log]
Starting program: /usr/bin/perl threads.pl
[New thread 560.0x734]
Error: dll starting at 0x778e0000 not found.
Error: dll starting at 0x76c70000 not found.
Error: dll starting at 0x778e0000 not found.
Error: dll starting at 0x777e0000 not found.
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[New thread 560.0xdc]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
Testing threads...

Program exited with code 030000000005.
(gdb)
[/log]
Christopher Faylor
2009-07-14 20:56:17 UTC
Permalink
Post by Steven Hartland
----- Original Message -----
Post by Corinna Vinschen
... and the script works fine on Windows 7 Build 7201 running under
Cygwin 1.7.
No joy here with 1.7 either, just crashes out instantly.
Running Windows Server 2008 R2 Standard 64bit build 7100 on dual E5520
with 18GB RAM showing 16 cores if that may be of interest.
gdb is not much help, so not sure what to try next?
1) rebaseall
2) http://cygwin.com/problems.html
Steven Hartland
2009-07-14 21:29:45 UTC
Permalink
----- Original Message -----
Post by Christopher Faylor
Post by Steven Hartland
No joy here with 1.7 either, just crashes out instantly.
Running Windows Server 2008 R2 Standard 64bit build 7100 on dual E5520
with 18GB RAM showing 16 cores if that may be of interest.
gdb is not much help, so not sure what to try next?
1) rebaseall
2) http://cygwin.com/problems.html
already tried rebaseall just case unfortunately.

Reinstalled 1.5.25 with just defaults + perl 5.10, ran rebaseall, confirmed
still fails.

cygcheck.out attached.

Threw c++ debugger at it and it errors in the main cygwin dll so it
seems like its something deep in the bowls:(

If there's anything else I can do here let me know. Going off now to
dig for instructions on compiling cygwin with debugging symbols so
I can try find you more info on the problem.

Regards
Steve
Steven Hartland
2009-07-14 23:36:56 UTC
Permalink
This may or may not help:

According to VC++ debugger it always dies with:
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation reading location 0x00000004.

According to gdb 0x610d089d = thread.cc:113

So running it through gdb it hits this break point ~ 280 times before it exits:


[gdb]
Breakpoint 1, pthread_setspecific (key=0x19e9e88, value=0x19e9768)
at /netrel/src/cygwin-snapshot-20090711-1/winsup/cygwin/thread.cc:113
113 if ((*object)->magic != magic)
(gdb)
307 ~myfault () __attribute__ ((always_inline)) { _my_tls.reset_fault (sebastian); }
(gdb)
285 andreas._myfault = old_j._myfault;
(gdb)
307 ~myfault () __attribute__ ((always_inline)) { _my_tls.reset_fault (sebastian); }
(gdb)
285 andreas._myfault = old_j._myfault;
(gdb)
286 andreas._myfault_errno = old_j._myfault_errno;
(gdb)
209 int set (const void *value) {TlsSetValue (tls_index, (void *) value); return 0;}
(gdb)
2259 }
(gdb)
0x610b3108 in _sigbe () from /usr/bin/cygwin1.dll
(gdb)
Single stepping until exit from function _sigbe,
which has no line number information.
0x6ce32ea3 in XS_threads_create () from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
(gdb)
Single stepping until exit from function XS_threads_create,
which has no line number information.

Program exited with code 030000000005.
[/gdb]
Christopher Faylor
2009-07-15 00:03:31 UTC
Permalink
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation reading location 0x00000004.
No, sorry, it really doesn't help. The VC++ debugger doesn't know how
to handle cygwin exceptions.

cgf
Steven Hartland
2009-07-15 00:41:18 UTC
Permalink
----- Original Message -----
From: "Christopher Faylor" <cgf-use-the-mailinglist-***@cygwin.com>
To: <***@cygwin.com>
Sent: Wednesday, July 15, 2009 1:03 AM
Subject: Re: perl threads on 2008 R2 64bit = crash ( was: perl 5.10 threads on 1.5.25 = instant crash )
Post by Christopher Faylor
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation reading location 0x00000004.
No, sorry, it really doesn't help. The VC++ debugger doesn't know how
to handle cygwin exceptions.
Was just trying to get a hint of the area of the problem since gdb doesn't
actually break when it happens this seemed to be the only way to get that
info.

Any pointers on how I can help narrow down the issue?

Regards
Steve
Corinna Vinschen
2009-07-15 15:21:39 UTC
Permalink
----- Original Message ----- From: "Christopher Faylor"
<cgf-use...>
http://cygwin.com/acronyms/#PCYMTNQREAIYR
Post by Christopher Faylor
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation reading location 0x00000004.
No, sorry, it really doesn't help. The VC++ debugger doesn't know how
to handle cygwin exceptions.
Was just trying to get a hint of the area of the problem since gdb doesn't
actually break when it happens this seemed to be the only way to get that
info.
Any pointers on how I can help narrow down the issue?
I can reproduce the problem on my 2008 R2 box. It works fine on Windows
7 x64, though, so it's a Server thingy.

What happens is that this statement

if ((*object)->magic != magic)

in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.

To debug this further I created a STC(TM)(*) which does the same as the
Perl testcase, just in pure C:

==== SNIP ====
#include <stdio.h>
#include <errno.h>
#include <pthread.h>

pthread_attr_t attr;

void *thr (void *arg)
{
printf ("I'm a thread\n");
return NULL;
}

int main()
{
pthread_t t;
int i, r;
void *ret;

fprintf (stderr, "Testing threads...\n");
i = pthread_attr_init (&attr);
printf ("i = %d\n", i);
r = pthread_create (&t, &attr, thr, NULL);
if (r)
fprintf (stderr, "pthread_create: %d %s\n", errno, strerror (errno));
else
pthread_join (t, &ret);
fprintf (stderr, "Testing done\n");
return 0;
}
==== SNAP ====

The problem is, this testcase works fine, even on 2008 R2. It must
have something to do with the way Perl creates thread or does its
own exception handling. I just don't know what to look for.


Corinna


(*) http://cygwin.com/acronyms/#STC
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Dave Korn
2009-07-15 16:03:43 UTC
Permalink
Post by Corinna Vinschen
What happens is that this statement
if ((*object)->magic != magic)
in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.
So, set a "*object == 0" conditional breakpoint on that line and see what
the SEH chain looks like?

cheers,
DaveK
Christopher Faylor
2009-07-15 16:22:43 UTC
Permalink
Post by Dave Korn
Post by Corinna Vinschen
What happens is that this statement
if ((*object)->magic != magic)
in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.
So, set a "*object == 0" conditional breakpoint on that line and see what
the SEH chain looks like?
But the point is that this shouldn't have caused a SEGV.

cgf
Dave Korn
2009-07-15 16:58:25 UTC
Permalink
Post by Christopher Faylor
Post by Dave Korn
Post by Corinna Vinschen
What happens is that this statement
if ((*object)->magic != magic)
in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.
So, set a "*object == 0" conditional breakpoint on that line and see what
the SEH chain looks like?
But the point is that this shouldn't have caused a SEGV.
Don't understand quite what you're alluding to. Where did Corinna refer to
a SEGV? Unless we're using the words differently, a SEGV is a signal, which
is a cygwin posix construct generated in response to an unhandled x86 access
violation exception. Corinna said that the call to v_o_i caused an
*exception*, as dereferencing a NULL pointer always does, and that it should
have been covered by the myfault handler (which as far as I know works by
wrapping an SEH handler around the block of protected code, and using it to
catch exceptions and longjmp back to the receiver) and which might lead to a
SEGV signal being generated somewhere a long way down the road if it failed to
catch the exception, but I'm just concentrating on the point of failure.
Hence my suggestion to breakpoint it just before the exception happens and see
what the state of the SEH chain looks like.

cheers,
DaveK
Christopher Faylor
2009-07-15 18:56:36 UTC
Permalink
Post by Dave Korn
Post by Christopher Faylor
Post by Dave Korn
Post by Corinna Vinschen
What happens is that this statement
if ((*object)->magic != magic)
in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.
So, set a "*object == 0" conditional breakpoint on that line and see what
the SEH chain looks like?
But the point is that this shouldn't have caused a SEGV.
Don't understand quite what you're alluding to. Where did Corinna refer to
a SEGV? Unless we're using the words differently, a SEGV is a signal, which
is a cygwin posix construct generated in response to an unhandled x86 access
violation exception. Corinna said that the call to v_o_i caused an
*exception*, as dereferencing a NULL pointer always does, and that it should
have been covered by the myfault handler (which as far as I know works by
wrapping an SEH handler around the block of protected code, and using it to
catch exceptions and longjmp back to the receiver) and which might lead to a
SEGV signal being generated somewhere a long way down the road if it failed to
catch the exception, but I'm just concentrating on the point of failure.
Hence my suggestion to breakpoint it just before the exception happens and see
what the state of the SEH chain looks like.
The point is that this is generating the equivalent of a SEGV without
ever hitting Cygwin's "SEH" code. Setting a breakpoint on the line
would likely just show you the call stack but would not provide any
insight into why the myfault was not invoked.

cgf
Steven Hartland
2009-07-15 19:14:25 UTC
Permalink
----- Original Message -----
From: "Christopher Faylor"
Post by Christopher Faylor
The point is that this is generating the equivalent of a SEGV without
ever hitting Cygwin's "SEH" code. Setting a breakpoint on the line
would likely just show you the call stack but would not provide any
insight into why the myfault was not invoked.
Of note when running 1.5.25 I did get the windows application error
dialog, but with 1.7 and the latest snapshot it doesn't, so maybe using
1.5.25 might help?

Regards
Steve
Dave Korn
2009-07-15 19:32:38 UTC
Permalink
Post by Christopher Faylor
Post by Dave Korn
Post by Christopher Faylor
Post by Dave Korn
Post by Corinna Vinschen
What happens is that this statement
if ((*object)->magic != magic)
in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.
So, set a "*object == 0" conditional breakpoint on that line and see what
the SEH chain looks like?
But the point is that this shouldn't have caused a SEGV.
Don't understand quite what you're alluding to. Where did Corinna refer to
a SEGV? Unless we're using the words differently, a SEGV is a signal, which
is a cygwin posix construct generated in response to an unhandled x86 access
violation exception. Corinna said that the call to v_o_i caused an
*exception*, as dereferencing a NULL pointer always does, and that it should
have been covered by the myfault handler (which as far as I know works by
wrapping an SEH handler around the block of protected code, and using it to
catch exceptions and longjmp back to the receiver) and which might lead to a
SEGV signal being generated somewhere a long way down the road if it failed to
catch the exception, but I'm just concentrating on the point of failure.
Hence my suggestion to breakpoint it just before the exception happens and see
what the state of the SEH chain looks like.
The point is that this is generating the equivalent of a SEGV without
ever hitting Cygwin's "SEH" code. Setting a breakpoint on the line
would likely just show you the call stack but would not provide any
insight into why the myfault was not invoked.
Yes. That's why I said "examine the SEH chain", not "look at the call
stack". I reckoned that doing so might provide any insight into why the
myfault was not invoked. For instance, you might see something hooked into
the SEH chain ahead of Cygwin's handler and start to look at what it was and
where it came from; and if not, you would be able to infer that the SEH chain
was not being invoked and start looking at the various SEH security
enhancements in recent windows versions and wondering which one might make it
think it shouldn't call handlers from a non-registered stack-based SEH
registration record.

cheers,
DaveK
Corinna Vinschen
2009-07-15 19:45:40 UTC
Permalink
Post by Dave Korn
Yes. That's why I said "examine the SEH chain", not "look at the call
stack". I reckoned that doing so might provide any insight into why the
myfault was not invoked. For instance, you might see something hooked into
the SEH chain ahead of Cygwin's handler and start to look at what it was and
where it came from; and if not, you would be able to infer that the SEH chain
was not being invoked and start looking at the various SEH security
enhancements in recent windows versions and wondering which one might make it
think it shouldn't call handlers from a non-registered stack-based SEH
registration record.
I'm not opposed to get some help with this stuff...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Dave Korn
2009-07-15 20:42:07 UTC
Permalink
Post by Corinna Vinschen
Post by Dave Korn
Yes. That's why I said "examine the SEH chain", not "look at the call
stack". I reckoned that doing so might provide any insight into why the
myfault was not invoked. For instance, you might see something hooked into
the SEH chain ahead of Cygwin's handler and start to look at what it was and
where it came from; and if not, you would be able to infer that the SEH chain
was not being invoked and start looking at the various SEH security
enhancements in recent windows versions and wondering which one might make it
think it shouldn't call handlers from a non-registered stack-based SEH
registration record.
I'm not opposed to get some help with this stuff...
I don't have 2k8 to test it on myself, but if you can get this reproducing
under the debugger, then use a command like

(gdb) list 'verifyable_object_isvalid(void const*, long, void*, void*, void*)'

94 paranoid_printf ("threadcount %d. unlocked",
MT_INTERFACE->threadcount);
95 }
96
97 static inline verifyable_object_state
98 verifyable_object_isvalid (void const *objectptr, long magic, void
*static_ptr1,
99 void *static_ptr2, void *static_ptr3)
100 {
101 myfault efault;
102 /* Check for NULL pointer specifically since it is a cheap test and
avoids the
103 overhead of setting up the fault handler. */
(gdb)
104 if (!objectptr || efault.faulted ())
105 return INVALID_OBJECT;
106
107 verifyable_object **object = (verifyable_object **) objectptr;
108
109 if ((static_ptr1 && *object == static_ptr1) ||
110 (static_ptr2 && *object == static_ptr2) ||
111 (static_ptr3 && *object == static_ptr3))
112 return VALID_STATIC_OBJECT;
113 if ((*object)->magic != magic)
(gdb)

check which line number the dereference is on, in my case 113, so set a
breakpoint there

(gdb) b 113 if ((*object) == 0)
No symbol "object" in current context.
(gdb)

Ah, that's bad. It might work on a DLL compiled with -O0 -g, but here we
have a problem that the function gets inlined everywhere it's called. So
instead I set an unconditional breakpoint there and let it run until I hit it:

(gdb) b 113
Breakpoint 3 at 0x610d0411: file /gnu/winsup/src/winsup/cygwin/thread.cc, line
113. (18 locations)
(gdb) disa 2
(gdb) c
Continuing.

Because that breakpoint is set on every inlined instance of the function,
you might need to continue it several times, until it hits the particular
inlined instance in the particular function that is blowing up. Let us say
for the sake of argument that it was in pthread_key_create;

Breakpoint 3, pthread_key_create (key=0x43b0a0,
destructor=0x408e00 <eh_globals_dtor>)
at /gnu/winsup/src/winsup/cygwin/thread.cc:113
113 if ((*object)->magic != magic)

... so I check the disassembly to see what register was being dereferenced for
comparison to the magic number:

(gdb) disass $eip $eip+10
Dump of assembler code from 0x610d7c46 to 0x610d7c50:
0x610d7c46 <pthread_key_create+214>: mov (%esi),%eax
0x610d7c48 <pthread_key_create+216>: cmpl $0xdf0df047,0x4(%eax)
0x610d7c4f <pthread_key_create+223>: jne 0x610d7c06 <pthread_key_create+15
0>
End of assembler dump.
(gdb)

... and set a breakpoint using the assembler parameters:

(gdb) b *0x610d7c48 if ($eax == 0)
Breakpoint 5 at 0x610d7c48: file /gnu/winsup/src/winsup/cygwin/thread.cc, line
113.
(gdb) disa 3
(gdb) c
Continuing.
Caught integer 2.

Program exited normally.
(gdb)

... and then my program exited normally, because it didn't ever try to
dereference a NULL pointer at that point. But, if the breakpoint did trip,
you could then examine the SEH chain. The SEH chain head lives at [fs:0], so
look up the base of the $fs selector using "info w32 selector"

(gdb) info w32 selectors
Undefined info w32 command: "selectors". Try "help info w32".
(gdb) info w32 selector
Selector $cs
0x01b: base=0x00000000 limit=0xffffffff 32-bit Code (Exec/Read, N.Conf)
Priviledge level = 3. Page granular.
Selector $ds
0x023: base=0x00000000 limit=0xffffffff 32-bit Data (Read/Write, Exp-up)
Priviledge level = 3. Page granular.
Selector $es
0x023: base=0x00000000 limit=0xffffffff 32-bit Data (Read/Write, Exp-up)
Priviledge level = 3. Page granular.
Selector $ss
0x023: base=0x00000000 limit=0xffffffff 32-bit Data (Read/Write, Exp-up)
Priviledge level = 3. Page granular.
Selector $fs
0x038: base=0x7ffde000 limit=0x00000fff 32-bit Data (Read/Write, Exp-up)
Priviledge level = 3. Byte granular.
Selector $gs
0x000: Segment not present
(gdb)

... get the head pointer:

(gdb) x/xw 0x7ffde000
0x7ffde000: 0x0022ce68

... on the stack, as you might expect, and walk the chain, first word of each
record is the 'next' pointer, second is the handler function:

(gdb) x/2xw 0x0022ce68
0x22ce68: 0x0022ffe0 0x61028770
(gdb) x 0x61028770
0x61028770 <_ZN7_cygtls17handle_exceptionsEP17_EXCEPTION_RECORDP15_exception_lis
tP8_CONTEXTPv>: 0x57e58955
(gdb) x/2xw 0x0022ffe0
0x22ffe0: 0xffffffff 0x7c4ff0b4
(gdb) x 0x7c4ff0b4
0x7c4ff0b4 <SetProcessPriorityBoost+86>: 0x83ec8b55
(gdb)

0xffffffff in the chain pointer means final entry, and 0x7c4ff0b4 is
somewhere in kernel32.dll, it's presumably the last resort fault handler. The
important point was we verified that the cygwin exception handler is first in
the chain, so we'd expect it to be called by the NULL dereference (set a
breakpoint there too, just in case something goes wrong shortly after it
enters) when we step into it. If there was something else first, we'd know
where to start looking, if not, we'd have to suspect the OS has decided not to
call the SEH chain at all for some reason.

cheers,
DaveK
Corinna Vinschen
2009-07-15 21:04:25 UTC
Permalink
Post by Dave Korn
Post by Corinna Vinschen
Post by Dave Korn
Yes. That's why I said "examine the SEH chain", not "look at the call
stack". I reckoned that doing so might provide any insight into why the
myfault was not invoked. For instance, you might see something hooked into
the SEH chain ahead of Cygwin's handler and start to look at what it was and
where it came from; and if not, you would be able to infer that the SEH chain
was not being invoked and start looking at the various SEH security
enhancements in recent windows versions and wondering which one might make it
think it shouldn't call handlers from a non-registered stack-based SEH
registration record.
I'm not opposed to get some help with this stuff...
I don't have 2k8 to test it on myself, but if you can get this reproducing
under the debugger, then use a command like
[...]
Thanks for your help. I'm too tired right now to follow through.
I'll look into it tomorrow.


Thanks again,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Steven Hartland
2009-07-15 21:28:00 UTC
Permalink
----- Original Message -----
From: "Dave Korn"
Post by Dave Korn
(gdb) b 113 if ((*object) == 0)
No symbol "object" in current context.
(gdb)
Ah, that's bad. It might work on a DLL compiled with -O0 -g, but here we
have a problem that the function gets inlined everywhere it's called. So
From my experience last night you should be able to use something like:-
b 113 if ( 0 == (**(verifyable_object **)objectptr)

If not here at least it hits that break ~ 280 times before blowing up so
setting a conditional on that occurrence should help.

Unfortunately I'm currently testing a none threaded compile on the machine
so cant check myself just now.
Dave Korn
2009-07-15 22:07:10 UTC
Permalink
----- Original Message ----- From: "Dave Korn"
Post by Dave Korn
(gdb) b 113 if ((*object) == 0)
No symbol "object" in current context.
(gdb)
Ah, that's bad. It might work on a DLL compiled with -O0 -g, but here we
have a problem that the function gets inlined everywhere it's called. So
From my experience last night you should be able to use something like:-
b 113 if ( 0 == (**(verifyable_object **)objectptr)
I did try it, but objectptr was out of scope as well. I'm using 1.7 and
gcc-4.3.2, so it might well be that there's more inlining going on for me than
for you, or changes in the debug info generation that account for it.
If not here at least it hits that break ~ 280 times before blowing up so
setting a conditional on that occurrence should help.
:) That's the general idea!

cheers,
DaveK
Corinna Vinschen
2009-07-16 16:12:20 UTC
Permalink
Post by Dave Korn
(gdb) x/xw 0x7ffde000
0x7ffde000: 0x0022ce68
... on the stack, as you might expect, and walk the chain, first word of each
(gdb) x/2xw 0x0022ce68
0x22ce68: 0x0022ffe0 0x61028770
(gdb) x 0x61028770
0x61028770 <_ZN7_cygtls17handle_exceptionsEP17_EXCEPTION_RECORDP15_exception_lis
tP8_CONTEXTPv>: 0x57e58955
(gdb) x/2xw 0x0022ffe0
0x22ffe0: 0xffffffff 0x7c4ff0b4
(gdb) x 0x7c4ff0b4
0x7c4ff0b4 <SetProcessPriorityBoost+86>: 0x83ec8b55
(gdb)
0xffffffff in the chain pointer means final entry, and 0x7c4ff0b4 is
somewhere in kernel32.dll, it's presumably the last resort fault handler. The
important point was we verified that the cygwin exception handler is first in
the chain, so we'd expect it to be called by the NULL dereference (set a
breakpoint there too, just in case something goes wrong shortly after it
enters) when we step into it. If there was something else first, we'd know
where to start looking, if not, we'd have to suspect the OS has decided not to
call the SEH chain at all for some reason.
Thanks again for your help. I had the funny idea to examine the
SEH chain before the myfault handler gets installed. That's what
I get in my C testcase:

(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0028ce68
(gdb) x/2xw 0x0028ce68
0x28ce68: 0x0028ffc4 0x6103ce20 <-- Cygwin exception handler
tP8_CONTEXTPv>: 0x57e58955
(gdb) x/2xw 0x0028ffc4
0x28ffc4: 0x0028ffe4 0x77cc03dd <-- OS
(gdb) x/2xw 0x0028ffe4
0x28ffe4: 0xffffffff 0x77d16900 <-- OS

And that's what I get in the Perl testcase:

(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2xw 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20 <-- Cygwin exception handler
(gdb) x/2xw 0x0088400c
0x88400c: 0x00000000 0x00000001 <-- Huh?

This looks wrong, doesn't it? The question is now, how and why does
that happen?

Examining the SEH chain on Windows XP in the same situation looks quite
different, though not necessarily correct:

(gdb) x/xw 0x7ffdd000
0x7ffdd000: 0x0082ce68
(gdb) x/2w 0x0082ce68
0x82ce68: 0x00823c48 0x6103ce20 <-- Cygwin exception handler
(gdb) x/2w 0x00823c48
0x823c48: 0x00823ef4 0x7c90e920 <-- OS
(gdb) x/2w 0x00823ef4
0x823ef4: 0x0082419c 0x00823ee8 <-- Perl?!?
(gdb) x/2w 0x0082419c
0x82419c: 0x610e207f 0x6117194c <-- ?!?
(gdb) x/2w 0x610e207f
0x610e207f <_ZN4muto7acquireEm+155>: 0x0674c085 0x01e345c6
(gdb) x/2w 0x0674c085
0x674c085: Cannot access memory at address 0x674c085

Something's fishy. However, it seems to work on XP and other systems.
Where's the 0x00000000 pointer coming from on 2008? Is it possible that
the OS overwrote the entry because it appears to be an address in Perl's
stack, so it's a potential security theat?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Dave Korn
2009-07-16 16:47:29 UTC
Permalink
Post by Corinna Vinschen
(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2xw 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20 <-- Cygwin exception handler
(gdb) x/2xw 0x0088400c
0x88400c: 0x00000000 0x00000001 <-- Huh?
This looks wrong, doesn't it? The question is now, how and why does
that happen?
Where's the 0x00000000 pointer coming from on 2008? Is it possible that
the OS overwrote the entry because it appears to be an address in Perl's
stack, so it's a potential security theat?
The addresses are in the wrong order; SEH registration records should
always nest in the same way as stack call frames, i.e. unwinding toward
ascending memory addresses, but the second record is at a lower address than
the first, so the chain has been mangled somehow. Perhaps that breaks an
integrity check in the kernel? Where actually is $esp at the time; is the
bogus one in an already-released frame below $esp?

You might want to try again with a watchpoint:

watch *(unsigned int*)0x88ce68

... and see how and where that head entry gets set up and whether it
subsequently gets overwritten somehow.

cheers,
DaveK
Corinna Vinschen
2009-07-16 19:55:52 UTC
Permalink
Post by Dave Korn
Post by Corinna Vinschen
(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2xw 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20 <-- Cygwin exception handler
(gdb) x/2xw 0x0088400c
0x88400c: 0x00000000 0x00000001 <-- Huh?
This looks wrong, doesn't it? The question is now, how and why does
that happen?
Where's the 0x00000000 pointer coming from on 2008? Is it possible that
the OS overwrote the entry because it appears to be an address in Perl's
stack, so it's a potential security theat?
The addresses are in the wrong order; SEH registration records should
always nest in the same way as stack call frames, i.e. unwinding toward
ascending memory addresses, but the second record is at a lower address than
the first, so the chain has been mangled somehow. Perhaps that breaks an
integrity check in the kernel? Where actually is $esp at the time; is the
bogus one in an already-released frame below $esp?
Seems so. $esp is 0x88c8c0.
Post by Dave Korn
watch *(unsigned int*)0x88ce68
... and see how and where that head entry gets set up and whether it
subsequently gets overwritten somehow.
That was really helpful, Dave. Thank you!

Here's the result:

(gdb) br pthread_attr_init
Breakpoint 2 at 0x610f42dc: file /home/corinna/src/cygwin/vanilla/winsup/cygwin/thread.cc, line 1909.
(gdb) watch *(unsigned int*)0x88ce68
Hardware watchpoint 3: *(unsigned int *) 8965736
(gdb) c
Continuing.
Hardware watchpoint 3: *(unsigned int *) 8965736

Old value = 8978372
New value = 8929292
_cygtls::init_exception_handler (this=0x88ce64,
eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_lis
t*, _CONTEXT*, void*)>)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:244
244 _except_list = &el;
Current language: auto; currently c++
(gdb) p/x 8978372
$1 = 0x88ffc4
(gdb) p/x 8929292
$2 = 0x88400c
(gdb) p $esp
$3 = (void *) 0x883e78
(gdb) bt
#0 _cygtls::init_exception_handler (this=0x88ce64,
eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_lis
t*, _CONTEXT*, void*)>)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:244
#1 0x61033ff5 in dll_dllcrt0_1 (x=0x883edc)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:321
#2 0x6103414f in dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
#3 0x6eb77acf in ***@12 ()
from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So this exception handler is installed as part of the Perl threads DLL
initialization. But appanrelty the address is not valid anymore when
leaving the DLL initialization.

For testing I disabled the

_my_tls.init_exception_handler (_cygtls::handle_exceptions);

call in dll_init.cc:dll_dllcrt0_1() and re-ran the Perl testcase.
Now it runs fine:

$ perl ./perlthread.pl
Testing threads...
I'm a thread!
Testing done

Is it possible that we have to remove the exception handler before
dll_dllcrt0_1 returns?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2009-07-16 21:18:23 UTC
Permalink
Post by Corinna Vinschen
Post by Dave Korn
Post by Corinna Vinschen
(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2xw 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20 <-- Cygwin exception handler
(gdb) x/2xw 0x0088400c
0x88400c: 0x00000000 0x00000001 <-- Huh?
This looks wrong, doesn't it? The question is now, how and why does
that happen?
Where's the 0x00000000 pointer coming from on 2008? Is it possible that
the OS overwrote the entry because it appears to be an address in Perl's
stack, so it's a potential security theat?
The addresses are in the wrong order; SEH registration records should
always nest in the same way as stack call frames, i.e. unwinding toward
ascending memory addresses, but the second record is at a lower address than
the first, so the chain has been mangled somehow. Perhaps that breaks an
integrity check in the kernel? Where actually is $esp at the time; is the
bogus one in an already-released frame below $esp?
Seems so. $esp is 0x88c8c0.
Post by Dave Korn
watch *(unsigned int*)0x88ce68
... and see how and where that head entry gets set up and whether it
subsequently gets overwritten somehow.
That was really helpful, Dave. Thank you!
(gdb) br pthread_attr_init
Breakpoint 2 at 0x610f42dc: file /home/corinna/src/cygwin/vanilla/winsup/cygwin/thread.cc, line 1909.
(gdb) watch *(unsigned int*)0x88ce68
Hardware watchpoint 3: *(unsigned int *) 8965736
(gdb) c
Continuing.
Hardware watchpoint 3: *(unsigned int *) 8965736
Old value = 8978372
New value = 8929292
_cygtls::init_exception_handler (this=0x88ce64,
eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_lis
t*, _CONTEXT*, void*)>)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:244
244 _except_list = &el;
Current language: auto; currently c++
(gdb) p/x 8978372
$1 = 0x88ffc4
(gdb) p/x 8929292
$2 = 0x88400c
(gdb) p $esp
$3 = (void *) 0x883e78
(gdb) bt
#0 _cygtls::init_exception_handler (this=0x88ce64,
eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_lis
t*, _CONTEXT*, void*)>)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:244
#1 0x61033ff5 in dll_dllcrt0_1 (x=0x883edc)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:321
#2 0x6103414f in dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
So this exception handler is installed as part of the Perl threads DLL
initialization. But appanrelty the address is not valid anymore when
leaving the DLL initialization.
For testing I disabled the
_my_tls.init_exception_handler (_cygtls::handle_exceptions);
call in dll_init.cc:dll_dllcrt0_1() and re-ran the Perl testcase.
$ perl ./perlthread.pl
Testing threads...
I'm a thread!
Testing done
Is it possible that we have to remove the exception handler before
dll_dllcrt0_1 returns?
Are you saying that perl not cleaning up after itself here? If so, that sounds
like a perl bug.

cgf
Corinna Vinschen
2009-07-17 08:57:27 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Dave Korn
watch *(unsigned int*)0x88ce68
... and see how and where that head entry gets set up and whether it
subsequently gets overwritten somehow.
[...]
(gdb) bt
#0 _cygtls::init_exception_handler (this=0x88ce64,
eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_lis
t*, _CONTEXT*, void*)>)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:244
#1 0x61033ff5 in dll_dllcrt0_1 (x=0x883edc)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:321
#2 0x6103414f in dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
So this exception handler is installed as part of the Perl threads DLL
initialization. But appanrelty the address is not valid anymore when
leaving the DLL initialization.
For testing I disabled the
_my_tls.init_exception_handler (_cygtls::handle_exceptions);
call in dll_init.cc:dll_dllcrt0_1() and re-ran the Perl testcase.
$ perl ./perlthread.pl
Testing threads...
I'm a thread!
Testing done
Is it possible that we have to remove the exception handler before
dll_dllcrt0_1 returns?
Are you saying that perl not cleaning up after itself here? If so,
that sounds like a perl bug.
I'm not saying that. Maybe it is a Perl bug, but it looks like a Cygwin
bug to me.

After having started Perl, at the start of main(), the SEH chain
looks entirely normal:

(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2x 0x0088ce68
0x88ce68: 0x0088ffc4 0x6103ce20
(gdb) x/2x 0x0088ffc4
0x88ffc4: 0x0088ffe4 0x77cc03dd
(gdb) x/2x 0x0088ffe4
0x88ffe4: 0xffffffff 0x77d16900

Note that the start of the SEH chain is already at the address which
gets changed errneously in the later DLL initialization. It's our
_my_tls.el entry.

Now I set a breakpoint to the start of the dll_dllcrt0 function, which
is called when the DLL gets loaded:

(gdb) br "dll_init.cc:302"
Breakpoint 3 at 0x61034144: file /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc, line 302.
(gdb) c
Continuing.

Breakpoint 3, dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
302 dll_dllcrt0_1 (&x);
Current language: auto; currently c++
(gdb) bt
#0 dll_dllcrt0 (h=0x6eb70000, p=0x6eb79070)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:302
#1 0x6eb77acf in ***@12 ()
from /usr/lib/perl5/5.10/i686-cygwin/auto/threads/threads.dll
#2 0x77c897c0 in ntdll!RtlQueryInformationActiveActivationContext ()
from /cygdrive/c/Windows/system32/ntdll.dll

Ok, so the loaded DLL is the threads.dll lib. How does the SEH chain
look like now?

(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088400c
(gdb) x/2x 0x0088400c
0x88400c: 0x00884178 0x77cc03dd
(gdb) x/2x 0x00884178
0x884178: 0x0088ce68 0x77cc03dd
(gdb) x/2x 0x0088ce68
0x88ce68: 0x0088ffc4 0x6103ce20
(gdb) x/2x 0x0088ffc4
0x88ffc4: 0x0088ffe4 0x77cc03dd
(gdb) x/2x 0x0088ffe4
0x88ffe4: 0xffffffff 0x77d16900

As you can see, the OS has added two handlers to the chain. Now I step
to the code which is supposed to add the Cygwin exception handler:

(gdb) s
dll_dllcrt0_1 (x=0x883edc)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/dll_init.cc:311
311 HMODULE& h = ((dllcrt0_info *)x)->h;
(gdb) n
312 per_process*& p = ((dllcrt0_info *)x)->p;
(gdb)
313 int& res = ((dllcrt0_info *)x)->res;
(gdb)
321 _my_tls.init_exception_handler (_cygtls::handle_exceptions);
(gdb) s
_cygtls::init_exception_handler (this=0x88ce64,
eh=0x6103ce20 <_cygtls::handle_exceptions(_EXCEPTION_RECORD*, _exception_list*, _CONTEXT*, void*)>)
at /home/corinna/src/cygwin/vanilla/winsup/cygwin/cygtls.cc:231
231 el.handler = eh;

Ok, so _my_tls.el, the SEH chain entry, gets overwritten now with the
new entries. Where is el?

(gdb) p/x &el
$2 = 0x88ce68

Yes, that's still the same _my_tls.el. That's also the watch address and
it's now an entry in the middle of the current SEH chain.

(gdb) s
243 el.prev = _except_list;
(gdb)
244 _except_list = &el;
(gdb) p/x el
$3 = {prev = 0x88400c, handler = 0x6103ce20}

Now the new prev address points to an address lower than the current
address and...

(gdb) s
245 }
(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68

Now the SEH entry address has been moved to the new address and the
SEH chain is invalid since it's a circular list:

(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2x 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20
(gdb) x/2x 0x0088400c
0x88400c: 0x00884178 0x77cc03dd
(gdb) x/2x 0x00884178
0x884178: 0x0088ce68 0x77cc03dd
(gdb) x/2x 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20

AFAICS, the problem is that _my_tls.el is not the active SEH handler at
this point, but it is already part of the chain,
_cygtls::init_exception_handler doesn't check for validity and just
overwrites the entry in an invalid way.

If it's correct to set %fs:0 to our _my_tls.el address in this case,
thus just ignoring the OS handlers, then it seems incorrect in this
specific situation to change el.prev, because it already points to a
valid address. Actually, the entire el SEH entry is already set
correctly, just `_except_list = &el;' would have to be called to skip
the OS handlers.

If it's not correct to just skip the OS handlers, we would have to
invent a new SEH entry at a lower stack address, rather than reusing
the _my_tls entry, which is already in use.

Assuming that skipping the OS handlers is OK, I have applied this
(too?) simple patch to have some crude sanity check:

Index: cygtls.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/cygtls.cc,v
retrieving revision 1.67
diff -u -p -r1.67 cygtls.cc
--- cygtls.cc 7 Jul 2009 08:07:38 -0000 1.67
+++ cygtls.cc 17 Jul 2009 08:54:07 -0000
@@ -240,6 +240,7 @@ _cygtls::init_exception_handler (excepti
Windows 2008, which irremediably gets into an endless loop, taking 100%
CPU. That's why we reverted to a normal SEH chain and changed the way
the exception handler returns to the application. */
- el.prev = _except_list;
+ if (_except_list > el.prev)
+ el.prev = _except_list;
_except_list = &el;
}

With this patch, the Perl testcase works fine. I'm sure there's
a better way to implement a sanity check, though.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Dave Korn
2009-07-17 13:38:45 UTC
Permalink
Post by Corinna Vinschen
AFAICS, the problem is that _my_tls.el is not the active SEH handler at
this point, but it is already part of the chain,
_cygtls::init_exception_handler doesn't check for validity and just
overwrites the entry in an invalid way.
Argh. Yes, there's no way we can repeatedly enter the same registration
record into the chain. I see this has already been anticipated:

/* Windows apparently installs a bunch of exception handlers prior to
this function getting called and one of them may trip before cygwin
gets to it. So, install our own exception handler only.
FIXME: It is possible that we may have to save state of the
previous exception handler chain and restore it, if problems
are noted. */

Time to fix the fixme, I guess.
Post by Corinna Vinschen
If it's not correct to just skip the OS handlers, we would have to
invent a new SEH entry at a lower stack address, rather than reusing
the _my_tls entry, which is already in use.
Assuming that skipping the OS handlers is OK,
What if they are try....finally handlers rather than try....except? Bad
things might happen? It might be useful to know what those functions are
(lookup in windbg?), but then again it might be fragile if we devised a
solution that relied on that knowledge. I guess the answer is that if we only
want a last-chance exception handler we should just make
_my_tls.init_exception_handler() idempotent so it only installs one handler at
the start of the chain, but if we want to intercept exceptions ahead of those
OS handlers (as I think is the intent of the code here) then we need to set up
and tear down a new SEH record. The SEH chain has to be a strict stack, with
entries unlinked in the reverse order to when they're linked; we can't go
re-ordering it when there are foreign handlers in the mix.


cheers,
DaveK
Corinna Vinschen
2009-07-17 13:41:14 UTC
Permalink
Post by Dave Korn
Post by Corinna Vinschen
AFAICS, the problem is that _my_tls.el is not the active SEH handler at
this point, but it is already part of the chain,
_cygtls::init_exception_handler doesn't check for validity and just
overwrites the entry in an invalid way.
Argh. Yes, there's no way we can repeatedly enter the same registration
[...]
Post by Corinna Vinschen
If it's not correct to just skip the OS handlers, we would have to
invent a new SEH entry at a lower stack address, rather than reusing
the _my_tls entry, which is already in use.
Assuming that skipping the OS handlers is OK,
What if they are try....finally handlers rather than try....except? Bad
things might happen? It might be useful to know what those functions are
If you look again, you see that it's always the same address, always the
same default exception handler in ntdll.dll. I guess it might be more
correct to add another handler to the chain, rather than to strip the OS
handlers from the chain, but the fact that the OS SEH validity check was
happy with the chain and the testcase worked is kind of relaxing.
Post by Dave Korn
(lookup in windbg?), but then again it might be fragile if we devised a
solution that relied on that knowledge. I guess the answer is that if we only
want a last-chance exception handler we should just make
_my_tls.init_exception_handler() idempotent so it only installs one handler at
the start of the chain, but if we want to intercept exceptions ahead of those
OS handlers (as I think is the intent of the code here) then we need to set up
and tear down a new SEH record. The SEH chain has to be a strict stack, with
entries unlinked in the reverse order to when they're linked; we can't go
re-ordering it when there are foreign handlers in the mix.
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Dave Korn
2009-07-17 14:23:16 UTC
Permalink
Post by Corinna Vinschen
Post by Dave Korn
Post by Corinna Vinschen
Assuming that skipping the OS handlers is OK,
What if they are try....finally handlers rather than try....except? Bad
things might happen? It might be useful to know what those functions are
If you look again, you see that it's always the same address, always the
same default exception handler in ntdll.dll. I guess it might be more
correct to add another handler to the chain, rather than to strip the OS
handlers from the chain, but the fact that the OS SEH validity check was
happy with the chain and the testcase worked is kind of relaxing.
I'm surprised. Whatever it was (presumably the part of the OS dynamic
loader that is responsible for invoking DllMain) that put those two OS
handlers at the front of the chain before we arrived in ***@12()
is liable to try and unlink them after we return, isn't it? Perhaps it has a
bit of robustness code so it doesn't unlink them if the head pointer doesn't
match what it's expecting. Hmm.
Post by Corinna Vinschen
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?
Nope, they don't, but that will probably not be the case forever, there are
(long-term) moves afoot to get SEH support into the compiler. However, we're
in early startup-and-init here; we don't need to worry about what the
application will do once it finally gets going.


cheers,
DaveK
Corinna Vinschen
2009-07-17 15:29:20 UTC
Permalink
Post by Dave Korn
Post by Corinna Vinschen
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?
Nope, they don't, but that will probably not be the case forever, there are
(long-term) moves afoot to get SEH support into the compiler. However, we're
in early startup-and-init here; we don't need to worry about what the
application will do once it finally gets going.
Sorry, but AFAICS we are not in early startup-and-init. The threads.dll
library is a run-time loaded DLL via dlopen due to the

use threads;

statement in the script. This situation can occur at any point
during the runtime of an application.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2009-07-17 15:44:31 UTC
Permalink
Post by Corinna Vinschen
Post by Dave Korn
Post by Corinna Vinschen
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?
Nope, they don't, but that will probably not be the case forever, there are
(long-term) moves afoot to get SEH support into the compiler. However, we're
in early startup-and-init here; we don't need to worry about what the
application will do once it finally gets going.
Sorry, but AFAICS we are not in early startup-and-init. The threads.dll
library is a run-time loaded DLL via dlopen due to the
use threads;
statement in the script. This situation can occur at any point
during the runtime of an application.
Right, and I don't know how you could make the claim that Cygwin apps
don't install SEH handlers. We can't possibly know how every Cygwin app
does this. Obviously there's at least one app out there which has
decided that it needs to use Windows-specific methods to accomplish a
goal. I'm not exactly thrilled to see code which has decided to dig
deep into Windows internals. That's what Cygwin is supposed to prevent.

cgf
Eric Blake
2009-07-17 16:22:01 UTC
Permalink
Post by Christopher Faylor
Right, and I don't know how you could make the claim that Cygwin apps
don't install SEH handlers. We can't possibly know how every Cygwin app
does this. Obviously there's at least one app out there which has
decided that it needs to use Windows-specific methods to accomplish a
goal. I'm not exactly thrilled to see code which has decided to dig
deep into Windows internals. That's what Cygwin is supposed to prevent.
If cygwin were to provide a working sigaltstack, then libsigsegv would be able
to use that instead of digging into Windows internals (because that's the
interface that libsigsegv expects to be able to use on Linux). Until that
point, at least all clients of libsigsegv are messing with Windows handlers;
but at least libsigsegv is small and self-contained enough to track potential
portability problems if cygwin makes changes in this area. On the other hand,
even patching things to allow libsigsegv to use standardized interfaces won't
help apps like perl if they are doing SEH manipulations without the use of
libsigsegv.
--
Eric Blake
Christopher Faylor
2009-07-17 16:29:24 UTC
Permalink
Post by Christopher Faylor
Right, and I don't know how you could make the claim that Cygwin apps
don't install SEH handlers. We can't possibly know how every Cygwin
app does this. Obviously there's at least one app out there which has
decided that it needs to use Windows-specific methods to accomplish a
goal. I'm not exactly thrilled to see code which has decided to dig
deep into Windows internals. That's what Cygwin is supposed to prevent.
If cygwin were to provide a working sigaltstack,...
Yes. I got that the first time you said it. That has no effect on my
thrilledness.

cgf
Corinna Vinschen
2009-07-17 16:36:16 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Dave Korn
Post by Corinna Vinschen
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?
Nope, they don't, but that will probably not be the case forever, there are
(long-term) moves afoot to get SEH support into the compiler. However, we're
in early startup-and-init here; we don't need to worry about what the
application will do once it finally gets going.
Sorry, but AFAICS we are not in early startup-and-init. The threads.dll
library is a run-time loaded DLL via dlopen due to the
use threads;
statement in the script. This situation can occur at any point
during the runtime of an application.
Right, and I don't know how you could make the claim that Cygwin apps
don't install SEH handlers. We can't possibly know how every Cygwin app
I didn't make a claim, I asked.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2009-07-17 18:22:32 UTC
Permalink
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Dave Korn
Post by Corinna Vinschen
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?
Nope, they don't, but that will probably not be the case forever, there are
(long-term) moves afoot to get SEH support into the compiler. However, we're
in early startup-and-init here; we don't need to worry about what the
application will do once it finally gets going.
Sorry, but AFAICS we are not in early startup-and-init. The threads.dll
library is a run-time loaded DLL via dlopen due to the
use threads;
statement in the script. This situation can occur at any point
during the runtime of an application.
Right, and I don't know how you could make the claim that Cygwin apps
don't install SEH handlers. We can't possibly know how every Cygwin app
I didn't make a claim, I asked.
You didn't make the claim. I was responding to "Nope, they don't"

cgf
Steven Hartland
2009-07-26 13:03:07 UTC
Permalink
Just wanted to confirm upgrading to 1.7.0-25 ( with the SEH fix )
fixes this issue under Windows 2008 R2.

Thanks again to those involved in identifying and fixing this issue.

Regards
Steve

Dave Korn
2009-07-17 17:56:26 UTC
Permalink
Post by Christopher Faylor
Post by Dave Korn
Nope, they don't, but that will probably not be the case forever,
Right, and I don't know how you could make the claim that Cygwin apps
don't install SEH handlers.
Nah, it was a generalisation, and in fact I did have another example I was
going to mention but forgot while writing the email: TCL in gdb.

cheers,
DaveK
Eric Blake
2009-07-17 14:15:04 UTC
Permalink
Post by Corinna Vinschen
Do we have to take other handlers than the OS handlers and the Cygwin
handlers into account? Cygwin apps don't install SEH handlers, do
they? Or do C++ apps?
I believe that libsigsegv does its magic by installing an SEH handler.
libsigsegv is a C library, used (among others) by the cygwin build of m4
1.4.13. I also know that one of the new features in the recently announced
beta version of gawk is the addition of support for using libsigsegv. At any
rate, these snippets from the libsigsegv source are somewhat telling:

/* In Cygwin programs, SetUnhandledExceptionFilter has no effect because Cygwin
installs a global exception handler. We have to dig deep in order to install
our main_exception_filter. */

/* Data structures for the current thread's exception handler chain.
On the x86 Windows uses register fs, offset 0 to point to the current
exception handler; Cygwin mucks with it, so we must do the same... :-/ */

/* Magic taken from winsup/cygwin/include/exceptions.h. */

/* Cygwin's original exception handler. */
static int (*cygwin_exception_handler) (EXCEPTION_RECORD *, void *, CONTEXT *,
void *);

/* Our exception handler. */
static int
libsigsegv_exception_handler (EXCEPTION_RECORD *exception, void *frame, CONTEXT
*context, void *dispatch)
{
EXCEPTION_POINTERS ExceptionInfo;
ExceptionInfo.ExceptionRecord = exception;
ExceptionInfo.ContextRecord = context;
if (main_exception_filter (&ExceptionInfo) == EXCEPTION_CONTINUE_SEARCH)
return cygwin_exception_handler (exception, frame, context, dispatch);
else
return 0;
}


[m4 wouldn't need to use libsigsegv if cygwin provided sigaltstack, but that's
an entirely different can of worms.]
--
Eric Blake
Dave Korn
2009-07-17 14:36:34 UTC
Permalink
Post by Eric Blake
static int (*cygwin_exception_handler) (EXCEPTION_RECORD *, void *, CONTEXT *,
void *);
/* Our exception handler. */
static int
libsigsegv_exception_handler (EXCEPTION_RECORD *exception, void *frame, CONTEXT
*context, void *dispatch)
{
EXCEPTION_POINTERS ExceptionInfo;
ExceptionInfo.ExceptionRecord = exception;
ExceptionInfo.ContextRecord = context;
if (main_exception_filter (&ExceptionInfo) == EXCEPTION_CONTINUE_SEARCH)
return cygwin_exception_handler (exception, frame, context, dispatch);
else
return 0;
}
That looks fairly robust to me, shouldn't give us any problems. Question
is, what does the code that hooks and unhooks the exception handler look like,
and where does it get called from?
Post by Eric Blake
[m4 wouldn't need to use libsigsegv if cygwin provided sigaltstack, but that's
an entirely different can of worms.]
Nuns! Nuns! Reverse! Reverse! Reverse!

cheers,
DaveK
Eric Blake
2009-07-17 14:52:07 UTC
Permalink
Post by Dave Korn
That looks fairly robust to me, shouldn't give us any problems. Question
is, what does the code that hooks and unhooks the exception handler look like,
and where does it get called from?
static void
do_install_main_exception_filter ()
{
/* We cannot insert any handler into the chain, because such handlers
must lie on the stack (?). Instead, we have to replace(!) Cygwin's
global exception handler. */
cygwin_exception_handler = _except_list->handler;
_except_list->handler = libsigsegv_exception_handler;
}

static void
install_main_exception_filter ()
{
static int main_exception_filter_installed = 0;

if (!main_exception_filter_installed)
{
do_install_main_exception_filter ();
main_exception_filter_installed = 1;
}
}

It looks like it is installed, never uninstalled. And although the current
release of libsigsegv is a static-only library, Bruno is proud of the fact that
his libsigsegv package can be provided as a dynamic library even on cygwin (in
other words, the current cygwin maintainer of the libsigsegv package could
decide to pass the right configure options to make libsigsegv a .dll, at which
point a rebuild of m4 would then be subject to issues of a .dll playing with
the exception filter). Is there a chance that this represents a bug in
libsigsegv SEH handling that needs to be reported upstream?
--
Eric Blake
Reini Urban
2009-07-22 08:24:34 UTC
Permalink
Post by Eric Blake
  That looks fairly robust to me, shouldn't give us any problems.  Question
is, what does the code that hooks and unhooks the exception handler look like,
and where does it get called from?
static void
do_install_main_exception_filter ()
{
 /* We cannot insert any handler into the chain, because such handlers
    must lie on the stack (?).  Instead, we have to replace(!) Cygwin's
    global exception handler.  */
 cygwin_exception_handler = _except_list->handler;
 _except_list->handler = libsigsegv_exception_handler;
}
static void
install_main_exception_filter ()
{
 static int main_exception_filter_installed = 0;
 if (!main_exception_filter_installed)
   {
     do_install_main_exception_filter ();
     main_exception_filter_installed = 1;
   }
}
It looks like it is installed, never uninstalled.  And although the current
release of libsigsegv is a static-only library, Bruno is proud of the fact that
his libsigsegv package can be provided as a dynamic library even on cygwin (in
other words, the current cygwin maintainer of the libsigsegv package could
decide to pass the right configure options to make libsigsegv a .dll, at which
point a rebuild of m4 would then be subject to issues of a .dll playing with
the exception filter).
Good catch! I'll try the dll ASAP
(putting clisp-2.48, parrot-1.4.0 and postgresql-8.4.0 back in the pipeline)
Post by Eric Blake
Is there a chance that this represents a bug in
libsigsegv SEH handling that needs to be reported upstream?
I'll report that, if it turns out so.
--
Reini Urban
http://phpwiki.org/ http://murbreak.at/
Eric Blake
2009-07-22 11:26:59 UTC
Permalink
Post by Reini Urban
Post by Eric Blake
Is there a chance that this represents a bug in
libsigsegv SEH handling that needs to be reported upstream?
I'll report that, if it turns out so.
I've already mentioned it to Bruno, and am still working on a fix. I have
a simple testcase - on cygwin 1.5 or 1.7, calling open(NULL,O_RDONLY)
before installing the libsigsegv handler returns -1 with EFAULT, but
calling it after installing the handler kills the app with a spurious
claim of a sigsegv. But on Solaris, the same test case returns -1 with
EFAULT in both places.

Since SEH triggers for more reasons than SIGSEGV: the fix HAS to be that
the libsigsegv SEH handler inspects the faulting address, and if it is
stack overflow deals with it immediately (since cygwin has no sigaltstack
for libsigsegv to deal with it after SIGSEGV has been raised), but for
_all other addresses_, libsigsegv must let the address propagate onto the
cygwin SEH handler, and deal with all other faults only if they are
re-raised via a SIGSEGV handler (stack overflow is the only form of
SIGSEGV where an alternate stack is important; all other synchronous SEGV
can be dealt with in-place, and libsigsegv should not change behavior for
SEH faults that cygwin decides are not worthy of a SIGSEGV).

- --
Don't work too hard, make some time for fun as well!

Eric Blake ***@byu.net
Eric Blake
2009-07-22 17:23:58 UTC
Permalink
Post by Eric Blake
Post by Reini Urban
Post by Eric Blake
Is there a chance that this represents a bug in
libsigsegv SEH handling that needs to be reported upstream?
I'll report that, if it turns out so.
I've already mentioned it to Bruno, and am still working on a fix.
FWIW, rebuilding m4 1.4.13 picks up the new libsigsegv0 dll, but it still
exhibits the crash when used under cygwin-1.7.0-51 (ie. the bug is not whether
libsigsegv was linked in static or dynamic, but that libsigsegv is over-eager
to claim that all SEH faults should be handled like SEGV faults). The
particular m4 crash is gone with cygwin1.dll built today (thanks to my recent
newlib fflush patch). Meanwhile, I will be packaging m4-1.4.13-2 to pick up
the new libsigsegv0 (so that when Bruno does fix libsigsegv, I don't have to
rebuild m4-1.4.13-3 to relink against the fix), and that will also include a
one-liner patch (currently in m4.git) that makes the particular m4 crash go
away even if you don't have a self-built cygwin1.dll.
--
Eric Blake
Dave Korn
2009-07-16 23:11:31 UTC
Permalink
Post by Corinna Vinschen
So this exception handler is installed as part of the Perl threads DLL
initialization. But appanrelty the address is not valid anymore when
leaving the DLL initialization.
Is it possible that we have to remove the exception handler before
dll_dllcrt0_1 returns?
I wouldn't think so. I would guess this is an utterly bogus bit of code in
Perl, the equivalent of returning the address-of a stack auto variable when
you return from the function in whose scope it's allocated, and needs tracking
down in the upstream sources. It should unlink its exception registration
before it returns.

cheers,
DaveK
Christopher Faylor
2009-07-16 16:49:05 UTC
Permalink
Post by Corinna Vinschen
Post by Dave Korn
(gdb) x/xw 0x7ffde000
0x7ffde000: 0x0022ce68
... on the stack, as you might expect, and walk the chain, first word of each
(gdb) x/2xw 0x0022ce68
0x22ce68: 0x0022ffe0 0x61028770
(gdb) x 0x61028770
0x61028770 <_ZN7_cygtls17handle_exceptionsEP17_EXCEPTION_RECORDP15_exception_lis
tP8_CONTEXTPv>: 0x57e58955
(gdb) x/2xw 0x0022ffe0
0x22ffe0: 0xffffffff 0x7c4ff0b4
(gdb) x 0x7c4ff0b4
0x7c4ff0b4 <SetProcessPriorityBoost+86>: 0x83ec8b55
(gdb)
0xffffffff in the chain pointer means final entry, and 0x7c4ff0b4 is
somewhere in kernel32.dll, it's presumably the last resort fault handler. The
important point was we verified that the cygwin exception handler is first in
the chain, so we'd expect it to be called by the NULL dereference (set a
breakpoint there too, just in case something goes wrong shortly after it
enters) when we step into it. If there was something else first, we'd know
where to start looking, if not, we'd have to suspect the OS has decided not to
call the SEH chain at all for some reason.
Thanks again for your help. I had the funny idea to examine the
SEH chain before the myfault handler gets installed. That's what
(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0028ce68
(gdb) x/2xw 0x0028ce68
0x28ce68: 0x0028ffc4 0x6103ce20 <-- Cygwin exception handler
tP8_CONTEXTPv>: 0x57e58955
(gdb) x/2xw 0x0028ffc4
0x28ffc4: 0x0028ffe4 0x77cc03dd <-- OS
(gdb) x/2xw 0x0028ffe4
0x28ffe4: 0xffffffff 0x77d16900 <-- OS
(gdb) x/xw 0x7efdd000
0x7efdd000: 0x0088ce68
(gdb) x/2xw 0x0088ce68
0x88ce68: 0x0088400c 0x6103ce20 <-- Cygwin exception handler
(gdb) x/2xw 0x0088400c
0x88400c: 0x00000000 0x00000001 <-- Huh?
This looks wrong, doesn't it? The question is now, how and why does
that happen?
I don't have the output in front of me but I saw something that had three things
in the chain. The first was ours, the second was an OS function which
seemed somehow thread related, the third looked bogus but not bogus like
the above.

cgf
Christopher Faylor
2009-07-15 16:23:17 UTC
Permalink
Post by Corinna Vinschen
----- Original Message ----- From: "Christopher Faylor"
<cgf-use...>
http://cygwin.com/acronyms/#PCYMTNQREAIYR
Post by Christopher Faylor
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation reading location 0x00000004.
No, sorry, it really doesn't help. The VC++ debugger doesn't know how
to handle cygwin exceptions.
Was just trying to get a hint of the area of the problem since gdb doesn't
actually break when it happens this seemed to be the only way to get that
info.
Any pointers on how I can help narrow down the issue?
I can reproduce the problem on my 2008 R2 box. It works fine on Windows
7 x64, though, so it's a Server thingy.
What happens is that this statement
if ((*object)->magic != magic)
in the function thread.cc:verifyable_object_isvalid throws an exception
because *object is NULL. This should be covered by the myfault handler
in this function but for some reason it isn't.
To debug this further I created a STC(TM)(*) which does the same as the
==== SNIP ====
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
pthread_attr_t attr;
void *thr (void *arg)
{
printf ("I'm a thread\n");
return NULL;
}
int main()
{
pthread_t t;
int i, r;
void *ret;
fprintf (stderr, "Testing threads...\n");
i = pthread_attr_init (&attr);
printf ("i = %d\n", i);
r = pthread_create (&t, &attr, thr, NULL);
if (r)
fprintf (stderr, "pthread_create: %d %s\n", errno, strerror (errno));
else
pthread_join (t, &ret);
fprintf (stderr, "Testing done\n");
return 0;
}
==== SNAP ====
I can't try this right now myself but what about trying various settings
for a SIGSEGV signal handler?

cgf
Corinna Vinschen
2009-07-15 17:29:57 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
==== SNIP ====
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
pthread_attr_t attr;
void *thr (void *arg)
{
printf ("I'm a thread\n");
return NULL;
}
int main()
{
pthread_t t;
int i, r;
void *ret;
fprintf (stderr, "Testing threads...\n");
i = pthread_attr_init (&attr);
printf ("i = %d\n", i);
r = pthread_create (&t, &attr, thr, NULL);
if (r)
fprintf (stderr, "pthread_create: %d %s\n", errno, strerror (errno));
else
pthread_join (t, &ret);
fprintf (stderr, "Testing done\n");
return 0;
}
==== SNAP ====
I can't try this right now myself but what about trying various settings
for a SIGSEGV signal handler?
No SIGSEGV setting has any visible effect. In the Perl testcase
_cygtls::handle_exceptions is just not called, in the C testcase
it's always called.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Reini Urban
2009-07-16 02:07:23 UTC
Permalink
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation
reading location 0x00000004.
According to gdb 0x610d089d = thread.cc:113
Thanks!

This looks like almost certainly a simple perl bug. Threads was Jerry Heddens
working arena lately, but there are complicated things going in core.
If it's easily reproducible best would be to start with a debugging perl
and break at the point which tries to read from 0x4.

BTW: I thought about adding -debug packages in general (to cygport) as
with fedora,
but got distracted somewhere.
--
Reini Urban
Reini Urban
2009-07-16 02:21:41 UTC
Permalink
Post by Reini Urban
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation
reading location 0x00000004.
According to gdb 0x610d089d = thread.cc:113
Thanks!
This looks like almost certainly a simple perl bug. Threads was Jerry Heddens
working arena lately, but there are complicated things going in core.
If it's easily reproducible best would be to start with a debugging perl
and break at the point which tries to read from 0x4.
Sorry, cannot reproduce either
with the following perls: 5.8.5 5.8.5d 5.8.6 5.8.8 5.10.0 5.10.0d 5.11.0d
under cygwin-1.5.25 and XP SP2
and neither under latest cygwin-1.7.0
Corinna Vinschen
2009-07-16 09:03:26 UTC
Permalink
Post by Reini Urban
Post by Reini Urban
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation
reading location 0x00000004.
According to gdb 0x610d089d = thread.cc:113
Thanks!
This looks like almost certainly a simple perl bug. Threads was Jerry Heddens
working arena lately, but there are complicated things going in core.
If it's easily reproducible best would be to start with a debugging perl
and break at the point which tries to read from 0x4.
Sorry, cannot reproduce either
with the following perls: 5.8.5 5.8.5d 5.8.6 5.8.8 5.10.0 5.10.0d 5.11.0d
under cygwin-1.5.25 and XP SP2
and neither under latest cygwin-1.7.0
It's a Windows Server 2008 thingy. MSFT added some code to the OS which
checks for the validity of the SEH chain. The code is not compiled in
to the client OSes, only to the Server OSes. We checked in a patch back
in early 2008 to deal with this mechanism. That's probably the reason
that the C testcase works fine. Apparently Perl adds something which
makes the SEH chain invalid again from 2008's point of view.

Btw., *everybody* can test on Windows Server 2008:
http://www.microsoft.com/downloads/details.aspx?familyid=B6E99D4C-A40E-4FD2-A0F7-32212B520F50


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Dave Korn
2009-07-16 09:42:14 UTC
Permalink
Post by Corinna Vinschen
http://www.microsoft.com/downloads/details.aspx?familyid=B6E99D4C-A40E-4FD2-A0F7-32212B520F50
Ooh, that's handy. Perhaps more relevant: the R2 candidate?

http://www.microsoft.com/downloads/details.aspx?familyid=A4E21E2E-E992-4AEC-9ED4-086DE21632A2

cheers,
DaveK
Corinna Vinschen
2009-07-16 09:42:42 UTC
Permalink
Post by Dave Korn
Post by Corinna Vinschen
http://www.microsoft.com/downloads/details.aspx?familyid=B6E99D4C-A40E-4FD2-A0F7-32212B520F50
Ooh, that's handy. Perhaps more relevant: the R2 candidate?
http://www.microsoft.com/downloads/details.aspx?familyid=A4E21E2E-E992-4AEC-9ED4-086DE21632A2
The problem occurs on any 2008 so it shouldn't matter. I like testing
on 32 bit systems first to take the 64 bit stuff out of the picture for
a start. R2 is only available as 64 bit system unfortunately.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2009-07-16 03:01:43 UTC
Permalink
Post by Reini Urban
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation
reading location 0x00000004.
According to gdb 0x610d089d = thread.cc:113
Thanks!
This looks like almost certainly a simple perl bug. Threads was Jerry Heddens
working arena lately, but there are complicated things going in core.
If it's easily reproducible best would be to start with a debugging perl
and break at the point which tries to read from 0x4.
I can reproduce it. It looks like perl (or more likely Windows) is adding
something extra to the SEH chain. I'm too tired to track it down any further
tonight though.

cgf
Corinna Vinschen
2009-07-16 16:13:30 UTC
Permalink
Post by Christopher Faylor
Post by Reini Urban
Post by Steven Hartland
Unhandled exception at 0x610d089d in perl.exe: 0xC0000005: Access violation
reading location 0x00000004.
According to gdb 0x610d089d = thread.cc:113
Thanks!
This looks like almost certainly a simple perl bug. Threads was Jerry Heddens
working arena lately, but there are complicated things going in core.
If it's easily reproducible best would be to start with a debugging perl
and break at the point which tries to read from 0x4.
I can reproduce it. It looks like perl (or more likely Windows) is adding
something extra to the SEH chain. I'm too tired to track it down any further
tonight though.
http://cygwin.com/ml/cygwin/2009-07/msg00584.html

Do you have an idea?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Continue reading on narkive:
Loading...