Discussion:
File operations really slow in emacs
Ryan Johnson
2012-02-11 01:18:24 UTC
Permalink
Hi all,

For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file that's
managed by mercurial (more than 20 seconds!), but I've seen it on the
command line as well (x-server takes a similar amount of time to start,
for example). I'm running the latest everything and I've run rebaseall.
I verified that Windows Defender did not silently re-enable itself since
I last disabled it (you can't actually uninstall it) and no other BLODA
are present on my machine. The problem persists across reboots.

I have vague memories that this has turned up in the past (maybe 12-15
months ago?) but Google isn't turning up anything. Attaching strace to
emacs during the save makes it take a full 35 seconds and reports the
following:

$ cat emacs.strace | awk '{if ($1 > 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp 0x264
low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60, 65536)
blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
2672291 36527941 [main] python2.6 5188 fhandler_disk_file::readdir: 0 =
readdir(0x800C84F0, 0x284BDC) (L"bookmarks.pyc" > "bookmarks.pyc") (attr
0x20 > type 8)
1076207 64638764 [main] emacs 6568
fhandler_base_overlapped::wait_overlapped: wfres 0, wores 1, bytes 16

There's some seriously long latencies going on there... 27s for
child_copy, 26s for stat_worker, and 2.7s for readdir? FYI, that call to
read() that blocks for 1.1s is accessing a .pyc module in /usr/share,
not related to the file I actually tried to save.

Relevant package versions, courtesy of cygcheck:
cygwin 1.7.10-1 OK
emacs 23.4-1 OK
emacs-X11 23.4-1 OK
mercurial 1.9.3-1 OK
python 2.6.7-1 OK

Any ideas what I might try next?

Thanks,
Ryan
Corinna Vinschen
2012-02-11 10:11:58 UTC
Permalink
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1 > 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-11 14:21:11 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
Hmm. I'm feeling both confused and enlightened now...

1. What about child_copy? (see below)

2. Running that same stat operation from the shell is equally painful:

$ time strace -mall -o stat.strace stat /cygdrive
File: `/cygdrive'
Size: 0 Blocks: 0 IO Block: 65536 directory
Device: 620000h/6422528d Inode: 2 Links: 4
Access: (0555/dr-xr-xr-x) Uid: ( 1000/ Ryan) Gid: ( 513/ None)
Access: 2012-02-11 09:17:12.000000000 -0500
Modify: 2012-02-11 09:17:12.000000000 -0500
Change: 2006-11-30 19:00:00.000000000 -0500
Birth: 2006-11-30 19:00:00.000000000 -0500

real 0m26.186s
user 0m0.030s
sys 0m0.015s

3. How might I diagnose what network activity could be the culprit? I
didn't think I was hosting or mounting any SMB shares... and certainly
not through cygwin (Q: below is my ThinkPad's recovery partition):

$ mount
C:/cygwin/bin on /usr/bin type ntfs (binary,auto)
C:/cygwin/lib on /usr/lib type ntfs (binary,auto)
C:/cygwin on / type ntfs (binary,auto)
C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto)
Q: on /cygdrive/q type ntfs (binary,posix=0,user,noumount,auto)

Thoughts?
Ryan

=== More details about #1 ===

So, what about the 26.9s call to child_copy? I ran a few more times and
the two don't strike me as strongly correlated. It's more like the true
cause sometimes hits both together:

$ strace -mall -p $(pgrep emacs) | awk '{if ($1 > 1000000) { print }}' |
grep -v timer_thread
[-- open hg-managed file -- ]
25799065 25801126 [main] emacs-X11 8016 child_copy: dll bss - hp 0x288
low 0x611FC000, high 0x61230770, res 1
32048267 32049470 [main] emacs-X11 7584 child_copy: dll bss - hp 0x278
low 0x611FC000, high 0x61230770, res 1
1127799 1841167 [main] python2.6 7584 read: read(5, 0x8009DB60, 65536)
blocking
1272387 38372453 [main] emacs 7284
fhandler_base_overlapped::wait_overlapped: wfres 0, wores 1, bytes 3
[-- save --]
50382655 50383904 [main] emacs-X11 1248 child_copy: dll bss - hp 0x290
low 0x611FC000, high 0x61230770, res 1
1095874 56238299 [main] emacs 7284
fhandler_base_overlapped::wait_overlapped: wfres 0, wores 1, bytes 16
[-- save --]
87436648 87439221 [main] emacs-X11 7668 child_copy: dll bss - hp 0x278
low 0x611FC000, high 0x61230770, res 1
26064678 31598419 [main] python2.6 7668 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
1575028 168999100 [main] emacs 7284 select_stuff::wait: woke up.
wait_ret 2. verifying
[-- save --]
170053330 170056051 [main] emacs-X11 7000 child_copy: dll bss - hp 0x22C
low 0x611FC000, high 0x61230770, res 1
1065439 1965054 [main] python2.6 7000 read: read(5, 0x8009DB60, 65536)
blocking
25893986 30974179 [main] python2.6 7000 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)


Also, here's a run that traces all mentions of C:\cygwin\cygdrive (this
time I created a new file in an hg-managed dir):

$ strace -mall -p $(pgrep emacs) | awk '/C:.cygwin.cygdrive/ { print }
{if ($1 > 1000000) { print }}' | grep -v timer_thread
41 4893387 [main] emacs 7652 mount_info::conv_to_win32_path:
src_path /cygdrive, dst C:\cygwin\cygdrive, flags 0x3000A, rc 0
36 4910555 [main] emacs 7652 mount_info::conv_to_win32_path:
src_path /cygdrive, dst C:\cygwin\cygdrive, flags 0x3000A, rc 0
34 4957910 [main] emacs 7652 mount_info::conv_to_win32_path:
src_path /cygdrive, dst C:\cygwin\cygdrive, flags 0x3000A, rc 0
9686534 9689074 [main] emacs-X11 5736 child_copy: dll bss - hp 0x27C low
0x611FC000, high 0x61230770, res 1
1063066 1765712 [main] python2.6 5736 read: read(5, 0x8009DB60, 65536)
blocking
33 5904760 [main] python2.6 5736 mount_info::conv_to_win32_path:
src_path /cygdrive, dst C:\cygwin\cygdrive, flags 0x3000A, rc 0
37 5904838 [main] python2.6 5736 stat_worker:
(\??\C:\cygwin\cygdrive, 0x28BB68, 0x612666A4), file_attributes 17
25766283 31671159 [main] python2.6 5736 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
25766283 31671159 [main] python2.6 5736 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
1368187 42267858 [main] emacs 7652
fhandler_base_overlapped::wait_overlapped: wfres 0, wores 1, bytes 36
42558979 42560468 [main] emacs-X11 7988 child_copy: dll bss - hp 0x26C
low 0x611FC000, high 0x61230770, res 1
1071523 1569246 [main] python2.6 7988 read: read(5, 0x8009DB60, 65536)
blocking
40 5738526 [main] python2.6 7988 mount_info::conv_to_win32_path:
src_path /cygdrive, dst C:\cygwin\cygdrive, flags 0x3000A, rc 0
41 5738609 [main] python2.6 7988 stat_worker:
(\??\C:\cygwin\cygdrive, 0x28BB68, 0x612666A4), file_attributes 17
455 5739102 [main] python2.6 7988 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)

So... why would python/hg feel a need to look at /cygdrive at all, and
why does looking at it take such a long time?
Christopher Faylor
2012-02-11 17:24:16 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
Hmm. I'm feeling both confused and enlightened now...
1. What about child_copy? (see below)
$ time strace -mall -o stat.strace stat /cygdrive
File: `/cygdrive'
Size: 0 Blocks: 0 IO Block: 65536 directory
Device: 620000h/6422528d Inode: 2 Links: 4
Access: (0555/dr-xr-xr-x) Uid: ( 1000/ Ryan) Gid: ( 513/ None)
Access: 2012-02-11 09:17:12.000000000 -0500
Modify: 2012-02-11 09:17:12.000000000 -0500
Change: 2006-11-30 19:00:00.000000000 -0500
Birth: 2006-11-30 19:00:00.000000000 -0500
real 0m26.186s
user 0m0.030s
sys 0m0.015s
3. How might I diagnose what network activity could be the culprit? I
didn't think I was hosting or mounting any SMB shares... and certainly
$ mount
C:/cygwin/bin on /usr/bin type ntfs (binary,auto)
C:/cygwin/lib on /usr/lib type ntfs (binary,auto)
C:/cygwin on / type ntfs (binary,auto)
C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto)
Q: on /cygdrive/q type ntfs (binary,posix=0,user,noumount,auto)
Thoughts?
Ryan
=== More details about #1 ===
So, what about the 26.9s call to child_copy? I ran a few more times and
the two don't strike me as strongly correlated. It's more like the true
I don't see anything in the above which indicates a 26.9s call to
child_copy. The delta number you see in front of an strace line doesn't
mean "this is how long this operation took". It means "this is how long
it's been since the previous line in the file."

cgf
Ryan Johnson
2012-02-11 19:50:31 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
Hmm. I'm feeling both confused and enlightened now...
1. What about child_copy? (see below)
$ time strace -mall -o stat.strace stat /cygdrive
File: `/cygdrive'
Size: 0 Blocks: 0 IO Block: 65536 directory
Device: 620000h/6422528d Inode: 2 Links: 4
Access: (0555/dr-xr-xr-x) Uid: ( 1000/ Ryan) Gid: ( 513/ None)
Access: 2012-02-11 09:17:12.000000000 -0500
Modify: 2012-02-11 09:17:12.000000000 -0500
Change: 2006-11-30 19:00:00.000000000 -0500
Birth: 2006-11-30 19:00:00.000000000 -0500
real 0m26.186s
user 0m0.030s
sys 0m0.015s
3. How might I diagnose what network activity could be the culprit? I
didn't think I was hosting or mounting any SMB shares... and certainly
$ mount
C:/cygwin/bin on /usr/bin type ntfs (binary,auto)
C:/cygwin/lib on /usr/lib type ntfs (binary,auto)
C:/cygwin on / type ntfs (binary,auto)
C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto)
Q: on /cygdrive/q type ntfs (binary,posix=0,user,noumount,auto)
Thoughts?
Ryan
=== More details about #1 ===
So, what about the 26.9s call to child_copy? I ran a few more times and
the two don't strike me as strongly correlated. It's more like the true
I don't see anything in the above which indicates a 26.9s call to
child_copy. The delta number you see in front of an strace line doesn't
mean "this is how long this operation took". It means "this is how long
it's been since the previous line in the file."
So for situations where cpu usage is ~0% and the latency is measured in
seconds, is it reasonable to infer that the preceding line with the same
pid might be the culprit?

Ryan
Christopher Faylor
2012-02-11 21:01:40 UTC
Permalink
Post by Ryan Johnson
Post by Christopher Faylor
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
Hmm. I'm feeling both confused and enlightened now...
1. What about child_copy? (see below)
$ time strace -mall -o stat.strace stat /cygdrive
File: `/cygdrive'
Size: 0 Blocks: 0 IO Block: 65536 directory
Device: 620000h/6422528d Inode: 2 Links: 4
Access: (0555/dr-xr-xr-x) Uid: ( 1000/ Ryan) Gid: ( 513/ None)
Access: 2012-02-11 09:17:12.000000000 -0500
Modify: 2012-02-11 09:17:12.000000000 -0500
Change: 2006-11-30 19:00:00.000000000 -0500
Birth: 2006-11-30 19:00:00.000000000 -0500
real 0m26.186s
user 0m0.030s
sys 0m0.015s
3. How might I diagnose what network activity could be the culprit? I
didn't think I was hosting or mounting any SMB shares... and certainly
$ mount
C:/cygwin/bin on /usr/bin type ntfs (binary,auto)
C:/cygwin/lib on /usr/lib type ntfs (binary,auto)
C:/cygwin on / type ntfs (binary,auto)
C: on /cygdrive/c type ntfs (binary,posix=0,user,noumount,auto)
Q: on /cygdrive/q type ntfs (binary,posix=0,user,noumount,auto)
Thoughts?
Ryan
=== More details about #1 ===
So, what about the 26.9s call to child_copy? I ran a few more times and
the two don't strike me as strongly correlated. It's more like the true
I don't see anything in the above which indicates a 26.9s call to
child_copy. The delta number you see in front of an strace line doesn't
mean "this is how long this operation took". It means "this is how long
it's been since the previous line in the file."
So for situations where cpu usage is ~0% and the latency is measured in
seconds, is it reasonable to infer that the preceding line with the same
pid might be the culprit?
It completely depends on what the previous line was. There could be a long
period of frantic activity which was never straced so you can't necessarily
infer anything.

cgf
Ryan Johnson
2012-02-13 13:31:30 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
Turns out you were right after all. I have Z: mapped to a SMB share
that's only visible when I'm connected to the VPN it lives on; I hadn't
used it in a few months and it didn't show up in Explorer, but it was on
the list of drives returned by GetLogicalDriveStrings(). Connecting the
drive makes everything run at normal speed, and disconnecting it makes
the problem return a short time later.

Oddly, I've never observed the effect when running from an elevated
prompt -- I can reliably fire off 'stat /cygdrive' in a normal prompt,
open an elevated mintty, run the same command there, get the results,
and close the window, all well before the first stat completes.

So, three questions:
- why is the elevated prompt unaffected?
- why does hg feel a need to access /cygdrive?
- is there a workaround? Neither "always run elevated" nor "always keep
all network drives mounted" seems like a reasonable requirement, but
that elevated prompt is looking mighty nice right about now. I suppose I
could also drop the for loop from fhandler_cygdrive::fstat and set
st_nlink to either 3 or 2+ndrives (on the assumption that the change
won't affect anyone's decision to unlink /cygdrive and that the number
is otherwise meaningless).

Thoughts?
Ryan
Corinna Vinschen
2012-02-13 14:58:57 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
Turns out you were right after all. I have Z: mapped to a SMB share
that's only visible when I'm connected to the VPN it lives on; I
hadn't used it in a few months and it didn't show up in Explorer,
but it was on the list of drives returned by
GetLogicalDriveStrings(). Connecting the drive makes everything run
at normal speed, and disconnecting it makes the problem return a
short time later.
Oddly, I've never observed the effect when running from an elevated
prompt -- I can reliably fire off 'stat /cygdrive' in a normal
prompt, open an elevated mintty, run the same command there, get the
results, and close the window, all well before the first stat
completes.
- why is the elevated prompt unaffected?
Because the elevated token is not connected to the drives of the
non-elevated token by default. There's a registry key which allows to
change that, but off the top of my head I don't know it. Search MSDN,
this question comes up since the first Vista release candidates.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-14 13:37:51 UTC
Permalink
Bump?
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
is there a workaround? Neither "always run elevated" nor "always keep
all network drives mounted" seems like a reasonable requirement
Corinna Vinschen
2012-02-14 13:52:46 UTC
Permalink
Bump?
Stagger!
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
is there a workaround? Neither "always run elevated" nor "always
keep all network drives mounted" seems like a reasonable
requirement
What are you expecting? Was my reply in
http://cygwin.com/ml/cygwin/2012-02/msg00375.html not sufficient?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-14 14:44:39 UTC
Permalink
Post by Corinna Vinschen
Bump?
Stagger!
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
For some reason file operations have become very slow inside emacs
starting yesterday. It's especially painful when saving a file
that's managed by mercurial (more than 20 seconds!), but I've seen
it on the command line as well (x-server takes a similar amount of
time to start, for example). I'm running the latest everything and
I've run rebaseall. I verified that Windows Defender did not
silently re-enable itself since I last disabled it (you can't
actually uninstall it) and no other BLODA are present on my machine.
The problem persists across reboots.
I have vague memories that this has turned up in the past (maybe
12-15 months ago?) but Google isn't turning up anything. Attaching
strace to emacs during the save makes it take a full 35 seconds and
$ cat emacs.strace | awk '{if ($1> 1000000) { print }}' | grep -v
timer_thread
26910790 26912157 [main] emacs-X11 5188 child_copy: dll bss - hp
0x264 low 0x611FC000, high 0x61230770, res 1
1128419 2125655 [main] python2.6 5188 read: read(5, 0x8009DB60,
65536) blocking
25850184 32830582 [main] python2.6 5188 stat_worker: 0 =
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
is there a workaround? Neither "always run elevated" nor "always
keep all network drives mounted" seems like a reasonable
requirement
What are you expecting? Was my reply in
http://cygwin.com/ml/cygwin/2012-02/msg00375.html not sufficient?
The reply explains why running elevated avoids the problem -- apparently
a side-effect of Windows' user token handling.

It does not explain why it's a good idea to always run elevated to get a
side effect that compensates for bad behavior which is arguably a bug
(though that's what I'm doing right now for lack of a better option -- I
often work off-grid, so I can't always have all network drives mapped).

AFAICT, `stat /cydrive` runs into trouble because it enumerates all
drive letters using GetFileAttributes, and only counts local drives as
"links" to the "directory" : 2 + ndrives - nfloppies - nnonlocal. This
relies on the fact (a side effect?) that GetFileAttributes returns
ERROR_BAD_NETPATH for network shares (but apparently only after timing
out an attempt to connect disconnected ones). Not sure what happens for
USB drives (are they "floppies" ?). Is there no other way to enumerate
the local drives, and even if there isn't, does anybody actually care
about that particular link count? AFAIK, directory link counts only
matter when you want to run fsck (which cygwin doesn't have) or delete a
directory. Even if cygwin's rm pays attention to link counts, which I
doubt, anyone issuing `rm -rf /cygdrive` has far bigger problems on
their hands.

Ryan
Corinna Vinschen
2012-02-14 15:17:45 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
is there a workaround? Neither "always run elevated" nor "always
keep all network drives mounted" seems like a reasonable
requirement
What are you expecting? Was my reply in
http://cygwin.com/ml/cygwin/2012-02/msg00375.html not sufficient?
The reply explains why running elevated avoids the problem --
apparently a side-effect of Windows' user token handling.
It does not explain why it's a good idea to always run elevated to
get a side effect that compensates for bad behavior which is
arguably a bug (though that's what I'm doing right now for lack of a
better option -- I often work off-grid, so I can't always have all
network drives mapped).
AFAICT, `stat /cydrive` runs into trouble because it enumerates all
drive letters using GetFileAttributes, and only counts local drives
as "links" to the "directory" : 2 + ndrives - nfloppies - nnonlocal.
That's only for stat and, yes, that can be removed and the link
set to 1, as for disk-based directories.

But that's not all. GetFileAttributes is called in readdir as well, and
if it works, the subsequent code tries to open the drive and fetch the
inode number. The inode number is important because otherwise find(1)
and other tools might print confused warnings.

So, even if we fix fstat, it doesn't solve the problem for readdir. The
GetFileAttributes call is obviously supposed to find out if the drive is
accessible. If not, it's omitted from the cygdrive dir. Unfortunately...

Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-14 15:47:04 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
(\??\C:\cygwin\cygdrive,0x28BB68)
^^^^^^^^^^^^^^^^^^^^^^^
This looks suspicious. I assume you're suffering from SMB network
scanning.
is there a workaround? Neither "always run elevated" nor "always
keep all network drives mounted" seems like a reasonable
requirement
What are you expecting? Was my reply in
http://cygwin.com/ml/cygwin/2012-02/msg00375.html not sufficient?
The reply explains why running elevated avoids the problem --
apparently a side-effect of Windows' user token handling.
It does not explain why it's a good idea to always run elevated to
get a side effect that compensates for bad behavior which is
arguably a bug (though that's what I'm doing right now for lack of a
better option -- I often work off-grid, so I can't always have all
network drives mapped).
AFAICT, `stat /cydrive` runs into trouble because it enumerates all
drive letters using GetFileAttributes, and only counts local drives
as "links" to the "directory" : 2 + ndrives - nfloppies - nnonlocal.
That's only for stat and, yes, that can be removed and the link
set to 1, as for disk-based directories.
But that's not all. GetFileAttributes is called in readdir as well, and
if it works, the subsequent code tries to open the drive and fetch the
inode number. The inode number is important because otherwise find(1)
and other tools might print confused warnings.
So, even if we fix fstat, it doesn't solve the problem for readdir. The
GetFileAttributes call is obviously supposed to find out if the drive is
accessible. If not, it's omitted from the cygdrive dir. Unfortunately...
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
I was also thinking about this readdir vs. stat thing after my last
post... I've never noticed `ls /cygdrive` being a problem. This is why I
thought it was emacs at first, and why I didn't notice z: at first.
Strangely, bash auto-completion for `/cygdrive/^I` sometimes is fast and
sometimes is slow.

I was going to suggest doing in fhandler_cygdrive::fstat whatever
fhandler_cygdrive::readdir does, but source diving confirms that the two
functions do essentially the same thing (huh???). Even more strangely,
none of my open terminals exhibits the problem right this minute, even
though some of them have been open this whole time. There must be some
external factor that makes Windows sometimes try to connect those drives
and sometimes not.

What if we parsed the mount table instead of calling readdir? I don't
know how that's computed, but it's never been a performance problem, it
only shows drives that are actually connected, and everything in
/cygdrive should be mounted (if not, fhandler_cygdrive::readdir and stat
are both broken).

Thoughts?
Ryan
Corinna Vinschen
2012-02-14 16:26:56 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
[...]
What if we parsed the mount table instead of calling readdir? I
don't know how that's computed, but it's never been a performance
problem, it only shows drives that are actually connected [...]
What mount table? Cygwin's? It calls GetFileAttributes on the drive's
root dir as well...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-14 17:46:41 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
[...]
What if we parsed the mount table instead of calling readdir? I
don't know how that's computed, but it's never been a performance
problem, it only shows drives that are actually connected [...]
What mount table? Cygwin's? It calls GetFileAttributes on the drive's
root dir as well...
This is bizarre... what would cause calls to the same Windows API
function behave so differently when called by stat vs ls vs
bash-autocomplete? I'm happy to accept that there's some weirdness on my
box, but I would have expected that weirdness to be consistent at any
given instant in time (either all go slow or all behave normally).

At least the problem has gone back into hiding for the moment... I guess
I'll just have to hope Windows doesn't change its mind again for a while.

Thanks for looking into this.
Ryan
Corinna Vinschen
2012-02-14 17:57:35 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
[...]
What if we parsed the mount table instead of calling readdir? I
don't know how that's computed, but it's never been a performance
problem, it only shows drives that are actually connected [...]
What mount table? Cygwin's? It calls GetFileAttributes on the drive's
root dir as well...
This is bizarre... what would cause calls to the same Windows API
function behave so differently when called by stat vs ls vs
bash-autocomplete? I'm happy to accept that there's some weirdness
on my box, but I would have expected that weirdness to be consistent
at any given instant in time (either all go slow or all behave
normally).
SMB just is not consistent. More often than not the timing behaviour is
just plain puzzeling. And, btw., in *my* testing I got hangs in mount
as well if I disabled the remote share. But only once. Subsequent
calls were fast. And after enabling the remote share, mount happily
ignored that fact for about a minute or so. Caching, anybody?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-14 19:50:04 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
[...]
What if we parsed the mount table instead of calling readdir? I
don't know how that's computed, but it's never been a performance
problem, it only shows drives that are actually connected [...]
What mount table? Cygwin's? It calls GetFileAttributes on the drive's
root dir as well...
This is bizarre... what would cause calls to the same Windows API
function behave so differently when called by stat vs ls vs
bash-autocomplete? I'm happy to accept that there's some weirdness
on my box, but I would have expected that weirdness to be consistent
at any given instant in time (either all go slow or all behave
normally).
SMB just is not consistent. More often than not the timing behaviour is
just plain puzzeling. And, btw., in *my* testing I got hangs in mount
as well if I disabled the remote share. But only once. Subsequent
calls were fast. And after enabling the remote share, mount happily
ignored that fact for about a minute or so. Caching, anybody?
Heisenburg? Impossible to know both what SMB share a mount points to and
whether it's currently connected?

It's really unfortunate Windows doesn't have a GetLocalDrives() or
GetAccessibleDriveLetters() function. Actually, isn't there a function
to convert DOS paths to those funky //?/ paths? Maybe that would be both
fast and give enough information to keep stat() happy; readdir() would
still be out of luck tho.

Ryan
Corinna Vinschen
2012-02-15 09:24:39 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
[...]
What if we parsed the mount table instead of calling readdir? I
don't know how that's computed, but it's never been a performance
problem, it only shows drives that are actually connected [...]
What mount table? Cygwin's? It calls GetFileAttributes on the drive's
root dir as well...
This is bizarre... what would cause calls to the same Windows API
function behave so differently when called by stat vs ls vs
bash-autocomplete? I'm happy to accept that there's some weirdness
on my box, but I would have expected that weirdness to be consistent
at any given instant in time (either all go slow or all behave
normally).
SMB just is not consistent. More often than not the timing behaviour is
just plain puzzeling. And, btw., in *my* testing I got hangs in mount
as well if I disabled the remote share. But only once. Subsequent
calls were fast. And after enabling the remote share, mount happily
ignored that fact for about a minute or so. Caching, anybody?
Heisenburg? Impossible to know both what SMB share a mount points to
and whether it's currently connected?
It's really unfortunate Windows doesn't have a GetLocalDrives() or
GetAccessibleDriveLetters() function. Actually, isn't there a
function to convert DOS paths to those funky //?/ paths? Maybe that
\\?\ is nothing but a Win32 path prefix which tells the kernel32
routines to omit the step to convert to native NT paths. The problem is
that the conversion buffers have a fixed size of MAX_PATH characters,
so Win32 paths without the prefix are restricted to 259 chars. So
in fact, there's no difference between the paths other than to omit
a conversion step. Apart from that the paths are equivalent:

standard Win32 C:\dir\file \\server\share\file
"long-path" Win32 \\?\C:\dir\file \\?\UNC\server\share\file
native NT \??\C:\dir\file \??\UNC\server\share\file
Post by Ryan Johnson
would be both fast and give enough information to keep stat() happy;
Not at all. It's all the same file and the underlying NT functions
will do the same in all cases.

But I already changed cygdrive::fstat yesterday to set st_nlinks to 1
without calling GetFileAttributes in a loop.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-02-16 11:09:44 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Heisenburg? Impossible to know both what SMB share a mount points to
and whether it's currently connected?
Almost. You have to access the share to find out if it is really
connected because the information returned from NetUseGetInfo that the
drive is disconnected could be outdated. This is deep within the way
SMB works. Either you wait up to, I'm not quite sure, 60 minutes I
think, or you just plunge into the drive and try to access it and
potentially suffer network timeouts.
Post by Corinna Vinschen
Post by Ryan Johnson
It's really unfortunate Windows doesn't have a GetLocalDrives() or
GetAccessibleDriveLetters() function. Actually, isn't there a
function to convert DOS paths to those funky //?/ paths? Maybe that
\\?\ is nothing but a Win32 path prefix which tells the kernel32
routines to omit the step to convert to native NT paths. The problem is
that the conversion buffers have a fixed size of MAX_PATH characters,
so Win32 paths without the prefix are restricted to 259 chars. So
in fact, there's no difference between the paths other than to omit
standard Win32 C:\dir\file \\server\share\file
"long-path" Win32 \\?\C:\dir\file \\?\UNC\server\share\file
native NT \??\C:\dir\file \??\UNC\server\share\file
Post by Ryan Johnson
would be both fast and give enough information to keep stat() happy;
Not at all. It's all the same file and the underlying NT functions
will do the same in all cases.
But I already changed cygdrive::fstat yesterday to set st_nlinks to 1
without calling GetFileAttributes in a loop.
I just applied a patch which calls NetUseGetInfo on SMB drives in
the cygdrive::readdir call. As I mentioned above, if the function
returns OK, we fetch the inode number. If the function returns
"Disconnected", we just omit the drive from the cygdrive directory.
If the drive is available again, it might not be noticed by the
NetUseGetInfo function for a long while. But as soon as you access
the drive successfully, the info will be updated in the OS, and the
NetUseGetInfo function will happily return OK again. This new
behaviour is not a swiss army knife since that's impossible with
SMB. But it might be better suited then the former code. I'm
just going to create a new snapshot. Please test.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-02-16 13:17:35 UTC
Permalink
Hi Corinna,
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Ryan Johnson
Heisenburg? Impossible to know both what SMB share a mount points to
and whether it's currently connected?
Almost. You have to access the share to find out if it is really
connected because the information returned from NetUseGetInfo that the
drive is disconnected could be outdated. This is deep within the way
SMB works. Either you wait up to, I'm not quite sure, 60 minutes I
think, or you just plunge into the drive and try to access it and
potentially suffer network timeouts.
Post by Corinna Vinschen
Post by Ryan Johnson
It's really unfortunate Windows doesn't have a GetLocalDrives() or
GetAccessibleDriveLetters() function. Actually, isn't there a
function to convert DOS paths to those funky //?/ paths? Maybe that
\\?\ is nothing but a Win32 path prefix which tells the kernel32
routines to omit the step to convert to native NT paths. The problem is
that the conversion buffers have a fixed size of MAX_PATH characters,
so Win32 paths without the prefix are restricted to 259 chars. So
in fact, there's no difference between the paths other than to omit
standard Win32 C:\dir\file \\server\share\file
"long-path" Win32 \\?\C:\dir\file \\?\UNC\server\share\file
native NT \??\C:\dir\file \??\UNC\server\share\file
Post by Ryan Johnson
would be both fast and give enough information to keep stat() happy;
Not at all. It's all the same file and the underlying NT functions
will do the same in all cases.
But I already changed cygdrive::fstat yesterday to set st_nlinks to 1
without calling GetFileAttributes in a loop.
I just applied a patch which calls NetUseGetInfo on SMB drives in
the cygdrive::readdir call. As I mentioned above, if the function
returns OK, we fetch the inode number. If the function returns
"Disconnected", we just omit the drive from the cygdrive directory.
If the drive is available again, it might not be noticed by the
NetUseGetInfo function for a long while. But as soon as you access
the drive successfully, the info will be updated in the OS, and the
NetUseGetInfo function will happily return OK again. This new
behaviour is not a swiss army knife since that's impossible with
SMB. But it might be better suited then the former code. I'm
just going to create a new snapshot. Please test.
That sounds like a reasonable approach (how do you figure out which
drive letters are network drives before calling NetUseGetInfo, btw? That
would allow stat /cygdrive to return proper link counts).

Unfortunately, the fingerprint reader on my machine is buggy and
sometimes crashes winlogon... machine technically still running but
utterly inaccessible; I've not been able to repro since rebooting.

I'd really like to know what caused the slowdowns before... I kind of
doubt the fingerprint reader was at fault.

BTW, this latency problem has been observed before [1]. There's no real
solution, but one reader suggested using a second thread to call
CancelSynchronousIo if you lose patience before the call returns. From
the docs on MSDN[2], though, there's a long list of pretty icky caveats
that may limit its usefulness in practice. Others [3] have suggested
that calling FindFirstFile first eliminates the latency, though I have
to wonder if that would actually be helpful.

[1]
http://stackoverflow.com/questions/1142080/how-to-avoid-network-stalls-in-getfileattributes
[2]
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363789%28v=vs.85%29.aspx
[3]
http://embarcadero.newsgroups.archived.at/public.delphi.nativeapi/201004/1004223088.html

Ryan
Corinna Vinschen
2012-02-16 13:44:14 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
I just applied a patch which calls NetUseGetInfo on SMB drives in
the cygdrive::readdir call. As I mentioned above, if the function
returns OK, we fetch the inode number. If the function returns
"Disconnected", we just omit the drive from the cygdrive directory.
If the drive is available again, it might not be noticed by the
NetUseGetInfo function for a long while. But as soon as you access
the drive successfully, the info will be updated in the OS, and the
NetUseGetInfo function will happily return OK again. This new
behaviour is not a swiss army knife since that's impossible with
SMB. But it might be better suited then the former code. I'm
just going to create a new snapshot. Please test.
That sounds like a reasonable approach (how do you figure out which
drive letters are network drives before calling NetUseGetInfo, btw?
http://sourceware.org/cgi-bin/cvsweb.cgi/src/winsup/cygwin/mount.cc.diff?cvsroot=src&r1=1.86&r2=1.87
Post by Ryan Johnson
That would allow stat /cygdrive to return proper link counts).
Nah, that's not needed. All GNU tools are fine with directories with
link count of 1. That's what we get from NTFS drives anyway, so why
bother?
Post by Ryan Johnson
BTW, this latency problem has been observed before [1]. There's no
real solution, but one reader suggested using a second thread to
call CancelSynchronousIo if you lose patience before the call
Not an option. CancelSynchronousIo is only available starting with
Vista. Running the read in a thread and killing the thread if a signal
arrives is probably better. That's how the network scanning code
for // is implemented.
Post by Ryan Johnson
Others [3] have suggested that calling FindFirstFile first
eliminates the latency, though I have to wonder if that would
actually be helpful.
Not to get the inode number of the share directory.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-02-14 17:18:15 UTC
Permalink
Post by Corinna Vinschen
So, even if we fix fstat, it doesn't solve the problem for readdir. The
GetFileAttributes call is obviously supposed to find out if the drive is
accessible. If not, it's omitted from the cygdrive dir. Unfortunately...
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
I just looked into this and I really don't see a way. While there's a
NetUseGetInfo call, which is pretty fast even for unavailable drives,
it's not reliable. Even if the drive is available again, it can take
minutes in which it still returns a status of "Session lost". I'm not
sure this is what we want.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-02-14 17:26:57 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
So, even if we fix fstat, it doesn't solve the problem for readdir. The
GetFileAttributes call is obviously supposed to find out if the drive is
accessible. If not, it's omitted from the cygdrive dir. Unfortunately...
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
I just looked into this and I really don't see a way. While there's a
NetUseGetInfo call, which is pretty fast even for unavailable drives,
it's not reliable. Even if the drive is available again, it can take
minutes in which it still returns a status of "Session lost". I'm not
sure this is what we want.
...and the call doesn't work for NFS drives. Too bad.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Mark Geisert
2012-02-15 04:35:58 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
I just looked into this and I really don't see a way. While there's a
NetUseGetInfo call, which is pretty fast even for unavailable drives,
it's not reliable. Even if the drive is available again, it can take
minutes in which it still returns a status of "Session lost". I'm not
sure this is what we want.
...and the call doesn't work for NFS drives. Too bad.
Does WNetGetConnection() do any better? It's referenced on the NetUseGetInfo()
page in MSDN. Claims to support other providers besides SMB.

Apart from that, is "net use" the mount table Ryan was referring to? Can we
tell what it's doing to identify connected and disconnected drives?

..mark
Corinna Vinschen
2012-02-15 09:33:52 UTC
Permalink
Post by Mark Geisert
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Corinna Vinschen
Does anybody know a system call which allows to fetch the network drive
state (connected/not connected) without a billion microsecond timeout?
I just looked into this and I really don't see a way. While there's a
NetUseGetInfo call, which is pretty fast even for unavailable drives,
it's not reliable. Even if the drive is available again, it can take
minutes in which it still returns a status of "Session lost". I'm not
sure this is what we want.
...and the call doesn't work for NFS drives. Too bad.
Does WNetGetConnection() do any better? It's referenced on the NetUseGetInfo()
page in MSDN. Claims to support other providers besides SMB.
Yes, that's right. Alas, BTDT. The function returns success even
if the share becomes unavailable.
Post by Mark Geisert
Apart from that, is "net use" the mount table Ryan was referring to? Can we
tell what it's doing to identify connected and disconnected drives?
Given it's import table it uses all functions available. I see at
least these:

NetUseEnum
NetUseGetInfo
WNetCloseEnum
WNetEnumResourceW
WNetOpenEnumW
WNetGetConnectionW
WNetGetLastErrorW
WNetCancelConnection2W
WNetAddConnection2W

But "net use" is not quite accurate either. If you switch off a
remote share it recognizes the disconmnection pretty fast, but if
the share becomes available again, it stays in the disconnected state
for quite some time, just like the NetUseGetInfo function.

But, on second thought, maybe that's ok for us. It would at least
help for SMB drives. I'll look into that again at one point.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Loading...