Discussion:
cant access to files more than 128 utf-8 symbol long names
Nikolay Ilychev
2013-12-10 07:15:48 UTC
Permalink
Hello!

When using cygwin, i can't list, copy, remove files and directories with
128 utf-8 symbol long names.

useless examples that illustrates the problem:

it is OK with latin symbols:

$ a="$(perl -e 'print "x"x255')"; touch "$a" && { ls "$a"; rm "$a"; }
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

$ a="$(perl -e 'print "x"x256')"; touch "$a" && { ls "$a"; rm "$a"; }
touch: cannot touch
`xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx':
File name too long

but if i try with cyrillic, max long is just 127 symbols:

$ a="$(perl -e 'print "\xd0\xaf"x127')"; touch "$a" && { ls "$a"; rm "$a"; }
ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ

$ a="$(perl -e 'print "\xd0\xaf"x128')"; touch "$a" && { ls "$a"; rm "$a"; }
touch: cannot touch
`ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ':
File name too long


but users can create whith cmd.exe or powershell.exe or explorer.exe
files with 250+ long names with cyrillic symbols:

$ a="$(perl -e 'print "\xd0\xaf"x251')"; cmd /C "echo > $a" && ls -l
ls: cannot access
ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ-:
No such file or directory
total 0
-????????? ? ? ? ? ?
ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ?

and with cygwin i have no any access to this files or directories:

$ rm *; ls -l
rm: cannot remove
`ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ\320':
No such file or directory
ls: cannot access
ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ-:
No such file or directory
total 0
-????????? ? ? ? ? ?
ЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯЯ?

same problem with other tools - find, perl, rsync from cygwin repo.

Please, make the MAX_PATH not for 260 bytes, but 260 utf-8 symbols.

Thanks.
Andrey Repin
2013-12-10 09:38:09 UTC
Permalink
Greetings, Nikolay Ilychev!
Post by Nikolay Ilychev
Please, make the MAX_PATH not for 260 bytes, but 260 utf-8 symbols.
You can't just "make MAX_PATH", this is an Operating System (i.e. Windows, not
Cygwin) constant.


--
WBR,
Andrey Repin (***@yandex.ru) 10.12.2013, <13:37>

Sorry for my terrible english...
Corinna Vinschen
2013-12-10 10:27:55 UTC
Permalink
Post by Nikolay Ilychev
Hello!
When using cygwin, i can't list, copy, remove files and directories
with 128 utf-8 symbol long names.
[...]
same problem with other tools - find, perl, rsync from cygwin repo.
Please, make the MAX_PATH not for 260 bytes, but 260 utf-8 symbols.
Easier said than done.

First of all, this is NOT about MAX_PATH. MAX_PATH (260 chars) is the
number of characters allowed in the Win32 ANSI file API for a complete
path, including the terminating null. Cygwin is using the native NT API
and, occasionally, the Win32 UNICODE file API, which allows paths of up
to 32767 chars.

The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the
"maximum number of bytes in a filename (not including the terminating
null)."

Note the word *bytes*. Not characters, bytes. UTF-8 chars are 1 to 4
bytes in length. Thus, the maximum number of UTF-8 chars in a filename
is potentially less than NAME_MAX:

A filename of chars only from the basic latin charset (1 byte in UTF-8)
may consist of NAME_MAX characters, a filename solely constructed from
chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX /
2 characters, a filename constructed from emoticons (4 byte chars) only
of NAME_MAX / 4 chars.

Ok, so we all know that Windows is not using a byte representation of
filenames, rather the OS uses UTF-16 to store and handle filenames
internally. Filename on Windows filesystems may consist of 255 UTF-16
chars[2].

How do you represent this in a byte-oriented POSIX system? What do you
set NAME_MAX to? You can't get it right due to the unfortunate multibyte
vs. UTF-16 encoding issue.

To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then,
applications relying on NAME_MAX will be surprised by ENAMETOOLONG
errors for perfectly valid POSIX filenames.

If you make it 255, applications will be surprised by ENAMETOOLONG
errors for perfectly valid Windows filenames.

If you make it 255 on the application level but then return filenames
longer than 255 multibyte chars to the application, they will crash
due to buffer overflow issues. After all, NAME_MAX is a contractual
obligation.

There was also the backward compatibility issue. Back in the pre-Cygwin
1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255.
Changing that to a bigger value might have resulted in the
aforementioned application crashes due to buffer overflows as well.

So we decided to keep NAME_MAX at the same value as it always was, 255.
This restricts the actual filename length when using multibyte
characters just as on any other POSIX system with the downside that,
occasionally, a Windows filename will be too long to handle.

Sorry if that is frustrating in your current situation, but this
isn't something we can just change at a whim and go ahead. It would
break compatibility with all existing Cygwin executables.


Corinna


[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html
[2] However, this does *not* cover NFS or other filesystems using a
byte representation for storing filenames.
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Noel Grandin
2013-12-10 14:48:12 UTC
Permalink
Sorry if that is frustrating in your current situation, but this isn't something we can just change at a whim and go
ahead. It would break compatibility with all existing Cygwin executables.
Maybe this is something that could be fixed only in the 64-bit version of Cygwin?

That would limit the compatibility damage.
Corinna Vinschen
2013-12-10 15:32:59 UTC
Permalink
Post by Noel Grandin
Post by Corinna Vinschen
Sorry if that is frustrating in your current situation, but this
isn't something we can just change at a whim and go ahead. It
would break compatibility with all existing Cygwin executables.
Maybe this is something that could be fixed only in the 64-bit version of Cygwin?
Did you really read my mail? There is no fix. You can handle this
wrongly one way or the other. If in doubt, I prefer the POSIXly correct
way.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2013-12-10 16:29:15 UTC
Permalink
Post by Corinna Vinschen
Post by Noel Grandin
Sorry if that is frustrating in your current situation, but this isn't
something we can just change at a whim and go ahead. It would break
compatibility with all existing Cygwin executables.
Maybe this is something that could be fixed only in the 64-bit version of Cygwin?
Did you really read my mail? There is no fix. You can handle this
wrongly one way or the other. If in doubt, I prefer the POSIXly
correct way.
Also, even if it was something that could be "fixed", since the 64-bit
version of Cygwin has been out for some time now, breaking backwards
compatibility would still be bad.
Andrey Repin
2013-12-11 09:15:25 UTC
Permalink
Greetings, Corinna Vinschen!
Post by Corinna Vinschen
Post by Noel Grandin
Post by Corinna Vinschen
Sorry if that is frustrating in your current situation, but this
isn't something we can just change at a whim and go ahead. It
would break compatibility with all existing Cygwin executables.
Maybe this is something that could be fixed only in the 64-bit version of Cygwin?
Did you really read my mail? There is no fix. You can handle this
wrongly one way or the other. If in doubt, I prefer the POSIXly correct
way.
After off-list discussion, Nikolay partially solved this issue by using
locale-appropriate single-byte encoding in LANG.
In this case,

LANG=ru_RU.CP1251

It is far from a perfect solution, but at least let him access the files
in question.


--
WBR,
Andrey Repin (***@yandex.ru) 11.12.2013, <13:12>

Sorry for my terrible english...
Andrey Repin
2013-12-11 07:04:39 UTC
Permalink
Greetings, Corinna Vinschen!
Post by Corinna Vinschen
The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the
"maximum number of bytes in a filename (not including the terminating
null)."
Does this mean that POSIX standard is not compatible with real life?
No surprise I was having hard times copying a rather simple directory
structure to a UNIX servers. Just 2 levels deep with 4-5 words in each
element name.
Post by Corinna Vinschen
Note the word *bytes*. Not characters, bytes. UTF-8 chars are 1 to 4
bytes in length. Thus, the maximum number of UTF-8 chars in a filename
A filename of chars only from the basic latin charset (1 byte in UTF-8)
may consist of NAME_MAX characters, a filename solely constructed from
chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX /
2 characters, a filename constructed from emoticons (4 byte chars) only
of NAME_MAX / 4 chars.
Ok, so we all know that Windows is not using a byte representation of
filenames, rather the OS uses UTF-16 to store and handle filenames
internally. Filename on Windows filesystems may consist of 255 UTF-16
chars[2].
How do you represent this in a byte-oriented POSIX system? What do you
set NAME_MAX to? You can't get it right due to the unfortunate multibyte
vs. UTF-16 encoding issue.
To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then,
applications relying on NAME_MAX will be surprised by ENAMETOOLONG
errors for perfectly valid POSIX filenames.
If you make it 255, applications will be surprised by ENAMETOOLONG
errors for perfectly valid Windows filenames.
If you make it 255 on the application level but then return filenames
longer than 255 multibyte chars to the application, they will crash
due to buffer overflow issues. After all, NAME_MAX is a contractual
obligation.
There was also the backward compatibility issue. Back in the pre-Cygwin
1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255.
Changing that to a bigger value might have resulted in the
aforementioned application crashes due to buffer overflows as well.
So we decided to keep NAME_MAX at the same value as it always was, 255.
This restricts the actual filename length when using multibyte
characters just as on any other POSIX system with the downside that,
occasionally, a Windows filename will be too long to handle.
Sorry if that is frustrating in your current situation, but this
isn't something we can just change at a whim and go ahead. It would
break compatibility with all existing Cygwin executables.
Corinna
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html
[2] However, this does *not* cover NFS or other filesystems using a
byte representation for storing filenames.
--
WBR,
Andrey Repin (***@yandex.ru) 11.12.2013, <10:55>

Sorry for my terrible english...
Corinna Vinschen
2013-12-11 11:24:01 UTC
Permalink
Post by Andrey Repin
Greetings, Corinna Vinschen!
Post by Corinna Vinschen
The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the
"maximum number of bytes in a filename (not including the terminating
null)."
Does this mean that POSIX standard is not compatible with real life?
Are you asking nonsensical questions for fun or did you not read my mail
closely, too? I made the effort to reply to the OP with a detailed mail
explaining the issue. I don't understand what this sniding reaction is
supposed to accomplish. POSIX and Windows are not naturally compatible.
Cygwin tries hard to bridge the gap, but sometimes the gap is really
wide.

But thanks anyway for providing a solution to the problem by setting the
locale environment variables. That might make a good FAQ entry, *iff*
somebody has the incentive to write one.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mikhail Usenko
2013-12-11 13:49:22 UTC
Permalink
On Tue, 10 Dec 2013 11:27:55 +0100
Post by Corinna Vinschen
Easier said than done.
Cygwin is using the native NT API
and, occasionally, the Win32 UNICODE file API, which allows paths of up
to 32767 chars.
...
How do you represent this in a byte-oriented POSIX system? What do you
set NAME_MAX to? You can't get it right due to the unfortunate multibyte
vs. UTF-16 encoding issue.
To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then,
applications relying on NAME_MAX will be surprised by ENAMETOOLONG
errors for perfectly valid POSIX filenames.
If you make it 255, applications will be surprised by ENAMETOOLONG
errors for perfectly valid Windows filenames.
Strictly speaking, the NAME_MAX and PATH_MAX POSIX' limits must be 32767*4 bytes, that is ~128K on Windows systems. With such a value no one Cygwin application running on the Windows does not come across the ENAMETOOLONG error because of the nonexistence of the actual filenames with this length (and hence POSIX filenames too). Did I understand rigth?

--
Corinna Vinschen
2013-12-11 14:09:07 UTC
Permalink
Post by Mikhail Usenko
On Tue, 10 Dec 2013 11:27:55 +0100
Post by Corinna Vinschen
Easier said than done.
Cygwin is using the native NT API
and, occasionally, the Win32 UNICODE file API, which allows paths of up
to 32767 chars.
...
How do you represent this in a byte-oriented POSIX system? What do you
set NAME_MAX to? You can't get it right due to the unfortunate multibyte
vs. UTF-16 encoding issue.
To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then,
applications relying on NAME_MAX will be surprised by ENAMETOOLONG
errors for perfectly valid POSIX filenames.
If you make it 255, applications will be surprised by ENAMETOOLONG
errors for perfectly valid Windows filenames.
Strictly speaking, the NAME_MAX and PATH_MAX POSIX' limits must be
32767*4 bytes, that is ~128K on Windows systems. With such a value no
Strictly speaking you're wrong. NAME_MAX is the length of a single
path component, not the length of a path:

NAME_MAX
vvv
/foo/bar/baz\0
^^^^^^^^^^^^^^
PATH_MAX

Also, PATH_MAX is NOT the maximum length of a path, but the

"Maximum number of bytes the implementation will store as a pathname
in a user-supplied buffer of unspecified size, including the
terminating null character."

That does not mean there are no longer paths possible, just that you
have to use, for instance, relative paths rather than absolute paths, if
the absolute path becomes longer than PATH_MAX, and that the system
does not guarantee to return paths if they are longer then PATH_MAX.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mikhail Usenko
2013-12-11 15:02:52 UTC
Permalink
I couldn't figure out how a POSIX filename passed to a Cygwin application running on the Windows system may become longer than NAME_MAX=1020 bytes if the maximum filename length in NTFS is 255 UTF-16 symbols (i.e. 1020 bytes for the biggest 4 byte UTF-8 code unit)?
What causes the ENAMETOOLONG error? In the most of POSIX functions ENAMETOOLONG is returned if the length of a component of a pathname is longer than {NAME_MAX} or the length of a pathname exceeds {PATH_MAX}. On NTFS there is no files with pathname component longer than 1020 bytes and the length of the full pathname is limited by the Unicode API (32767 chars * 4 byte = 128KiB).
--
Corinna Vinschen
2013-12-11 15:23:39 UTC
Permalink
Post by Mikhail Usenko
I couldn't figure out how a POSIX filename passed to a Cygwin
application running on the Windows system may become longer than
NAME_MAX=1020 bytes if the maximum filename length in NTFS is 255
UTF-16 symbols (i.e. 1020 bytes for the biggest 4 byte UTF-8 code
unit)?
Read my mail again. NAME_MAX is 255.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mikhail Usenko
2013-12-11 15:27:54 UTC
Permalink
On Wed, 11 Dec 2013 16:23:39 +0100
Post by Corinna Vinschen
Post by Mikhail Usenko
I couldn't figure out how a POSIX filename passed to a Cygwin
application running on the Windows system may become longer than
NAME_MAX=1020 bytes if the maximum filename length in NTFS is 255
UTF-16 symbols (i.e. 1020 bytes for the biggest 4 byte UTF-8 code
unit)?
Read my mail again. NAME_MAX is 255.
Corinna
Corinna, why not 1020?


--
Corinna Vinschen
2013-12-11 16:21:37 UTC
Permalink
Post by Mikhail Usenko
On Wed, 11 Dec 2013 16:23:39 +0100
Post by Corinna Vinschen
Post by Mikhail Usenko
I couldn't figure out how a POSIX filename passed to a Cygwin
application running on the Windows system may become longer than
NAME_MAX=1020 bytes if the maximum filename length in NTFS is 255
UTF-16 symbols (i.e. 1020 bytes for the biggest 4 byte UTF-8 code
unit)?
Read my mail again. NAME_MAX is 255.
Corinna
Corinna, why not 1020?
That's answered in my original mail.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2013-12-11 16:30:32 UTC
Permalink
Post by Corinna Vinschen
Post by Mikhail Usenko
On Wed, 11 Dec 2013 16:23:39 +0100
Post by Corinna Vinschen
Post by Mikhail Usenko
I couldn't figure out how a POSIX filename passed to a Cygwin
application running on the Windows system may become longer than
NAME_MAX=1020 bytes if the maximum filename length in NTFS is 255
UTF-16 symbols (i.e. 1020 bytes for the biggest 4 byte UTF-8 code
unit)?
Read my mail again. NAME_MAX is 255.
Corinna
Corinna, why not 1020?
That's answered in my original mail.
Perhaps this will require reiteration and reclarification on Thursday,
feline-permitting.

YMMV!

cgf
Corinna Vinschen
2013-12-11 17:01:03 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Mikhail Usenko
On Wed, 11 Dec 2013 16:23:39 +0100
Post by Corinna Vinschen
Post by Mikhail Usenko
I couldn't figure out how a POSIX filename passed to a Cygwin
application running on the Windows system may become longer than
NAME_MAX=1020 bytes if the maximum filename length in NTFS is 255
UTF-16 symbols (i.e. 1020 bytes for the biggest 4 byte UTF-8 code
unit)?
Read my mail again. NAME_MAX is 255.
Corinna
Corinna, why not 1020?
That's answered in my original mail.
Perhaps this will require reiteration and reclarification on Thursday,
feline-permitting.
And it's not even my WJM week. Can we move that to Thursday next week?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2013-12-11 17:49:22 UTC
Permalink
Post by Corinna Vinschen
Post by Christopher Faylor
Perhaps this will require reiteration and reclarification on Thursday,
feline-permitting.
And it's not even my WJM week. Can we move that to Thursday next week?
Sorry, no. I can't allow that. But, then, it's my week.

cgf

Continue reading on narkive:
Search results for 'cant access to files more than 128 utf-8 symbol long names' (Questions and Answers)
28
replies
HELP ME PLEASE?!. With spyware problems, I can't get it off my computer.?
started 2007-09-10 12:31:29 UTC
security
Loading...