Discussion:
Limited regex support in newlib cripples syntax highlighting in nano
(too old to reply)
Tomi Belan
2008-08-01 08:03:24 UTC
Permalink
Cygwin regex.h implementation doesn't support some special sequences,
for example \< (beginning of word), \> (end of word) and \b (word
boundary). This causes a usability bug with the nano editor, which
uses these sequences extensively in most of its syntax highlighting
rules. For example, in /usr/share/nano/c.nanorc, one of the rules goes
like this:
color brightyellow "\<(for|if|while|do|else|case|default|switch)\>"
\< and \> are used so that other words, e.g. 'fifo', 'elsewhere',
'undo' don't match, as they aren't keywords.

Test case 1:
Open nano and write a test message, for example "if a fifo is full".
Do a regexp search (using ^W M-R) for "\<if\>". No matches will be
found, because \< and \> don't have their special meaning and simply
match the characters < and >.

Test case 2:
To include C syntax rules, do: echo 'include
"/usr/share/nano/c.nanorc"' >> ~/.nanorc
Open a C source file in nano. Enable syntax highlighting with M-Y.
Strings and preprocessor instructions are highlighted (because those
rules don't contain \< \>), but keywords (e.g. if, for, return)
aren't.

I tested both nano 2.0.6 (currently in cygwin) and 2.0.7 (latest
stable version, compiled manually from source). The problem wasn't
present in Linux glibc nano 2.0.7. When I found out that Cygwin uses
newlib instead of glibc, it led me to believe that insufficient regex
support in newlib might be the cause.

I attached the output of cygcheck -s -r -v, but I doubt this is a
problem with my system configuration.

Thank you for your time reading this.
Tomi Belan
Corinna Vinschen
2008-08-07 13:45:11 UTC
Permalink
Post by Tomi Belan
Cygwin regex.h implementation doesn't support some special sequences,
for example \< (beginning of word), \> (end of word) and \b (word
boundary).
Cygwin's regex implements POSIX regular expressions as described in,
for instance,
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html

What you're looking for are perl regular expressions and they are
only available if nano is built against the perl regex library which
apparently it isn't.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Lapo Luchini
2008-08-28 21:00:16 UTC
Permalink
Post by Corinna Vinschen
Post by Tomi Belan
Cygwin regex.h implementation doesn't support some special sequences,
for example \< (beginning of word), \> (end of word) and \b (word
boundary).
What you're looking for are perl regular expressions and they are
only available if nano is built against the perl regex library which
apparently it isn't.
There isn't any configure option to use pcre from nano, and including
pcreposix.h instead of regex.h strangely produces compilation errors.

I'll take a deeper look and produce a nano-2.0.8 package ASAP.

Lapo
Lapo Luchini
2008-08-29 09:43:42 UTC
Permalink
Post by Lapo Luchini
Post by Corinna Vinschen
Post by Tomi Belan
Cygwin regex.h implementation doesn't support some special sequences,
for example \< (beginning of word), \> (end of word) and \b (word
boundary).
What you're looking for are perl regular expressions and they are
only available if nano is built against the perl regex library which
apparently it isn't.
There isn't any configure option to use pcre from nano, and including
pcreposix.h instead of regex.h strangely produces compilation errors.
Nope, that was a silly typo, I manager to produce the package.
Unfortunately some of the distributed .nanorc syntax files are not
compatible with PCRE syntax so I'll have to work on it a bit more.
Expect a new release shortly.
--
Lapo Luchini - http://lapo.it/

“ECC curves are divided into three groups, weak curves, inefficient
curves, and curves patented by Certicom” (Peter Gutmann, 2001-08-10)
Reini Urban
2008-08-29 10:39:58 UTC
Permalink
2008/8/1 Tomi Belan wrote as "Limited regex support in newlib cripples
syntax highlighting in nano"
Post by Tomi Belan
Cygwin regex.h implementation doesn't support some special sequences,
for example \< (beginning of word), \> (end of word) and \b (word
boundary). This causes a usability bug with the nano editor, which
uses these sequences extensively in most of its syntax highlighting
rules.
Posix regex is much faster then perl-style pcre regex. Syntax
highlighter usually prefer fast over complete. So the term "crippled"
should be used with care.

See e.g. http://swtch.com/~rsc/regexp/regexp1.html - Regular
Expression Matching Can Be Simple And Fast (but is slow in Java, Perl,
PHP, Python, Ruby, ...) - which complains about the typical POSIX
spencer implementation also.
--
Reini Urban
http://phpwiki.org/ http://murbreak.at/
Lapo Luchini
2008-09-02 06:17:45 UTC
Permalink
Post by Reini Urban
2008/8/1 Tomi Belan wrote as "Limited regex support in newlib cripples
syntax highlighting in nano"
Post by Tomi Belan
Cygwin regex.h implementation doesn't support some special sequences,
for example \< (beginning of word), \> (end of word) and \b (word
boundary). This causes a usability bug with the nano editor, which
uses these sequences extensively in most of its syntax highlighting
rules.
Posix regex is much faster then perl-style pcre regex. Syntax
highlighter usually prefer fast over complete. So the term "crippled"
should be used with care.
See e.g. http://swtch.com/~rsc/regexp/regexp1.html - Regular
Expression Matching Can Be Simple And Fast (but is slow in Java, Perl,
PHP, Python, Ruby, ...) - which complains about the typical POSIX
spencer implementation also.
Interesting paper!

Judging from the linked efficient implementations, it could be
interesting to have TRE library <http://laurikari.net/tre/> in Cygwin.

But judging from our own "man regexp" it should already have
back-references (??):
"Regexec is largely insensitive to RE complexity except that back
references are massively expensive."

Loading...