X-Git-Url: http://ftp.carnet.hr/carnet-debian/scm?a=blobdiff_plain;ds=sidebyside;f=src%2Fexternal%2Fpcre2-10.32%2Fdoc%2Fpcre2syntax.3;fp=src%2Fexternal%2Fpcre2-10.32%2Fdoc%2Fpcre2syntax.3;h=c392bfb01cd07a7f829cda0a3f269e365a2638d4;hb=3f728675941dc69d4e544d3a880a56240a6e394a;hp=0000000000000000000000000000000000000000;hpb=927951d1c1ad45ba9e7325f07d996154a91c911b;p=ossec-hids.git diff --git a/src/external/pcre2-10.32/doc/pcre2syntax.3 b/src/external/pcre2-10.32/doc/pcre2syntax.3 new file mode 100644 index 0000000..c392bfb --- /dev/null +++ b/src/external/pcre2-10.32/doc/pcre2syntax.3 @@ -0,0 +1,626 @@ +.TH PCRE2SYNTAX 3 "02 September 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" +.rs +.sp +The full syntax and semantics of the regular expressions that are supported by +PCRE2 are described in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. This document contains a quick-reference summary of the syntax. +. +. +.SH "QUOTING" +.rs +.sp + \ex where x is non-alphanumeric is a literal x + \eQ...\eE treat enclosed characters as literal +. +. +.SH "ESCAPED CHARACTERS" +.rs +.sp +This table applies to ASCII and Unicode environments. +.sp + \ea alarm, that is, the BEL character (hex 07) + \ecx "control-x", where x is any ASCII printing character + \ee escape (hex 1B) + \ef form feed (hex 0C) + \en newline (hex 0A) + \er carriage return (hex 0D) + \et tab (hex 09) + \e0dd character with octal code 0dd + \eddd character with octal code ddd, or backreference + \eo{ddd..} character with octal code ddd.. + \eU "U" if PCRE2_ALT_BSUX is set (otherwise is an error) + \eN{U+hh..} character with Unicode code point hh.. (Unicode mode only) + \euhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set) + \exhh character with hex code hh + \ex{hh..} character with hex code hh.. +.sp +Note that \e0dd is always an octal code. The treatment of backslash followed by +a non-zero digit is complicated; for details see the section +.\" HTML +.\" +"Non-printing characters" +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation, where details of escape processing in EBCDIC environments are +also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not +supported in EBCDIC environments. Note that \eN not followed by an opening +curly bracket has a different meaning (see below). +.P +When \ex is not followed by {, from zero to two hexadecimal digits are read, +but if PCRE2_ALT_BSUX is set, \ex must be followed by two hexadecimal digits to +be recognized as a hexadecimal escape; otherwise it matches a literal "x". +Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits, +it matches a literal "u". +. +. +.SH "CHARACTER TYPES" +.rs +.sp + . any character except newline; + in dotall mode, any character whatsoever + \eC one code unit, even in UTF mode (best avoided) + \ed a decimal digit + \eD a character that is not a decimal digit + \eh a horizontal white space character + \eH a character that is not a horizontal white space character + \eN a character that is not a newline + \ep{\fIxx\fP} a character with the \fIxx\fP property + \eP{\fIxx\fP} a character without the \fIxx\fP property + \eR a newline sequence + \es a white space character + \eS a character that is not a white space character + \ev a vertical white space character + \eV a character that is not a vertical white space character + \ew a "word" character + \eW a "non-word" character + \eX a Unicode extended grapheme cluster +.sp +\eC is dangerous because it may leave the current matching point in the middle +of a UTF-8 or UTF-16 character. The application can lock out the use of \eC by +setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2 +with the use of \eC permanently disabled. +.P +By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode +or in the 16-bit and 32-bit libraries. However, if locale-specific matching is +happening, \es and \ew may also match characters with code points in the range +128-255. If the PCRE2_UCP option is set, the behaviour of these escape +sequences is changed to use Unicode properties and they match many more +characters. +. +. +.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP" +.rs +.sp + C Other + Cc Control + Cf Format + Cn Unassigned + Co Private use + Cs Surrogate +.sp + L Letter + Ll Lower case letter + Lm Modifier letter + Lo Other letter + Lt Title case letter + Lu Upper case letter + L& Ll, Lu, or Lt +.sp + M Mark + Mc Spacing mark + Me Enclosing mark + Mn Non-spacing mark +.sp + N Number + Nd Decimal number + Nl Letter number + No Other number +.sp + P Punctuation + Pc Connector punctuation + Pd Dash punctuation + Pe Close punctuation + Pf Final punctuation + Pi Initial punctuation + Po Other punctuation + Ps Open punctuation +.sp + S Symbol + Sc Currency symbol + Sk Modifier symbol + Sm Mathematical symbol + So Other symbol +.sp + Z Separator + Zl Line separator + Zp Paragraph separator + Zs Space separator +. +. +.SH "PCRE2 SPECIAL CATEGORY PROPERTIES FOR \ep and \eP" +.rs +.sp + Xan Alphanumeric: union of properties L and N + Xps POSIX space: property Z or tab, NL, VT, FF, CR + Xsp Perl space: property Z or tab, NL, VT, FF, CR + Xuc Univerally-named character: one that can be + represented by a Universal Character Name + Xwd Perl word: property Xan or underscore +.sp +Perl and POSIX space are now the same. Perl added VT to its space character set +at release 5.18. +. +. +.SH "SCRIPT NAMES FOR \ep AND \eP" +.rs +.sp +Adlam, +Ahom, +Anatolian_Hieroglyphs, +Arabic, +Armenian, +Avestan, +Balinese, +Bamum, +Bassa_Vah, +Batak, +Bengali, +Bhaiksuki, +Bopomofo, +Brahmi, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Carian, +Caucasian_Albanian, +Chakma, +Cham, +Cherokee, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Dogra, +Duployan, +Egyptian_Hieroglyphs, +Elbasan, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Grantha, +Greek, +Gujarati, +Gunjala_Gondi, +Gurmukhi, +Han, +Hangul, +Hanifi_Rohingya, +Hanunoo, +Hatran, +Hebrew, +Hiragana, +Imperial_Aramaic, +Inherited, +Inscriptional_Pahlavi, +Inscriptional_Parthian, +Javanese, +Kaithi, +Kannada, +Katakana, +Kayah_Li, +Kharoshthi, +Khmer, +Khojki, +Khudawadi, +Lao, +Latin, +Lepcha, +Limbu, +Linear_A, +Linear_B, +Lisu, +Lycian, +Lydian, +Mahajani, +Makasar, +Malayalam, +Mandaic, +Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, +Meetei_Mayek, +Mende_Kikakui, +Meroitic_Cursive, +Meroitic_Hieroglyphs, +Miao, +Modi, +Mongolian, +Mro, +Multani, +Myanmar, +Nabataean, +New_Tai_Lue, +Newa, +Nko, +Nushu, +Ogham, +Ol_Chiki, +Old_Hungarian, +Old_Italic, +Old_North_Arabian, +Old_Permic, +Old_Persian, +Old_Sogdian, +Old_South_Arabian, +Old_Turkic, +Oriya, +Osage, +Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, +Phags_Pa, +Phoenician, +Psalter_Pahlavi, +Rejang, +Runic, +Samaritan, +Saurashtra, +Sharada, +Shavian, +Siddham, +SignWriting, +Sinhala, +Sogdian, +Sora_Sompeng, +Soyombo, +Sundanese, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tai_Tham, +Tai_Viet, +Takri, +Tamil, +Tangut, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Tirhuta, +Ugaritic, +Vai, +Warang_Citi, +Yi, +Zanabazar_Square. +. +. +.SH "CHARACTER CLASSES" +.rs +.sp + [...] positive character class + [^...] negative character class + [x-y] range (can be used for hex characters) + [[:xxx:]] positive POSIX named set + [[:^xxx:]] negative POSIX named set +.sp + alnum alphanumeric + alpha alphabetic + ascii 0-127 + blank space or tab + cntrl control character + digit decimal digit + graph printing, excluding space + lower lower case letter + print printing, including space + punct printing, excluding alphanumeric + space white space + upper upper case letter + word same as \ew + xdigit hexadecimal digit +.sp +In PCRE2, POSIX character set names recognize only ASCII characters by default, +but some of them use Unicode properties if PCRE2_UCP is set. You can use +\eQ...\eE inside a character class. +. +. +.SH "QUANTIFIERS" +.rs +.sp + ? 0 or 1, greedy + ?+ 0 or 1, possessive + ?? 0 or 1, lazy + * 0 or more, greedy + *+ 0 or more, possessive + *? 0 or more, lazy + + 1 or more, greedy + ++ 1 or more, possessive + +? 1 or more, lazy + {n} exactly n + {n,m} at least n, no more than m, greedy + {n,m}+ at least n, no more than m, possessive + {n,m}? at least n, no more than m, lazy + {n,} n or more, greedy + {n,}+ n or more, possessive + {n,}? n or more, lazy +. +. +.SH "ANCHORS AND SIMPLE ASSERTIONS" +.rs +.sp + \eb word boundary + \eB not a word boundary + ^ start of subject + also after an internal newline in multiline mode + (after any newline if PCRE2_ALT_CIRCUMFLEX is set) + \eA start of subject + $ end of subject + also before newline at end of subject + also before internal newline in multiline mode + \eZ end of subject + also before newline at end of subject + \ez end of subject + \eG first matching position in subject +. +. +.SH "REPORTED MATCH POINT SETTING" +.rs +.sp + \eK set reported start of match +.sp +\eK is honoured in positive assertions, but ignored in negative ones. +. +. +.SH "ALTERNATION" +.rs +.sp + expr|expr|expr... +. +. +.SH "CAPTURING" +.rs +.sp + (...) capturing group + (?...) named capturing group (Perl) + (?'name'...) named capturing group (Perl) + (?P...) named capturing group (Python) + (?:...) non-capturing group + (?|...) non-capturing group; reset group numbers for + capturing groups in each alternative +. +. +.SH "ATOMIC GROUPS" +.rs +.sp + (?>...) atomic, non-capturing group +. +. +.SH "COMMENT" +.rs +.sp + (?#....) comment (not nestable) +. +. +.SH "OPTION SETTING" +.rs +Changes of these options within a group are automatically cancelled at the end +of the group. +.sp + (?i) caseless + (?J) allow duplicate names + (?m) multiline + (?n) no auto capture + (?s) single line (dotall) + (?U) default ungreedy (lazy) + (?x) extended: ignore white space except in classes + (?xx) as (?x) but also ignore space and tab in classes + (?-...) unset option(s) + (?^) unset imnsx options +.sp +Unsetting x or xx unsets both. Several options may be set at once, and a +mixture of setting and unsetting such as (?i-x) is allowed, but there may be +only one hyphen. Setting (but no unsetting) is allowed after (?^ for example +(?^in). An option setting may appear at the start of a non-capturing group, for +example (?i:...). +.P +The following are recognized only at the very start of a pattern or after one +of the newline or \eR options with similar syntax. More than one of them may +appear. For the first three, d is a decimal number. +.sp + (*LIMIT_DEPTH=d) set the backtracking limit to d + (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes + (*LIMIT_MATCH=d) set the match limit to d + (*NOTEMPTY) set PCRE2_NOTEMPTY when matching + (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching + (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) + (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR) + (*NO_JIT) disable JIT optimization + (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE) + (*UTF) set appropriate UTF mode for the library in use + (*UCP) set PCRE2_UCP (use Unicode properties for \ed etc) +.sp +Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of +the limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, +not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The +application can lock out the use of (*UTF) and (*UCP) by setting the +PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time. +. +. +.SH "NEWLINE CONVENTION" +.rs +.sp +These are recognized only at the very start of the pattern or after option +settings with a similar syntax. +.sp + (*CR) carriage return only + (*LF) linefeed only + (*CRLF) carriage return followed by linefeed + (*ANYCRLF) all three of the above + (*ANY) any Unicode newline sequence + (*NUL) the NUL character (binary zero) +. +. +.SH "WHAT \eR MATCHES" +.rs +.sp +These are recognized only at the very start of the pattern or after option +setting with a similar syntax. +.sp + (*BSR_ANYCRLF) CR, LF, or CRLF + (*BSR_UNICODE) any Unicode newline sequence +. +. +.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS" +.rs +.sp + (?=...) positive look ahead + (?!...) negative look ahead + (?<=...) positive look behind + (? reference by name (Perl) + \ek'name' reference by name (Perl) + \eg{name} reference by name (Perl) + \ek{name} reference by name (.NET) + (?P=name) reference by name (Python) +. +. +.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)" +.rs +.sp + (?R) recurse whole pattern + (?n) call subpattern by absolute number + (?+n) call subpattern by relative number + (?-n) call subpattern by relative number + (?&name) call subpattern by name (Perl) + (?P>name) call subpattern by name (Python) + \eg call subpattern by name (Oniguruma) + \eg'name' call subpattern by name (Oniguruma) + \eg call subpattern by absolute number (Oniguruma) + \eg'n' call subpattern by absolute number (Oniguruma) + \eg<+n> call subpattern by relative number (PCRE2 extension) + \eg'+n' call subpattern by relative number (PCRE2 extension) + \eg<-n> call subpattern by relative number (PCRE2 extension) + \eg'-n' call subpattern by relative number (PCRE2 extension) +. +. +.SH "CONDITIONAL PATTERNS" +.rs +.sp + (?(condition)yes-pattern) + (?(condition)yes-pattern|no-pattern) +.sp + (?(n) absolute reference condition + (?(+n) relative reference condition + (?(-n) relative reference condition + (?() named reference condition (Perl) + (?('name') named reference condition (Perl) + (?(name) named reference condition (PCRE2, deprecated) + (?(R) overall recursion condition + (?(Rn) specific numbered group recursion condition + (?(R&name) specific named group recursion condition + (?(DEFINE) define subpattern for reference + (?(VERSION[>]=n.m) test PCRE2 version + (?(assert) assertion condition +.sp +Note the ambiguity of (?(R) and (?(Rn) which might be named reference +conditions or recursion tests. Such a condition is interpreted as a reference +condition if the relevant named group exists. +. +. +.SH "BACKTRACKING CONTROL" +.rs +.sp +All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the +name is mandatory, for the others it is optional. (*SKIP) changes its behaviour +if :NAME is present. The others just set a name for passing back to the caller, +but this is not a name that (*SKIP) can see. The following act immediately they +are reached: +.sp + (*ACCEPT) force successful match + (*FAIL) force backtrack; synonym (*F) + (*MARK:NAME) set name to be passed back; synonym (*:NAME) +.sp +The following act only when a subsequent match failure causes a backtrack to +reach them. They all force a match failure, but they differ in what happens +afterwards. Those that advance the start-of-match point do so only if the +pattern is not anchored. +.sp + (*COMMIT) overall failure, no advance of starting point + (*PRUNE) advance to next starting character + (*SKIP) advance to current matching position + (*SKIP:NAME) advance to position corresponding to an earlier + (*MARK:NAME); if not found, the (*SKIP) is ignored + (*THEN) local failure, backtrack to next alternation +.sp +The effect of one of these verbs in a group called as a subroutine is confined +to the subroutine call. +. +. +.SH "CALLOUTS" +.rs +.sp + (?C) callout (assumed number 0) + (?Cn) callout with numerical data n + (?C"text") callout with string data +.sp +The allowed string delimiters are ` ' " ^ % # $ (which are the same for the +start and the end), and the starting delimiter { matched with the ending +delimiter }. To encode the ending delimiter within the string, double it. +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2pattern\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3), +\fBpcre2matching\fP(3), \fBpcre2\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 02 September 2018 +Copyright (c) 1997-2018 University of Cambridge. +.fi