new upstream release (3.3.0); modify package compatibility for Stretch

[ossec-hids.git] / src / external / pcre2-10.32 / doc / html / pcre2test.html
diff --git a/src/external/pcre2-10.32/doc/html/pcre2test.html b/src/external/pcre2-10.32/doc/html/pcre2test.html

new file mode 100644 (file)

index 0000000..af2b18c
--- /dev/null
+++ b/src/external/pcre2-10.32/doc/html/pcre2test.html
@@ -0,0 +1,2013 @@
+<html>
+<head>
+<title>pcre2test specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2test man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
+<li><a name="TOC2" href="#SEC2">PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
+<li><a name="TOC3" href="#SEC3">INPUT ENCODING</a>
+<li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a>
+<li><a name="TOC5" href="#SEC5">DESCRIPTION</a>
+<li><a name="TOC6" href="#SEC6">COMMAND LINES</a>
+<li><a name="TOC7" href="#SEC7">MODIFIER SYNTAX</a>
+<li><a name="TOC8" href="#SEC8">PATTERN SYNTAX</a>
+<li><a name="TOC9" href="#SEC9">SUBJECT LINE SYNTAX</a>
+<li><a name="TOC10" href="#SEC10">PATTERN MODIFIERS</a>
+<li><a name="TOC11" href="#SEC11">SUBJECT MODIFIERS</a>
+<li><a name="TOC12" href="#SEC12">THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC13" href="#SEC13">DEFAULT OUTPUT FROM pcre2test</a>
+<li><a name="TOC14" href="#SEC14">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
+<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
+<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
+<li><a name="TOC18" href="#SEC18">SAVING AND RESTORING COMPILED PATTERNS</a>
+<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
+<li><a name="TOC20" href="#SEC20">AUTHOR</a>
+<li><a name="TOC21" href="#SEC21">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
+<P>
+<b>pcre2test [options] [input file [output file]]</b>
+<br>
+<br>
+<b>pcre2test</b> is a test program for the PCRE2 regular expression libraries,
+but it can also be used for experimenting with regular expressions. This
+document describes the features of the test program; for details of the regular
+expressions themselves, see the
+<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
+documentation. For details of the PCRE2 library function calls and their
+options, see the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+documentation.
+</P>
+<P>
+The input for <b>pcre2test</b> is a sequence of regular expression patterns and
+subject strings to be matched. There are also command lines for setting
+defaults and controlling some special actions. The output shows the result of
+each match attempt. Modifiers on external or internal command lines, the
+patterns, and the subject lines specify PCRE2 function options, control how the
+subject is processed, and what output is produced.
+</P>
+<P>
+As the original fairly simple PCRE library evolved, it acquired many different
+features, and as a result, the original <b>pcretest</b> program ended up with a
+lot of options in a messy, arcane syntax for testing all the features. The
+move to the new PCRE2 API provided an opportunity to re-implement the test
+program as <b>pcre2test</b>, with a cleaner modifier syntax. Nevertheless, there
+are still many obscure modifiers, some of which are specifically designed for
+use in conjunction with the test script and data files that are distributed as
+part of PCRE2. All the modifiers are documented here, some without much
+justification, but many of them are unlikely to be of use except when testing
+the libraries.
+</P>
+<br><a name="SEC2" href="#TOC1">PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
+<P>
+Different versions of the PCRE2 library can be built to support character
+strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or
+all three of these libraries may be simultaneously installed. The
+<b>pcre2test</b> program can be used to test all the libraries. However, its own
+input and output are always in 8-bit format. When testing the 16-bit or 32-bit
+libraries, patterns and subject strings are converted to 16-bit or 32-bit
+format before being passed to the library functions. Results are converted back
+to 8-bit code units for output.
+</P>
+<P>
+In the rest of this document, the names of library functions and structures
+are given in generic form, for example, <b>pcre_compile()</b>. The actual
+names used in the libraries have a suffix _8, _16, or _32, as appropriate.
+<a name="inputencoding"></a></P>
+<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
+<P>
+Input to <b>pcre2test</b> is processed line by line, either by calling the C
+library's <b>fgets()</b> function, or via the <b>libreadline</b> library. In some
+Windows environments character 26 (hex 1A) causes an immediate end of file, and
+no further data is read, so this character should be avoided unless you really
+want that action.
+</P>
+<P>
+The input is processed using using C's string functions, so must not
+contain binary zeros, even though in Unix-like environments, <b>fgets()</b>
+treats any bytes other than newline as data characters. An error is generated
+if a binary zero is encountered. By default subject lines are processed for
+backslash escapes, which makes it possible to include any data value in strings
+that are passed to the library for matching. For patterns, there is a facility
+for specifying some or all of the 8-bit input characters as hexadecimal pairs,
+which makes it possible to include binary zeros.
+</P>
+<br><b>
+Input for the 16-bit and 32-bit libraries
+</b><br>
+<P>
+When testing the 16-bit or 32-bit libraries, there is a need to be able to
+generate character code points greater than 255 in the strings that are passed
+to the library. For subject lines, backslash escapes can be used. In addition,
+when the <b>utf</b> modifier (see
+<a href="#optionmodifiers">"Setting compilation options"</a>
+below) is set, the pattern and any following subject lines are interpreted as
+UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
+</P>
+<P>
+For non-UTF testing of wide characters, the <b>utf8_input</b> modifier can be
+used. This is mutually exclusive with <b>utf</b>, and is allowed only in 16-bit
+or 32-bit mode. It causes the pattern and following subject lines to be treated
+as UTF-8 according to the original definition (RFC 2279), which allows for
+character values up to 0x7fffffff. Each character is placed in one 16-bit or
+32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error
+to occur).
+</P>
+<P>
+UTF-8 (in its original definition) is not capable of encoding values greater
+than 0x7fffffff, but such values can be handled by the 32-bit library. When
+testing this library in non-UTF mode with <b>utf8_input</b> set, if any
+character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
+0x80000000 is added to the character's value. This is the only way of passing
+such code points in a pattern string. For subject strings, using an escape
+sequence is preferable.
+</P>
+<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
+<P>
+<b>-8</b>
+If the 8-bit library has been built, this option causes it to be used (this is
+the default). If the 8-bit library has not been built, this option causes an
+error.
+</P>
+<P>
+<b>-16</b>
+If the 16-bit library has been built, this option causes it to be used. If only
+the 16-bit library has been built, this is the default. If the 16-bit library
+has not been built, this option causes an error.
+</P>
+<P>
+<b>-32</b>
+If the 32-bit library has been built, this option causes it to be used. If only
+the 32-bit library has been built, this is the default. If the 32-bit library
+has not been built, this option causes an error.
+</P>
+<P>
+<b>-ac</b>
+Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
+automatic callouts into every pattern that is compiled.
+</P>
+<P>
+<b>-AC</b>
+As for <b>-ac</b>, but in addition behave as if each subject line has the
+<b>callout_extra</b> modifier, that is, show additional information from
+callouts.
+</P>
+<P>
+<b>-b</b>
+Behave as if each pattern has the <b>fullbincode</b> modifier; the full
+internal binary form of the pattern is output after compilation.
+</P>
+<P>
+<b>-C</b>
+Output the version number of the PCRE2 library, and all available information
+about the optional features that are included, and then exit with zero exit
+code. All other options are ignored. If both -C and -LM are present, whichever
+is first is recognized.
+</P>
+<P>
+<b>-C</b> <i>option</i>
+Output information about a specific build-time option, then exit. This
+functionality is intended for use in scripts such as <b>RunTest</b>. The
+following options output the value and set the exit code as indicated:
+<pre>
+  ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
+               0x15 or 0x25
+               0 if used in an ASCII environment
+               exit code is always 0
+  linksize   the configured internal link size (2, 3, or 4)
+               exit code is set to the link size
+  newline    the default newline setting:
+               CR, LF, CRLF, ANYCRLF, ANY, or NUL
+               exit code is always 0
+  bsr        the default setting for what \R matches:
+               ANYCRLF or ANY
+               exit code is always 0
+</pre>
+The following options output 1 for true or 0 for false, and set the exit code
+to the same value:
+<pre>
+  backslash-C  \C is supported (not locked out)
+  ebcdic       compiled for an EBCDIC environment
+  jit          just-in-time support is available
+  pcre2-16     the 16-bit library was built
+  pcre2-32     the 32-bit library was built
+  pcre2-8      the 8-bit library was built
+  unicode      Unicode support is available
+</pre>
+If an unknown option is given, an error message is output; the exit code is 0.
+</P>
+<P>
+<b>-d</b>
+Behave as if each pattern has the <b>debug</b> modifier; the internal
+form and information about the compiled pattern is output after compilation;
+<b>-d</b> is equivalent to <b>-b -i</b>.
+</P>
+<P>
+<b>-dfa</b>
+Behave as if each subject line has the <b>dfa</b> modifier; matching is done
+using the <b>pcre2_dfa_match()</b> function instead of the default
+<b>pcre2_match()</b>.
+</P>
+<P>
+<b>-error</b> <i>number[,number,...]</i>
+Call <b>pcre2_get_error_message()</b> for each of the error numbers in the
+comma-separated list, display the resulting messages on the standard output,
+then exit with zero exit code. The numbers may be positive or negative. This is
+a convenience facility for PCRE2 maintainers.
+</P>
+<P>
+<b>-help</b>
+Output a brief summary these options and then exit.
+</P>
+<P>
+<b>-i</b>
+Behave as if each pattern has the <b>info</b> modifier; information about the
+compiled pattern is given after compilation.
+</P>
+<P>
+<b>-jit</b>
+Behave as if each pattern line has the <b>jit</b> modifier; after successful
+compilation, each pattern is passed to the just-in-time compiler, if available.
+</P>
+<P>
+<b>-jitverify</b>
+Behave as if each pattern line has the <b>jitverify</b> modifier; after
+successful compilation, each pattern is passed to the just-in-time compiler, if
+available, and the use of JIT is verified.
+</P>
+<P>
+<b>-LM</b>
+List modifiers: write a list of available pattern and subject modifiers to the
+standard output, then exit with zero exit code. All other options are ignored.
+If both -C and -LM are present, whichever is first is recognized.
+</P>
+<P>
+\fB-pattern\fB <i>modifier-list</i>
+Behave as if each pattern line contains the given modifiers.
+</P>
+<P>
+<b>-q</b>
+Do not output the version number of <b>pcre2test</b> at the start of execution.
+</P>
+<P>
+<b>-S</b> <i>size</i>
+On Unix-like systems, set the size of the run-time stack to <i>size</i>
+mebibytes (units of 1024*1024 bytes).
+</P>
+<P>
+<b>-subject</b> <i>modifier-list</i>
+Behave as if each subject line contains the given modifiers.
+</P>
+<P>
+<b>-t</b>
+Run each compile and match many times with a timer, and output the resulting
+times per compile or match. When JIT is used, separate times are given for the
+initial compile and the JIT compile. You can control the number of iterations
+that are used for timing by following <b>-t</b> with a number (as a separate
+item on the command line). For example, "-t 1000" iterates 1000 times. The
+default is to iterate 500,000 times.
+</P>
+<P>
+<b>-tm</b>
+This is like <b>-t</b> except that it times only the matching phase, not the
+compile phase.
+</P>
+<P>
+<b>-T</b> <b>-TM</b>
+These behave like <b>-t</b> and <b>-tm</b>, but in addition, at the end of a run,
+the total times for all compiles and matches are output.
+</P>
+<P>
+<b>-version</b>
+Output the PCRE2 version number and then exit.
+</P>
+<br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br>
+<P>
+If <b>pcre2test</b> is given two filename arguments, it reads from the first and
+writes to the second. If the first name is "-", input is taken from the
+standard input. If <b>pcre2test</b> is given only one argument, it reads from
+that file and writes to stdout. Otherwise, it reads from stdin and writes to
+stdout.
+</P>
+<P>
+When <b>pcre2test</b> is built, a configuration option can specify that it
+should be linked with the <b>libreadline</b> or <b>libedit</b> library. When this
+is done, if the input is from a terminal, it is read using the <b>readline()</b>
+function. This provides line-editing and history facilities. The output from
+the <b>-help</b> option states whether or not <b>readline()</b> will be used.
+</P>
+<P>
+The program handles any number of tests, each of which consists of a set of
+input lines. Each set starts with a regular expression pattern, followed by any
+number of subject lines to be matched against that pattern. In between sets of
+test data, command lines that begin with # may appear. This file format, with
+some restrictions, can also be processed by the <b>perltest.sh</b> script that
+is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
+and Perl is the same. For a specification of <b>perltest.sh</b>, see the
+comments near its beginning.
+</P>
+<P>
+When the input is a terminal, <b>pcre2test</b> prompts for each line of input,
+using "re&#62;" to prompt for regular expression patterns, and "data&#62;" to prompt
+for subject lines. Command lines starting with # can be entered only in
+response to the "re&#62;" prompt.
+</P>
+<P>
+Each subject line is matched separately and independently. If you want to do
+multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
+etc., depending on the newline setting) in a single line of input to encode the
+newline sequences. There is no limit on the length of subject lines; the input
+buffer is automatically extended if it is too small. There are replication
+features that makes it possible to generate long repetitive pattern or subject
+lines without having to supply them explicitly.
+</P>
+<P>
+An empty line or the end of the file signals the end of the subject lines for a
+test, at which point a new pattern or command line is expected if there is
+still input to be read.
+</P>
+<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
+<P>
+In between sets of test data, a line that begins with # is interpreted as a
+command line. If the first character is followed by white space or an
+exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
+following commands are recognized:
+<pre>
+  #forbid_utf
+</pre>
+Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
+options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
+the use of (*UTF) and (*UCP) at the start of patterns. This command also forces
+an error if a subsequent pattern contains any occurrences of \P, \p, or \X,
+which are still supported when PCRE2_UTF is not set, but which require Unicode
+property support to be included in the library.
+</P>
+<P>
+This is a trigger guard that is used in test files to ensure that UTF or
+Unicode property tests are not accidentally added to files that are used when
+Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and
+PCRE2_NEVER_UCP as a default can also be obtained by the use of <b>#pattern</b>;
+the difference is that <b>#forbid_utf</b> cannot be unset, and the automatic
+options are not displayed in pattern information, to avoid cluttering up test
+output.
+<pre>
+  #load &#60;filename&#62;
+</pre>
+This command is used to load a set of precompiled patterns from a file, as
+described in the section entitled "Saving and restoring compiled patterns"
+<a href="#saverestore">below.</a>
+<pre>
+  #newline_default [&#60;newline-list&#62;]
+</pre>
+When PCRE2 is built, a default newline convention can be specified. This
+determines which characters and/or character pairs are recognized as indicating
+a newline in a pattern or subject string. The default can be overridden when a
+pattern is compiled. The standard test files contain tests of various newline
+conventions, but the majority of the tests expect a single linefeed to be
+recognized as a newline by default. Without special action the tests would fail
+when PCRE2 is compiled with either CR or CRLF as the default newline.
+</P>
+<P>
+The #newline_default command specifies a list of newline types that are
+acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF,
+ANY, or NUL (in upper or lower case), for example:
+<pre>
+  #newline_default LF Any anyCRLF
+</pre>
+If the default newline is in the list, this command has no effect. Otherwise,
+except when testing the POSIX API, a <b>newline</b> modifier that specifies the
+first newline convention in the list (LF in the above example) is added to any
+pattern that does not already have a <b>newline</b> modifier. If the newline
+list is empty, the feature is turned off. This command is present in a number
+of the standard test input files.
+</P>
+<P>
+When the POSIX API is being tested there is no way to override the default
+newline convention, though it is possible to set the newline convention from
+within the pattern. A warning is given if the <b>posix</b> or <b>posix_nosub</b>
+modifier is used when <b>#newline_default</b> would set a default for the
+non-POSIX API.
+<pre>
+  #pattern &#60;modifier-list&#62;
+</pre>
+This command sets a default modifier list that applies to all subsequent
+patterns. Modifiers on a pattern can change these settings.
+<pre>
+  #perltest
+</pre>
+The appearance of this line causes all subsequent modifier settings to be
+checked for compatibility with the <b>perltest.sh</b> script, which is used to
+confirm that Perl gives the same results as PCRE2. Also, apart from comment
+lines, #pattern commands, and #subject commands that set or unset "mark", no
+command lines are permitted, because they and many of the modifiers are
+specific to <b>pcre2test</b>, and should not be used in test files that are also
+processed by <b>perltest.sh</b>. The <b>#perltest</b> command helps detect tests
+that are accidentally put in the wrong file.
+<pre>
+  #pop [&#60;modifiers&#62;]
+  #popcopy [&#60;modifiers&#62;]
+</pre>
+These commands are used to manipulate the stack of compiled patterns, as
+described in the section entitled "Saving and restoring compiled patterns"
+<a href="#saverestore">below.</a>
+<pre>
+  #save &#60;filename&#62;
+</pre>
+This command is used to save a set of compiled patterns to a file, as described
+in the section entitled "Saving and restoring compiled patterns"
+<a href="#saverestore">below.</a>
+<pre>
+  #subject &#60;modifier-list&#62;
+</pre>
+This command sets a default modifier list that applies to all subsequent
+subject lines. Modifiers on a subject line can change these settings.
+</P>
+<br><a name="SEC7" href="#TOC1">MODIFIER SYNTAX</a><br>
+<P>
+Modifier lists are used with both pattern and subject lines. Items in a list
+are separated by commas followed by optional white space. Trailing whitespace
+in a modifier list is ignored. Some modifiers may be given for both patterns
+and subject lines, whereas others are valid only for one or the other. Each
+modifier has a long name, for example "anchored", and some of them must be
+followed by an equals sign and a value, for example, "offset=12". Values cannot
+contain comma characters, but may contain spaces. Modifiers that do not take
+values may be preceded by a minus sign to turn off a previous setting.
+</P>
+<P>
+A few of the more common modifiers can also be specified as single letters, for
+example "i" for "caseless". In documentation, following the Perl convention,
+these are written with a slash ("the /i modifier") for clarity. Abbreviated
+modifiers must all be concatenated in the first item of a modifier list. If the
+first item is not recognized as a long modifier name, it is interpreted as a
+sequence of these abbreviations. For example:
+<pre>
+  /abc/ig,newline=cr,jit=3
+</pre>
+This is a pattern line whose modifier list starts with two one-letter modifiers
+(/i and /g). The lower-case abbreviated modifiers are the same as used in Perl.
+</P>
+<br><a name="SEC8" href="#TOC1">PATTERN SYNTAX</a><br>
+<P>
+A pattern line must start with one of the following characters (common symbols,
+excluding pattern meta-characters):
+<pre>
+  / ! " ' ` - = _ : ; , % & @ ~
+</pre>
+This is interpreted as the pattern's delimiter. A regular expression may be
+continued over several input lines, in which case the newline characters are
+included within it. It is possible to include the delimiter within the pattern
+by escaping it with a backslash, for example
+<pre>
+  /abc\/def/
+</pre>
+If you do this, the escape and the delimiter form part of the pattern, but
+since the delimiters are all non-alphanumeric, this does not affect its
+interpretation. If the terminating delimiter is immediately followed by a
+backslash, for example,
+<pre>
+  /abc/\
+</pre>
+then a backslash is added to the end of the pattern. This is done to provide a
+way of testing the error condition that arises if a pattern finishes with a
+backslash, because
+<pre>
+  /abc\/
+</pre>
+is interpreted as the first line of a pattern that starts with "abc/", causing
+pcre2test to read the next line as a continuation of the regular expression.
+</P>
+<P>
+A pattern can be followed by a modifier list (details below).
+</P>
+<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
+<P>
+Before each subject line is passed to <b>pcre2_match()</b> or
+<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
+line is scanned for backslash escapes, unless the <b>subject_literal</b>
+modifier was set for the pattern. The following provide a means of encoding
+non-printing characters in a visible way:
+<pre>
+  \a         alarm (BEL, \x07)
+  \b         backspace (\x08)
+  \e         escape (\x27)
+  \f         form feed (\x0c)
+  \n         newline (\x0a)
+  \r         carriage return (\x0d)
+  \t         tab (\x09)
+  \v         vertical tab (\x0b)
+  \nnn       octal character (up to 3 octal digits); always
+               a byte unless &#62; 255 in UTF-8 or 16-bit or 32-bit mode
+  \o{dd...}  octal character (any number of octal digits}
+  \xhh       hexadecimal byte (up to 2 hex digits)
+  \x{hh...}  hexadecimal character (any number of hex digits)
+</pre>
+The use of \x{hh...} is not dependent on the use of the <b>utf</b> modifier on
+the pattern. It is recognized always. There may be any number of hexadecimal
+digits inside the braces; invalid values provoke error messages.
+</P>
+<P>
+Note that \xhh specifies one byte rather than one character in UTF-8 mode;
+this makes it possible to construct invalid UTF-8 sequences for testing
+purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in
+UTF-8 mode, generating more than one byte if the value is greater than 127.
+When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte
+for values less than 256, and causes an error for greater values.
+</P>
+<P>
+In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
+possible to construct invalid UTF-16 sequences for testing purposes.
+</P>
+<P>
+In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it
+possible to construct invalid UTF-32 sequences for testing purposes.
+</P>
+<P>
+There is a special backslash sequence that specifies replication of one or more
+characters:
+<pre>
+  \[&#60;characters&#62;]{&#60;count&#62;}
+</pre>
+This makes it possible to test long strings without having to provide them as
+part of the file. For example:
+<pre>
+  \[abc]{4}
+</pre>
+is converted to "abcabcabcabc". This feature does not support nesting. To
+include a closing square bracket in the characters, code it as \x5D.
+</P>
+<P>
+A backslash followed by an equals sign marks the end of the subject string and
+the start of a modifier list. For example:
+<pre>
+  abc\=notbol,notempty
+</pre>
+If the subject string is empty and \= is followed by whitespace, the line is
+treated as a comment line, and is not used for matching. For example:
+<pre>
+  \= This is a comment.
+  abc\= This is an invalid modifier list.
+</pre>
+A backslash followed by any other non-alphanumeric character just escapes that
+character. A backslash followed by anything else causes an error. However, if
+the very last character in the line is a backslash (and there is no modifier
+list), it is ignored. This gives a way of passing an empty line as data, since
+a real empty line terminates the data input.
+</P>
+<P>
+If the <b>subject_literal</b> modifier is set for a pattern, all subject lines
+that follow are treated as literals, with no special treatment of backslashes.
+No replication is possible, and any subject modifiers must be set as defaults
+by a <b>#subject</b> command.
+</P>
+<br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br>
+<P>
+There are several types of modifier that can appear in pattern lines. Except
+where noted below, they may also be used in <b>#pattern</b> commands. A
+pattern's modifier list can add to or override default modifiers that were set
+by a previous <b>#pattern</b> command.
+<a name="optionmodifiers"></a></P>
+<br><b>
+Setting compilation options
+</b><br>
+<P>
+The following modifiers set options for <b>pcre2_compile()</b>. Most of them set
+bits in the options argument of that function, but those whose names start with
+PCRE2_EXTRA are additional options that are set in the compile context. For the
+main options, there are some single-letter abbreviations that are the same as
+Perl options. There is special handling for /x: if a second x is present,
+PCRE2_EXTENDED is converted into PCRE2_EXTENDED_MORE as in Perl. A third
+appearance adds PCRE2_EXTENDED as well, though this makes no difference to the
+way <b>pcre2_compile()</b> behaves. See
+<a href="pcre2api.html"><b>pcre2api</b></a>
+for a description of the effects of these options.
+<pre>
+      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
+      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
+      alt_bsux                  set PCRE2_ALT_BSUX
+      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
+      alt_verbnames             set PCRE2_ALT_VERBNAMES
+      anchored                  set PCRE2_ANCHORED
+      auto_callout              set PCRE2_AUTO_CALLOUT
+      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
+  /i  caseless                  set PCRE2_CASELESS
+      dollar_endonly            set PCRE2_DOLLAR_ENDONLY
+  /s  dotall                    set PCRE2_DOTALL
+      dupnames                  set PCRE2_DUPNAMES
+      endanchored               set PCRE2_ENDANCHORED
+  /x  extended                  set PCRE2_EXTENDED
+  /xx extended_more             set PCRE2_EXTENDED_MORE
+      firstline                 set PCRE2_FIRSTLINE
+      literal                   set PCRE2_LITERAL
+      match_line                set PCRE2_EXTRA_MATCH_LINE
+      match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
+      match_word                set PCRE2_EXTRA_MATCH_WORD
+  /m  multiline                 set PCRE2_MULTILINE
+      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
+      never_ucp                 set PCRE2_NEVER_UCP
+      never_utf                 set PCRE2_NEVER_UTF
+  /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
+      no_auto_possess           set PCRE2_NO_AUTO_POSSESS
+      no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
+      no_start_optimize         set PCRE2_NO_START_OPTIMIZE
+      no_utf_check              set PCRE2_NO_UTF_CHECK
+      ucp                       set PCRE2_UCP
+      ungreedy                  set PCRE2_UNGREEDY
+      use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
+      utf                       set PCRE2_UTF
+</pre>
+As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
+non-printing characters in output strings to be printed using the \x{hh...}
+notation. Otherwise, those less than 0x100 are output in hex without the curly
+brackets. Setting <b>utf</b> in 16-bit or 32-bit mode also causes pattern and
+subject strings to be translated to UTF-16 or UTF-32, respectively, before
+being passed to library functions.
+<a name="controlmodifiers"></a></P>
+<br><b>
+Setting compilation controls
+</b><br>
+<P>
+The following modifiers affect the compilation process or request information
+about the pattern. There are single-letter abbreviations for some that are
+heavily used in the test files.
+<pre>
+      bsr=[anycrlf|unicode]     specify \R handling
+  /B  bincode                   show binary code without lengths
+      callout_info              show callout information
+      convert=&#60;options&#62;         request foreign pattern conversion
+      convert_glob_escape=c     set glob escape character
+      convert_glob_separator=c  set glob separator character
+      convert_length            set convert buffer length
+      debug                     same as info,fullbincode
+      framesize                 show matching frame size
+      fullbincode               show binary code with lengths
+  /I  info                      show info about compiled pattern
+      hex                       unquoted characters are hexadecimal
+      jit[=&#60;number&#62;]            use JIT
+      jitfast                   use JIT fast path
+      jitverify                 verify JIT use
+      locale=&#60;name&#62;             use this locale
+      max_pattern_length=&#60;n&#62;    set the maximum pattern length
+      memory                    show memory used
+      newline=&#60;type&#62;            set newline type
+      null_context              compile with a NULL context
+      parens_nest_limit=&#60;n&#62;     set maximum parentheses depth
+      posix                     use the POSIX API
+      posix_nosub               use the POSIX API with REG_NOSUB
+      push                      push compiled pattern onto the stack
+      pushcopy                  push a copy onto the stack
+      stackguard=&#60;number&#62;       test the stackguard feature
+      subject_literal           treat all subject lines as literal
+      tables=[0|1|2]            select internal tables
+      use_length                do not zero-terminate the pattern
+      utf8_input                treat input as UTF-8
+</pre>
+The effects of these modifiers are described in the following sections.
+</P>
+<br><b>
+Newline and \R handling
+</b><br>
+<P>
+The <b>bsr</b> modifier specifies what \R in a pattern should match. If it is
+set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode",
+\R matches any Unicode newline sequence. The default can be specified when
+PCRE2 is built; if it is not, the default is set to Unicode.
+</P>
+<P>
+The <b>newline</b> modifier specifies which characters are to be interpreted as
+newlines, both in the pattern and in subject lines. The type must be one of CR,
+LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
+</P>
+<br><b>
+Information about a pattern
+</b><br>
+<P>
+The <b>debug</b> modifier is a shorthand for <b>info,fullbincode</b>, requesting
+all available information.
+</P>
+<P>
+The <b>bincode</b> modifier causes a representation of the compiled code to be
+output after compilation. This information does not contain length and offset
+values, which ensures that the same output is generated for different internal
+link sizes and different code unit widths. By using <b>bincode</b>, the same
+regression tests can be used in different environments.
+</P>
+<P>
+The <b>fullbincode</b> modifier, by contrast, <i>does</i> include length and
+offset values. This is used in a few special tests that run only for specific
+code unit widths and link sizes, and is also useful for one-off tests.
+</P>
+<P>
+The <b>info</b> modifier requests information about the compiled pattern
+(whether it is anchored, has a fixed first character, and so on). The
+information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
+some typical examples:
+<pre>
+    re&#62; /(?i)(^a|^b)/m,info
+  Capturing subpattern count = 1
+  Compile options: multiline
+  Overall options: caseless multiline
+  First code unit at start or follows newline
+  Subject length lower bound = 1
+
+    re&#62; /(?i)abc/info
+  Capturing subpattern count = 0
+  Compile options: &#60;none&#62;
+  Overall options: caseless
+  First code unit = 'a' (caseless)
+  Last code unit = 'c' (caseless)
+  Subject length lower bound = 3
+</pre>
+"Compile options" are those specified by modifiers; "overall options" have
+added options that are taken or deduced from the pattern. If both sets of
+options are the same, just a single "options" line is output; if there are no
+options, the line is omitted. "First code unit" is where any match must start;
+if there is more than one they are listed as "starting code units". "Last code
+unit" is the last literal code unit that must be present in any match. This is
+not necessarily the last character. These lines are omitted if no starting or
+ending code units are recorded.
+</P>
+<P>
+The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
+used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
+number of capturing parentheses in the pattern.
+</P>
+<P>
+The <b>callout_info</b> modifier requests information about all the callouts in
+the pattern. A list of them is output at the end of any other information that
+is requested. For each callout, either its number or string is given, followed
+by the item that follows it in the pattern.
+</P>
+<br><b>
+Passing a NULL context
+</b><br>
+<P>
+Normally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If
+the <b>null_context</b> modifier is set, however, NULL is passed. This is for
+testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses
+default values).
+</P>
+<br><b>
+Specifying pattern characters in hexadecimal
+</b><br>
+<P>
+The <b>hex</b> modifier specifies that the characters of the pattern, except for
+substrings enclosed in single or double quotes, are to be interpreted as pairs
+of hexadecimal digits. This feature is provided as a way of creating patterns
+that contain binary zeros and other non-printing characters. White space is
+permitted between pairs of digits. For example, this pattern contains three
+characters:
+<pre>
+  /ab 32 59/hex
+</pre>
+Parts of such a pattern are taken literally if quoted. This pattern contains
+nine characters, only two of which are specified in hexadecimal:
+<pre>
+  /ab "literal" 32/hex
+</pre>
+Either single or double quotes may be used. There is no way of including
+the delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
+mutually exclusive.
+</P>
+<br><b>
+Specifying the pattern's length
+</b><br>
+<P>
+By default, patterns are passed to the compiling functions as zero-terminated
+strings but can be passed by length instead of being zero-terminated. The
+<b>use_length</b> modifier causes this to happen. Using a length happens
+automatically (whether or not <b>use_length</b> is set) when <b>hex</b> is set,
+because patterns specified in hexadecimal may contain binary zeros.
+</P>
+<P>
+If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
+<a href="#posixwrapper">"Using the POSIX wrapper API"</a>
+below), the REG_PEND extension is used to pass the pattern's length.
+</P>
+<br><b>
+Specifying wide characters in 16-bit and 32-bit modes
+</b><br>
+<P>
+In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
+translated to UTF-16 or UTF-32 when the <b>utf</b> modifier is set. For testing
+the 16-bit and 32-bit libraries in non-UTF mode, the <b>utf8_input</b> modifier
+can be used. It is mutually exclusive with <b>utf</b>. Input lines are
+interpreted as UTF-8 as a means of specifying wide characters. More details are
+given in
+<a href="#inputencoding">"Input encoding"</a>
+above.
+</P>
+<br><b>
+Generating long repetitive patterns
+</b><br>
+<P>
+Some tests use long patterns that are very repetitive. Instead of creating a
+very long input line for such a pattern, you can use a special repetition
+feature, similar to the one described for subject lines above. If the
+<b>expand</b> modifier is present on a pattern, parts of the pattern that have
+the form
+<pre>
+  \[&#60;characters&#62;]{&#60;count&#62;}
+</pre>
+are expanded before the pattern is passed to <b>pcre2_compile()</b>. For
+example, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
+cannot be nested. An initial "\[" sequence is recognized only if "]{" followed
+by decimal digits and "}" is found later in the pattern. If not, the characters
+remain in the pattern unaltered. The <b>expand</b> and <b>hex</b> modifiers are
+mutually exclusive.
+</P>
+<P>
+If part of an expanded pattern looks like an expansion, but is really part of
+the actual pattern, unwanted expansion can be avoided by giving two values in
+the quantifier. For example, \[AB]{6000,6000} is not recognized as an
+expansion item.
+</P>
+<P>
+If the <b>info</b> modifier is set on an expanded pattern, the result of the
+expansion is included in the information that is output.
+</P>
+<br><b>
+JIT compilation
+</b><br>
+<P>
+Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
+speed up pattern matching. See the
+<a href="pcre2jit.html"><b>pcre2jit</b></a>
+documentation for details. JIT compiling happens, optionally, after a pattern
+has been successfully compiled into an internal form. The JIT compiler converts
+this to optimized machine code. It needs to know whether the match-time options
+PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because
+different code is generated for the different cases. See the <b>partial</b>
+modifier in "Subject Modifiers"
+<a href="#subjectmodifiers">below</a>
+for details of how these options are specified for each match attempt.
+</P>
+<P>
+JIT compilation is requested by the <b>jit</b> pattern modifier, which may
+optionally be followed by an equals sign and a number in the range 0 to 7.
+The three bits that make up the number specify which of the three JIT operating
+modes are to be compiled:
+<pre>
+  1  compile JIT code for non-partial matching
+  2  compile JIT code for soft partial matching
+  4  compile JIT code for hard partial matching
+</pre>
+The possible values for the <b>jit</b> modifier are therefore:
+<pre>
+  0  disable JIT
+  1  normal matching only
+  2  soft partial matching only
+  3  normal and soft partial matching
+  4  hard partial matching only
+  6  soft and hard partial matching only
+  7  all three modes
+</pre>
+If no number is given, 7 is assumed. The phrase "partial matching" means a call
+to <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the
+PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
+match; the options enable the possibility of a partial match, but do not
+require it. Note also that if you request JIT compilation only for partial
+matching (for example, jit=2) but do not set the <b>partial</b> modifier on a
+subject line, that match will not use JIT code because none was compiled for
+non-partial matching.
+</P>
+<P>
+If JIT compilation is successful, the compiled JIT code will automatically be
+used when an appropriate type of match is run, except when incompatible
+run-time options are specified. For more details, see the
+<a href="pcre2jit.html"><b>pcre2jit</b></a>
+documentation. See also the <b>jitstack</b> modifier below for a way of
+setting the size of the JIT stack.
+</P>
+<P>
+If the <b>jitfast</b> modifier is specified, matching is done using the JIT
+"fast path" interface, <b>pcre2_jit_match()</b>, which skips some of the sanity
+checks that are done by <b>pcre2_match()</b>, and of course does not work when
+JIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is
+assumed.
+</P>
+<P>
+If the <b>jitverify</b> modifier is specified, information about the compiled
+pattern shows whether JIT compilation was or was not successful. If
+<b>jitverify</b> is specified without <b>jit</b>, jit=7 is assumed. If JIT
+compilation is successful when <b>jitverify</b> is set, the text "(JIT)" is
+added to the first output line after a match or non match when JIT-compiled
+code was actually used in the match.
+</P>
+<br><b>
+Setting a locale
+</b><br>
+<P>
+The <b>locale</b> modifier must specify the name of a locale, for example:
+<pre>
+  /pattern/locale=fr_FR
+</pre>
+The given locale is set, <b>pcre2_maketables()</b> is called to build a set of
+character tables for the locale, and this is then passed to
+<b>pcre2_compile()</b> when compiling the regular expression. The same tables
+are used when matching the following subject lines. The <b>locale</b> modifier
+applies only to the pattern on which it appears, but can be given in a
+<b>#pattern</b> command if a default is needed. Setting a locale and alternate
+character tables are mutually exclusive.
+</P>
+<br><b>
+Showing pattern memory
+</b><br>
+<P>
+The <b>memory</b> modifier causes the size in bytes of the memory used to hold
+the compiled pattern to be output. This does not include the size of the
+<b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
+subsequently passed to the JIT compiler, the size of the JIT compiled code is
+also output. Here is an example:
+<pre>
+    re&#62; /a(b)c/jit,memory
+  Memory allocation (code space): 21
+  Memory allocation (JIT code): 1910
+
+</PRE>
+</P>
+<br><b>
+Limiting nested parentheses
+</b><br>
+<P>
+The <b>parens_nest_limit</b> modifier sets a limit on the depth of nested
+parentheses in a pattern. Breaching the limit causes a compilation error.
+The default for the library is set when PCRE2 is built, but <b>pcre2test</b>
+sets its own default of 220, which is required for running the standard test
+suite.
+</P>
+<br><b>
+Limiting the pattern length
+</b><br>
+<P>
+The <b>max_pattern_length</b> modifier sets a limit, in code units, to the
+length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
+causes a compilation error. The default is the largest number a PCRE2_SIZE
+variable can hold (essentially unlimited).
+<a name="posixwrapper"></a></P>
+<br><b>
+Using the POSIX wrapper API
+</b><br>
+<P>
+The <b>posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
+PCRE2 via the POSIX wrapper API rather than its native API. When
+<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to
+<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that
+it does not imply POSIX matching semantics; for more detail see the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+documentation. The following pattern modifiers set options for the
+<b>regcomp()</b> function:
+<pre>
+  caseless           REG_ICASE
+  multiline          REG_NEWLINE
+  dotall             REG_DOTALL     )
+  ungreedy           REG_UNGREEDY   ) These options are not part of
+  ucp                REG_UCP        )   the POSIX standard
+  utf                REG_UTF8       )
+</pre>
+The <b>regerror_buffsize</b> modifier specifies a size for the error buffer that
+is passed to <b>regerror()</b> in the event of a compilation error. For example:
+<pre>
+  /abc/posix,regerror_buffsize=20
+</pre>
+This provides a means of testing the behaviour of <b>regerror()</b> when the
+buffer is too small for the error message. If this modifier has not been set, a
+large buffer is used.
+</P>
+<P>
+The <b>aftertext</b> and <b>allaftertext</b> subject modifiers work as described
+below. All other modifiers are either ignored, with a warning message, or cause
+an error.
+</P>
+<P>
+The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
+default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
+REG_PEND extension is used to pass it by length.
+</P>
+<br><b>
+Testing the stack guard feature
+</b><br>
+<P>
+The <b>stackguard</b> modifier is used to test the use of
+<b>pcre2_set_compile_recursion_guard()</b>, a function that is provided to
+enable stack availability to be checked during compilation (see the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+documentation for details). If the number specified by the modifier is greater
+than zero, <b>pcre2_set_compile_recursion_guard()</b> is called to set up
+callback from <b>pcre2_compile()</b> to a local function. The argument it
+receives is the current nesting parenthesis depth; if this is greater than the
+value given by the modifier, non-zero is returned, causing the compilation to
+be aborted.
+</P>
+<br><b>
+Using alternative character tables
+</b><br>
+<P>
+The value specified for the <b>tables</b> modifier must be one of the digits 0,
+1, or 2. It causes a specific set of built-in character tables to be passed to
+<b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour with
+different character tables. The digit specifies the tables as follows:
+<pre>
+  0   do not pass any special character tables
+  1   the default ASCII tables, as distributed in
+        pcre2_chartables.c.dist
+  2   a set of tables defining ISO 8859 characters
+</pre>
+In table 2, some characters whose codes are greater than 128 are identified as
+letters, digits, spaces, etc. Setting alternate character tables and a locale
+are mutually exclusive.
+</P>
+<br><b>
+Setting certain match controls
+</b><br>
+<P>
+The following modifiers are really subject modifiers, and are described under
+"Subject Modifiers" below. However, they may be included in a pattern's
+modifier list, in which case they are applied to every subject line that is
+processed with that pattern. These modifiers do not affect the compilation
+process.
+<pre>
+      aftertext                  show text after match
+      allaftertext               show text after captures
+      allcaptures                show all captures
+      allusedtext                show all consulted text
+      altglobal                  alternative global matching
+  /g  global                     global matching
+      jitstack=&#60;n&#62;               set size of JIT stack
+      mark                       show mark values
+      replace=&#60;string&#62;           specify a replacement string
+      startchar                  show starting character when relevant
+      substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+</pre>
+These modifiers may not appear in a <b>#pattern</b> command. If you want them as
+defaults, set them in a <b>#subject</b> command.
+</P>
+<br><b>
+Specifying literal subject lines
+</b><br>
+<P>
+If the <b>subject_literal</b> modifier is present on a pattern, all the subject
+lines that it matches are taken as literal strings, with no interpretation of
+backslashes. It is not possible to set subject modifiers on such lines, but any
+that are set as defaults by a <b>#subject</b> command are recognized.
+</P>
+<br><b>
+Saving a compiled pattern
+</b><br>
+<P>
+When a pattern with the <b>push</b> modifier is successfully compiled, it is
+pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
+line to contain a new pattern (or a command) instead of a subject line. This
+facility is used when saving compiled patterns to a file, as described in the
+section entitled "Saving and restoring compiled patterns"
+<a href="#saverestore">below.</a>
+If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled
+pattern is stacked, leaving the original as current, ready to match the
+following input lines. This provides a way of testing the
+<b>pcre2_code_copy()</b> function.
+The <b>push</b> and <b>pushcopy </b> modifiers are incompatible with compilation
+modifiers such as <b>global</b> that act at match time. Any that are specified
+are ignored (for the stacked copy), with a warning message, except for
+<b>replace</b>, which causes an error. Note that <b>jitverify</b>, which is
+allowed, does not carry through to any subsequent matching that uses a stacked
+pattern.
+</P>
+<br><b>
+Testing foreign pattern conversion
+</b><br>
+<P>
+The experimental foreign pattern conversion functions in PCRE2 can be tested by
+setting the <b>convert</b> modifier. Its argument is a colon-separated list of
+options, which set the equivalent option for the <b>pcre2_pattern_convert()</b>
+function:
+<pre>
+  glob                    PCRE2_CONVERT_GLOB
+  glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
+  glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
+  posix_basic             PCRE2_CONVERT_POSIX_BASIC
+  posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
+  unset                   Unset all options
+</pre>
+The "unset" value is useful for turning off a default that has been set by a
+<b>#pattern</b> command. When one of these options is set, the input pattern is
+passed to <b>pcre2_pattern_convert()</b>. If the conversion is successful, the
+result is reflected in the output and then passed to <b>pcre2_compile()</b>. The
+normal <b>utf</b> and <b>no_utf_check</b> options, if set, cause the
+PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be passed to
+<b>pcre2_pattern_convert()</b>.
+</P>
+<P>
+By default, the conversion function is allowed to allocate a buffer for its
+output. However, if the <b>convert_length</b> modifier is set to a value greater
+than zero, <b>pcre2test</b> passes a buffer of the given length. This makes it
+possible to test the length check.
+</P>
+<P>
+The <b>convert_glob_escape</b> and <b>convert_glob_separator</b> modifiers can be
+used to specify the escape and separator characters for glob processing,
+overriding the defaults, which are operating-system dependent.
+<a name="subjectmodifiers"></a></P>
+<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
+<P>
+The modifiers that can appear in subject lines and the <b>#subject</b>
+command are of two types.
+</P>
+<br><b>
+Setting match options
+</b><br>
+<P>
+The following modifiers set options for <b>pcre2_match()</b> or
+<b>pcre2_dfa_match()</b>. See
+<a href="pcreapi.html"><b>pcreapi</b></a>
+for a description of their effects.
+<pre>
+      anchored                  set PCRE2_ANCHORED
+      endanchored               set PCRE2_ENDANCHORED
+      dfa_restart               set PCRE2_DFA_RESTART
+      dfa_shortest              set PCRE2_DFA_SHORTEST
+      no_jit                    set PCRE2_NO_JIT
+      no_utf_check              set PCRE2_NO_UTF_CHECK
+      notbol                    set PCRE2_NOTBOL
+      notempty                  set PCRE2_NOTEMPTY
+      notempty_atstart          set PCRE2_NOTEMPTY_ATSTART
+      noteol                    set PCRE2_NOTEOL
+      partial_hard (or ph)      set PCRE2_PARTIAL_HARD
+      partial_soft (or ps)      set PCRE2_PARTIAL_SOFT
+</pre>
+The partial matching modifiers are provided with abbreviations because they
+appear frequently in tests.
+</P>
+<P>
+If the <b>posix</b> or <b>posix_nosub</b> modifier was present on the pattern,
+causing the POSIX wrapper API to be used, the only option-setting modifiers
+that have any effect are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>,
+causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
+<b>regexec()</b>. The other modifiers are ignored, with a warning message.
+</P>
+<P>
+There is one additional modifier that can be used with the POSIX wrapper. It is
+ignored (with a warning) if used for non-POSIX matching.
+<pre>
+      posix_startend=&#60;n&#62;[:&#60;m&#62;]
+</pre>
+This causes the subject string to be passed to <b>regexec()</b> using the
+REG_STARTEND option, which uses offsets to specify which part of the string is
+searched. If only one number is given, the end offset is passed as the end of
+the subject string. For more detail of REG_STARTEND, see the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+documentation. If the subject string contains binary zeros (coded as escapes
+such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
+its input), you must use <b>posix_startend</b> to specify its length.
+</P>
+<br><b>
+Setting match controls
+</b><br>
+<P>
+The following modifiers affect the matching process or request additional
+information. Some of them may also be specified on a pattern line (see above),
+in which case they apply to every subject line that is matched against that
+pattern.
+<pre>
+      aftertext                  show text after match
+      allaftertext               show text after captures
+      allcaptures                show all captures
+      allusedtext                show all consulted text (non-JIT only)
+      altglobal                  alternative global matching
+      callout_capture            show captures at callout time
+      callout_data=&#60;n&#62;           set a value to pass via callouts
+      callout_error=&#60;n&#62;[:&#60;m&#62;]    control callout error
+      callout_extra              show extra callout information
+      callout_fail=&#60;n&#62;[:&#60;m&#62;]     control callout failure
+      callout_no_where           do not show position of a callout
+      callout_none               do not supply a callout function
+      copy=&#60;number or name&#62;      copy captured substring
+      depth_limit=&#60;n&#62;            set a depth limit
+      dfa                        use <b>pcre2_dfa_match()</b>
+      find_limits                find match and depth limits
+      get=&#60;number or name&#62;       extract captured substring
+      getall                     extract all captured substrings
+  /g  global                     global matching
+      heap_limit=&#60;n&#62;             set a limit on heap memory (Kbytes)
+      jitstack=&#60;n&#62;               set size of JIT stack
+      mark                       show mark values
+      match_limit=&#60;n&#62;            set a match limit
+      memory                     show heap memory usage
+      null_context               match with a NULL context
+      offset=&#60;n&#62;                 set starting offset
+      offset_limit=&#60;n&#62;           set offset limit
+      ovector=&#60;n&#62;                set size of output vector
+      recursion_limit=&#60;n&#62;        obsolete synonym for depth_limit
+      replace=&#60;string&#62;           specify a replacement string
+      startchar                  show startchar when relevant
+      startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
+      substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+      zero_terminate             pass the subject as zero-terminated
+</pre>
+The effects of these modifiers are described in the following sections. When
+matching via the POSIX wrapper API, the <b>aftertext</b>, <b>allaftertext</b>,
+and <b>ovector</b> subject modifiers work as described below. All other
+modifiers are either ignored, with a warning message, or cause an error.
+</P>
+<br><b>
+Showing more text
+</b><br>
+<P>
+The <b>aftertext</b> modifier requests that as well as outputting the part of
+the subject string that matched the entire pattern, <b>pcre2test</b> should in
+addition output the remainder of the subject string. This is useful for tests
+where the subject contains multiple copies of the same substring. The
+<b>allaftertext</b> modifier requests the same action for captured substrings as
+well as the main matched substring. In each case the remainder is output on the
+following line with a plus character following the capture number.
+</P>
+<P>
+The <b>allusedtext</b> modifier requests that all the text that was consulted
+during a successful pattern match by the interpreter should be shown. This
+feature is not supported for JIT matching, and if requested with JIT it is
+ignored (with a warning message). Setting this modifier affects the output if
+there is a lookbehind at the start of a match, or a lookahead at the end, or if
+\K is used in the pattern. Characters that precede or follow the start and end
+of the actual match are indicated in the output by '&#60;' or '&#62;' characters
+underneath them. Here is an example:
+<pre>
+    re&#62; /(?&#60;=pqr)abc(?=xyz)/
+  data&#62; 123pqrabcxyz456\=allusedtext
+   0: pqrabcxyz
+      &#60;&#60;&#60;   &#62;&#62;&#62;
+</pre>
+This shows that the matched string is "abc", with the preceding and following
+strings "pqr" and "xyz" having been consulted during the match (when processing
+the assertions).
+</P>
+<P>
+The <b>startchar</b> modifier requests that the starting character for the match
+be indicated, if it is different to the start of the matched string. The only
+time when this occurs is when \K has been processed as part of the match. In
+this situation, the output for the matched string is displayed from the
+starting character instead of from the match point, with circumflex characters
+under the earlier characters. For example:
+<pre>
+    re&#62; /abc\Kxyz/
+  data&#62; abcxyz\=startchar
+   0: abcxyz
+      ^^^
+</pre>
+Unlike <b>allusedtext</b>, the <b>startchar</b> modifier can be used with JIT.
+However, these two modifiers are mutually exclusive.
+</P>
+<br><b>
+Showing the value of all capture groups
+</b><br>
+<P>
+The <b>allcaptures</b> modifier requests that the values of all potential
+captured parentheses be output after a match. By default, only those up to the
+highest one actually used in the match are output (corresponding to the return
+code from <b>pcre2_match()</b>). Groups that did not take part in the match
+are output as "&#60;unset&#62;". This modifier is not relevant for DFA matching (which
+does no capturing); it is ignored, with a warning message, if present.
+</P>
+<br><b>
+Testing callouts
+</b><br>
+<P>
+A callout function is supplied when <b>pcre2test</b> calls the library matching
+functions, unless <b>callout_none</b> is specified. Its behaviour can be
+controlled by various modifiers listed above whose names begin with
+<b>callout_</b>. Details are given in the section entitled "Callouts"
+<a href="#callouts">below.</a>
+</P>
+<br><b>
+Finding all matches in a string
+</b><br>
+<P>
+Searching for all possible matches within a subject can be requested by the
+<b>global</b> or <b>altglobal</b> modifier. After finding a match, the matching
+function is called again to search the remainder of the subject. The difference
+between <b>global</b> and <b>altglobal</b> is that the former uses the
+<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
+to start searching at a new point within the entire string (which is what Perl
+does), whereas the latter passes over a shortened subject. This makes a
+difference to the matching process if the pattern begins with a lookbehind
+assertion (including \b or \B).
+</P>
+<P>
+If an empty string is matched, the next match is done with the
+PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
+another, non-empty, match at the same point in the subject. If this match
+fails, the start offset is advanced, and the normal match is retried. This
+imitates the way Perl handles such cases when using the <b>/g</b> modifier or
+the <b>split()</b> function. Normally, the start offset is advanced by one
+character, but if the newline convention recognizes CRLF as a newline, and the
+current character is CR followed by LF, an advance of two characters occurs.
+</P>
+<br><b>
+Testing substring extraction functions
+</b><br>
+<P>
+The <b>copy</b> and <b>get</b> modifiers can be used to test the
+<b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
+They can be given more than once, and each can specify a group name or number,
+for example:
+<pre>
+   abcd\=copy=1,copy=3,get=G1
+</pre>
+If the <b>#subject</b> command is used to set default copy and/or get lists,
+these can be unset by specifying a negative number to cancel all numbered
+groups and an empty name to cancel all named groups.
+</P>
+<P>
+The <b>getall</b> modifier tests <b>pcre2_substring_list_get()</b>, which
+extracts all captured substrings.
+</P>
+<P>
+If the subject line is successfully matched, the substrings extracted by the
+convenience functions are output with C, G, or L after the string number
+instead of a colon. This is in addition to the normal full list. The string
+length (that is, the return from the extraction function) is given in
+parentheses after each substring, followed by the name when the extraction was
+by name.
+</P>
+<br><b>
+Testing the substitution function
+</b><br>
+<P>
+If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
+called instead of one of the matching functions. Note that replacement strings
+cannot contain commas, because a comma signifies the end of a modifier. This is
+not thought to be an issue in a test program.
+</P>
+<P>
+Unlike subject strings, <b>pcre2test</b> does not process replacement strings
+for escape sequences. In UTF mode, a replacement string is checked to see if it
+is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
+the appropriate code unit width. If it is not a valid UTF-8 string, the
+individual code units are copied directly. This provides a means of passing an
+invalid UTF-8 string for testing purposes.
+</P>
+<P>
+The following modifiers set options (in additional to the normal match options)
+for <b>pcre2_substitute()</b>:
+<pre>
+  global                      PCRE2_SUBSTITUTE_GLOBAL
+  substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
+  substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+  substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+  substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
+
+</PRE>
+</P>
+<P>
+After a successful substitution, the modified string is output, preceded by the
+number of replacements. This may be zero if there were no matches. Here is a
+simple example of a substitution test:
+<pre>
+  /abc/replace=xxx
+      =abc=abc=
+   1: =xxx=abc=
+      =abc=abc=\=global
+   2: =xxx=xxx=
+</pre>
+Subject and replacement strings should be kept relatively short (fewer than 256
+characters) for substitution tests, as fixed-size buffers are used. To make it
+easy to test for buffer overflow, if the replacement string starts with a
+number in square brackets, that number is passed to <b>pcre2_substitute()</b> as
+the size of the output buffer, with the replacement string starting at the next
+character. Here is an example that tests the edge case:
+<pre>
+  /abc/
+      123abc123\=replace=[10]XYZ
+   1: 123XYZ123
+      123abc123\=replace=[9]XYZ
+  Failed: error -47: no more memory
+</pre>
+The default action of <b>pcre2_substitute()</b> is to return
+PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
+PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
+<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
+to go through the motions of matching and substituting, in order to compute the
+size of buffer that is required. When this happens, <b>pcre2test</b> shows the
+required buffer length (which includes space for the trailing zero) as part of
+the error message. For example:
+<pre>
+  /abc/substitute_overflow_length
+      123abc123\=replace=[9]XYZ
+  Failed: error -47: no more memory: 10 code units are needed
+</pre>
+A replacement string is ignored with POSIX and DFA matching. Specifying partial
+matching provokes an error return ("bad option value") from
+<b>pcre2_substitute()</b>.
+</P>
+<br><b>
+Setting the JIT stack size
+</b><br>
+<P>
+The <b>jitstack</b> modifier provides a way of setting the maximum stack size
+that is used by the just-in-time optimization code. It is ignored if JIT
+optimization is not being used. The value is a number of kibibytes (units of
+1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack
+that is larger than the default is necessary only for very complicated
+patterns. If <b>jitstack</b> is set non-zero on a subject line it overrides any
+value that was set on the pattern.
+</P>
+<br><b>
+Setting heap, match, and depth limits
+</b><br>
+<P>
+The <b>heap_limit</b>, <b>match_limit</b>, and <b>depth_limit</b> modifiers set
+the appropriate limits in the match context. These values are ignored when the
+<b>find_limits</b> modifier is specified.
+</P>
+<br><b>
+Finding minimum limits
+</b><br>
+<P>
+If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
+calls the relevant matching function several times, setting different values in
+the match context via <b>pcre2_set_heap_limit()</b>,
+<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
+the minimum values for each parameter that allows the match to complete without
+error. If JIT is being used, only the match limit is relevant.
+</P>
+<P>
+When using this modifier, the pattern should not contain any limit settings
+such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
+lower than the minimum matching value, the minimum value cannot be found
+because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
+an in-pattern limit; they cannot increase it.
+</P>
+<P>
+For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
+much nested backtracking happens (that is, how deeply the pattern's tree is
+searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
+recursive calls of the internal function that is used for handling pattern
+recursion, lookaround assertions, and atomic groups.
+</P>
+<P>
+For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
+of backtracking that takes place, and learning the minimum value can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string. In the case of DFA
+matching, <i>match_limit</i> controls the total number of calls, both recursive
+and non-recursive, to the internal matching function, thus controlling the
+overall amount of computing resource that is used.
+</P>
+<P>
+For both kinds of matching, the <i>heap_limit</i> number, which is in kibibytes
+(units of 1024 bytes), limits the amount of heap memory used for matching. A
+value of zero disables the use of any heap memory; many simple pattern matches
+can be done without using the heap, so zero is not an unreasonable setting.
+</P>
+<br><b>
+Showing MARK names
+</b><br>
+<P>
+The <b>mark</b> modifier causes the names from backtracking control verbs that
+are returned from calls to <b>pcre2_match()</b> to be displayed. If a mark is
+returned for a match, non-match, or partial match, <b>pcre2test</b> shows it.
+For a match, it is on a line by itself, tagged with "MK:". Otherwise, it
+is added to the non-match message.
+</P>
+<br><b>
+Showing memory usage
+</b><br>
+<P>
+The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
+memory allocation and freeing calls that occur during a call to
+<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
+requires a bigger vector than the default for remembering backtracking points
+(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
+many cases there will be no heap memory used and therefore no additional
+output. No heap memory is allocated during matching with JIT, so in that case
+the <b>memory</b> modifier never has any effect. For this modifier to work, the
+<b>null_context</b> modifier must not be set on both the pattern and the
+subject, though it can be set on one or the other.
+</P>
+<br><b>
+Setting a starting offset
+</b><br>
+<P>
+The <b>offset</b> modifier sets an offset in the subject string at which
+matching starts. Its value is a number of code units, not characters.
+</P>
+<br><b>
+Setting an offset limit
+</b><br>
+<P>
+The <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match
+cannot be found starting at or before this offset in the subject, a "no match"
+return is given. The data value is a number of code units, not characters. When
+this modifier is used, the <b>use_offset_limit</b> modifier must have been set
+for the pattern; if not, an error is generated.
+</P>
+<br><b>
+Setting the size of the output vector
+</b><br>
+<P>
+The <b>ovector</b> modifier applies only to the subject line in which it
+appears, though of course it can also be used to set a default in a
+<b>#subject</b> command. It specifies the number of pairs of offsets that are
+available for storing matching information. The default is 15.
+</P>
+<P>
+A value of zero is useful when testing the POSIX API because it causes
+<b>regexec()</b> to be called with a NULL capture vector. When not testing the
+POSIX API, a value of zero is used to cause
+<b>pcre2_match_data_create_from_pattern()</b> to be called, in order to create a
+match block of exactly the right size for the pattern. (It is not possible to
+create a match block with a zero-length ovector; there is always at least one
+pair of offsets.)
+</P>
+<br><b>
+Passing the subject as zero-terminated
+</b><br>
+<P>
+By default, the subject string is passed to a native API matching function with
+its correct length. In order to test the facility for passing a zero-terminated
+string, the <b>zero_terminate</b> modifier is provided. It causes the length to
+be passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface,
+this modifier is ignored, with a warning.
+</P>
+<P>
+When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
+passing the replacement string as zero-terminated.
+</P>
+<br><b>
+Passing a NULL context
+</b><br>
+<P>
+Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
+<b>pcre2_dfa_match()</b> or <b>pcre2_jit_match()</b>. If the <b>null_context</b>
+modifier is set, however, NULL is passed. This is for testing that the matching
+functions behave correctly in this case (they use default values). This
+modifier cannot be used with the <b>find_limits</b> modifier or when testing the
+substitution function.
+</P>
+<br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<P>
+By default, <b>pcre2test</b> uses the standard PCRE2 matching function,
+<b>pcre2_match()</b> to match each subject line. PCRE2 also supports an
+alternative matching function, <b>pcre2_dfa_match()</b>, which operates in a
+different way, and has some restrictions. The differences between the two
+functions are described in the
+<a href="pcre2matching.html"><b>pcre2matching</b></a>
+documentation.
+</P>
+<P>
+If the <b>dfa</b> modifier is set, the alternative matching function is used.
+This function finds all possible matches at a given point in the subject. If,
+however, the <b>dfa_shortest</b> modifier is set, processing stops after the
+first match is found. This is always the shortest possible match.
+</P>
+<br><a name="SEC13" href="#TOC1">DEFAULT OUTPUT FROM pcre2test</a><br>
+<P>
+This section describes the output when the normal matching function,
+<b>pcre2_match()</b>, is being used.
+</P>
+<P>
+When a match succeeds, <b>pcre2test</b> outputs the list of captured substrings,
+starting with number 0 for the string that matched the whole pattern.
+Otherwise, it outputs "No match" when the return is PCRE2_ERROR_NOMATCH, or
+"Partial match:" followed by the partially matching substring when the
+return is PCRE2_ERROR_PARTIAL. (Note that this is the
+entire substring that was inspected during the partial match; it may include
+characters before the actual match start if a lookbehind assertion, \K, \b,
+or \B was involved.)
+</P>
+<P>
+For any other return, <b>pcre2test</b> outputs the PCRE2 negative error number
+and a short descriptive phrase. If the error is a failed UTF string check, the
+code unit offset of the start of the failing character is also output. Here is
+an example of an interactive <b>pcre2test</b> run.
+<pre>
+  $ pcre2test
+  PCRE2 version 10.22 2016-07-29
+
+    re&#62; /^abc(\d+)/
+  data&#62; abc123
+   0: abc123
+   1: 123
+  data&#62; xyz
+  No match
+</pre>
+Unset capturing substrings that are not followed by one that is set are not
+shown by <b>pcre2test</b> unless the <b>allcaptures</b> modifier is specified. In
+the following example, there are two capturing substrings, but when the first
+data line is matched, the second, unset substring is not shown. An "internal"
+unset substring is shown as "&#60;unset&#62;", as for the second data line.
+<pre>
+    re&#62; /(a)|(b)/
+  data&#62; a
+   0: a
+   1: a
+  data&#62; b
+   0: b
+   1: &#60;unset&#62;
+   2: b
+</pre>
+If the strings contain any non-printing characters, they are output as \xhh
+escapes if the value is less than 256 and UTF mode is not set. Otherwise they
+are output as \x{hh...} escapes. See below for the definition of non-printing
+characters. If the <b>aftertext</b> modifier is set, the output for substring
+0 is followed by the the rest of the subject string, identified by "0+" like
+this:
+<pre>
+    re&#62; /cat/aftertext
+  data&#62; cataract
+   0: cat
+   0+ aract
+</pre>
+If global matching is requested, the results of successive matching attempts
+are output in sequence, like this:
+<pre>
+    re&#62; /\Bi(\w\w)/g
+  data&#62; Mississippi
+   0: iss
+   1: ss
+   0: iss
+   1: ss
+   0: ipp
+   1: pp
+</pre>
+"No match" is output only if the first match attempt fails. Here is an example
+of a failure message (the offset 4 that is specified by the <b>offset</b>
+modifier is past the end of the subject string):
+<pre>
+    re&#62; /xyz/
+  data&#62; xyz\=offset=4
+  Error -24 (bad offset value)
+</PRE>
+</P>
+<P>
+Note that whereas patterns can be continued over several lines (a plain "&#62;"
+prompt is used for continuations), subject lines may not. However newlines can
+be included in a subject by means of the \n escape (or \r, \r\n, etc.,
+depending on the newline sequence setting).
+</P>
+<br><a name="SEC14" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<P>
+When the alternative matching function, <b>pcre2_dfa_match()</b>, is used, the
+output consists of a list of all the matches that start at the first point in
+the subject where there is at least one match. For example:
+<pre>
+    re&#62; /(tang|tangerine|tan)/
+  data&#62; yellow tangerine\=dfa
+   0: tangerine
+   1: tang
+   2: tan
+</pre>
+Using the normal matching function on this data finds only "tang". The
+longest matching string is always given first (and numbered zero). After a
+PCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the
+partially matching substring. Note that this is the entire substring that was
+inspected during the partial match; it may include characters before the actual
+match start if a lookbehind assertion, \b, or \B was involved. (\K is not
+supported for DFA matching.)
+</P>
+<P>
+If global matching is requested, the search for further matches resumes
+at the end of the longest match. For example:
+<pre>
+    re&#62; /(tang|tangerine|tan)/g
+  data&#62; yellow tangerine and tangy sultana\=dfa
+   0: tangerine
+   1: tang
+   2: tan
+   0: tang
+   1: tan
+   0: tan
+</pre>
+The alternative matching function does not support substring capture, so the
+modifiers that are concerned with captured substrings are not relevant.
+</P>
+<br><a name="SEC15" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
+<P>
+When the alternative matching function has given the PCRE2_ERROR_PARTIAL
+return, indicating that the subject partially matched the pattern, you can
+restart the match with additional subject data by means of the
+<b>dfa_restart</b> modifier. For example:
+<pre>
+    re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+  data&#62; 23ja\=P,dfa
+  Partial match: 23ja
+  data&#62; n05\=dfa,dfa_restart
+   0: n05
+</pre>
+For further information about partial matching, see the
+<a href="pcre2partial.html"><b>pcre2partial</b></a>
+documentation.
+<a name="callouts"></a></P>
+<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
+<P>
+If the pattern contains any callout requests, <b>pcre2test</b>'s callout
+function is called during matching unless <b>callout_none</b> is specified. This
+works with both matching functions, and with JIT, though there are some
+differences in behaviour. The output for callouts with numerical arguments and
+those with string arguments is slightly different.
+</P>
+<br><b>
+Callouts with numerical arguments
+</b><br>
+<P>
+By default, the callout function displays the callout number, the start and
+current positions in the subject text at the callout time, and the next pattern
+item to be tested. For example:
+<pre>
+  ---&#62;pqrabcdef
+    0    ^  ^     \d
+</pre>
+This output indicates that callout number 0 occurred for a match attempt
+starting at the fourth character of the subject string, when the pointer was at
+the seventh character, and when the next pattern item was \d. Just
+one circumflex is output if the start and current positions are the same, or if
+the current position precedes the start position, which can happen if the
+callout is in a lookbehind assertion.
+</P>
+<P>
+Callouts numbered 255 are assumed to be automatic callouts, inserted as a
+result of the <b>auto_callout</b> pattern modifier. In this case, instead of
+showing the callout number, the offset in the pattern, preceded by a plus, is
+output. For example:
+<pre>
+    re&#62; /\d?[A-E]\*/auto_callout
+  data&#62; E*
+  ---&#62;E*
+   +0 ^      \d?
+   +3 ^      [A-E]
+   +8 ^^     \*
+  +10 ^ ^
+   0: E*
+</pre>
+If a pattern contains (*MARK) items, an additional line is output whenever
+a change of latest mark is passed to the callout function. For example:
+<pre>
+    re&#62; /a(*MARK:X)bc/auto_callout
+  data&#62; abc
+  ---&#62;abc
+   +0 ^       a
+   +1 ^^      (*MARK:X)
+  +10 ^^      b
+  Latest Mark: X
+  +11 ^ ^     c
+  +12 ^  ^
+   0: abc
+</pre>
+The mark changes between matching "a" and "b", but stays the same for the rest
+of the match, so nothing more is output. If, as a result of backtracking, the
+mark reverts to being unset, the text "&#60;unset&#62;" is output.
+</P>
+<br><b>
+Callouts with string arguments
+</b><br>
+<P>
+The output for a callout with a string argument is similar, except that instead
+of outputting a callout number before the position indicators, the callout
+string and its offset in the pattern string are output before the reflection of
+the subject string, and the subject string is reflected for each callout. For
+example:
+<pre>
+    re&#62; /^ab(?C'first')cd(?C"second")ef/
+  data&#62; abcdefg
+  Callout (7): 'first'
+  ---&#62;abcdefg
+      ^ ^         c
+  Callout (20): "second"
+  ---&#62;abcdefg
+      ^   ^       e
+   0: abcdef
+
+</PRE>
+</P>
+<br><b>
+Callout modifiers
+</b><br>
+<P>
+The callout function in <b>pcre2test</b> returns zero (carry on matching) by
+default, but you can use a <b>callout_fail</b> modifier in a subject line to
+change this and other parameters of the callout (see below).
+</P>
+<P>
+If the <b>callout_capture</b> modifier is set, the current captured groups are
+output when a callout occurs. This is useful only for non-DFA matching, as
+<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever
+shown.
+</P>
+<P>
+The normal callout output, showing the callout number or pattern offset (as
+described above) is suppressed if the <b>callout_no_where</b> modifier is set.
+</P>
+<P>
+When using the interpretive matching function <b>pcre2_match()</b> without JIT,
+setting the <b>callout_extra</b> modifier causes additional output from
+<b>pcre2test</b>'s callout function to be generated. For the first callout in a
+match attempt at a new starting position in the subject, "New match attempt" is
+output. If there has been a backtrack since the last callout (or start of
+matching if this is the first callout), "Backtrack" is output, followed by "No
+other matching paths" if the backtrack ended the previous match attempt. For
+example:
+<pre>
+   re&#62; /(a+)b/auto_callout,no_start_optimize,no_auto_possess
+  data&#62; aac\=callout_extra
+  New match attempt
+  ---&#62;aac
+   +0 ^       (
+   +1 ^       a+
+   +3 ^ ^     )
+   +4 ^ ^     b
+  Backtrack
+  ---&#62;aac
+   +3 ^^      )
+   +4 ^^      b
+  Backtrack
+  No other matching paths
+  New match attempt
+  ---&#62;aac
+   +0  ^      (
+   +1  ^      a+
+   +3  ^^     )
+   +4  ^^     b
+  Backtrack
+  No other matching paths
+  New match attempt
+  ---&#62;aac
+   +0   ^     (
+   +1   ^     a+
+  Backtrack
+  No other matching paths
+  New match attempt
+  ---&#62;aac
+   +0    ^    (
+   +1    ^    a+
+  No match
+</pre>
+Notice that various optimizations must be turned off if you want all possible
+matching paths to be scanned. If <b>no_start_optimize</b> is not used, there is
+an immediate "no match", without any callouts, because the starting
+optimization fails to find "b" in the subject, which it knows must be present
+for any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned
+into "a++", which reduces the number of backtracks.
+</P>
+<P>
+The <b>callout_extra</b> modifier has no effect if used with the DFA matching
+function, or with JIT.
+</P>
+<br><b>
+Return values from callouts
+</b><br>
+<P>
+The default return from the callout function is zero, which allows matching to
+continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
+there is only one number, 1 is returned instead of 0 (causing matching to
+backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
+are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
+least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
+PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
+aborted. If both these modifiers are set for the same callout number,
+<b>callout_error</b> takes precedence. Note that callouts with string arguments
+are always given the number zero.
+</P>
+<P>
+The <b>callout_data</b> modifier can be given an unsigned or a negative number.
+This is set as the "user data" that is passed to the matching function, and
+passed back when the callout function is invoked. Any value other than zero is
+used as a return from <b>pcre2test</b>'s callout function.
+</P>
+<P>
+Inserting callouts can be helpful when using <b>pcre2test</b> to check
+complicated regular expressions. For further information about callouts, see
+the
+<a href="pcre2callout.html"><b>pcre2callout</b></a>
+documentation.
+</P>
+<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
+<P>
+When <b>pcre2test</b> is outputting text in the compiled version of a pattern,
+bytes other than 32-126 are always treated as non-printing characters and are
+therefore shown as hex escapes.
+</P>
+<P>
+When <b>pcre2test</b> is outputting text that is a matched part of a subject
+string, it behaves in the same way, unless a different locale has been set for
+the pattern (using the <b>locale</b> modifier). In this case, the
+<b>isprint()</b> function is used to distinguish printing and non-printing
+characters.
+<a name="saverestore"></a></P>
+<br><a name="SEC18" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br>
+<P>
+It is possible to save compiled patterns on disc or elsewhere, and reload them
+later, subject to a number of restrictions. JIT data cannot be saved. The host
+on which the patterns are reloaded must be running the same version of PCRE2,
+with the same code unit width, and must also have the same endianness, pointer
+width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
+serialized, that is, converted to a stream of bytes. A single byte stream may
+contain any number of compiled patterns, but they must all use the same
+character tables. A single copy of the tables is included in the byte stream
+(its size is 1088 bytes).
+</P>
+<P>
+The functions whose names begin with <b>pcre2_serialize_</b> are used
+for serializing and de-serializing. They are described in the
+<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
+documentation. In this section we describe the features of <b>pcre2test</b> that
+can be used to test these functions.
+</P>
+<P>
+Note that "serialization" in PCRE2 does not convert compiled patterns to an
+abstract format like Java or .NET. It just makes a reloadable byte code stream.
+Hence the restrictions on reloading mentioned above.
+</P>
+<P>
+In <b>pcre2test</b>, when a pattern with <b>push</b> modifier is successfully
+compiled, it is pushed onto a stack of compiled patterns, and <b>pcre2test</b>
+expects the next line to contain a new pattern (or command) instead of a
+subject line. By contrast, the <b>pushcopy</b> modifier causes a copy of the
+compiled pattern to be stacked, leaving the original available for immediate
+matching. By using <b>push</b> and/or <b>pushcopy</b>, a number of patterns can
+be compiled and retained. These modifiers are incompatible with <b>posix</b>,
+and control modifiers that act at match time are ignored (with a message) for
+the stacked patterns. The <b>jitverify</b> modifier applies only at compile
+time.
+</P>
+<P>
+The command
+<pre>
+  #save &#60;filename&#62;
+</pre>
+causes all the stacked patterns to be serialized and the result written to the
+named file. Afterwards, all the stacked patterns are freed. The command
+<pre>
+  #load &#60;filename&#62;
+</pre>
+reads the data in the file, and then arranges for it to be de-serialized, with
+the resulting compiled patterns added to the pattern stack. The pattern on the
+top of the stack can be retrieved by the #pop command, which must be followed
+by lines of subjects that are to be matched with the pattern, terminated as
+usual by an empty line or end of file. This command may be followed by a
+modifier list containing only
+<a href="#controlmodifiers">control modifiers</a>
+that act after a pattern has been compiled. In particular, <b>hex</b>,
+<b>posix</b>, <b>posix_nosub</b>, <b>push</b>, and <b>pushcopy</b> are not allowed,
+nor are any
+<a href="#optionmodifiers">option-setting modifiers.</a>
+The JIT modifiers are, however permitted. Here is an example that saves and
+reloads two patterns.
+<pre>
+  /abc/push
+  /xyz/push
+  #save tempfile
+  #load tempfile
+  #pop info
+  xyz
+
+  #pop jit,bincode
+  abc
+</pre>
+If <b>jitverify</b> is used with #pop, it does not automatically imply
+<b>jit</b>, which is different behaviour from when it is used on a pattern.
+</P>
+<P>
+The #popcopy command is analagous to the <b>pushcopy</b> modifier in that it
+makes current a copy of the topmost stack pattern, leaving the original still
+on the stack.
+</P>
+<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
+<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
+<b>pcre2pattern</b>(3), <b>pcre2serialize</b>(3).
+</P>
+<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge, England.
+<br>
+</P>
+<br><a name="SEC21" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 21 July 2018
+<br>
+Copyright &copy; 1997-2018 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>