new upstream release (3.3.0); modify package compatibility for Stretch

[ossec-hids.git] / src / external / pcre2-10.32 / doc / pcre2.txt
diff --git a/src/external/pcre2-10.32/doc/pcre2.txt b/src/external/pcre2-10.32/doc/pcre2.txt

new file mode 100644 (file)

index 0000000..30ba2f9
--- /dev/null
+++ b/src/external/pcre2-10.32/doc/pcre2.txt
@@ -0,0 +1,10671 @@
+-----------------------------------------------------------------------------
+This file contains a concatenation of the PCRE2 man pages, converted to plain
+text format for ease of searching with a text editor, or for use on systems
+that do not have a man page processor. The small individual files that give
+synopses of each function in the library have not been included. Neither has
+the pcre2demo program. There are separate text files for the pcre2grep and
+pcre2test commands.
+-----------------------------------------------------------------------------
+
+
+PCRE2(3)                   Library Functions Manual                   PCRE2(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+INTRODUCTION
+
+       PCRE2 is the name used for a revised API for the PCRE library, which is
+       a set of functions, written in C,  that  implement  regular  expression
+       pattern matching using the same syntax and semantics as Perl, with just
+       a few differences. After nearly two decades,  the  limitations  of  the
+       original  API  were  making development increasingly difficult. The new
+       API is more extensible, and it was simplified by abolishing  the  sepa-
+       rate  "study" optimizing function; in PCRE2, patterns are automatically
+       optimized where possible. Since forking from PCRE1, the code  has  been
+       extensively refactored and new features introduced.
+
+       As  well  as Perl-style regular expression patterns, some features that
+       appeared in Python and the original PCRE before they appeared  in  Perl
+       are  available  using the Python syntax. There is also some support for
+       one or two .NET and Oniguruma syntax items, and there are  options  for
+       requesting   some  minor  changes  that  give  better  ECMAScript  (aka
+       JavaScript) compatibility.
+
+       The source code for PCRE2 can be compiled to support 8-bit, 16-bit,  or
+       32-bit  code units, which means that up to three separate libraries may
+       be installed.  The original work to extend PCRE to  16-bit  and  32-bit
+       code  units  was  done  by Zoltan Herczeg and Christian Persch, respec-
+       tively. In all three cases, strings can be interpreted  either  as  one
+       character  per  code  unit, or as UTF-encoded Unicode, with support for
+       Unicode general category properties. Unicode  support  is  optional  at
+       build  time  (but  is  the default). However, processing strings as UTF
+       code units must be enabled explicitly at run time. The version of  Uni-
+       code in use can be discovered by running
+
+         pcre2test -C
+
+       The  three  libraries  contain  identical sets of functions, with names
+       ending in _8,  _16,  or  _32,  respectively  (for  example,  pcre2_com-
+       pile_8()).  However,  by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or
+       32, a program that uses just one code unit width can be  written  using
+       generic names such as pcre2_compile(), and the documentation is written
+       assuming that this is the case.
+
+       In addition to the Perl-compatible matching function, PCRE2 contains an
+       alternative  function that matches the same compiled patterns in a dif-
+       ferent way. In certain circumstances, the alternative function has some
+       advantages.   For  a discussion of the two matching algorithms, see the
+       pcre2matching page.
+
+       Details of exactly which Perl regular expression features are  and  are
+       not  supported  by  PCRE2  are  given  in  separate  documents. See the
+       pcre2pattern and pcre2compat pages. There is a syntax  summary  in  the
+       pcre2syntax page.
+
+       Some  features  of PCRE2 can be included, excluded, or changed when the
+       library is built. The pcre2_config() function makes it possible  for  a
+       client  to  discover  which  features are available. The features them-
+       selves are described in the pcre2build page. Documentation about build-
+       ing  PCRE2 for various operating systems can be found in the README and
+       NON-AUTOTOOLS_BUILD files in the source distribution.
+
+       The libraries contains a number of undocumented internal functions  and
+       data  tables  that  are  used by more than one of the exported external
+       functions, but which are not intended  for  use  by  external  callers.
+       Their  names  all begin with "_pcre2", which hopefully will not provoke
+       any name clashes. In some environments, it is possible to control which
+       external  symbols  are  exported when a shared library is built, and in
+       these cases the undocumented symbols are not exported.
+
+
+SECURITY CONSIDERATIONS
+
+       If you are using PCRE2 in a non-UTF application that permits  users  to
+       supply  arbitrary  patterns  for  compilation, you should be aware of a
+       feature that allows users to turn on UTF support from within a pattern.
+       For  example, an 8-bit pattern that begins with "(*UTF)" turns on UTF-8
+       mode, which interprets patterns and subjects as strings of  UTF-8  code
+       units instead of individual 8-bit characters. This causes both the pat-
+       tern and any data against which it is matched to be checked  for  UTF-8
+       validity.  If the data string is very long, such a check might use suf-
+       ficiently many resources as to cause your application to  lose  perfor-
+       mance.
+
+       One  way  of guarding against this possibility is to use the pcre2_pat-
+       tern_info() function  to  check  the  compiled  pattern's  options  for
+       PCRE2_UTF.  Alternatively,  you can set the PCRE2_NEVER_UTF option when
+       calling pcre2_compile(). This causes a compile time error if  the  pat-
+       tern contains a UTF-setting sequence.
+
+       The  use  of Unicode properties for character types such as \d can also
+       be enabled from within the pattern, by specifying "(*UCP)".  This  fea-
+       ture can be disallowed by setting the PCRE2_NEVER_UCP option.
+
+       If  your  application  is one that supports UTF, be aware that validity
+       checking can take time. If the same data string is to be  matched  many
+       times,  you  can  use  the PCRE2_NO_UTF_CHECK option for the second and
+       subsequent matches to avoid running redundant checks.
+
+       The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead
+       to  problems,  because  it  may leave the current matching point in the
+       middle of  a  multi-code-unit  character.  The  PCRE2_NEVER_BACKSLASH_C
+       option can be used by an application to lock out the use of \C, causing
+       a compile-time error if it is encountered. It is also possible to build
+       PCRE2 with the use of \C permanently disabled.
+
+       Another  way  that  performance can be hit is by running a pattern that
+       has a very large search tree against a string that  will  never  match.
+       Nested  unlimited repeats in a pattern are a common example. PCRE2 pro-
+       vides some protection against  this:  see  the  pcre2_set_match_limit()
+       function  in  the  pcre2api  page.  There  is a similar function called
+       pcre2_set_depth_limit() that can be used to restrict the amount of mem-
+       ory that is used.
+
+
+USER DOCUMENTATION
+
+       The  user  documentation for PCRE2 comprises a number of different sec-
+       tions. In the "man" format, each of these is a separate "man page".  In
+       the  HTML  format, each is a separate page, linked from the index page.
+       In the plain  text  format,  the  descriptions  of  the  pcre2grep  and
+       pcre2test programs are in files called pcre2grep.txt and pcre2test.txt,
+       respectively. The remaining sections, except for the pcre2demo  section
+       (which  is a program listing), and the short pages for individual func-
+       tions, are concatenated in pcre2.txt, for ease of searching.  The  sec-
+       tions are as follows:
+
+         pcre2              this document
+         pcre2-config       show PCRE2 installation configuration information
+         pcre2api           details of PCRE2's native C API
+         pcre2build         building PCRE2
+         pcre2callout       details of the callout feature
+         pcre2compat        discussion of Perl compatibility
+         pcre2convert       details of pattern conversion functions
+         pcre2demo          a demonstration C program that uses PCRE2
+         pcre2grep          description of the pcre2grep command (8-bit only)
+         pcre2jit           discussion of just-in-time optimization support
+         pcre2limits        details of size and other limits
+         pcre2matching      discussion of the two matching algorithms
+         pcre2partial       details of the partial matching facility
+         pcre2pattern       syntax and semantics of supported regular
+                              expression patterns
+         pcre2perform       discussion of performance issues
+         pcre2posix         the POSIX-compatible C API for the 8-bit library
+         pcre2sample        discussion of the pcre2demo program
+         pcre2serialize     details of pattern serialization
+         pcre2syntax        quick syntax reference
+         pcre2test          description of the pcre2test command
+         pcre2unicode       discussion of Unicode and UTF support
+
+       In  the  "man"  and HTML formats, there is also a short page for each C
+       library function, listing its arguments and results.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+       Putting an actual email address here is a spam magnet. If you  want  to
+       email  me,  use  my two initials, followed by the two digits 10, at the
+       domain cam.ac.uk.
+
+
+REVISION
+
+       Last updated: 11 July 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2API(3)                Library Functions Manual                PCRE2API(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+       #include <pcre2.h>
+
+       PCRE2  is  a  new API for PCRE, starting at release 10.0. This document
+       contains a description of all its native functions. See the pcre2 docu-
+       ment for an overview of all the PCRE2 documentation.
+
+
+PCRE2 NATIVE API BASIC FUNCTIONS
+
+       pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length,
+         uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset,
+         pcre2_compile_context *ccontext);
+
+       void pcre2_code_free(pcre2_code *code);
+
+       pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize,
+         pcre2_general_context *gcontext);
+
+       pcre2_match_data *pcre2_match_data_create_from_pattern(
+         const pcre2_code *code, pcre2_general_context *gcontext);
+
+       int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext);
+
+       int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext,
+         int *workspace, PCRE2_SIZE wscount);
+
+       void pcre2_match_data_free(pcre2_match_data *match_data);
+
+
+PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS
+
+       PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data);
+
+       uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data);
+
+       PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data);
+
+       PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data);
+
+
+PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS
+
+       pcre2_general_context *pcre2_general_context_create(
+         void *(*private_malloc)(PCRE2_SIZE, void *),
+         void (*private_free)(void *, void *), void *memory_data);
+
+       pcre2_general_context *pcre2_general_context_copy(
+         pcre2_general_context *gcontext);
+
+       void pcre2_general_context_free(pcre2_general_context *gcontext);
+
+
+PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS
+
+       pcre2_compile_context *pcre2_compile_context_create(
+         pcre2_general_context *gcontext);
+
+       pcre2_compile_context *pcre2_compile_context_copy(
+         pcre2_compile_context *ccontext);
+
+       void pcre2_compile_context_free(pcre2_compile_context *ccontext);
+
+       int pcre2_set_bsr(pcre2_compile_context *ccontext,
+         uint32_t value);
+
+       int pcre2_set_character_tables(pcre2_compile_context *ccontext,
+         const unsigned char *tables);
+
+       int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext,
+         uint32_t extra_options);
+
+       int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
+         PCRE2_SIZE value);
+
+       int pcre2_set_newline(pcre2_compile_context *ccontext,
+         uint32_t value);
+
+       int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
+         uint32_t value);
+
+       int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext,
+         int (*guard_function)(uint32_t, void *), void *user_data);
+
+
+PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
+
+       pcre2_match_context *pcre2_match_context_create(
+         pcre2_general_context *gcontext);
+
+       pcre2_match_context *pcre2_match_context_copy(
+         pcre2_match_context *mcontext);
+
+       void pcre2_match_context_free(pcre2_match_context *mcontext);
+
+       int pcre2_set_callout(pcre2_match_context *mcontext,
+         int (*callout_function)(pcre2_callout_block *, void *),
+         void *callout_data);
+
+       int pcre2_set_offset_limit(pcre2_match_context *mcontext,
+         PCRE2_SIZE value);
+
+       int pcre2_set_heap_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+       int pcre2_set_match_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+       int pcre2_set_depth_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+
+PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
+
+       int pcre2_substring_copy_byname(pcre2_match_data *match_data,
+         PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen);
+
+       int pcre2_substring_copy_bynumber(pcre2_match_data *match_data,
+         uint32_t number, PCRE2_UCHAR *buffer,
+         PCRE2_SIZE *bufflen);
+
+       void pcre2_substring_free(PCRE2_UCHAR *buffer);
+
+       int pcre2_substring_get_byname(pcre2_match_data *match_data,
+         PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen);
+
+       int pcre2_substring_get_bynumber(pcre2_match_data *match_data,
+         uint32_t number, PCRE2_UCHAR **bufferptr,
+         PCRE2_SIZE *bufflen);
+
+       int pcre2_substring_length_byname(pcre2_match_data *match_data,
+         PCRE2_SPTR name, PCRE2_SIZE *length);
+
+       int pcre2_substring_length_bynumber(pcre2_match_data *match_data,
+         uint32_t number, PCRE2_SIZE *length);
+
+       int pcre2_substring_nametable_scan(const pcre2_code *code,
+         PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
+
+       int pcre2_substring_number_from_name(const pcre2_code *code,
+         PCRE2_SPTR name);
+
+       void pcre2_substring_list_free(PCRE2_SPTR *list);
+
+       int pcre2_substring_list_get(pcre2_match_data *match_data,
+         PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr);
+
+
+PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION
+
+       int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext, PCRE2_SPTR replacementzfP,
+         PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
+         PCRE2_SIZE *outlengthptr);
+
+
+PCRE2 NATIVE API JIT FUNCTIONS
+
+       int pcre2_jit_compile(pcre2_code *code, uint32_t options);
+
+       int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext);
+
+       void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
+
+       pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize,
+         PCRE2_SIZE maxsize, pcre2_general_context *gcontext);
+
+       void pcre2_jit_stack_assign(pcre2_match_context *mcontext,
+         pcre2_jit_callback callback_function, void *callback_data);
+
+       void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack);
+
+
+PCRE2 NATIVE API SERIALIZATION FUNCTIONS
+
+       int32_t pcre2_serialize_decode(pcre2_code **codes,
+         int32_t number_of_codes, const uint8_t *bytes,
+         pcre2_general_context *gcontext);
+
+       int32_t pcre2_serialize_encode(const pcre2_code **codes,
+         int32_t number_of_codes, uint8_t **serialized_bytes,
+         PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
+
+       void pcre2_serialize_free(uint8_t *bytes);
+
+       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
+
+
+PCRE2 NATIVE API AUXILIARY FUNCTIONS
+
+       pcre2_code *pcre2_code_copy(const pcre2_code *code);
+
+       pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code);
+
+       int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
+         PCRE2_SIZE bufflen);
+
+       const unsigned char *pcre2_maketables(pcre2_general_context *gcontext);
+
+       int pcre2_pattern_info(const pcre2 *code, uint32_t what, void *where);
+
+       int pcre2_callout_enumerate(const pcre2_code *code,
+         int (*callback)(pcre2_callout_enumerate_block *, void *),
+         void *user_data);
+
+       int pcre2_config(uint32_t what, void *where);
+
+
+PCRE2 NATIVE API OBSOLETE FUNCTIONS
+
+       int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+       int pcre2_set_recursion_memory_management(
+         pcre2_match_context *mcontext,
+         void *(*private_malloc)(PCRE2_SIZE, void *),
+         void (*private_free)(void *, void *), void *memory_data);
+
+       These  functions became obsolete at release 10.30 and are retained only
+       for backward compatibility. They should not be used in  new  code.  The
+       first  is  replaced by pcre2_set_depth_limit(); the second is no longer
+       needed and has no effect (it always returns zero).
+
+
+PCRE2 EXPERIMENTAL PATTERN CONVERSION FUNCTIONS
+
+       pcre2_convert_context *pcre2_convert_context_create(
+         pcre2_general_context *gcontext);
+
+       pcre2_convert_context *pcre2_convert_context_copy(
+         pcre2_convert_context *cvcontext);
+
+       void pcre2_convert_context_free(pcre2_convert_context *cvcontext);
+
+       int pcre2_set_glob_escape(pcre2_convert_context *cvcontext,
+         uint32_t escape_char);
+
+       int pcre2_set_glob_separator(pcre2_convert_context *cvcontext,
+         uint32_t separator_char);
+
+       int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length,
+         uint32_t options, PCRE2_UCHAR **buffer,
+         PCRE2_SIZE *blength, pcre2_convert_context *cvcontext);
+
+       void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern);
+
+       These functions provide a way of  converting  non-PCRE2  patterns  into
+       patterns  that  can  be  processed by pcre2_compile(). This facility is
+       experimental and may be changed in future releases. At present, "globs"
+       and  POSIX  basic  and  extended patterns can be converted. Details are
+       given in the pcre2convert documentation.
+
+
+PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
+
+       There are three PCRE2 libraries, supporting 8-bit, 16-bit,  and  32-bit
+       code  units,  respectively.  However,  there  is  just one header file,
+       pcre2.h.  This contains the function prototypes and  other  definitions
+       for all three libraries. One, two, or all three can be installed simul-
+       taneously. On Unix-like systems the libraries  are  called  libpcre2-8,
+       libpcre2-16, and libpcre2-32, and they can also co-exist with the orig-
+       inal PCRE libraries.
+
+       Character strings are passed to and from a PCRE2 library as a  sequence
+       of  unsigned  integers  in  code  units of the appropriate width. Every
+       PCRE2 function comes in three different forms, one  for  each  library,
+       for example:
+
+         pcre2_compile_8()
+         pcre2_compile_16()
+         pcre2_compile_32()
+
+       There are also three different sets of data types:
+
+         PCRE2_UCHAR8, PCRE2_UCHAR16, PCRE2_UCHAR32
+         PCRE2_SPTR8,  PCRE2_SPTR16,  PCRE2_SPTR32
+
+       The  UCHAR  types define unsigned code units of the appropriate widths.
+       For example, PCRE2_UCHAR16 is usually defined as `uint16_t'.  The  SPTR
+       types  are  constant  pointers  to the equivalent UCHAR types, that is,
+       they are pointers to vectors of unsigned code units.
+
+       Many applications use only one code unit width. For their  convenience,
+       macros are defined whose names are the generic forms such as pcre2_com-
+       pile() and  PCRE2_SPTR.  These  macros  use  the  value  of  the  macro
+       PCRE2_CODE_UNIT_WIDTH  to generate the appropriate width-specific func-
+       tion and macro names.  PCRE2_CODE_UNIT_WIDTH is not defined by default.
+       An  application  must  define  it  to  be 8, 16, or 32 before including
+       pcre2.h in order to make use of the generic names.
+
+       Applications that use more than one code unit width can be linked  with
+       more  than  one PCRE2 library, but must define PCRE2_CODE_UNIT_WIDTH to
+       be 0 before including pcre2.h, and then use the  real  function  names.
+       Any  code  that  is to be included in an environment where the value of
+       PCRE2_CODE_UNIT_WIDTH is unknown should  also  use  the  real  function
+       names. (Unfortunately, it is not possible in C code to save and restore
+       the value of a macro.)
+
+       If PCRE2_CODE_UNIT_WIDTH is not defined  before  including  pcre2.h,  a
+       compiler error occurs.
+
+       When  using  multiple  libraries  in an application, you must take care
+       when processing any particular pattern to use  only  functions  from  a
+       single  library.   For example, if you want to run a match using a pat-
+       tern that was compiled with pcre2_compile_16(), you  must  do  so  with
+       pcre2_match_16(), not pcre2_match_8() or pcre2_match_32().
+
+       In  the  function summaries above, and in the rest of this document and
+       other PCRE2 documents, functions and data  types  are  described  using
+       their generic names, without the _8, _16, or _32 suffix.
+
+
+PCRE2 API OVERVIEW
+
+       PCRE2  has  its  own  native  API, which is described in this document.
+       There are also some wrapper functions for the 8-bit library that corre-
+       spond  to the POSIX regular expression API, but they do not give access
+       to all the functionality of PCRE2. They are described in the pcre2posix
+       documentation. Both these APIs define a set of C function calls.
+
+       The  native  API  C data types, function prototypes, option values, and
+       error codes are defined in the header file pcre2.h, which also contains
+       definitions of PCRE2_MAJOR and PCRE2_MINOR, the major and minor release
+       numbers for the library. Applications can use these to include  support
+       for different releases of PCRE2.
+
+       In a Windows environment, if you want to statically link an application
+       program against a non-dll PCRE2 library, you must  define  PCRE2_STATIC
+       before including pcre2.h.
+
+       The  functions pcre2_compile() and pcre2_match() are used for compiling
+       and matching regular expressions in a Perl-compatible manner. A  sample
+       program that demonstrates the simplest way of using them is provided in
+       the file called pcre2demo.c in the PCRE2 source distribution. A listing
+       of  this  program  is  given  in  the  pcre2demo documentation, and the
+       pcre2sample documentation describes how to compile and run it.
+
+       The compiling and matching functions recognize various options that are
+       passed as bits in an options argument. There are also some more compli-
+       cated  parameters  such  as  custom  memory  management  functions  and
+       resource  limits  that  are passed in "contexts" (which are just memory
+       blocks, described below). Simple applications do not need to  make  use
+       of contexts.
+
+       Just-in-time  (JIT)  compiler  support  is an optional feature of PCRE2
+       that can be built in  appropriate  hardware  environments.  It  greatly
+       speeds  up  the  matching  performance  of  many patterns. Programs can
+       request that it be used if  available  by  calling  pcre2_jit_compile()
+       after a pattern has been successfully compiled by pcre2_compile(). This
+       does nothing if JIT support is not available.
+
+       More complicated programs might need to  make  use  of  the  specialist
+       functions    pcre2_jit_stack_create(),    pcre2_jit_stack_free(),   and
+       pcre2_jit_stack_assign() in order to  control  the  JIT  code's  memory
+       usage.
+
+       JIT matching is automatically used by pcre2_match() if it is available,
+       unless the PCRE2_NO_JIT option is set. There is also a direct interface
+       for  JIT  matching,  which gives improved performance at the expense of
+       less sanity checking. The JIT-specific functions are discussed  in  the
+       pcre2jit documentation.
+
+       A  second  matching function, pcre2_dfa_match(), which is not Perl-com-
+       patible, is also provided. This uses  a  different  algorithm  for  the
+       matching.  The  alternative  algorithm finds all possible matches (at a
+       given point in the subject), and scans the subject  just  once  (unless
+       there  are  lookaround  assertions).  However,  this algorithm does not
+       return captured substrings. A description of  the  two  matching  algo-
+       rithms   and  their  advantages  and  disadvantages  is  given  in  the
+       pcre2matching   documentation.   There   is   no   JIT   support    for
+       pcre2_dfa_match().
+
+       In  addition  to  the  main compiling and matching functions, there are
+       convenience functions for extracting captured substrings from a subject
+       string that has been matched by pcre2_match(). They are:
+
+         pcre2_substring_copy_byname()
+         pcre2_substring_copy_bynumber()
+         pcre2_substring_get_byname()
+         pcre2_substring_get_bynumber()
+         pcre2_substring_list_get()
+         pcre2_substring_length_byname()
+         pcre2_substring_length_bynumber()
+         pcre2_substring_nametable_scan()
+         pcre2_substring_number_from_name()
+
+       pcre2_substring_free()  and  pcre2_substring_list_free()  are also pro-
+       vided, to free memory used for extracted strings. If  either  of  these
+       functions  is called with a NULL argument, the function returns immedi-
+       ately without doing anything.
+
+       The function pcre2_substitute() can be called to match  a  pattern  and
+       return  a  copy of the subject string with substitutions for parts that
+       were matched.
+
+       Functions whose names begin with pcre2_serialize_ are used  for  saving
+       compiled patterns on disc or elsewhere, and reloading them later.
+
+       Finally,  there  are functions for finding out information about a com-
+       piled pattern (pcre2_pattern_info()) and about the  configuration  with
+       which PCRE2 was built (pcre2_config()).
+
+       Functions  with  names  ending with _free() are used for freeing memory
+       blocks of various sorts. In all cases, if one  of  these  functions  is
+       called with a NULL argument, it does nothing.
+
+
+STRING LENGTHS AND OFFSETS
+
+       The  PCRE2  API  uses  string  lengths and offsets into strings of code
+       units in several places. These values are always  of  type  PCRE2_SIZE,
+       which  is an unsigned integer type, currently always defined as size_t.
+       The largest  value  that  can  be  stored  in  such  a  type  (that  is
+       ~(PCRE2_SIZE)0)  is reserved as a special indicator for zero-terminated
+       strings and unset offsets.  Therefore, the longest string that  can  be
+       handled is one less than this maximum.
+
+
+NEWLINES
+
+       PCRE2 supports five different conventions for indicating line breaks in
+       strings: a single CR (carriage return) character, a  single  LF  (line-
+       feed) character, the two-character sequence CRLF, any of the three pre-
+       ceding, or any Unicode newline sequence. The Unicode newline  sequences
+       are  the  three just mentioned, plus the single characters VT (vertical
+       tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
+       separator, U+2028), and PS (paragraph separator, U+2029).
+
+       Each  of  the first three conventions is used by at least one operating
+       system as its standard newline sequence. When PCRE2 is built, a default
+       can be specified.  If it is not, the default is set to LF, which is the
+       Unix standard. However, the newline convention can  be  changed  by  an
+       application  when  calling  pcre2_compile(),  or it can be specified by
+       special text at the start of the pattern  itself;  this  overrides  any
+       other  settings.  See  the pcre2pattern page for details of the special
+       character sequences.
+
+       In the PCRE2 documentation the word "newline"  is  used  to  mean  "the
+       character or pair of characters that indicate a line break". The choice
+       of newline convention affects the handling of the dot, circumflex,  and
+       dollar metacharacters, the handling of #-comments in /x mode, and, when
+       CRLF is a recognized line ending sequence, the match position  advance-
+       ment for a non-anchored pattern. There is more detail about this in the
+       section on pcre2_match() options below.
+
+       The choice of newline convention does not affect the interpretation  of
+       the \n or \r escape sequences, nor does it affect what \R matches; this
+       has its own separate convention.
+
+
+MULTITHREADING
+
+       In a multithreaded application it is important to keep  thread-specific
+       data  separate  from data that can be shared between threads. The PCRE2
+       library code itself is thread-safe: it contains  no  static  or  global
+       variables.  The  API  is  designed to be fairly simple for non-threaded
+       applications while at the same time ensuring that multithreaded  appli-
+       cations can use it.
+
+       There are several different blocks of data that are used to pass infor-
+       mation between the application and the PCRE2 libraries.
+
+   The compiled pattern
+
+       A pointer to the compiled form of a pattern is  returned  to  the  user
+       when pcre2_compile() is successful. The data in the compiled pattern is
+       fixed, and does not change when the pattern is matched.  Therefore,  it
+       is  thread-safe, that is, the same compiled pattern can be used by more
+       than one thread simultaneously. For example, an application can compile
+       all its patterns at the start, before forking off multiple threads that
+       use them. However, if the just-in-time (JIT)  optimization  feature  is
+       being  used,  it needs separate memory stack areas for each thread. See
+       the pcre2jit documentation for more details.
+
+       In a more complicated situation, where patterns are compiled only  when
+       they  are  first needed, but are still shared between threads, pointers
+       to compiled patterns must be protected  from  simultaneous  writing  by
+       multiple threads, at least until a pattern has been compiled. The logic
+       can be something like this:
+
+         Get a read-only (shared) lock (mutex) for pointer
+         if (pointer == NULL)
+           {
+           Get a write (unique) lock for pointer
+           pointer = pcre2_compile(...
+           }
+         Release the lock
+         Use pointer in pcre2_match()
+
+       Of course, testing for compilation errors should also  be  included  in
+       the code.
+
+       If JIT is being used, but the JIT compilation is not being done immedi-
+       ately, (perhaps waiting to see if the pattern  is  used  often  enough)
+       similar logic is required. JIT compilation updates a pointer within the
+       compiled code block, so a thread must gain unique write access  to  the
+       pointer     before    calling    pcre2_jit_compile().    Alternatively,
+       pcre2_code_copy()  or  pcre2_code_copy_with_tables()  can  be  used  to
+       obtain  a private copy of the compiled code before calling the JIT com-
+       piler.
+
+   Context blocks
+
+       The next main section below introduces the idea of "contexts" in  which
+       PCRE2 functions are called. A context is nothing more than a collection
+       of parameters that control the way PCRE2 operates. Grouping a number of
+       parameters together in a context is a convenient way of passing them to
+       a PCRE2 function without using lots of arguments. The  parameters  that
+       are  stored  in  contexts  are in some sense "advanced features" of the
+       API. Many straightforward applications will not need to use contexts.
+
+       In a multithreaded application, if the parameters in a context are val-
+       ues  that  are  never  changed, the same context can be used by all the
+       threads. However, if any thread needs to change any value in a context,
+       it must make its own thread-specific copy.
+
+   Match blocks
+
+       The  matching  functions need a block of memory for storing the results
+       of a match. This includes details of what was matched, as well as addi-
+       tional  information  such as the name of a (*MARK) setting. Each thread
+       must provide its own copy of this memory.
+
+
+PCRE2 CONTEXTS
+
+       Some PCRE2 functions have a lot of parameters, many of which  are  used
+       only  by  specialist  applications,  for example, those that use custom
+       memory management or non-standard character tables.  To  keep  function
+       argument  lists  at a reasonable size, and at the same time to keep the
+       API extensible, "uncommon" parameters are passed to  certain  functions
+       in  a  context instead of directly. A context is just a block of memory
+       that holds the parameter values.  Applications  that  do  not  need  to
+       adjust  any  of  the  context  parameters  can pass NULL when a context
+       pointer is required.
+
+       There are three different types of context: a general context  that  is
+       relevant  for  several  PCRE2 operations, a compile-time context, and a
+       match-time context.
+
+   The general context
+
+       At present, this context just  contains  pointers  to  (and  data  for)
+       external  memory  management  functions  that  are  called from several
+       places in the PCRE2 library. The context is named `general' rather than
+       specifically  `memory'  because in future other fields may be added. If
+       you do not want to supply your own custom memory management  functions,
+       you  do not need to bother with a general context. A general context is
+       created by:
+
+       pcre2_general_context *pcre2_general_context_create(
+         void *(*private_malloc)(PCRE2_SIZE, void *),
+         void (*private_free)(void *, void *), void *memory_data);
+
+       The two function pointers specify custom memory  management  functions,
+       whose prototypes are:
+
+         void *private_malloc(PCRE2_SIZE, void *);
+         void  private_free(void *, void *);
+
+       Whenever code in PCRE2 calls these functions, the final argument is the
+       value of memory_data. Either of the first two arguments of the creation
+       function  may be NULL, in which case the system memory management func-
+       tions malloc() and free() are used. (This is not currently  useful,  as
+       there  are  no  other  fields in a general context, but in future there
+       might be.)  The private_malloc() function  is  used  (if  supplied)  to
+       obtain  memory  for storing the context, and all three values are saved
+       as part of the context.
+
+       Whenever PCRE2 creates a data block of any kind, the block  contains  a
+       pointer  to the free() function that matches the malloc() function that
+       was used. When the time comes to  free  the  block,  this  function  is
+       called.
+
+       A general context can be copied by calling:
+
+       pcre2_general_context *pcre2_general_context_copy(
+         pcre2_general_context *gcontext);
+
+       The memory used for a general context should be freed by calling:
+
+       void pcre2_general_context_free(pcre2_general_context *gcontext);
+
+       If  this  function  is  passed  a NULL argument, it returns immediately
+       without doing anything.
+
+   The compile context
+
+       A compile context is required if you want to provide an external  func-
+       tion  for  stack  checking  during compilation or to change the default
+       values of any of the following compile-time parameters:
+
+         What \R matches (Unicode newlines or CR, LF, CRLF only)
+         PCRE2's character tables
+         The newline character sequence
+         The compile time nested parentheses limit
+         The maximum length of the pattern string
+         The extra options bits (none set by default)
+
+       A compile context is also required if you are using custom memory  man-
+       agement.   If  none of these apply, just pass NULL as the context argu-
+       ment of pcre2_compile().
+
+       A compile context is created, copied, and freed by the following  func-
+       tions:
+
+       pcre2_compile_context *pcre2_compile_context_create(
+         pcre2_general_context *gcontext);
+
+       pcre2_compile_context *pcre2_compile_context_copy(
+         pcre2_compile_context *ccontext);
+
+       void pcre2_compile_context_free(pcre2_compile_context *ccontext);
+
+       A  compile  context  is created with default values for its parameters.
+       These can be changed by calling the following functions, which return 0
+       on success, or PCRE2_ERROR_BADDATA if invalid data is detected.
+
+       int pcre2_set_bsr(pcre2_compile_context *ccontext,
+         uint32_t value);
+
+       The  value  must  be PCRE2_BSR_ANYCRLF, to specify that \R matches only
+       CR, LF, or CRLF, or PCRE2_BSR_UNICODE, to specify that \R  matches  any
+       Unicode line ending sequence. The value is used by the JIT compiler and
+       by  the  two  interpreted   matching   functions,   pcre2_match()   and
+       pcre2_dfa_match().
+
+       int pcre2_set_character_tables(pcre2_compile_context *ccontext,
+         const unsigned char *tables);
+
+       The  value  must  be  the result of a call to pcre2_maketables(), whose
+       only argument is a general context. This function builds a set of char-
+       acter tables in the current locale.
+
+       int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext,
+         uint32_t extra_options);
+
+       As  PCRE2  has developed, almost all the 32 option bits that are avail-
+       able in the options argument of pcre2_compile() have been used  up.  To
+       avoid  running  out, the compile context contains a set of extra option
+       bits which are used for some newer, assumed rarer, options. This  func-
+       tion  sets  those bits. It always sets all the bits (either on or off).
+       It does not modify any existing  setting.  The  available  options  are
+       defined in the section entitled "Extra compile options" below.
+
+       int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
+         PCRE2_SIZE value);
+
+       This  sets a maximum length, in code units, for any pattern string that
+       is compiled with this context. If the pattern is longer,  an  error  is
+       generated.   This facility is provided so that applications that accept
+       patterns from external sources can limit their size. The default is the
+       largest  number  that  a  PCRE2_SIZE variable can hold, which is effec-
+       tively unlimited.
+
+       int pcre2_set_newline(pcre2_compile_context *ccontext,
+         uint32_t value);
+
+       This specifies which characters or character sequences are to be recog-
+       nized  as newlines. The value must be one of PCRE2_NEWLINE_CR (carriage
+       return only), PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the
+       two-character  sequence  CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
+       of the above), PCRE2_NEWLINE_ANY (any  Unicode  newline  sequence),  or
+       PCRE2_NEWLINE_NUL (the NUL character, that is a binary zero).
+
+       A pattern can override the value set in the compile context by starting
+       with a sequence such as (*CRLF). See the pcre2pattern page for details.
+
+       When   a   pattern   is   compiled   with   the    PCRE2_EXTENDED    or
+       PCRE2_EXTENDED_MORE option, the newline convention affects the recogni-
+       tion of the end of internal comments starting  with  #.  The  value  is
+       saved  with the compiled pattern for subsequent use by the JIT compiler
+       and by  the  two  interpreted  matching  functions,  pcre2_match()  and
+       pcre2_dfa_match().
+
+       int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
+         uint32_t value);
+
+       This parameter ajusts the limit, set when PCRE2 is built (default 250),
+       on the depth of parenthesis nesting in  a  pattern.  This  limit  stops
+       rogue  patterns using up too much system stack when being compiled. The
+       limit applies to parentheses of all kinds, not just capturing parenthe-
+       ses.
+
+       int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext,
+         int (*guard_function)(uint32_t, void *), void *user_data);
+
+       There  is at least one application that runs PCRE2 in threads with very
+       limited system stack, where running out of stack is to  be  avoided  at
+       all  costs. The parenthesis limit above cannot take account of how much
+       stack is actually available during compilation. For  a  finer  control,
+       you  can  supply  a  function  that  is called whenever pcre2_compile()
+       starts to compile a parenthesized part of a pattern. This function  can
+       check  the  actual  stack  size  (or anything else that it wants to, of
+       course).
+
+       The first argument to the callout function gives the current  depth  of
+       nesting,  and  the second is user data that is set up by the last argu-
+       ment  of  pcre2_set_compile_recursion_guard().  The  callout   function
+       should return zero if all is well, or non-zero to force an error.
+
+   The match context
+
+       A match context is required if you want to:
+
+         Set up a callout function
+         Set an offset limit for matching an unanchored pattern
+         Change the limit on the amount of heap used when matching
+         Change the backtracking match limit
+         Change the backtracking depth limit
+         Set custom memory management specifically for the match
+
+       If  none  of  these  apply,  just  pass NULL as the context argument of
+       pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
+
+       A match context is created, copied, and freed by  the  following  func-
+       tions:
+
+       pcre2_match_context *pcre2_match_context_create(
+         pcre2_general_context *gcontext);
+
+       pcre2_match_context *pcre2_match_context_copy(
+         pcre2_match_context *mcontext);
+
+       void pcre2_match_context_free(pcre2_match_context *mcontext);
+
+       A  match  context  is  created  with default values for its parameters.
+       These can be changed by calling the following functions, which return 0
+       on success, or PCRE2_ERROR_BADDATA if invalid data is detected.
+
+       int pcre2_set_callout(pcre2_match_context *mcontext,
+         int (*callout_function)(pcre2_callout_block *, void *),
+         void *callout_data);
+
+       This sets up a "callout" function for PCRE2 to call at specified points
+       during a matching operation. Details are given in the pcre2callout doc-
+       umentation.
+
+       int pcre2_set_offset_limit(pcre2_match_context *mcontext,
+         PCRE2_SIZE value);
+
+       The  offset_limit  parameter  limits  how  far an unanchored search can
+       advance in the subject string. The default value  is  PCRE2_UNSET.  The
+       pcre2_match()      and      pcre2_dfa_match()      functions     return
+       PCRE2_ERROR_NOMATCH if a match with a starting point before or  at  the
+       given  offset  is  not  found. The pcre2_substitute() function makes no
+       more substitutions.
+
+       For example, if the pattern /abc/ is matched against "123abc"  with  an
+       offset  limit  less than 3, the result is PCRE2_ERROR_NO_MATCH. A match
+       can never be  found  if  the  startoffset  argument  of  pcre2_match(),
+       pcre2_dfa_match(),  or  pcre2_substitute()  is  greater than the offset
+       limit set in the match context.
+
+       When using this  facility,  you  must  set  the  PCRE2_USE_OFFSET_LIMIT
+       option when calling pcre2_compile() so that when JIT is in use, differ-
+       ent code can be compiled. If a match  is  started  with  a  non-default
+       match  limit when PCRE2_USE_OFFSET_LIMIT is not set, an error is gener-
+       ated.
+
+       The offset limit facility can be used to track progress when  searching
+       large  subject  strings or to limit the extent of global substitutions.
+       See also the PCRE2_FIRSTLINE option, which requires a  match  to  start
+       before  or  at  the first newline that follows the start of matching in
+       the subject. If this is set with an offset limit, a match must occur in
+       the first line and also within the offset limit. In other words, which-
+       ever limit comes first is used.
+
+       int pcre2_set_heap_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+       The heap_limit parameter specifies, in units of kibibytes (1024 bytes),
+       the  maximum  amount  of heap memory that pcre2_match() may use to hold
+       backtracking information when running an interpretive match. This limit
+       also applies to pcre2_dfa_match(), which may use the heap when process-
+       ing patterns with a lot of nested pattern recursion or  lookarounds  or
+       atomic groups. This limit does not apply to matching with the JIT opti-
+       mization, which has  its  own  memory  control  arrangements  (see  the
+       pcre2jit  documentation for more details). If the limit is reached, the
+       negative error code  PCRE2_ERROR_HEAPLIMIT  is  returned.  The  default
+       limit  can be set when PCRE2 is built; if it is not, the default is set
+       very large and is essentially "unlimited".
+
+       A value for the heap limit may also be supplied by an item at the start
+       of a pattern of the form
+
+         (*LIMIT_HEAP=ddd)
+
+       where  ddd  is  a  decimal  number.  However, such a setting is ignored
+       unless ddd is less than the limit set by the  caller  of  pcre2_match()
+       or, if no such limit is set, less than the default.
+
+       The  pcre2_match() function starts out using a 20KiB vector on the sys-
+       tem stack for recording backtracking points. The more nested backtrack-
+       ing  points  there  are (that is, the deeper the search tree), the more
+       memory is needed.  Heap memory is used only if the  initial  vector  is
+       too small. If the heap limit is set to a value less than 21 (in partic-
+       ular, zero) no heap memory will be used. In this  case,  only  patterns
+       that  do not have a lot of nested backtracking can be successfully pro-
+       cessed.
+
+       Similarly, for pcre2_dfa_match(), a vector on the system stack is  used
+       when  processing pattern recursions, lookarounds, or atomic groups, and
+       only if this is not big enough is heap memory used. In this case,  too,
+       setting a value of zero disables the use of the heap.
+
+       int pcre2_set_match_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+       The  match_limit  parameter  provides  a means of preventing PCRE2 from
+       using up too many computing resources when processing patterns that are
+       not going to match, but which have a very large number of possibilities
+       in their search trees. The classic  example  is  a  pattern  that  uses
+       nested unlimited repeats.
+
+       There  is an internal counter in pcre2_match() that is incremented each
+       time round its main matching loop. If  this  value  reaches  the  match
+       limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
+       This has the effect of limiting the amount  of  backtracking  that  can
+       take place. For patterns that are not anchored, the count restarts from
+       zero for each position in the subject string. This limit  also  applies
+       to pcre2_dfa_match(), though the counting is done in a different way.
+
+       When  pcre2_match() is called with a pattern that was successfully pro-
+       cessed by pcre2_jit_compile(), the way in which matching is executed is
+       entirely  different. However, there is still the possibility of runaway
+       matching that goes on for a very long  time,  and  so  the  match_limit
+       value  is  also used in this case (but in a different way) to limit how
+       long the matching can continue.
+
+       The default value for the limit can be set when  PCRE2  is  built;  the
+       default  default  is 10 million, which handles all but the most extreme
+       cases. A value for the match limit may also be supplied by an  item  at
+       the start of a pattern of the form
+
+         (*LIMIT_MATCH=ddd)
+
+       where  ddd  is  a  decimal  number.  However, such a setting is ignored
+       unless ddd is less than the limit set by the caller of pcre2_match() or
+       pcre2_dfa_match() or, if no such limit is set, less than the default.
+
+       int pcre2_set_depth_limit(pcre2_match_context *mcontext,
+         uint32_t value);
+
+       This   parameter   limits   the   depth   of   nested  backtracking  in
+       pcre2_match().  Each time a nested backtracking point is passed, a  new
+       memory "frame" is used to remember the state of matching at that point.
+       Thus, this parameter indirectly limits the amount  of  memory  that  is
+       used  in  a  match.  However,  because  the size of each memory "frame"
+       depends on the number of capturing parentheses, the actual memory limit
+       varies  from pattern to pattern. This limit was more useful in versions
+       before 10.30, where function recursion was used for backtracking.
+
+       The depth limit is not relevant, and is ignored, when matching is  done
+       using JIT compiled code. However, it is supported by pcre2_dfa_match(),
+       which uses it to limit the depth of nested internal recursive  function
+       calls  that implement atomic groups, lookaround assertions, and pattern
+       recursions. This limits, indirectly, the amount of system stack that is
+       used.  It  was  more useful in versions before 10.32, when stack memory
+       was used for local workspace vectors for recursive function calls. From
+       version  10.32,  only local variables are allocated on the stack and as
+       each call uses only a few hundred bytes, even a small stack can support
+       quite a lot of recursion.
+
+       If  the  depth  of  internal  recursive function calls is great enough,
+       local workspace vectors are allocated on the heap  from  version  10.32
+       onwards,  so  the depth limit also indirectly limits the amount of heap
+       memory that is used. A recursive pattern such as /(.(?2))((?1)|)/, when
+       matched  to a very long string using pcre2_dfa_match(), can use a great
+       deal of memory. However, it is probably  better  to  limit  heap  usage
+       directly by calling pcre2_set_heap_limit().
+
+       The  default  value for the depth limit can be set when PCRE2 is built;
+       if it is not, the default is set to the same value as the  default  for
+       the   match   limit.   If  the  limit  is  exceeded,  pcre2_match()  or
+       pcre2_dfa_match() returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth
+       limit  may also be supplied by an item at the start of a pattern of the
+       form
+
+         (*LIMIT_DEPTH=ddd)
+
+       where ddd is a decimal number.  However,  such  a  setting  is  ignored
+       unless ddd is less than the limit set by the caller of pcre2_match() or
+       pcre2_dfa_match() or, if no such limit is set, less than the default.
+
+
+CHECKING BUILD-TIME OPTIONS
+
+       int pcre2_config(uint32_t what, void *where);
+
+       The function pcre2_config() makes it possible for  a  PCRE2  client  to
+       discover  which  optional  features  have  been compiled into the PCRE2
+       library. The pcre2build documentation  has  more  details  about  these
+       optional features.
+
+       The  first  argument  for pcre2_config() specifies which information is
+       required. The second argument is a pointer to  memory  into  which  the
+       information  is  placed.  If  NULL  is passed, the function returns the
+       amount of memory that is needed  for  the  requested  information.  For
+       calls  that  return  numerical  values,  the  value  is  in bytes; when
+       requesting these values, where should point  to  appropriately  aligned
+       memory.  For calls that return strings, the required length is given in
+       code units, not counting the terminating zero.
+
+       When requesting information, the returned value from pcre2_config()  is
+       non-negative  on success, or the negative error code PCRE2_ERROR_BADOP-
+       TION if the value in the first argument is not recognized. The  follow-
+       ing information is available:
+
+         PCRE2_CONFIG_BSR
+
+       The  output  is a uint32_t integer whose value indicates what character
+       sequences the \R  escape  sequence  matches  by  default.  A  value  of
+       PCRE2_BSR_UNICODE  means  that  \R  matches  any  Unicode  line  ending
+       sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches  only  CR,
+       LF, or CRLF. The default can be overridden when a pattern is compiled.
+
+         PCRE2_CONFIG_COMPILED_WIDTHS
+
+       The  output  is a uint32_t integer whose lower bits indicate which code
+       unit widths were selected when PCRE2 was  built.  The  1-bit  indicates
+       8-bit  support, and the 2-bit and 4-bit indicate 16-bit and 32-bit sup-
+       port, respectively.
+
+         PCRE2_CONFIG_DEPTHLIMIT
+
+       The output is a uint32_t integer that gives the default limit  for  the
+       depth  of  nested  backtracking in pcre2_match() or the depth of nested
+       recursions, lookarounds, and atomic groups in  pcre2_dfa_match().  Fur-
+       ther details are given with pcre2_set_depth_limit() above.
+
+         PCRE2_CONFIG_HEAPLIMIT
+
+       The  output is a uint32_t integer that gives, in kibibytes, the default
+       limit  for  the  amount  of  heap  memory  used  by  pcre2_match()   or
+       pcre2_dfa_match().      Further      details     are     given     with
+       pcre2_set_heap_limit() above.
+
+         PCRE2_CONFIG_JIT
+
+       The output is a uint32_t integer that is set  to  one  if  support  for
+       just-in-time compiling is available; otherwise it is set to zero.
+
+         PCRE2_CONFIG_JITTARGET
+
+       The  where  argument  should point to a buffer that is at least 48 code
+       units long.  (The  exact  length  required  can  be  found  by  calling
+       pcre2_config()  with  where  set  to NULL.) The buffer is filled with a
+       string that contains the name of the architecture  for  which  the  JIT
+       compiler  is  configured,  for  example  "x86  32bit  (little  endian +
+       unaligned)". If JIT support is not available, PCRE2_ERROR_BADOPTION  is
+       returned,  otherwise the number of code units used is returned. This is
+       the length of the string, plus one unit for the terminating zero.
+
+         PCRE2_CONFIG_LINKSIZE
+
+       The output is a uint32_t integer that contains the number of bytes used
+       for  internal  linkage  in  compiled regular expressions. When PCRE2 is
+       configured, the value can be set to 2, 3, or 4, with the default  being
+       2.  This is the value that is returned by pcre2_config(). However, when
+       the 16-bit library is compiled, a value of 3 is rounded up  to  4,  and
+       when  the  32-bit  library  is compiled, internal linkages always use 4
+       bytes, so the configured value is not relevant.
+
+       The default value of 2 for the 8-bit and 16-bit libraries is sufficient
+       for  all but the most massive patterns, since it allows the size of the
+       compiled pattern to be up to 65535  code  units.  Larger  values  allow
+       larger  regular  expressions to be compiled by those two libraries, but
+       at the expense of slower matching.
+
+         PCRE2_CONFIG_MATCHLIMIT
+
+       The output is a uint32_t integer that gives the default match limit for
+       pcre2_match().  Further  details are given with pcre2_set_match_limit()
+       above.
+
+         PCRE2_CONFIG_NEWLINE
+
+       The output is a uint32_t integer  whose  value  specifies  the  default
+       character  sequence that is recognized as meaning "newline". The values
+       are:
+
+         PCRE2_NEWLINE_CR       Carriage return (CR)
+         PCRE2_NEWLINE_LF       Linefeed (LF)
+         PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
+         PCRE2_NEWLINE_ANY      Any Unicode line ending
+         PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
+         PCRE2_NEWLINE_NUL      The NUL character (binary zero)
+
+       The default should normally correspond to  the  standard  sequence  for
+       your operating system.
+
+         PCRE2_CONFIG_NEVER_BACKSLASH_C
+
+       The  output  is  a uint32_t integer that is set to one if the use of \C
+       was permanently disabled when PCRE2 was built; otherwise it is  set  to
+       zero.
+
+         PCRE2_CONFIG_PARENSLIMIT
+
+       The  output is a uint32_t integer that gives the maximum depth of nest-
+       ing of parentheses (of any kind) in a pattern. This limit is imposed to
+       cap  the  amount of system stack used when a pattern is compiled. It is
+       specified when PCRE2 is built; the default is 250. This limit does  not
+       take  into  account  the  stack that may already be used by the calling
+       application. For  finer  control  over  compilation  stack  usage,  see
+       pcre2_set_compile_recursion_guard().
+
+         PCRE2_CONFIG_STACKRECURSE
+
+       This parameter is obsolete and should not be used in new code. The out-
+       put is a uint32_t integer that is always set to zero.
+
+         PCRE2_CONFIG_UNICODE_VERSION
+
+       The where argument should point to a buffer that is at  least  24  code
+       units  long.  (The  exact  length  required  can  be  found  by calling
+       pcre2_config() with where set to NULL.)  If  PCRE2  has  been  compiled
+       without  Unicode  support,  the buffer is filled with the text "Unicode
+       not supported". Otherwise, the Unicode  version  string  (for  example,
+       "8.0.0")  is  inserted. The number of code units used is returned. This
+       is the length of the string plus one unit for the terminating zero.
+
+         PCRE2_CONFIG_UNICODE
+
+       The output is a uint32_t integer that is set to one if Unicode  support
+       is  available; otherwise it is set to zero. Unicode support implies UTF
+       support.
+
+         PCRE2_CONFIG_VERSION
+
+       The where argument should point to a buffer that is at  least  24  code
+       units  long.  (The  exact  length  required  can  be  found  by calling
+       pcre2_config() with where set to NULL.) The buffer is filled  with  the
+       PCRE2 version string, zero-terminated. The number of code units used is
+       returned. This is the length of the string plus one unit for the termi-
+       nating zero.
+
+
+COMPILING A PATTERN
+
+       pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length,
+         uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset,
+         pcre2_compile_context *ccontext);
+
+       void pcre2_code_free(pcre2_code *code);
+
+       pcre2_code *pcre2_code_copy(const pcre2_code *code);
+
+       pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code);
+
+       The  pcre2_compile() function compiles a pattern into an internal form.
+       The pattern is defined by a pointer to a string of  code  units  and  a
+       length  (in  code units). If the pattern is zero-terminated, the length
+       can be specified  as  PCRE2_ZERO_TERMINATED.  The  function  returns  a
+       pointer  to  a  block  of memory that contains the compiled pattern and
+       related data, or NULL if an error occurred.
+
+       If the compile context argument ccontext is NULL, memory for  the  com-
+       piled  pattern  is  obtained  by  calling  malloc().  Otherwise,  it is
+       obtained from the same memory function that was used  for  the  compile
+       context.  The  caller must free the memory by calling pcre2_code_free()
+       when it is no longer needed.  If pcre2_code_free()  is  called  with  a
+       NULL argument, it returns immediately, without doing anything.
+
+       The function pcre2_code_copy() makes a copy of the compiled code in new
+       memory, using the same memory allocator as was used for  the  original.
+       However,  if  the  code  has  been  processed  by the JIT compiler (see
+       below), the JIT information cannot be copied (because it  is  position-
+       dependent).  The new copy can initially be used only for non-JIT match-
+       ing, though it can be passed to  pcre2_jit_compile()  if  required.  If
+       pcre2_code_copy() is called with a NULL argument, it returns NULL.
+
+       The pcre2_code_copy() function provides a way for individual threads in
+       a multithreaded application to acquire a private copy  of  shared  com-
+       piled  code.   However, it does not make a copy of the character tables
+       used by the compiled pattern; the new pattern code points to  the  same
+       tables  as  the original code.  (See "Locale Support" below for details
+       of these character tables.) In many applications the  same  tables  are
+       used  throughout, so this behaviour is appropriate. Nevertheless, there
+       are occasions when a copy of a compiled pattern and the relevant tables
+       are  needed.  The pcre2_code_copy_with_tables() provides this facility.
+       Copies of both the code and the tables are  made,  with  the  new  code
+       pointing  to the new tables. The memory for the new tables is automati-
+       cally freed when pcre2_code_free() is called for the new  copy  of  the
+       compiled  code. If pcre2_code_copy_withy_tables() is called with a NULL
+       argument, it returns NULL.
+
+       NOTE: When one of the matching functions is  called,  pointers  to  the
+       compiled pattern and the subject string are set in the match data block
+       so that they can be referenced by the substring  extraction  functions.
+       After  running a match, you must not free a compiled pattern (or a sub-
+       ject string) until after all operations on the match  data  block  have
+       taken place.
+
+       The  options argument for pcre2_compile() contains various bit settings
+       that affect the compilation. It  should  be  zero  if  no  options  are
+       required.  The  available options are described below. Some of them (in
+       particular, those that are compatible with Perl,  but  some  others  as
+       well)  can  also  be  set  and  unset  from within the pattern (see the
+       detailed description in the pcre2pattern documentation).
+
+       For those options that can be different in different parts of the  pat-
+       tern,  the contents of the options argument specifies their settings at
+       the start of compilation. The  PCRE2_ANCHORED,  PCRE2_ENDANCHORED,  and
+       PCRE2_NO_UTF_CHECK  options  can be set at the time of matching as well
+       as at compile time.
+
+       Other, less frequently required compile-time parameters  (for  example,
+       the newline setting) can be provided in a compile context (as described
+       above).
+
+       If errorcode or erroroffset is NULL, pcre2_compile() returns NULL imme-
+       diately.  Otherwise,  the  variables to which these point are set to an
+       error code and an offset (number of code  units)  within  the  pattern,
+       respectively,  when  pcre2_compile() returns NULL because a compilation
+       error has occurred. The values are not defined when compilation is suc-
+       cessful and pcre2_compile() returns a non-NULL value.
+
+       There  are  nearly  100  positive  error codes that pcre2_compile() may
+       return if it finds an error in the pattern. There are also  some  nega-
+       tive  error  codes that are used for invalid UTF strings. These are the
+       same as given by pcre2_match() and pcre2_dfa_match(), and are described
+       in  the  pcre2unicode  page. There is no separate documentation for the
+       positive error codes, because  the  textual  error  messages  that  are
+       obtained   by   calling  the  pcre2_get_error_message()  function  (see
+       "Obtaining a textual error message" below) should be  self-explanatory.
+       Macro  names  starting  with PCRE2_ERROR_ are defined for both positive
+       and negative error codes in pcre2.h.
+
+       The value returned in erroroffset is an indication of where in the pat-
+       tern  the  error  occurred. It is not necessarily the furthest point in
+       the pattern that was read. For example,  after  the  error  "lookbehind
+       assertion is not fixed length", the error offset points to the start of
+       the failing assertion. For an invalid UTF-8 or UTF-16 string, the  off-
+       set is that of the first code unit of the failing character.
+
+       Some  errors are not detected until the whole pattern has been scanned;
+       in these cases, the offset passed back is the length  of  the  pattern.
+       Note  that  the  offset is in code units, not characters, even in a UTF
+       mode. It may sometimes point into the middle of a UTF-8 or UTF-16 char-
+       acter.
+
+       This  code  fragment shows a typical straightforward call to pcre2_com-
+       pile():
+
+         pcre2_code *re;
+         PCRE2_SIZE erroffset;
+         int errorcode;
+         re = pcre2_compile(
+           "^A.*Z",                /* the pattern */
+           PCRE2_ZERO_TERMINATED,  /* the pattern is zero-terminated */
+           0,                      /* default options */
+           &errorcode,             /* for error code */
+           &erroffset,             /* for error offset */
+           NULL);                  /* no compile context */
+
+       The following names for option bits are defined in the  pcre2.h  header
+       file:
+
+         PCRE2_ANCHORED
+
+       If this bit is set, the pattern is forced to be "anchored", that is, it
+       is constrained to match only at the first matching point in the  string
+       that  is being searched (the "subject string"). This effect can also be
+       achieved by appropriate constructs in the pattern itself, which is  the
+       only way to do it in Perl.
+
+         PCRE2_ALLOW_EMPTY_CLASS
+
+       By  default, for compatibility with Perl, a closing square bracket that
+       immediately follows an opening one is treated as a data  character  for
+       the  class.  When  PCRE2_ALLOW_EMPTY_CLASS  is  set,  it terminates the
+       class, which therefore contains no characters and so can never match.
+
+         PCRE2_ALT_BSUX
+
+       This option request alternative handling  of  three  escape  sequences,
+       which  makes  PCRE2's  behaviour more like ECMAscript (aka JavaScript).
+       When it is set:
+
+       (1) \U matches an upper case "U" character; by default \U causes a com-
+       pile time error (Perl uses \U to upper case subsequent characters).
+
+       (2) \u matches a lower case "u" character unless it is followed by four
+       hexadecimal digits, in which case the hexadecimal  number  defines  the
+       code  point  to match. By default, \u causes a compile time error (Perl
+       uses it to upper case the following character).
+
+       (3) \x matches a lower case "x" character unless it is followed by  two
+       hexadecimal  digits,  in  which case the hexadecimal number defines the
+       code point to match. By default, as in Perl, a  hexadecimal  number  is
+       always expected after \x, but it may have zero, one, or two digits (so,
+       for example, \xz matches a binary zero character followed by z).
+
+         PCRE2_ALT_CIRCUMFLEX
+
+       In  multiline  mode  (when  PCRE2_MULTILINE  is  set),  the  circumflex
+       metacharacter  matches at the start of the subject (unless PCRE2_NOTBOL
+       is set), and also after any internal  newline.  However,  it  does  not
+       match after a newline at the end of the subject, for compatibility with
+       Perl. If you want a multiline circumflex also to match after  a  termi-
+       nating newline, you must set PCRE2_ALT_CIRCUMFLEX.
+
+         PCRE2_ALT_VERBNAMES
+
+       By  default, for compatibility with Perl, the name in any verb sequence
+       such as (*MARK:NAME) is  any  sequence  of  characters  that  does  not
+       include  a  closing  parenthesis. The name is not processed in any way,
+       and it is not possible to include a closing parenthesis  in  the  name.
+       However,  if  the  PCRE2_ALT_VERBNAMES  option is set, normal backslash
+       processing is applied to verb  names  and  only  an  unescaped  closing
+       parenthesis  terminates the name. A closing parenthesis can be included
+       in a name either as \) or between \Q and \E. If the  PCRE2_EXTENDED  or
+       PCRE2_EXTENDED_MORE  option  is set with PCRE2_ALT_VERBNAMES, unescaped
+       whitespace in verb names is  skipped  and  #-comments  are  recognized,
+       exactly as in the rest of the pattern.
+
+         PCRE2_AUTO_CALLOUT
+
+       If  this  bit  is  set,  pcre2_compile()  automatically inserts callout
+       items, all with number 255, before each pattern  item,  except  immedi-
+       ately  before  or after an explicit callout in the pattern. For discus-
+       sion of the callout facility, see the pcre2callout documentation.
+
+         PCRE2_CASELESS
+
+       If this bit is set, letters in the pattern match both upper  and  lower
+       case  letters in the subject. It is equivalent to Perl's /i option, and
+       it can be changed within  a  pattern  by  a  (?i)  option  setting.  If
+       PCRE2_UTF  is  set, Unicode properties are used for all characters with
+       more than one other case, and for all characters whose code points  are
+       greater  than  U+007F.  For lower valued characters with only one other
+       case, a lookup table is used for speed. When PCRE2_UTF is  not  set,  a
+       lookup table is used for all code points less than 256, and higher code
+       points (available only in 16-bit or 32-bit mode)  are  treated  as  not
+       having another case.
+
+         PCRE2_DOLLAR_ENDONLY
+
+       If  this bit is set, a dollar metacharacter in the pattern matches only
+       at the end of the subject string. Without this option,  a  dollar  also
+       matches  immediately before a newline at the end of the string (but not
+       before any other newlines). The PCRE2_DOLLAR_ENDONLY option is  ignored
+       if  PCRE2_MULTILINE  is  set.  There is no equivalent to this option in
+       Perl, and no way to set it within a pattern.
+
+         PCRE2_DOTALL
+
+       If this bit is set, a dot metacharacter  in  the  pattern  matches  any
+       character,  including  one  that  indicates a newline. However, it only
+       ever matches one character, even if newlines are coded as CRLF. Without
+       this option, a dot does not match when the current position in the sub-
+       ject is at a newline. This option is equivalent to  Perl's  /s  option,
+       and it can be changed within a pattern by a (?s) option setting. A neg-
+       ative class such as [^a] always matches newline characters, and the  \N
+       escape  sequence always matches a non-newline character, independent of
+       the setting of PCRE2_DOTALL.
+
+         PCRE2_DUPNAMES
+
+       If this bit is set, names used to identify capturing  subpatterns  need
+       not be unique. This can be helpful for certain types of pattern when it
+       is known that only one instance of the named  subpattern  can  ever  be
+       matched.  There  are  more details of named subpatterns below; see also
+       the pcre2pattern documentation.
+
+         PCRE2_ENDANCHORED
+
+       If this bit is set, the end of any pattern match must be right  at  the
+       end of the string being searched (the "subject string"). If the pattern
+       match succeeds by reaching (*ACCEPT), but does not reach the end of the
+       subject,  the match fails at the current starting point. For unanchored
+       patterns, a new match is then tried at the next  starting  point.  How-
+       ever, if the match succeeds by reaching the end of the pattern, but not
+       the end of the subject, backtracking occurs and  an  alternative  match
+       may be found. Consider these two patterns:
+
+         .(*ACCEPT)|..
+         .|..
+
+       If  matched against "abc" with PCRE2_ENDANCHORED set, the first matches
+       "c" whereas the second matches "bc". The  effect  of  PCRE2_ENDANCHORED
+       can  also  be achieved by appropriate constructs in the pattern itself,
+       which is the only way to do it in Perl.
+
+       For DFA matching with pcre2_dfa_match(), PCRE2_ENDANCHORED applies only
+       to  the  first  (that  is,  the longest) matched string. Other parallel
+       matches, which are necessarily substrings of the first one, must  obvi-
+       ously end before the end of the subject.
+
+         PCRE2_EXTENDED
+
+       If  this  bit  is  set,  most white space characters in the pattern are
+       totally ignored except when escaped or inside a character  class.  How-
+       ever,  white  space  is  not  allowed within sequences such as (?> that
+       introduce various parenthesized subpatterns, nor within numerical quan-
+       tifiers  such  as {1,3}.  Ignorable white space is permitted between an
+       item and a following quantifier and between a quantifier and a  follow-
+       ing  +  that indicates possessiveness.  PCRE2_EXTENDED is equivalent to
+       Perl's /x option, and it can be changed within  a  pattern  by  a  (?x)
+       option setting.
+
+       When  PCRE2  is compiled without Unicode support, PCRE2_EXTENDED recog-
+       nizes as white space only those characters with code points  less  than
+       256 that are flagged as white space in its low-character table. The ta-
+       ble is normally created by pcre2_maketables(), which uses the isspace()
+       function  to identify space characters. In most ASCII environments, the
+       relevant characters are those with code  points  0x0009  (tab),  0x000A
+       (linefeed),  0x000B (vertical tab), 0x000C (formfeed), 0x000D (carriage
+       return), and 0x0020 (space).
+
+       When PCRE2 is compiled with Unicode support, in addition to these char-
+       acters,  five  more Unicode "Pattern White Space" characters are recog-
+       nized by PCRE2_EXTENDED. These are U+0085 (next line), U+200E (left-to-
+       right  mark), U+200F (right-to-left mark), U+2028 (line separator), and
+       U+2029 (paragraph separator). This set of characters  is  the  same  as
+       recognized  by  Perl's /x option. Note that the horizontal and vertical
+       space characters that are matched by the \h and \v escapes in  patterns
+       are a much bigger set.
+
+       As  well as ignoring most white space, PCRE2_EXTENDED also causes char-
+       acters between an unescaped # outside a character class  and  the  next
+       newline,  inclusive,  to be ignored, which makes it possible to include
+       comments inside complicated patterns. Note that the end of this type of
+       comment  is a literal newline sequence in the pattern; escape sequences
+       that happen to represent a newline do not count.
+
+       Which characters are interpreted as newlines can be specified by a set-
+       ting  in  the compile context that is passed to pcre2_compile() or by a
+       special sequence at the start of the pattern, as described in the  sec-
+       tion  entitled "Newline conventions" in the pcre2pattern documentation.
+       A default is defined when PCRE2 is built.
+
+         PCRE2_EXTENDED_MORE
+
+       This option  has  the  effect  of  PCRE2_EXTENDED,  but,  in  addition,
+       unescaped  space  and  horizontal  tab  characters are ignored inside a
+       character class. Note: only these two characters are ignored,  not  the
+       full  set  of pattern white space characters that are ignored outside a
+       character  class.  PCRE2_EXTENDED_MORE  is  equivalent  to  Perl's  /xx
+       option,  and  it can be changed within a pattern by a (?xx) option set-
+       ting.
+
+         PCRE2_FIRSTLINE
+
+       If this option is set, the start of an unanchored pattern match must be
+       before  or  at  the  first  newline in the subject string following the
+       start of matching, though the matched text may continue over  the  new-
+       line. If startoffset is non-zero, the limiting newline is not necessar-
+       ily the first newline in the  subject.  For  example,  if  the  subject
+       string is "abc\nxyz" (where \n represents a single-character newline) a
+       pattern match for "yz" succeeds with PCRE2_FIRSTLINE if startoffset  is
+       greater  than 3. See also PCRE2_USE_OFFSET_LIMIT, which provides a more
+       general limiting facility. If PCRE2_FIRSTLINE is  set  with  an  offset
+       limit,  a match must occur in the first line and also within the offset
+       limit. In other words, whichever limit comes first is used.
+
+         PCRE2_LITERAL
+
+       If this option is set, all meta-characters in the pattern are disabled,
+       and  it is treated as a literal string. Matching literal strings with a
+       regular expression engine is not the most efficient way of doing it. If
+       you  are  doing  a  lot of literal matching and are worried about effi-
+       ciency, you should consider using other approaches. The only other main
+       options  that  are  allowed  with  PCRE2_LITERAL  are:  PCRE2_ANCHORED,
+       PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, PCRE2_CASELESS, PCRE2_FIRSTLINE,
+       PCRE2_NO_START_OPTIMIZE,     PCRE2_NO_UTF_CHECK,     PCRE2_UTF,     and
+       PCRE2_USE_OFFSET_LIMIT. The extra  options  PCRE2_EXTRA_MATCH_LINE  and
+       PCRE2_EXTRA_MATCH_WORD  are  also supported. Any other options cause an
+       error.
+
+         PCRE2_MATCH_UNSET_BACKREF
+
+       If this option is set, a backreference to  an  unset  subpattern  group
+       matches  an  empty  string (by default this causes the current matching
+       alternative to fail).  A pattern such as  (\1)(a)  succeeds  when  this
+       option  is set (assuming it can find an "a" in the subject), whereas it
+       fails by default, for Perl compatibility.  Setting  this  option  makes
+       PCRE2 behave more like ECMAscript (aka JavaScript).
+
+         PCRE2_MULTILINE
+
+       By  default,  for  the purposes of matching "start of line" and "end of
+       line", PCRE2 treats the subject string as consisting of a  single  line
+       of  characters,  even  if  it actually contains newlines. The "start of
+       line" metacharacter (^) matches only at the start of  the  string,  and
+       the  "end  of  line"  metacharacter  ($) matches only at the end of the
+       string,  or  before  a  terminating  newline  (except  when  PCRE2_DOL-
+       LAR_ENDONLY  is  set).  Note, however, that unless PCRE2_DOTALL is set,
+       the "any character" metacharacter (.) does not match at a newline. This
+       behaviour (for ^, $, and dot) is the same as Perl.
+
+       When  PCRE2_MULTILINE  it is set, the "start of line" and "end of line"
+       constructs match immediately following or immediately  before  internal
+       newlines  in  the  subject string, respectively, as well as at the very
+       start and end. This is equivalent to Perl's /m option, and  it  can  be
+       changed within a pattern by a (?m) option setting. Note that the "start
+       of line" metacharacter does not match after a newline at the end of the
+       subject,  for compatibility with Perl.  However, you can change this by
+       setting the PCRE2_ALT_CIRCUMFLEX option. If there are no newlines in  a
+       subject  string,  or  no  occurrences  of  ^ or $ in a pattern, setting
+       PCRE2_MULTILINE has no effect.
+
+         PCRE2_NEVER_BACKSLASH_C
+
+       This option locks out the use of \C in the pattern that is  being  com-
+       piled.   This  escape  can  cause  unpredictable  behaviour in UTF-8 or
+       UTF-16 modes, because it may leave the current matching  point  in  the
+       middle  of  a  multi-code-unit  character. This option may be useful in
+       applications that process patterns from  external  sources.  Note  that
+       there is also a build-time option that permanently locks out the use of
+       \C.
+
+         PCRE2_NEVER_UCP
+
+       This option locks out the use of Unicode properties  for  handling  \B,
+       \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes, as
+       described for the PCRE2_UCP option below. In  particular,  it  prevents
+       the  creator of the pattern from enabling this facility by starting the
+       pattern with (*UCP). This option may be  useful  in  applications  that
+       process patterns from external sources. The option combination PCRE_UCP
+       and PCRE_NEVER_UCP causes an error.
+
+         PCRE2_NEVER_UTF
+
+       This option locks out interpretation of the pattern as  UTF-8,  UTF-16,
+       or UTF-32, depending on which library is in use. In particular, it pre-
+       vents the creator of the pattern from switching to  UTF  interpretation
+       by  starting  the  pattern  with  (*UTF).  This option may be useful in
+       applications that process patterns from external sources. The  combina-
+       tion of PCRE2_UTF and PCRE2_NEVER_UTF causes an error.
+
+         PCRE2_NO_AUTO_CAPTURE
+
+       If this option is set, it disables the use of numbered capturing paren-
+       theses in the pattern. Any opening parenthesis that is not followed  by
+       ?  behaves as if it were followed by ?: but named parentheses can still
+       be used for capturing (and they acquire numbers in the usual way). This
+       is  the  same as Perl's /n option.  Note that, when this option is set,
+       references to capturing groups (backreferences or  recursion/subroutine
+       calls)  may  only refer to named groups, though the reference can be by
+       name or by number.
+
+         PCRE2_NO_AUTO_POSSESS
+
+       If this option is set, it disables "auto-possessification", which is an
+       optimization  that,  for example, turns a+b into a++b in order to avoid
+       backtracks into a+ that can never be successful. However,  if  callouts
+       are  in  use,  auto-possessification means that some callouts are never
+       taken. You can set this option if you want the matching functions to do
+       a  full  unoptimized  search and run all the callouts, but it is mainly
+       provided for testing purposes.
+
+         PCRE2_NO_DOTSTAR_ANCHOR
+
+       If this option is set, it disables an optimization that is applied when
+       .*  is  the  first significant item in a top-level branch of a pattern,
+       and all the other branches also start with .* or with \A or  \G  or  ^.
+       The  optimization  is  automatically disabled for .* if it is inside an
+       atomic group or a capturing group that is the subject of  a  backrefer-
+       ence,  or  if  the pattern contains (*PRUNE) or (*SKIP). When the opti-
+       mization is not disabled, such a pattern is automatically  anchored  if
+       PCRE2_DOTALL is set for all the .* items and PCRE2_MULTILINE is not set
+       for any ^ items. Otherwise, the fact that any match must  start  either
+       at  the start of the subject or following a newline is remembered. Like
+       other optimizations, this can cause callouts to be skipped.
+
+         PCRE2_NO_START_OPTIMIZE
+
+       This is an option whose main effect is at matching time.  It  does  not
+       change what pcre2_compile() generates, but it does affect the output of
+       the JIT compiler.
+
+       There are a number of optimizations that may occur at the  start  of  a
+       match,  in  order  to speed up the process. For example, if it is known
+       that an unanchored match must start with a specific  code  unit  value,
+       the  matching code searches the subject for that value, and fails imme-
+       diately if it cannot find it, without actually running the main  match-
+       ing  function.  This means that a special item such as (*COMMIT) at the
+       start of a pattern is not considered until after  a  suitable  starting
+       point  for  the  match  has  been found. Also, when callouts or (*MARK)
+       items are in use, these "start-up" optimizations can cause them  to  be
+       skipped  if  the pattern is never actually used. The start-up optimiza-
+       tions are in effect a pre-scan of the subject that takes  place  before
+       the pattern is run.
+
+       The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations,
+       possibly causing performance to suffer,  but  ensuring  that  in  cases
+       where  the  result is "no match", the callouts do occur, and that items
+       such as (*COMMIT) and (*MARK) are considered at every possible starting
+       position in the subject string.
+
+       Setting  PCRE2_NO_START_OPTIMIZE  may  change the outcome of a matching
+       operation.  Consider the pattern
+
+         (*COMMIT)ABC
+
+       When this is compiled, PCRE2 records the fact that a match  must  start
+       with  the  character  "A".  Suppose the subject string is "DEFABC". The
+       start-up optimization scans along the subject, finds "A" and  runs  the
+       first  match attempt from there. The (*COMMIT) item means that the pat-
+       tern must match the current starting position, which in this  case,  it
+       does.  However,  if  the same match is run with PCRE2_NO_START_OPTIMIZE
+       set, the initial scan along the subject string  does  not  happen.  The
+       first  match  attempt  is  run  starting  from "D" and when this fails,
+       (*COMMIT) prevents any further matches  being  tried,  so  the  overall
+       result is "no match".
+
+       There  are  also  other  start-up optimizations. For example, a minimum
+       length for the subject may be recorded. Consider the pattern
+
+         (*MARK:A)(X|Y)
+
+       The minimum length for a match is one  character.  If  the  subject  is
+       "ABC", there will be attempts to match "ABC", "BC", and "C". An attempt
+       to match an empty string at the end of the subject does not take place,
+       because  PCRE2  knows  that  the  subject  is now too short, and so the
+       (*MARK) is never encountered. In this case, the optimization  does  not
+       affect the overall match result, which is still "no match", but it does
+       affect the auxiliary information that is returned.
+
+         PCRE2_NO_UTF_CHECK
+
+       When PCRE2_UTF is set, the validity of the pattern as a UTF  string  is
+       automatically  checked.  There  are  discussions  about the validity of
+       UTF-8 strings, UTF-16 strings, and UTF-32 strings in  the  pcre2unicode
+       document.  If an invalid UTF sequence is found, pcre2_compile() returns
+       a negative error code.
+
+       If you know that your pattern is a valid UTF string, and  you  want  to
+       skip   this   check   for   performance   reasons,   you  can  set  the
+       PCRE2_NO_UTF_CHECK option. When it is set, the  effect  of  passing  an
+       invalid UTF string as a pattern is undefined. It may cause your program
+       to crash or loop.
+
+       Note  that  this  option  can  also  be  passed  to  pcre2_match()  and
+       pcre_dfa_match(),  to  suppress  UTF  validity  checking of the subject
+       string.
+
+       Note also that setting PCRE2_NO_UTF_CHECK at compile time does not dis-
+       able  the error that is given if an escape sequence for an invalid Uni-
+       code code point is encountered in the pattern. In particular,  the  so-
+       called  "surrogate"  code points (0xd800 to 0xdfff) are invalid. If you
+       want to allow escape  sequences  such  as  \x{d800}  you  can  set  the
+       PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  extra  option, as described in the
+       section entitled "Extra compile options" below.  However, this is  pos-
+       sible only in UTF-8 and UTF-32 modes, because these values are not rep-
+       resentable in UTF-16.
+
+         PCRE2_UCP
+
+       This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W,
+       \w,  and  some  of  the POSIX character classes. By default, only ASCII
+       characters are recognized, but if PCRE2_UCP is set, Unicode  properties
+       are  used instead to classify characters. More details are given in the
+       section on generic character types in the pcre2pattern page. If you set
+       PCRE2_UCP,  matching one of the items it affects takes much longer. The
+       option is available only if PCRE2 has been compiled with  Unicode  sup-
+       port (which is the default).
+
+         PCRE2_UNGREEDY
+
+       This  option  inverts  the "greediness" of the quantifiers so that they
+       are not greedy by default, but become greedy if followed by "?". It  is
+       not  compatible  with Perl. It can also be set by a (?U) option setting
+       within the pattern.
+
+         PCRE2_USE_OFFSET_LIMIT
+
+       This option must be set for pcre2_compile() if pcre2_set_offset_limit()
+       is  going  to be used to set a non-default offset limit in a match con-
+       text for matches that use this pattern. An error  is  generated  if  an
+       offset  limit  is  set  without  this option. For more details, see the
+       description of pcre2_set_offset_limit() in the section  that  describes
+       match contexts. See also the PCRE2_FIRSTLINE option above.
+
+         PCRE2_UTF
+
+       This  option  causes  PCRE2  to regard both the pattern and the subject
+       strings that are subsequently processed as strings  of  UTF  characters
+       instead  of  single-code-unit  strings.  It  is available when PCRE2 is
+       built to include Unicode support (which is  the  default).  If  Unicode
+       support  is  not  available,  the use of this option provokes an error.
+       Details of how PCRE2_UTF changes the behaviour of PCRE2  are  given  in
+       the  pcre2unicode  page.  In  particular,  note that it changes the way
+       PCRE2_CASELESS handles characters with code points greater than 127.
+
+   Extra compile options
+
+       Unlike the main compile-time options, the extra options are  not  saved
+       with the compiled pattern. The option bits that can be set in a compile
+       context by calling the pcre2_set_compile_extra_options()  function  are
+       as follows:
+
+         PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
+
+       This  option  applies when compiling a pattern in UTF-8 or UTF-32 mode.
+       It is forbidden in UTF-16 mode, and ignored in non-UTF  modes.  Unicode
+       "surrogate" code points in the range 0xd800 to 0xdfff are used in pairs
+       in UTF-16 to encode code points with values in  the  range  0x10000  to
+       0x10ffff.  The  surrogates  cannot  therefore be represented in UTF-16.
+       They can be represented in UTF-8 and UTF-32, but are defined as invalid
+       code  points,  and  cause  errors  if  encountered in a UTF-8 or UTF-32
+       string that is being checked for validity by PCRE2.
+
+       These values also cause errors if encountered in escape sequences  such
+       as \x{d912} within a pattern. However, it seems that some applications,
+       when using PCRE2 to check for unwanted  characters  in  UTF-8  strings,
+       explicitly   test  for  the  surrogates  using  escape  sequences.  The
+       PCRE2_NO_UTF_CHECK option does  not  disable  the  error  that  occurs,
+       because  it applies only to the testing of input strings for UTF valid-
+       ity.
+
+       If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set,  surro-
+       gate  code  point values in UTF-8 and UTF-32 patterns no longer provoke
+       errors and are incorporated in the compiled pattern. However, they  can
+       only  match  subject characters if the matching function is called with
+       PCRE2_NO_UTF_CHECK set.
+
+         PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
+
+       This is a dangerous option. Use with care. By default, an  unrecognized
+       escape  such  as \j or a malformed one such as \x{2z} causes a compile-
+       time error when detected by pcre2_compile(). Perl is somewhat inconsis-
+       tent  in  handling  such items: for example, \j is treated as a literal
+       "j", and non-hexadecimal digits in \x{} are just ignored, though  warn-
+       ings  are given in both cases if Perl's warning switch is enabled. How-
+       ever, a malformed octal number after \o{  always  causes  an  error  in
+       Perl.
+
+       If  the  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL  extra  option  is passed to
+       pcre2_compile(), all unrecognized or  erroneous  escape  sequences  are
+       treated  as  single-character escapes. For example, \j is a literal "j"
+       and \x{2z} is treated as  the  literal  string  "x{2z}".  Setting  this
+       option  means  that  typos in patterns may go undetected and have unex-
+       pected results. This is a dangerous option. Use with care.
+
+         PCRE2_EXTRA_MATCH_LINE
+
+       This option is provided for use by  the  -x  option  of  pcre2grep.  It
+       causes  the  pattern  only to match complete lines. This is achieved by
+       automatically inserting the code for "^(?:" at the start  of  the  com-
+       piled  pattern  and ")$" at the end. Thus, when PCRE2_MULTILINE is set,
+       the matched line may be in the  middle  of  the  subject  string.  This
+       option can be used with PCRE2_LITERAL.
+
+         PCRE2_EXTRA_MATCH_WORD
+
+       This  option  is  provided  for  use  by the -w option of pcre2grep. It
+       causes the pattern only to match strings that have a word  boundary  at
+       the  start and the end. This is achieved by automatically inserting the
+       code for "\b(?:" at the start of the compiled pattern and ")\b" at  the
+       end.  The option may be used with PCRE2_LITERAL. However, it is ignored
+       if PCRE2_EXTRA_MATCH_LINE is also set.
+
+
+JUST-IN-TIME (JIT) COMPILATION
+
+       int pcre2_jit_compile(pcre2_code *code, uint32_t options);
+
+       int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext);
+
+       void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
+
+       pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize,
+         PCRE2_SIZE maxsize, pcre2_general_context *gcontext);
+
+       void pcre2_jit_stack_assign(pcre2_match_context *mcontext,
+         pcre2_jit_callback callback_function, void *callback_data);
+
+       void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack);
+
+       These functions provide support for  JIT  compilation,  which,  if  the
+       just-in-time  compiler  is available, further processes a compiled pat-
+       tern into machine code that executes much faster than the pcre2_match()
+       interpretive  matching function. Full details are given in the pcre2jit
+       documentation.
+
+       JIT compilation is a heavyweight optimization. It can  take  some  time
+       for  patterns  to  be analyzed, and for one-off matches and simple pat-
+       terns the benefit of faster execution might be offset by a much  slower
+       compilation  time.  Most (but not all) patterns can be optimized by the
+       JIT compiler.
+
+
+LOCALE SUPPORT
+
+       PCRE2 handles caseless matching, and determines whether characters  are
+       letters,  digits, or whatever, by reference to a set of tables, indexed
+       by character code point. This applies only  to  characters  whose  code
+       points  are  less than 256. By default, higher-valued code points never
+       match escapes such as \w or \d.  However, if PCRE2 is built  with  Uni-
+       code support, all characters can be tested with \p and \P, or, alterna-
+       tively, the PCRE2_UCP option can be set when  a  pattern  is  compiled;
+       this  causes  \w and friends to use Unicode property support instead of
+       the built-in tables.
+
+       The use of locales with Unicode is discouraged.  If  you  are  handling
+       characters  with  code  points  greater than 128, you should either use
+       Unicode support, or use locales, but not try to mix the two.
+
+       PCRE2 contains an internal set of character tables  that  are  used  by
+       default.   These  are  sufficient  for many applications. Normally, the
+       internal tables recognize only ASCII characters. However, when PCRE2 is
+       built, it is possible to cause the internal tables to be rebuilt in the
+       default "C" locale of the local system, which may cause them to be dif-
+       ferent.
+
+       The  internal tables can be overridden by tables supplied by the appli-
+       cation that calls PCRE2. These may be created  in  a  different  locale
+       from  the  default.  As more and more applications change to using Uni-
+       code, the need for this locale support is expected to die away.
+
+       External tables are built by calling the  pcre2_maketables()  function,
+       in  the relevant locale. The result can be passed to pcre2_compile() as
+       often  as  necessary,  by  creating  a  compile  context  and   calling
+       pcre2_set_character_tables()  to  set  the  tables pointer therein. For
+       example, to build and use tables that are appropriate  for  the  French
+       locale  (where  accented  characters  with  values greater than 128 are
+       treated as letters), the following code could be used:
+
+         setlocale(LC_CTYPE, "fr_FR");
+         tables = pcre2_maketables(NULL);
+         ccontext = pcre2_compile_context_create(NULL);
+         pcre2_set_character_tables(ccontext, tables);
+         re = pcre2_compile(..., ccontext);
+
+       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
+       if  you  are using Windows, the name for the French locale is "french".
+       It is the caller's responsibility to ensure that the memory  containing
+       the tables remains available for as long as it is needed.
+
+       The pointer that is passed (via the compile context) to pcre2_compile()
+       is saved with the compiled pattern, and the same  tables  are  used  by
+       pcre2_match()  and pcre_dfa_match(). Thus, for any single pattern, com-
+       pilation and matching both happen in the  same  locale,  but  different
+       patterns can be processed in different locales.
+
+
+INFORMATION ABOUT A COMPILED PATTERN
+
+       int pcre2_pattern_info(const pcre2 *code, uint32_t what, void *where);
+
+       The  pcre2_pattern_info()  function returns general information about a
+       compiled pattern. For information about callouts, see the next section.
+       The  first  argument  for pcre2_pattern_info() is a pointer to the com-
+       piled pattern. The second argument specifies which piece of information
+       is  required,  and  the  third  argument  is a pointer to a variable to
+       receive the data. If the third argument is NULL, the first argument  is
+       ignored,  and  the  function  returns the size in bytes of the variable
+       that is required for the information requested. Otherwise, the yield of
+       the function is zero for success, or one of the following negative num-
+       bers:
+
+         PCRE2_ERROR_NULL           the argument code was NULL
+         PCRE2_ERROR_BADMAGIC       the "magic number" was not found
+         PCRE2_ERROR_BADOPTION      the value of what was invalid
+         PCRE2_ERROR_UNSET          the requested field is not set
+
+       The "magic number" is placed at the start of each compiled  pattern  as
+       an  simple check against passing an arbitrary memory pointer. Here is a
+       typical call of pcre2_pattern_info(), to obtain the length of the  com-
+       piled pattern:
+
+         int rc;
+         size_t length;
+         rc = pcre2_pattern_info(
+           re,               /* result of pcre2_compile() */
+           PCRE2_INFO_SIZE,  /* what is required */
+           &length);         /* where to put the data */
+
+       The possible values for the second argument are defined in pcre2.h, and
+       are as follows:
+
+         PCRE2_INFO_ALLOPTIONS
+         PCRE2_INFO_ARGOPTIONS
+         PCRE2_INFO_EXTRAOPTIONS
+
+       Return copies of the pattern's options. The third argument should point
+       to  a  uint32_t  variable.  PCRE2_INFO_ARGOPTIONS  returns  exactly the
+       options that were passed to pcre2_compile(), whereas  PCRE2_INFO_ALLOP-
+       TIONS  returns  the compile options as modified by any top-level (*XXX)
+       option settings such as (*UTF) at the  start  of  the  pattern  itself.
+       PCRE2_INFO_EXTRAOPTIONS  returns the extra options that were set in the
+       compile context by calling the pcre2_set_compile_extra_options()  func-
+       tion.
+
+       For   example,   if  the  pattern  /(*UTF)abc/  is  compiled  with  the
+       PCRE2_EXTENDED  option,  the  result   for   PCRE2_INFO_ALLOPTIONS   is
+       PCRE2_EXTENDED  and  PCRE2_UTF.   Option settings such as (?i) that can
+       change within a pattern do not affect the result  of  PCRE2_INFO_ALLOP-
+       TIONS, even if they appear right at the start of the pattern. (This was
+       different in some earlier releases.)
+
+       A pattern compiled without PCRE2_ANCHORED is automatically anchored  by
+       PCRE2 if the first significant item in every top-level branch is one of
+       the following:
+
+         ^     unless PCRE2_MULTILINE is set
+         \A    always
+         \G    always
+         .*    sometimes - see below
+
+       When .* is the first significant item, anchoring is possible only  when
+       all the following are true:
+
+         .* is not in an atomic group
+         .* is not in a capturing group that is the subject
+              of a backreference
+         PCRE2_DOTALL is in force for .*
+         Neither (*PRUNE) nor (*SKIP) appears in the pattern
+         PCRE2_NO_DOTSTAR_ANCHOR is not set
+
+       For  patterns  that are auto-anchored, the PCRE2_ANCHORED bit is set in
+       the options returned for PCRE2_INFO_ALLOPTIONS.
+
+         PCRE2_INFO_BACKREFMAX
+
+       Return the number of the highest  backreference  in  the  pattern.  The
+       third  argument should point to an uint32_t variable. Named subpatterns
+       acquire numbers as well as names, and these count towards  the  highest
+       backreference.   Backreferences such as \4 or \g{12} match the captured
+       characters of the given group, but in addition, the check that  a  cap-
+       turing  group  is  set in a conditional subpattern such as (?(3)a|b) is
+       also a backreference. Zero is returned if there are no backreferences.
+
+         PCRE2_INFO_BSR
+
+       The output is a uint32_t integer whose value indicates  what  character
+       sequences  the \R escape sequence matches. A value of PCRE2_BSR_UNICODE
+       means that \R matches any Unicode line  ending  sequence;  a  value  of
+       PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF.
+
+         PCRE2_INFO_CAPTURECOUNT
+
+       Return  the highest capturing subpattern number in the pattern. In pat-
+       terns where (?| is not used, this is also the total number of capturing
+       subpatterns.  The third argument should point to an uint32_t variable.
+
+         PCRE2_INFO_DEPTHLIMIT
+
+       If  the  pattern set a backtracking depth limit by including an item of
+       the form (*LIMIT_DEPTH=nnnn) at the start, the value is  returned.  The
+       third argument should point to a uint32_t integer. If no such value has
+       been  set,  the  call  to  pcre2_pattern_info()   returns   the   error
+       PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
+       ing if it is less than the limit set or defaulted by the caller of  the
+       match function.
+
+         PCRE2_INFO_FIRSTBITMAP
+
+       In  the absence of a single first code unit for a non-anchored pattern,
+       pcre2_compile() may construct a 256-bit table that defines a fixed  set
+       of  values for the first code unit in any match. For example, a pattern
+       that starts with [abc] results in a table with  three  bits  set.  When
+       code  unit  values greater than 255 are supported, the flag bit for 255
+       means "any code unit of value 255 or above". If such a table  was  con-
+       structed,  a pointer to it is returned. Otherwise NULL is returned. The
+       third argument should point to a const uint8_t * variable.
+
+         PCRE2_INFO_FIRSTCODETYPE
+
+       Return information about the first code unit of any matched string, for
+       a  non-anchored pattern. The third argument should point to an uint32_t
+       variable. If there is a fixed first value, for example, the letter  "c"
+       from  a  pattern such as (cat|cow|coyote), 1 is returned, and the value
+       can be retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is  no  fixed
+       first  value,  but it is known that a match can occur only at the start
+       of the subject or following a newline in the subject,  2  is  returned.
+       Otherwise, and for anchored patterns, 0 is returned.
+
+         PCRE2_INFO_FIRSTCODEUNIT
+
+       Return  the  value  of  the first code unit of any matched string for a
+       pattern where PCRE2_INFO_FIRSTCODETYPE returns 1; otherwise  return  0.
+       The  third  argument should point to an uint32_t variable. In the 8-bit
+       library, the value is always less than 256. In the 16-bit  library  the
+       value  can  be  up  to 0xffff. In the 32-bit library in UTF-32 mode the
+       value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32
+       mode.
+
+         PCRE2_INFO_FRAMESIZE
+
+       Return the size (in bytes) of the data frames that are used to remember
+       backtracking positions when the pattern is processed  by  pcre2_match()
+       without  the  use  of  JIT. The third argument should point to a size_t
+       variable. The frame size depends on the number of capturing parentheses
+       in  the  pattern.  Each  additional capturing group adds two PCRE2_SIZE
+       variables.
+
+         PCRE2_INFO_HASBACKSLASHC
+
+       Return 1 if the pattern contains any instances of \C, otherwise 0.  The
+       third argument should point to an uint32_t variable.
+
+         PCRE2_INFO_HASCRORLF
+
+       Return  1  if  the  pattern  contains any explicit matches for CR or LF
+       characters, otherwise 0. The third argument should point to an uint32_t
+       variable.  An explicit match is either a literal CR or LF character, or
+       \r or  \n  or  one  of  the  equivalent  hexadecimal  or  octal  escape
+       sequences.
+
+         PCRE2_INFO_HEAPLIMIT
+
+       If the pattern set a heap memory limit by including an item of the form
+       (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu-
+       ment should point to a uint32_t integer. If no such value has been set,
+       the call to pcre2_pattern_info() returns the  error  PCRE2_ERROR_UNSET.
+       Note  that  this  limit will only be used during matching if it is less
+       than the limit set or defaulted by the caller of the match function.
+
+         PCRE2_INFO_JCHANGED
+
+       Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
+       otherwise  0.  The third argument should point to an uint32_t variable.
+       (?J) and (?-J) set and unset the local PCRE2_DUPNAMES  option,  respec-
+       tively.
+
+         PCRE2_INFO_JITSIZE
+
+       If  the  compiled  pattern was successfully processed by pcre2_jit_com-
+       pile(), return the size of the  JIT  compiled  code,  otherwise  return
+       zero. The third argument should point to a size_t variable.
+
+         PCRE2_INFO_LASTCODETYPE
+
+       Returns  1 if there is a rightmost literal code unit that must exist in
+       any matched string, other than at its start. The third argument  should
+       point  to  an  uint32_t  variable.  If  there  is  no  such value, 0 is
+       returned. When 1 is  returned,  the  code  unit  value  itself  can  be
+       retrieved  using PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last
+       literal value is recorded only if  it  follows  something  of  variable
+       length.  For example, for the pattern /^a\d+z\d+/ the returned value is
+       1 (with "z" returned from PCRE2_INFO_LASTCODEUNIT), but  for  /^a\dz\d/
+       the returned value is 0.
+
+         PCRE2_INFO_LASTCODEUNIT
+
+       Return  the value of the rightmost literal code unit that must exist in
+       any matched string, other than  at  its  start,  for  a  pattern  where
+       PCRE2_INFO_LASTCODETYPE returns 1. Otherwise, return 0. The third argu-
+       ment should point to an uint32_t variable.
+
+         PCRE2_INFO_MATCHEMPTY
+
+       Return 1 if the pattern might match an empty string, otherwise  0.  The
+       third  argument  should  point  to an uint32_t variable. When a pattern
+       contains recursive subroutine calls it is not always possible to deter-
+       mine  whether  or  not it can match an empty string. PCRE2 takes a cau-
+       tious approach and returns 1 in such cases.
+
+         PCRE2_INFO_MATCHLIMIT
+
+       If the pattern set a match limit by  including  an  item  of  the  form
+       (*LIMIT_MATCH=nnnn)  at  the  start,  the  value is returned. The third
+       argument should point to a uint32_t integer. If no such value has  been
+       set,    the    call   to   pcre2_pattern_info()   returns   the   error
+       PCRE2_ERROR_UNSET. Note that this limit will only be used during match-
+       ing  if it is less than the limit set or defaulted by the caller of the
+       match function.
+
+         PCRE2_INFO_MAXLOOKBEHIND
+
+       Return the number of characters (not code units) in the longest lookbe-
+       hind  assertion  in  the  pattern. The third argument should point to a
+       uint32_t integer. This information is useful when  doing  multi-segment
+       matching  using  the  partial matching facilities. Note that the simple
+       assertions \b and \B require a one-character lookbehind. \A also regis-
+       ters  a  one-character  lookbehind, though it does not actually inspect
+       the previous character. This is to ensure that at least  one  character
+       from  the old segment is retained when a new segment is processed. Oth-
+       erwise, if there are no lookbehinds in  the  pattern,  \A  might  match
+       incorrectly at the start of a second or subsequent segment.
+
+         PCRE2_INFO_MINLENGTH
+
+       If  a  minimum  length  for  matching subject strings was computed, its
+       value is returned. Otherwise the returned value is 0. The  value  is  a
+       number  of characters, which in UTF mode may be different from the num-
+       ber of code units.  The third argument  should  point  to  an  uint32_t
+       variable.  The  value  is  a  lower bound to the length of any matching
+       string. There may not be any strings of that length  that  do  actually
+       match, but every string that does match is at least that long.
+
+         PCRE2_INFO_NAMECOUNT
+         PCRE2_INFO_NAMEENTRYSIZE
+         PCRE2_INFO_NAMETABLE
+
+       PCRE2 supports the use of named as well as numbered capturing parenthe-
+       ses. The names are just an additional way of identifying the  parenthe-
+       ses, which still acquire numbers. Several convenience functions such as
+       pcre2_substring_get_byname() are provided for extracting captured  sub-
+       strings  by  name. It is also possible to extract the data directly, by
+       first converting the name to a number in order to  access  the  correct
+       pointers  in the output vector (described with pcre2_match() below). To
+       do the conversion, you need to use the  name-to-number  map,  which  is
+       described by these three values.
+
+       The  map  consists  of a number of fixed-size entries. PCRE2_INFO_NAME-
+       COUNT gives the number of entries, and  PCRE2_INFO_NAMEENTRYSIZE  gives
+       the  size  of each entry in code units; both of these return a uint32_t
+       value. The entry size depends on the length of the longest name.
+
+       PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table.
+       This  is  a  PCRE2_SPTR  pointer to a block of code units. In the 8-bit
+       library, the first two bytes of each entry are the number of  the  cap-
+       turing parenthesis, most significant byte first. In the 16-bit library,
+       the pointer points to 16-bit code units, the first  of  which  contains
+       the  parenthesis  number.  In the 32-bit library, the pointer points to
+       32-bit code units, the first of which contains the parenthesis  number.
+       The rest of the entry is the corresponding name, zero terminated.
+
+       The  names are in alphabetical order. If (?| is used to create multiple
+       groups with the same number, as described in the section  on  duplicate
+       subpattern  numbers  in  the pcre2pattern page, the groups may be given
+       the same name, but there is only one  entry  in  the  table.  Different
+       names for groups of the same number are not permitted.
+
+       Duplicate  names  for subpatterns with different numbers are permitted,
+       but only if PCRE2_DUPNAMES is set. They appear  in  the  table  in  the
+       order  in  which  they were found in the pattern. In the absence of (?|
+       this is the order of increasing number; when (?| is used  this  is  not
+       necessarily the case because later subpatterns may have lower numbers.
+
+       As  a  simple  example of the name/number table, consider the following
+       pattern after compilation by the 8-bit library  (assume  PCRE2_EXTENDED
+       is set, so white space - including newlines - is ignored):
+
+         (?<date> (?<year>(\d\d)?\d\d) -
+         (?<month>\d\d) - (?<day>\d\d) )
+
+       There  are  four  named subpatterns, so the table has four entries, and
+       each entry in the table is eight bytes long. The table is  as  follows,
+       with non-printing bytes shows in hexadecimal, and undefined bytes shown
+       as ??:
+
+         00 01 d  a  t  e  00 ??
+         00 05 d  a  y  00 ?? ??
+         00 04 m  o  n  t  h  00
+         00 02 y  e  a  r  00 ??
+
+       When writing code to extract data  from  named  subpatterns  using  the
+       name-to-number  map,  remember that the length of the entries is likely
+       to be different for each compiled pattern.
+
+         PCRE2_INFO_NEWLINE
+
+       The output is one of the following uint32_t values:
+
+         PCRE2_NEWLINE_CR       Carriage return (CR)
+         PCRE2_NEWLINE_LF       Linefeed (LF)
+         PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
+         PCRE2_NEWLINE_ANY      Any Unicode line ending
+         PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
+         PCRE2_NEWLINE_NUL      The NUL character (binary zero)
+
+       This identifies the character sequence that will be recognized as mean-
+       ing "newline" while matching.
+
+         PCRE2_INFO_SIZE
+
+       Return  the  size  of  the  compiled  pattern  in  bytes (for all three
+       libraries). The third argument should point to a size_t variable.  This
+       value  includes  the  size  of the general data block that precedes the
+       code units of the compiled pattern itself. The value that is used  when
+       pcre2_compile()  is  getting memory in which to place the compiled pat-
+       tern may be slightly larger than the value  returned  by  this  option,
+       because  there are cases where the code that calculates the size has to
+       over-estimate. Processing a pattern with  the  JIT  compiler  does  not
+       alter the value returned by this option.
+
+
+INFORMATION ABOUT A PATTERN'S CALLOUTS
+
+       int pcre2_callout_enumerate(const pcre2_code *code,
+         int (*callback)(pcre2_callout_enumerate_block *, void *),
+         void *user_data);
+
+       A script language that supports the use of string arguments in callouts
+       might like to scan all the callouts in a  pattern  before  running  the
+       match. This can be done by calling pcre2_callout_enumerate(). The first
+       argument is a pointer to a compiled pattern, the  second  points  to  a
+       callback  function,  and the third is arbitrary user data. The callback
+       function is called for every callout in the pattern  in  the  order  in
+       which they appear. Its first argument is a pointer to a callout enumer-
+       ation block, and its second argument is the user_data  value  that  was
+       passed  to  pcre2_callout_enumerate(). The contents of the callout enu-
+       meration block are described in the pcre2callout  documentation,  which
+       also gives further details about callouts.
+
+
+SERIALIZATION AND PRECOMPILING
+
+       It  is  possible  to  save  compiled patterns on disc or elsewhere, and
+       reload them later, subject to a number of  restrictions.  The  host  on
+       which  the  patterns  are  reloaded must be running the same version of
+       PCRE2, with the same code unit width, and must also have the same endi-
+       anness,  pointer  width,  and PCRE2_SIZE type. Before compiled patterns
+       can be saved, they must be converted to a "serialized" form,  which  in
+       the  case of PCRE2 is really just a bytecode dump.  The functions whose
+       names begin with pcre2_serialize_ are used for converting to  and  from
+       the  serialized form. They are described in the pcre2serialize documen-
+       tation. Note that PCRE2 serialization does not  convert  compiled  pat-
+       terns to an abstract format like Java or .NET serialization.
+
+
+THE MATCH DATA BLOCK
+
+       pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize,
+         pcre2_general_context *gcontext);
+
+       pcre2_match_data *pcre2_match_data_create_from_pattern(
+         const pcre2_code *code, pcre2_general_context *gcontext);
+
+       void pcre2_match_data_free(pcre2_match_data *match_data);
+
+       Information  about  a  successful  or unsuccessful match is placed in a
+       match data block, which is an opaque  structure  that  is  accessed  by
+       function  calls.  In particular, the match data block contains a vector
+       of offsets into the subject string that define the matched part of  the
+       subject  and  any  substrings  that were captured. This is known as the
+       ovector.
+
+       Before calling pcre2_match(), pcre2_dfa_match(),  or  pcre2_jit_match()
+       you must create a match data block by calling one of the creation func-
+       tions above. For pcre2_match_data_create(), the first argument  is  the
+       number  of  pairs  of  offsets  in  the ovector. One pair of offsets is
+       required to identify the string that matched the whole pattern, with an
+       additional  pair for each captured substring. For example, a value of 4
+       creates enough space to record the matched portion of the subject  plus
+       three  captured  substrings. A minimum of at least 1 pair is imposed by
+       pcre2_match_data_create(), so it is always possible to return the over-
+       all matched string.
+
+       The second argument of pcre2_match_data_create() is a pointer to a gen-
+       eral context, which can specify custom memory management for  obtaining
+       the memory for the match data block. If you are not using custom memory
+       management, pass NULL, which causes malloc() to be used.
+
+       For pcre2_match_data_create_from_pattern(), the  first  argument  is  a
+       pointer to a compiled pattern. The ovector is created to be exactly the
+       right size to hold all the substrings a pattern might capture. The sec-
+       ond  argument is again a pointer to a general context, but in this case
+       if NULL is passed, the memory is obtained using the same allocator that
+       was used for the compiled pattern (custom or default).
+
+       A  match  data block can be used many times, with the same or different
+       compiled patterns. You can extract information from a match data  block
+       after  a  match  operation  has  finished,  using  functions  that  are
+       described in the sections on  matched  strings  and  other  match  data
+       below.
+
+       When  a  call  of  pcre2_match()  fails, valid data is available in the
+       match   block   only   when   the   error    is    PCRE2_ERROR_NOMATCH,
+       PCRE2_ERROR_PARTIAL,  or  one  of  the  error  codes for an invalid UTF
+       string. Exactly what is available depends on the error, and is detailed
+       below.
+
+       When  one of the matching functions is called, pointers to the compiled
+       pattern and the subject string are set in the match data block so  that
+       they  can  be  referenced  by the extraction functions. After running a
+       match, you must not free a compiled pattern or a subject  string  until
+       after  all  operations  on  the  match data block (for that match) have
+       taken place.
+
+       When a match data block itself is no longer needed, it should be  freed
+       by  calling  pcre2_match_data_free(). If this function is called with a
+       NULL argument, it returns immediately, without doing anything.
+
+
+MATCHING A PATTERN: THE TRADITIONAL FUNCTION
+
+       int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext);
+
+       The function pcre2_match() is called to match a subject string  against
+       a  compiled pattern, which is passed in the code argument. You can call
+       pcre2_match() with the same code argument as many times as you like, in
+       order  to  find multiple matches in the subject string or to match dif-
+       ferent subject strings with the same pattern.
+
+       This function is the main matching facility  of  the  library,  and  it
+       operates  in  a  Perl-like  manner. For specialist use there is also an
+       alternative matching function, which is described below in the  section
+       about the pcre2_dfa_match() function.
+
+       Here is an example of a simple call to pcre2_match():
+
+         pcre2_match_data *md = pcre2_match_data_create(4, NULL);
+         int rc = pcre2_match(
+           re,             /* result of pcre2_compile() */
+           "some string",  /* the subject string */
+           11,             /* the length of the subject string */
+           0,              /* start at offset 0 in the subject */
+           0,              /* default options */
+           md,             /* the match data block */
+           NULL);          /* a match context; NULL means use defaults */
+
+       If  the  subject  string is zero-terminated, the length can be given as
+       PCRE2_ZERO_TERMINATED. A match context must be provided if certain less
+       common matching parameters are to be changed. For details, see the sec-
+       tion on the match context above.
+
+   The string to be matched by pcre2_match()
+
+       The subject string is passed to pcre2_match() as a pointer in  subject,
+       a  length  in  length, and a starting offset in startoffset. The length
+       and offset are in code units, not characters.  That  is,  they  are  in
+       bytes  for the 8-bit library, 16-bit code units for the 16-bit library,
+       and 32-bit code units for the 32-bit library, whether or not  UTF  pro-
+       cessing is enabled.
+
+       If startoffset is greater than the length of the subject, pcre2_match()
+       returns PCRE2_ERROR_BADOFFSET. When the starting offset  is  zero,  the
+       search  for a match starts at the beginning of the subject, and this is
+       by far the most common case. In UTF-8 or UTF-16 mode, the starting off-
+       set  must  point to the start of a character, or to the end of the sub-
+       ject (in UTF-32 mode, one code unit equals one character, so  all  off-
+       sets  are  valid).  Like  the  pattern  string, the subject may contain
+       binary zeros.
+
+       A non-zero starting offset is useful when searching for  another  match
+       in  the  same  subject  by calling pcre2_match() again after a previous
+       success.  Setting startoffset differs from  passing  over  a  shortened
+       string  and  setting  PCRE2_NOTBOL in the case of a pattern that begins
+       with any kind of lookbehind. For example, consider the pattern
+
+         \Biss\B
+
+       which finds occurrences of "iss" in the middle of  words.  (\B  matches
+       only  if  the  current position in the subject is not a word boundary.)
+       When applied to the string "Mississipi" the first call to pcre2_match()
+       finds  the first occurrence. If pcre2_match() is called again with just
+       the remainder of the subject,  namely  "issipi",  it  does  not  match,
+       because \B is always false at the start of the subject, which is deemed
+       to be a word boundary. However, if pcre2_match() is passed  the  entire
+       string again, but with startoffset set to 4, it finds the second occur-
+       rence of "iss" because it is able to look behind the starting point  to
+       discover that it is preceded by a letter.
+
+       Finding  all  the  matches  in a subject is tricky when the pattern can
+       match an empty string. It is possible to emulate Perl's /g behaviour by
+       first   trying   the   match   again  at  the  same  offset,  with  the
+       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED options,  and  then  if  that
+       fails,  advancing  the  starting  offset  and  trying an ordinary match
+       again. There is some code that demonstrates  how  to  do  this  in  the
+       pcre2demo  sample  program. In the most general case, you have to check
+       to see if the newline convention recognizes CRLF as a newline,  and  if
+       so,  and the current character is CR followed by LF, advance the start-
+       ing offset by two characters instead of one.
+
+       If a non-zero starting offset is passed when the pattern is anchored, a
+       single attempt to match at the given offset is made. This can only suc-
+       ceed if the pattern does not require the match to be at  the  start  of
+       the  subject.  In other words, the anchoring must be the result of set-
+       ting the PCRE2_ANCHORED option or the use of .* with PCRE2_DOTALL,  not
+       by starting the pattern with ^ or \A.
+
+   Option bits for pcre2_match()
+
+       The unused bits of the options argument for pcre2_match() must be zero.
+       The only bits that may be set  are  PCRE2_ANCHORED,  PCRE2_ENDANCHORED,
+       PCRE2_NOTBOL,   PCRE2_NOTEOL,  PCRE2_NOTEMPTY,  PCRE2_NOTEMPTY_ATSTART,
+       PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,  PCRE2_PARTIAL_HARD,  and  PCRE2_PAR-
+       TIAL_SOFT.  Their action is described below.
+
+       Setting  PCRE2_ANCHORED  or PCRE2_ENDANCHORED at match time is not sup-
+       ported by the just-in-time (JIT) compiler. If it is set,  JIT  matching
+       is  disabled  and  the interpretive code in pcre2_match() is run. Apart
+       from PCRE2_NO_JIT (obviously), the remaining options are supported  for
+       JIT matching.
+
+         PCRE2_ANCHORED
+
+       The PCRE2_ANCHORED option limits pcre2_match() to matching at the first
+       matching position. If a pattern was compiled  with  PCRE2_ANCHORED,  or
+       turned  out to be anchored by virtue of its contents, it cannot be made
+       unachored at matching time. Note that setting the option at match  time
+       disables JIT matching.
+
+         PCRE2_ENDANCHORED
+
+       If  the  PCRE2_ENDANCHORED option is set, any string that pcre2_match()
+       matches must be right at the end of the subject string. Note that  set-
+       ting the option at match time disables JIT matching.
+
+         PCRE2_NOTBOL
+
+       This option specifies that first character of the subject string is not
+       the beginning of a line, so the  circumflex  metacharacter  should  not
+       match  before  it.  Setting  this without having set PCRE2_MULTILINE at
+       compile time causes circumflex never to match. This option affects only
+       the behaviour of the circumflex metacharacter. It does not affect \A.
+
+         PCRE2_NOTEOL
+
+       This option specifies that the end of the subject string is not the end
+       of a line, so the dollar metacharacter should not match it nor  (except
+       in  multiline mode) a newline immediately before it. Setting this with-
+       out having set PCRE2_MULTILINE at compile time causes dollar  never  to
+       match. This option affects only the behaviour of the dollar metacharac-
+       ter. It does not affect \Z or \z.
+
+         PCRE2_NOTEMPTY
+
+       An empty string is not considered to be a valid match if this option is
+       set.  If  there are alternatives in the pattern, they are tried. If all
+       the alternatives match the empty string, the entire  match  fails.  For
+       example, if the pattern
+
+         a?b?
+
+       is  applied  to  a  string not beginning with "a" or "b", it matches an
+       empty string at the start of the subject. With PCRE2_NOTEMPTY set, this
+       match  is  not valid, so pcre2_match() searches further into the string
+       for occurrences of "a" or "b".
+
+         PCRE2_NOTEMPTY_ATSTART
+
+       This is like PCRE2_NOTEMPTY, except that it locks out an  empty  string
+       match only at the first matching position, that is, at the start of the
+       subject plus the starting offset. An empty string match  later  in  the
+       subject  is  permitted.   If  the pattern is anchored, such a match can
+       occur only if the pattern contains \K.
+
+         PCRE2_NO_JIT
+
+       By  default,  if  a  pattern  has  been   successfully   processed   by
+       pcre2_jit_compile(),  JIT  is  automatically used when pcre2_match() is
+       called with options that JIT supports.  Setting  PCRE2_NO_JIT  disables
+       the use of JIT; it forces matching to be done by the interpreter.
+
+         PCRE2_NO_UTF_CHECK
+
+       When PCRE2_UTF is set at compile time, the validity of the subject as a
+       UTF string is checked by default  when  pcre2_match()  is  subsequently
+       called.   If  a non-zero starting offset is given, the check is applied
+       only to that part of the subject that could be inspected during  match-
+       ing,  and there is a check that the starting offset points to the first
+       code unit of a character or to the end of the subject. If there are  no
+       lookbehind  assertions in the pattern, the check starts at the starting
+       offset. Otherwise, it starts at the length of  the  longest  lookbehind
+       before the starting offset, or at the start of the subject if there are
+       not that many characters before the  starting  offset.  Note  that  the
+       sequences \b and \B are one-character lookbehinds.
+
+       The check is carried out before any other processing takes place, and a
+       negative error code is returned if the check fails. There  are  several
+       UTF  error  codes  for each code unit width, corresponding to different
+       problems with the code unit sequence. There are discussions  about  the
+       validity  of  UTF-8  strings, UTF-16 strings, and UTF-32 strings in the
+       pcre2unicode page.
+
+       If you know that your subject is valid, and  you  want  to  skip  these
+       checks  for  performance  reasons,  you  can set the PCRE2_NO_UTF_CHECK
+       option when calling pcre2_match(). You might want to do  this  for  the
+       second and subsequent calls to pcre2_match() if you are making repeated
+       calls to find other matches in the same subject string.
+
+       Warning: When PCRE2_NO_UTF_CHECK is  set,  the  effect  of  passing  an
+       invalid  string  as  a  subject, or an invalid value of startoffset, is
+       undefined.  Your program may crash or loop indefinitely.
+
+         PCRE2_PARTIAL_HARD
+         PCRE2_PARTIAL_SOFT
+
+       These options turn on the partial matching  feature.  A  partial  match
+       occurs  if  the  end of the subject string is reached successfully, but
+       there are not enough subject characters to complete the match. If  this
+       happens  when  PCRE2_PARTIAL_SOFT  (but not PCRE2_PARTIAL_HARD) is set,
+       matching continues by testing any remaining alternatives.  Only  if  no
+       complete  match can be found is PCRE2_ERROR_PARTIAL returned instead of
+       PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies  that
+       the  caller  is prepared to handle a partial match, but only if no com-
+       plete match can be found.
+
+       If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In  this
+       case,  if  a  partial match is found, pcre2_match() immediately returns
+       PCRE2_ERROR_PARTIAL, without considering  any  other  alternatives.  In
+       other words, when PCRE2_PARTIAL_HARD is set, a partial match is consid-
+       ered to be more important that an alternative complete match.
+
+       There is a more detailed discussion of partial and multi-segment match-
+       ing, with examples, in the pcre2partial documentation.
+
+
+NEWLINE HANDLING WHEN MATCHING
+
+       When  PCRE2 is built, a default newline convention is set; this is usu-
+       ally the standard convention for the operating system. The default  can
+       be  overridden  in a compile context by calling pcre2_set_newline(). It
+       can also be overridden by starting a pattern string with, for  example,
+       (*CRLF),  as  described  in  the  section on newline conventions in the
+       pcre2pattern page. During matching, the newline choice affects the  be-
+       haviour  of the dot, circumflex, and dollar metacharacters. It may also
+       alter the way the match starting position is  advanced  after  a  match
+       failure for an unanchored pattern.
+
+       When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is
+       set as the newline convention, and a match attempt  for  an  unanchored
+       pattern fails when the current starting position is at a CRLF sequence,
+       and the pattern contains no explicit matches for CR or  LF  characters,
+       the  match  position  is  advanced by two characters instead of one, in
+       other words, to after the CRLF.
+
+       The above rule is a compromise that makes the most common cases work as
+       expected.  For  example,  if  the  pattern is .+A (and the PCRE2_DOTALL
+       option is not set), it does not match the string "\r\nA" because, after
+       failing  at the start, it skips both the CR and the LF before retrying.
+       However, the pattern [\r\n]A does match that string,  because  it  con-
+       tains an explicit CR or LF reference, and so advances only by one char-
+       acter after the first failure.
+
+       An explicit match for CR of LF is either a literal appearance of one of
+       those  characters  in the pattern, or one of the \r or \n or equivalent
+       octal or hexadecimal escape sequences. Implicit matches such as [^X] do
+       not  count, nor does \s, even though it includes CR and LF in the char-
+       acters that it matches.
+
+       Notwithstanding the above, anomalous effects may still occur when  CRLF
+       is a valid newline sequence and explicit \r or \n escapes appear in the
+       pattern.
+
+
+HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
+
+       uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data);
+
+       PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data);
+
+       In general, a pattern matches a certain portion of the subject, and  in
+       addition,  further  substrings  from  the  subject may be picked out by
+       parenthesized parts of the pattern.  Following  the  usage  in  Jeffrey
+       Friedl's  book,  this  is  called  "capturing" in what follows, and the
+       phrase "capturing subpattern" or "capturing group" is used for a  frag-
+       ment  of  a  pattern that picks out a substring. PCRE2 supports several
+       other kinds of parenthesized subpattern that do not cause substrings to
+       be  captured. The pcre2_pattern_info() function can be used to find out
+       how many capturing subpatterns there are in a compiled pattern.
+
+       You can use auxiliary functions for accessing  captured  substrings  by
+       number or by name, as described in sections below.
+
+       Alternatively, you can make direct use of the vector of PCRE2_SIZE val-
+       ues, called  the  ovector,  which  contains  the  offsets  of  captured
+       strings.   It   is   part  of  the  match  data  block.   The  function
+       pcre2_get_ovector_pointer() returns the address  of  the  ovector,  and
+       pcre2_get_ovector_count() returns the number of pairs of values it con-
+       tains.
+
+       Within the ovector, the first in each pair of values is set to the off-
+       set of the first code unit of a substring, and the second is set to the
+       offset of the first code unit after the end of a substring. These  val-
+       ues  are always code unit offsets, not character offsets. That is, they
+       are byte offsets in the 8-bit library, 16-bit  offsets  in  the  16-bit
+       library, and 32-bit offsets in the 32-bit library.
+
+       After  a  partial  match  (error  return PCRE2_ERROR_PARTIAL), only the
+       first pair of offsets (that is, ovector[0]  and  ovector[1])  are  set.
+       They  identify  the part of the subject that was partially matched. See
+       the pcre2partial documentation for details of partial matching.
+
+       After a fully successful match, the first pair  of  offsets  identifies
+       the  portion  of the subject string that was matched by the entire pat-
+       tern. The next pair is used for the first captured  substring,  and  so
+       on.  The  value  returned by pcre2_match() is one more than the highest
+       numbered pair that has been set. For example, if  two  substrings  have
+       been  captured,  the returned value is 3. If there are no captured sub-
+       strings, the return value from a successful match is 1, indicating that
+       just the first pair of offsets has been set.
+
+       If  a  pattern uses the \K escape sequence within a positive assertion,
+       the reported start of a successful match can be greater than the end of
+       the  match.   For  example,  if the pattern (?=ab\K) is matched against
+       "ab", the start and end offset values for the match are 2 and 0.
+
+       If a capturing subpattern group is matched repeatedly within  a  single
+       match  operation, it is the last portion of the subject that it matched
+       that is returned.
+
+       If the ovector is too small to hold all the captured substring offsets,
+       as  much  as possible is filled in, and the function returns a value of
+       zero. If captured substrings are not of interest, pcre2_match() may  be
+       called with a match data block whose ovector is of minimum length (that
+       is, one pair).
+
+       It is possible for capturing subpattern number n+1 to match  some  part
+       of the subject when subpattern n has not been used at all. For example,
+       if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
+       return from the function is 4, and subpatterns 1 and 3 are matched, but
+       2 is not. When this happens, both values in  the  offset  pairs  corre-
+       sponding to unused subpatterns are set to PCRE2_UNSET.
+
+       Offset  values  that correspond to unused subpatterns at the end of the
+       expression are also set to PCRE2_UNSET.  For  example,  if  the  string
+       "abc" is matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3
+       are not matched.  The return from the function is 2, because the  high-
+       est used capturing subpattern number is 1. The offsets for for the sec-
+       ond and third capturing  subpatterns  (assuming  the  vector  is  large
+       enough, of course) are set to PCRE2_UNSET.
+
+       Elements in the ovector that do not correspond to capturing parentheses
+       in the pattern are never changed. That is, if a pattern contains n cap-
+       turing parentheses, no more than ovector[0] to ovector[2n+1] are set by
+       pcre2_match(). The other elements retain whatever  values  they  previ-
+       ously  had.  After  a failed match attempt, the contents of the ovector
+       are unchanged.
+
+
+OTHER INFORMATION ABOUT A MATCH
+
+       PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data);
+
+       PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data);
+
+       As well as the offsets in the ovector, other information about a  match
+       is  retained  in the match data block and can be retrieved by the above
+       functions in appropriate circumstances. If they  are  called  at  other
+       times, the result is undefined.
+
+       After  a  successful match, a partial match (PCRE2_ERROR_PARTIAL), or a
+       failure to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN)
+       name  may  be available. The function pcre2_get_mark() can be called to
+       access this name. The same function applies  to  all  three  verbs.  It
+       returns a pointer to the zero-terminated name, which is within the com-
+       piled pattern. If no name is available, NULL is returned. The length of
+       the  name  (excluding  the terminating zero) is stored in the code unit
+       that precedes the name. You should use this length instead  of  relying
+       on the terminating zero if the name might contain a binary zero.
+
+       After  a  successful  match,  the  name  that  is  returned is the last
+       (*MARK), (*PRUNE), or (*THEN) name encountered  on  the  matching  path
+       through  the  pattern.  Instances of (*PRUNE) and (*THEN) without names
+       are  ignored.  Thus,  for  example,  if  the  matching  path   contains
+       (*MARK:A)(*PRUNE),  the  name "A" is returned.  After a "no match" or a
+       partial match, the last encountered name  is  returned.   For  example,
+       consider this pattern:
+
+         ^(*MARK:A)((*MARK:B)a|b)c
+
+       When  it  matches "bc", the returned name is A. The B mark is "seen" in
+       the first branch of the group, but it is not on the matching  path.  On
+       the  other  hand,  when  this pattern fails to match "bx", the returned
+       name is B.
+
+       Warning: By default, certain start-of-match optimizations are  used  to
+       give  a  fast "no match" result in some situations. For example, if the
+       anchoring is removed from the pattern above, there is an initial  check
+       for  the  presence  of  "c"  in the subject before running the matching
+       engine. This check fails for "bx", causing a match failure without see-
+       ing any marks. You can disable the start-of-match optimizations by set-
+       ting the PCRE2_NO_START_OPTIMIZE option for pcre2_compile() or starting
+       the pattern with (*NO_START_OPT).
+
+       After  a  successful  match, a partial match, or one of the invalid UTF
+       errors (for example, PCRE2_ERROR_UTF8_ERR5), pcre2_get_startchar()  can
+       be called. After a successful or partial match it returns the code unit
+       offset of the character at which the match started. For  a  non-partial
+       match,  this can be different to the value of ovector[0] if the pattern
+       contains the \K escape sequence. After a partial match,  however,  this
+       value  is  always the same as ovector[0] because \K does not affect the
+       result of a partial match.
+
+       After a UTF check failure, pcre2_get_startchar() can be used to  obtain
+       the code unit offset of the invalid UTF character. Details are given in
+       the pcre2unicode page.
+
+
+ERROR RETURNS FROM pcre2_match()
+
+       If pcre2_match() fails, it returns a negative number. This can be  con-
+       verted  to a text string by calling the pcre2_get_error_message() func-
+       tion (see "Obtaining a textual error message" below).   Negative  error
+       codes  are  also  returned  by other functions, and are documented with
+       them. The codes are given names in the header file. If UTF checking  is
+       in force and an invalid UTF subject string is detected, one of a number
+       of UTF-specific negative error codes is returned. Details are given  in
+       the  pcre2unicode  page. The following are the other errors that may be
+       returned by pcre2_match():
+
+         PCRE2_ERROR_NOMATCH
+
+       The subject string did not match the pattern.
+
+         PCRE2_ERROR_PARTIAL
+
+       The subject string did not match, but it did match partially.  See  the
+       pcre2partial documentation for details of partial matching.
+
+         PCRE2_ERROR_BADMAGIC
+
+       PCRE2 stores a 4-byte "magic number" at the start of the compiled code,
+       to catch the case when it is passed a junk pointer. This is  the  error
+       that is returned when the magic number is not present.
+
+         PCRE2_ERROR_BADMODE
+
+       This  error is given when a compiled pattern is passed to a function in
+       a library of a different code unit width, for example, a  pattern  com-
+       piled  by  the  8-bit  library  is passed to a 16-bit or 32-bit library
+       function.
+
+         PCRE2_ERROR_BADOFFSET
+
+       The value of startoffset was greater than the length of the subject.
+
+         PCRE2_ERROR_BADOPTION
+
+       An unrecognized bit was set in the options argument.
+
+         PCRE2_ERROR_BADUTFOFFSET
+
+       The UTF code unit sequence that was passed as a subject was checked and
+       found  to be valid (the PCRE2_NO_UTF_CHECK option was not set), but the
+       value of startoffset did not point to the beginning of a UTF  character
+       or the end of the subject.
+
+         PCRE2_ERROR_CALLOUT
+
+       This  error  is never generated by pcre2_match() itself. It is provided
+       for use by callout  functions  that  want  to  cause  pcre2_match()  or
+       pcre2_callout_enumerate()  to  return a distinctive error code. See the
+       pcre2callout documentation for details.
+
+         PCRE2_ERROR_DEPTHLIMIT
+
+       The nested backtracking depth limit was reached.
+
+         PCRE2_ERROR_HEAPLIMIT
+
+       The heap limit was reached.
+
+         PCRE2_ERROR_INTERNAL
+
+       An unexpected internal error has occurred. This error could  be  caused
+       by a bug in PCRE2 or by overwriting of the compiled pattern.
+
+         PCRE2_ERROR_JIT_STACKLIMIT
+
+       This  error  is  returned  when a pattern that was successfully studied
+       using JIT is being matched, but the memory available for  the  just-in-
+       time  processing stack is not large enough. See the pcre2jit documenta-
+       tion for more details.
+
+         PCRE2_ERROR_MATCHLIMIT
+
+       The backtracking match limit was reached.
+
+         PCRE2_ERROR_NOMEMORY
+
+       If a pattern contains many nested backtracking points, heap  memory  is
+       used  to  remember them. This error is given when the memory allocation
+       function (default or  custom)  fails.  Note  that  a  different  error,
+       PCRE2_ERROR_HEAPLIMIT,  is given if the amount of memory needed exceeds
+       the heap limit.
+
+         PCRE2_ERROR_NULL
+
+       Either the code, subject, or match_data argument was passed as NULL.
+
+         PCRE2_ERROR_RECURSELOOP
+
+       This error is returned when  pcre2_match()  detects  a  recursion  loop
+       within  the  pattern. Specifically, it means that either the whole pat-
+       tern or a subpattern has been called recursively for the second time at
+       the  same  position  in  the  subject string. Some simple patterns that
+       might do this are detected and faulted at compile time, but  more  com-
+       plicated  cases,  in particular mutual recursions between two different
+       subpatterns, cannot be detected until matching is attempted.
+
+
+OBTAINING A TEXTUAL ERROR MESSAGE
+
+       int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
+         PCRE2_SIZE bufflen);
+
+       A text message for an error code  from  any  PCRE2  function  (compile,
+       match,  or  auxiliary)  can be obtained by calling pcre2_get_error_mes-
+       sage(). The code is passed as the first argument,  with  the  remaining
+       two  arguments  specifying  a  code  unit buffer and its length in code
+       units, into which the text message is placed. The message  is  returned
+       in  code  units  of the appropriate width for the library that is being
+       used.
+
+       The returned message is terminated with a trailing zero, and the  func-
+       tion  returns  the  number  of  code units used, excluding the trailing
+       zero.  If  the  error  number  is  unknown,  the  negative  error  code
+       PCRE2_ERROR_BADDATA  is  returned. If the buffer is too small, the mes-
+       sage is truncated (but still with a trailing zero),  and  the  negative
+       error  code PCRE2_ERROR_NOMEMORY is returned.  None of the messages are
+       very long; a buffer size of 120 code units is ample.
+
+
+EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
+
+       int pcre2_substring_length_bynumber(pcre2_match_data *match_data,
+         uint32_t number, PCRE2_SIZE *length);
+
+       int pcre2_substring_copy_bynumber(pcre2_match_data *match_data,
+         uint32_t number, PCRE2_UCHAR *buffer,
+         PCRE2_SIZE *bufflen);
+
+       int pcre2_substring_get_bynumber(pcre2_match_data *match_data,
+         uint32_t number, PCRE2_UCHAR **bufferptr,
+         PCRE2_SIZE *bufflen);
+
+       void pcre2_substring_free(PCRE2_UCHAR *buffer);
+
+       Captured substrings can be accessed directly by using  the  ovector  as
+       described above.  For convenience, auxiliary functions are provided for
+       extracting  captured  substrings  as  new,  separate,   zero-terminated
+       strings. A substring that contains a binary zero is correctly extracted
+       and has a further zero added on the end, but  the  result  is  not,  of
+       course, a C string.
+
+       The functions in this section identify substrings by number. The number
+       zero refers to the entire matched substring, with higher numbers refer-
+       ring  to  substrings  captured by parenthesized groups. After a partial
+       match, only substring zero is available.  An  attempt  to  extract  any
+       other  substring  gives the error PCRE2_ERROR_PARTIAL. The next section
+       describes similar functions for extracting captured substrings by name.
+
+       If a pattern uses the \K escape sequence within a  positive  assertion,
+       the reported start of a successful match can be greater than the end of
+       the match.  For example, if the pattern  (?=ab\K)  is  matched  against
+       "ab",  the  start  and  end offset values for the match are 2 and 0. In
+       this situation, calling these functions with a  zero  substring  number
+       extracts a zero-length empty string.
+
+       You  can  find the length in code units of a captured substring without
+       extracting it by calling pcre2_substring_length_bynumber().  The  first
+       argument  is a pointer to the match data block, the second is the group
+       number, and the third is a pointer to a variable into which the  length
+       is  placed.  If  you just want to know whether or not the substring has
+       been captured, you can pass the third argument as NULL.
+
+       The pcre2_substring_copy_bynumber() function  copies  a  captured  sub-
+       string  into  a supplied buffer, whereas pcre2_substring_get_bynumber()
+       copies it into new memory, obtained using the  same  memory  allocation
+       function  that  was  used for the match data block. The first two argu-
+       ments of these functions are a pointer to the match data  block  and  a
+       capturing group number.
+
+       The final arguments of pcre2_substring_copy_bynumber() are a pointer to
+       the buffer and a pointer to a variable that contains its length in code
+       units.  This is updated to contain the actual number of code units used
+       for the extracted substring, excluding the terminating zero.
+
+       For pcre2_substring_get_bynumber() the third and fourth arguments point
+       to  variables that are updated with a pointer to the new memory and the
+       number of code units that comprise the substring, again  excluding  the
+       terminating  zero.  When  the substring is no longer needed, the memory
+       should be freed by calling pcre2_substring_free().
+
+       The return value from all these functions is zero  for  success,  or  a
+       negative  error  code.  If  the pattern match failed, the match failure
+       code is returned.  If a substring number  greater  than  zero  is  used
+       after  a partial match, PCRE2_ERROR_PARTIAL is returned. Other possible
+       error codes are:
+
+         PCRE2_ERROR_NOMEMORY
+
+       The buffer was too small for  pcre2_substring_copy_bynumber(),  or  the
+       attempt to get memory failed for pcre2_substring_get_bynumber().
+
+         PCRE2_ERROR_NOSUBSTRING
+
+       There  is  no  substring  with that number in the pattern, that is, the
+       number is greater than the number of capturing parentheses.
+
+         PCRE2_ERROR_UNAVAILABLE
+
+       The substring number, though not greater than the number of captures in
+       the pattern, is greater than the number of slots in the ovector, so the
+       substring could not be captured.
+
+         PCRE2_ERROR_UNSET
+
+       The substring did not participate in the match.  For  example,  if  the
+       pattern  is  (abc)|(def) and the subject is "def", and the ovector con-
+       tains at least two capturing slots, substring number 1 is unset.
+
+
+EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS
+
+       int pcre2_substring_list_get(pcre2_match_data *match_data,
+         PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr);
+
+       void pcre2_substring_list_free(PCRE2_SPTR *list);
+
+       The pcre2_substring_list_get() function  extracts  all  available  sub-
+       strings  and  builds  a  list of pointers to them. It also (optionally)
+       builds a second list that  contains  their  lengths  (in  code  units),
+       excluding a terminating zero that is added to each of them. All this is
+       done in a single block of memory that is obtained using the same memory
+       allocation function that was used to get the match data block.
+
+       This  function  must be called only after a successful match. If called
+       after a partial match, the error code PCRE2_ERROR_PARTIAL is returned.
+
+       The address of the memory block is returned via listptr, which is  also
+       the start of the list of string pointers. The end of the list is marked
+       by a NULL pointer. The address of the list of lengths is  returned  via
+       lengthsptr.  If your strings do not contain binary zeros and you do not
+       therefore need the lengths, you may supply NULL as the lengthsptr argu-
+       ment  to  disable  the  creation of a list of lengths. The yield of the
+       function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the  mem-
+       ory  block could not be obtained. When the list is no longer needed, it
+       should be freed by calling pcre2_substring_list_free().
+
+       If this function encounters a substring that is unset, which can happen
+       when  capturing subpattern number n+1 matches some part of the subject,
+       but subpattern n has not been used at all, it returns an empty  string.
+       This  can  be  distinguished  from  a  genuine zero-length substring by
+       inspecting  the  appropriate  offset  in  the  ovector,  which  contain
+       PCRE2_UNSET   for   unset   substrings,   or   by   calling  pcre2_sub-
+       string_length_bynumber().
+
+
+EXTRACTING CAPTURED SUBSTRINGS BY NAME
+
+       int pcre2_substring_number_from_name(const pcre2_code *code,
+         PCRE2_SPTR name);
+
+       int pcre2_substring_length_byname(pcre2_match_data *match_data,
+         PCRE2_SPTR name, PCRE2_SIZE *length);
+
+       int pcre2_substring_copy_byname(pcre2_match_data *match_data,
+         PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen);
+
+       int pcre2_substring_get_byname(pcre2_match_data *match_data,
+         PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen);
+
+       void pcre2_substring_free(PCRE2_UCHAR *buffer);
+
+       To extract a substring by name, you first have to find associated  num-
+       ber.  For example, for this pattern:
+
+         (a+)b(?<xxx>\d+)...
+
+       the number of the subpattern called "xxx" is 2. If the name is known to
+       be unique (PCRE2_DUPNAMES was not set), you can find  the  number  from
+       the name by calling pcre2_substring_number_from_name(). The first argu-
+       ment is the compiled pattern, and the second is the name. The yield  of
+       the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
+       is no subpattern of  that  name,  or  PCRE2_ERROR_NOUNIQUESUBSTRING  if
+       there  is  more than one subpattern of that name. Given the number, you
+       can extract the substring directly from the ovector, or use one of  the
+       "bynumber" functions described above.
+
+       For  convenience,  there are also "byname" functions that correspond to
+       the "bynumber" functions, the only difference  being  that  the  second
+       argument  is  a  name instead of a number. If PCRE2_DUPNAMES is set and
+       there are duplicate names, these functions scan all the groups with the
+       given name, and return the first named string that is set.
+
+       If  there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
+       returned. If all groups with the name have  numbers  that  are  greater
+       than  the  number  of  slots in the ovector, PCRE2_ERROR_UNAVAILABLE is
+       returned. If there is at least one group with a slot  in  the  ovector,
+       but no group is found to be set, PCRE2_ERROR_UNSET is returned.
+
+       Warning: If the pattern uses the (?| feature to set up multiple subpat-
+       terns with the same number, as described in the  section  on  duplicate
+       subpattern  numbers  in  the pcre2pattern page, you cannot use names to
+       distinguish the different subpatterns, because names are  not  included
+       in  the compiled code. The matching process uses only numbers. For this
+       reason, the use of different names for subpatterns of the  same  number
+       causes an error at compile time.
+
+
+CREATING A NEW STRING WITH SUBSTITUTIONS
+
+       int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext, PCRE2_SPTR replacement,
+         PCRE2_SIZE rlength, PCRE2_UCHAR *outputbufferP,
+         PCRE2_SIZE *outlengthptr);
+
+       This  function calls pcre2_match() and then makes a copy of the subject
+       string in outputbuffer, replacing the part that was  matched  with  the
+       replacement  string,  whose  length is supplied in rlength. This can be
+       given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
+       which  a  \K item in a lookahead in the pattern causes the match to end
+       before it starts are not supported, and give rise to an  error  return.
+       For global replacements, matches in which \K in a lookbehind causes the
+       match to start earlier than the point that was reached in the  previous
+       iteration are also not supported.
+
+       The  first  seven  arguments  of pcre2_substitute() are the same as for
+       pcre2_match(), except that the partial matching options are not permit-
+       ted,  and  match_data may be passed as NULL, in which case a match data
+       block is obtained and freed within this function, using memory  manage-
+       ment  functions from the match context, if provided, or else those that
+       were used to allocate memory for the compiled code.
+
+       If an external match_data block is provided,  its  contents  afterwards
+       are those set by the final call to pcre2_match(), which will have ended
+       in a matching error. The contents of the ovector within the match  data
+       block may or may not have been changed.
+
+       The  outlengthptr  argument  must point to a variable that contains the
+       length, in code units, of the output buffer. If the  function  is  suc-
+       cessful,  the value is updated to contain the length of the new string,
+       excluding the trailing zero that is automatically added.
+
+       If the function is not  successful,  the  value  set  via  outlengthptr
+       depends  on  the  type  of  error. For syntax errors in the replacement
+       string, the value is the offset in the  replacement  string  where  the
+       error  was  detected.  For  other  errors,  the value is PCRE2_UNSET by
+       default. This includes the case of the output buffer being  too  small,
+       unless  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  is  set (see below), in which
+       case the value is the minimum length needed, including  space  for  the
+       trailing  zero.  Note  that  in  order  to compute the required length,
+       pcre2_substitute() has  to  simulate  all  the  matching  and  copying,
+       instead of giving an error return as soon as the buffer overflows. Note
+       also that the length is in code units, not bytes.
+
+       In the replacement string, which is interpreted as a UTF string in  UTF
+       mode,  and  is  checked  for UTF validity unless the PCRE2_NO_UTF_CHECK
+       option is set, a dollar character is an escape character that can spec-
+       ify  the  insertion  of  characters  from  capturing groups or (*MARK),
+       (*PRUNE), or (*THEN) items in the  pattern.  The  following  forms  are
+       always recognized:
+
+         $$                  insert a dollar character
+         $<n> or ${<n>}      insert the contents of group <n>
+         $*MARK or ${*MARK}  insert a (*MARK), (*PRUNE), or (*THEN) name
+
+       Either  a  group  number  or  a  group name can be given for <n>. Curly
+       brackets are required only if the following character would  be  inter-
+       preted as part of the number or name. The number may be zero to include
+       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
+       matched  with "=abc=" and the replacement string "+$1$0$1+", the result
+       is "=+babcb+=".
+
+       $*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or
+       (*THEN)  on  the  matching  path  that  has a name. (*MARK) must always
+       include a name, but (*PRUNE) and (*THEN) need not. For example, in  the
+       case   of   (*MARK:A)(*PRUNE)   the  name  inserted  is  "A",  but  for
+       (*MARK:A)(*PRUNE:B) the relevant name is "B".   This  facility  can  be
+       used  to  perform  simple simultaneous substitutions, as this pcre2test
+       example shows:
+
+         /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
+             apple lemon
+          2: pear orange
+
+       As well as the usual options for pcre2_match(), a number of  additional
+       options can be set in the options argument of pcre2_substitute().
+
+       PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
+       string, replacing every matching substring. If this option is not  set,
+       only  the  first matching substring is replaced. The search for matches
+       takes place in the original subject string (that is, previous  replace-
+       ments  do  not  affect  it).  Iteration is implemented by advancing the
+       startoffset value for each search, which is always  passed  the  entire
+       subject string. If an offset limit is set in the match context, search-
+       ing stops when that limit is reached.
+
+       You can restrict the effect of a global substitution to  a  portion  of
+       the subject string by setting either or both of startoffset and an off-
+       set limit. Here is a pcre2test example:
+
+         /B/g,replace=!,use_offset_limit
+         ABC ABC ABC ABC\=offset=3,offset_limit=12
+          2: ABC A!C A!C ABC
+
+       When continuing with global substitutions after  matching  a  substring
+       with zero length, an attempt to find a non-empty match at the same off-
+       set is performed.  If this is not successful, the offset is advanced by
+       one character except when CRLF is a valid newline sequence and the next
+       two characters are CR, LF. In this case, the offset is advanced by  two
+       characters.
+
+       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  changes  what happens when the output
+       buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
+       ORY  immediately.  If  this  option is set, however, pcre2_substitute()
+       continues to go through the motions of matching and substituting (with-
+       out,  of course, writing anything) in order to compute the size of buf-
+       fer that is needed. This value is  passed  back  via  the  outlengthptr
+       variable,    with    the   result   of   the   function   still   being
+       PCRE2_ERROR_NOMEMORY.
+
+       Passing a buffer size of zero is a permitted way  of  finding  out  how
+       much  memory  is needed for given substitution. However, this does mean
+       that the entire operation is carried out twice. Depending on the appli-
+       cation,  it  may  be more efficient to allocate a large buffer and free
+       the  excess  afterwards,  instead   of   using   PCRE2_SUBSTITUTE_OVER-
+       FLOW_LENGTH.
+
+       PCRE2_SUBSTITUTE_UNKNOWN_UNSET  causes  references  to capturing groups
+       that do not appear in the pattern to be treated as unset  groups.  This
+       option  should  be  used  with  care, because it means that a typo in a
+       group name or  number  no  longer  causes  the  PCRE2_ERROR_NOSUBSTRING
+       error.
+
+       PCRE2_SUBSTITUTE_UNSET_EMPTY  causes  unset capturing groups (including
+       unknown  groups  when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)  to  be
+       treated  as  empty  strings  when  inserted as described above. If this
+       option is not set, an attempt to  insert  an  unset  group  causes  the
+       PCRE2_ERROR_UNSET  error.  This  option does not influence the extended
+       substitution syntax described below.
+
+       PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to  the
+       replacement  string.  Without this option, only the dollar character is
+       special, and only the group insertion forms  listed  above  are  valid.
+       When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
+
+       Firstly,  backslash in a replacement string is interpreted as an escape
+       character. The usual forms such as \n or \x{ddd} can be used to specify
+       particular  character codes, and backslash followed by any non-alphanu-
+       meric character quotes that character. Extended quoting  can  be  coded
+       using \Q...\E, exactly as in pattern strings.
+
+       There  are  also four escape sequences for forcing the case of inserted
+       letters.  The insertion mechanism has three states:  no  case  forcing,
+       force upper case, and force lower case. The escape sequences change the
+       current state: \U and \L change to upper or lower case forcing, respec-
+       tively,  and  \E (when not terminating a \Q quoted sequence) reverts to
+       no case forcing. The sequences \u and \l force the next  character  (if
+       it  is  a  letter)  to  upper or lower case, respectively, and then the
+       state automatically reverts to no case forcing. Case forcing applies to
+       all inserted  characters, including those from captured groups and let-
+       ters within \Q...\E quoted sequences.
+
+       Note that case forcing sequences such as \U...\E do not nest. For exam-
+       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
+       \E has no effect.
+
+       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
+       flexibility  to  group substitution. The syntax is similar to that used
+       by Bash:
+
+         ${<n>:-<string>}
+         ${<n>:+<string1>:<string2>}
+
+       As before, <n> may be a group number or a name. The first  form  speci-
+       fies  a  default  value. If group <n> is set, its value is inserted; if
+       not, <string> is expanded and the  result  inserted.  The  second  form
+       specifies  strings that are expanded and inserted when group <n> is set
+       or unset, respectively. The first form is just a  convenient  shorthand
+       for
+
+         ${<n>:+${<n>}:<string>}
+
+       Backslash  can  be  used to escape colons and closing curly brackets in
+       the replacement strings. A change of the case forcing  state  within  a
+       replacement  string  remains  in  force  afterwards,  as  shown in this
+       pcre2test example:
+
+         /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
+             body
+          1: hello
+             somebody
+          1: HELLO
+
+       The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these  extended
+       substitutions.   However,   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  does  cause
+       unknown groups in the extended syntax forms to be treated as unset.
+
+       If successful, pcre2_substitute() returns the  number  of  replacements
+       that were made. This may be zero if no matches were found, and is never
+       greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
+
+       In the event of an error, a negative error code is returned. Except for
+       PCRE2_ERROR_NOMATCH    (which   is   never   returned),   errors   from
+       pcre2_match() are passed straight back.
+
+       PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
+       tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
+
+       PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
+       ing an unknown substring when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)
+       when  the  simple  (non-extended)  syntax  is  used  and  PCRE2_SUBSTI-
+       TUTE_UNSET_EMPTY is not set.
+
+       PCRE2_ERROR_NOMEMORY is returned  if  the  output  buffer  is  not  big
+       enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
+       of buffer that is needed is returned via outlengthptr. Note  that  this
+       does not happen by default.
+
+       PCRE2_ERROR_BADREPLACEMENT  is  used for miscellaneous syntax errors in
+       the   replacement   string,   with   more   particular   errors   being
+       PCRE2_ERROR_BADREPESCAPE  (invalid  escape  sequence), PCRE2_ERROR_REP-
+       MISSINGBRACE (closing curly bracket not found),  PCRE2_ERROR_BADSUBSTI-
+       TUTION   (syntax   error   in   extended   group   substitution),   and
+       PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before  it  started
+       or  the match started earlier than the current position in the subject,
+       which can happen if \K is used in an assertion).
+
+       As for all PCRE2 errors, a text message that describes the error can be
+       obtained   by   calling  the  pcre2_get_error_message()  function  (see
+       "Obtaining a textual error message" above).
+
+
+DUPLICATE SUBPATTERN NAMES
+
+       int pcre2_substring_nametable_scan(const pcre2_code *code,
+         PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
+
+       When a pattern is compiled with the PCRE2_DUPNAMES  option,  names  for
+       subpatterns  are  not required to be unique. Duplicate names are always
+       allowed for subpatterns with the same number, created by using the  (?|
+       feature.  Indeed,  if  such subpatterns are named, they are required to
+       use the same names.
+
+       Normally, patterns with duplicate names are such that in any one match,
+       only  one of the named subpatterns participates. An example is shown in
+       the pcre2pattern documentation.
+
+       When  duplicates   are   present,   pcre2_substring_copy_byname()   and
+       pcre2_substring_get_byname()  return  the first substring corresponding
+       to  the  given  name  that  is  set.  Only   if   none   are   set   is
+       PCRE2_ERROR_UNSET  is  returned. The pcre2_substring_number_from_name()
+       function returns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are
+       duplicate names.
+
+       If  you want to get full details of all captured substrings for a given
+       name, you must use the pcre2_substring_nametable_scan()  function.  The
+       first  argument is the compiled pattern, and the second is the name. If
+       the third and fourth arguments are NULL, the function returns  a  group
+       number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
+
+       When the third and fourth arguments are not NULL, they must be pointers
+       to variables that are updated by the function. After it has  run,  they
+       point to the first and last entries in the name-to-number table for the
+       given name, and the function returns the length of each entry  in  code
+       units.  In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
+       no entries for the given name.
+
+       The format of the name table is described above in the section entitled
+       Information  about  a  pattern.  Given all the relevant entries for the
+       name, you can extract each of their numbers,  and  hence  the  captured
+       data.
+
+
+FINDING ALL POSSIBLE MATCHES AT ONE POSITION
+
+       The  traditional  matching  function  uses a similar algorithm to Perl,
+       which stops when it finds the first match at a given point in the  sub-
+       ject. If you want to find all possible matches, or the longest possible
+       match at a given position,  consider  using  the  alternative  matching
+       function  (see  below) instead. If you cannot use the alternative func-
+       tion, you can kludge it up by making use of the callout facility, which
+       is described in the pcre2callout documentation.
+
+       What you have to do is to insert a callout right at the end of the pat-
+       tern.  When your callout function is called, extract and save the  cur-
+       rent  matched  substring.  Then return 1, which forces pcre2_match() to
+       backtrack and try other alternatives. Ultimately, when it runs  out  of
+       matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
+
+
+MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
+
+       int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject,
+         PCRE2_SIZE length, PCRE2_SIZE startoffset,
+         uint32_t options, pcre2_match_data *match_data,
+         pcre2_match_context *mcontext,
+         int *workspace, PCRE2_SIZE wscount);
+
+       The  function  pcre2_dfa_match()  is  called  to match a subject string
+       against a compiled pattern, using a matching algorithm that  scans  the
+       subject string just once (not counting lookaround assertions), and does
+       not backtrack.  This has different characteristics to the normal  algo-
+       rithm,  and  is not compatible with Perl. Some of the features of PCRE2
+       patterns are not supported.  Nevertheless, there are  times  when  this
+       kind  of  matching  can be useful. For a discussion of the two matching
+       algorithms, and a list of features that pcre2_dfa_match() does not sup-
+       port, see the pcre2matching documentation.
+
+       The  arguments  for  the pcre2_dfa_match() function are the same as for
+       pcre2_match(), plus two extras. The ovector within the match data block
+       is used in a different way, and this is described below. The other com-
+       mon arguments are used in the same way as for pcre2_match(),  so  their
+       description is not repeated here.
+
+       The  two  additional  arguments provide workspace for the function. The
+       workspace vector should contain at least 20 elements. It  is  used  for
+       keeping  track  of  multiple  paths  through  the  pattern  tree.  More
+       workspace is needed for patterns and subjects where there are a lot  of
+       potential matches.
+
+       Here is an example of a simple call to pcre2_dfa_match():
+
+         int wspace[20];
+         pcre2_match_data *md = pcre2_match_data_create(4, NULL);
+         int rc = pcre2_dfa_match(
+           re,             /* result of pcre2_compile() */
+           "some string",  /* the subject string */
+           11,             /* the length of the subject string */
+           0,              /* start at offset 0 in the subject */
+           0,              /* default options */
+           md,             /* the match data block */
+           NULL,           /* a match context; NULL means use defaults */
+           wspace,         /* working space vector */
+           20);            /* number of elements (NOT size in bytes) */
+
+   Option bits for pcre_dfa_match()
+
+       The  unused  bits of the options argument for pcre2_dfa_match() must be
+       zero. The only bits that may be set  are  PCRE2_ANCHORED,  PCRE2_ENDAN-
+       CHORED,        PCRE2_NOTBOL,        PCRE2_NOTEOL,       PCRE2_NOTEMPTY,
+       PCRE2_NOTEMPTY_ATSTART,     PCRE2_NO_UTF_CHECK,     PCRE2_PARTIAL_HARD,
+       PCRE2_PARTIAL_SOFT,  PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but
+       the last four of these are exactly the same as  for  pcre2_match(),  so
+       their description is not repeated here.
+
+         PCRE2_PARTIAL_HARD
+         PCRE2_PARTIAL_SOFT
+
+       These  have  the  same general effect as they do for pcre2_match(), but
+       the details are slightly different. When PCRE2_PARTIAL_HARD is set  for
+       pcre2_dfa_match(),  it  returns  PCRE2_ERROR_PARTIAL  if the end of the
+       subject is reached and there is still at least one matching possibility
+       that requires additional characters. This happens even if some complete
+       matches have already been found. When PCRE2_PARTIAL_SOFT  is  set,  the
+       return  code  PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
+       if the end of the subject is  reached,  there  have  been  no  complete
+       matches, but there is still at least one matching possibility. The por-
+       tion of the string that was inspected when the  longest  partial  match
+       was found is set as the first matching string in both cases. There is a
+       more detailed discussion of partial and  multi-segment  matching,  with
+       examples, in the pcre2partial documentation.
+
+         PCRE2_DFA_SHORTEST
+
+       Setting  the PCRE2_DFA_SHORTEST option causes the matching algorithm to
+       stop as soon as it has found one match. Because of the way the alterna-
+       tive  algorithm  works, this is necessarily the shortest possible match
+       at the first possible matching point in the subject string.
+
+         PCRE2_DFA_RESTART
+
+       When pcre2_dfa_match() returns a partial match, it is possible to  call
+       it again, with additional subject characters, and have it continue with
+       the same match. The PCRE2_DFA_RESTART option requests this action; when
+       it  is  set,  the workspace and wscount options must reference the same
+       vector as before because data about the match so far is  left  in  them
+       after a partial match. There is more discussion of this facility in the
+       pcre2partial documentation.
+
+   Successful returns from pcre2_dfa_match()
+
+       When pcre2_dfa_match() succeeds, it may have matched more than one sub-
+       string in the subject. Note, however, that all the matches from one run
+       of the function start at the same point in  the  subject.  The  shorter
+       matches  are all initial substrings of the longer matches. For example,
+       if the pattern
+
+         <.*>
+
+       is matched against the string
+
+         This is <something> <something else> <something further> no more
+
+       the three matched strings are
+
+         <something> <something else> <something further>
+         <something> <something else>
+         <something>
+
+       On success, the yield of the function is a number  greater  than  zero,
+       which  is  the  number  of  matched substrings. The offsets of the sub-
+       strings are returned in the ovector, and can be extracted by number  in
+       the  same way as for pcre2_match(), but the numbers bear no relation to
+       any capturing groups that may exist in the pattern, because DFA  match-
+       ing does not support group capture.
+
+       Calls  to  the  convenience  functions  that extract substrings by name
+       return the error PCRE2_ERROR_DFA_UFUNC (unsupported function)  if  used
+       after a DFA match. The convenience functions that extract substrings by
+       number never return PCRE2_ERROR_NOSUBSTRING.
+
+       The matched strings are stored in  the  ovector  in  reverse  order  of
+       length;  that  is,  the longest matching string is first. If there were
+       too many matches to fit into the ovector, the yield of the function  is
+       zero, and the vector is filled with the longest matches.
+
+       NOTE:  PCRE2's  "auto-possessification" optimization usually applies to
+       character repeats at the end of a pattern (as well as internally).  For
+       example,  the pattern "a\d+" is compiled as if it were "a\d++". For DFA
+       matching, this means that only one possible  match  is  found.  If  you
+       really  do  want multiple matches in such cases, either use an ungreedy
+       repeat such as "a\d+?" or set  the  PCRE2_NO_AUTO_POSSESS  option  when
+       compiling.
+
+   Error returns from pcre2_dfa_match()
+
+       The pcre2_dfa_match() function returns a negative number when it fails.
+       Many of the errors are the same  as  for  pcre2_match(),  as  described
+       above.  There are in addition the following errors that are specific to
+       pcre2_dfa_match():
+
+         PCRE2_ERROR_DFA_UITEM
+
+       This return is given if pcre2_dfa_match() encounters  an  item  in  the
+       pattern  that it does not support, for instance, the use of \C in a UTF
+       mode or a backreference.
+
+         PCRE2_ERROR_DFA_UCOND
+
+       This return is given if pcre2_dfa_match() encounters a  condition  item
+       that uses a backreference for the condition, or a test for recursion in
+       a specific group. These are not supported.
+
+         PCRE2_ERROR_DFA_WSSIZE
+
+       This return is given if pcre2_dfa_match() runs  out  of  space  in  the
+       workspace vector.
+
+         PCRE2_ERROR_DFA_RECURSE
+
+       When  a  recursive subpattern is processed, the matching function calls
+       itself recursively, using private memory for the ovector and workspace.
+       This  error  is given if the internal ovector is not large enough. This
+       should be extremely rare, as a vector of size 1000 is used.
+
+         PCRE2_ERROR_DFA_BADRESTART
+
+       When pcre2_dfa_match() is called  with  the  PCRE2_DFA_RESTART  option,
+       some  plausibility  checks  are  made on the contents of the workspace,
+       which should contain data about the previous partial match. If  any  of
+       these checks fail, this error is given.
+
+
+SEE ALSO
+
+       pcre2build(3),    pcre2callout(3),    pcre2demo(3),   pcre2matching(3),
+       pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 07 September 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2BUILD(3)              Library Functions Manual              PCRE2BUILD(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+BUILDING PCRE2
+
+       PCRE2  is distributed with a configure script that can be used to build
+       the library in Unix-like environments using the applications  known  as
+       Autotools. Also in the distribution are files to support building using
+       CMake instead of configure.  The  text  file  README  contains  general
+       information  about  building  with Autotools (some of which is repeated
+       below), and also has some comments about building on various  operating
+       systems.  There  is a lot more information about building PCRE2 without
+       using Autotools (including information about using CMake  and  building
+       "by  hand")  in  the  text file called NON-AUTOTOOLS-BUILD.  You should
+       consult this file as well as the README file if you are building  in  a
+       non-Unix-like environment.
+
+
+PCRE2 BUILD-TIME OPTIONS
+
+       The rest of this document describes the optional features of PCRE2 that
+       can be selected when the library is compiled. It  assumes  use  of  the
+       configure  script,  where  the  optional features are selected or dese-
+       lected by providing options to configure before running the  make  com-
+       mand.  However,  the same options can be selected in both Unix-like and
+       non-Unix-like environments if you are using CMake instead of  configure
+       to build PCRE2.
+
+       If  you  are not using Autotools or CMake, option selection can be done
+       by editing the config.h file, or by passing parameter settings  to  the
+       compiler, as described in NON-AUTOTOOLS-BUILD.
+
+       The complete list of options for configure (which includes the standard
+       ones such as the  selection  of  the  installation  directory)  can  be
+       obtained by running
+
+         ./configure --help
+
+       The  following  sections include descriptions of "on/off" options whose
+       names begin with --enable or --disable. Because of the way that config-
+       ure  works, --enable and --disable always come in pairs, so the comple-
+       mentary option always exists as well, but as it specifies the  default,
+       it is not described.  Options that specify values have names that start
+       with --with. At the end of a configure run, a summary of the configura-
+       tion is output.
+
+
+BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
+
+       By  default, a library called libpcre2-8 is built, containing functions
+       that take string arguments contained in arrays  of  bytes,  interpreted
+       either  as single-byte characters, or UTF-8 strings. You can also build
+       two other libraries, called libpcre2-16 and libpcre2-32, which  process
+       strings  that  are contained in arrays of 16-bit and 32-bit code units,
+       respectively. These can be interpreted either as single-unit characters
+       or  UTF-16/UTF-32 strings. To build these additional libraries, add one
+       or both of the following to the configure command:
+
+         --enable-pcre2-16
+         --enable-pcre2-32
+
+       If you do not want the 8-bit library, add
+
+         --disable-pcre2-8
+
+       as well. At least one of the three libraries must be built.  Note  that
+       the  POSIX wrapper is for the 8-bit library only, and that pcre2grep is
+       an 8-bit program. Neither of these are built if  you  select  only  the
+       16-bit or 32-bit libraries.
+
+
+BUILDING SHARED AND STATIC LIBRARIES
+
+       The  Autotools PCRE2 building process uses libtool to build both shared
+       and static libraries by default. You can suppress an  unwanted  library
+       by adding one of
+
+         --disable-shared
+         --disable-static
+
+       to the configure command.
+
+
+UNICODE AND UTF SUPPORT
+
+       By  default,  PCRE2 is built with support for Unicode and UTF character
+       strings.  To build it without Unicode support, add
+
+         --disable-unicode
+
+       to the configure command. This setting applies to all three  libraries.
+       It  is  not  possible  to  build  one library with Unicode support, and
+       another without, in the same configuration.
+
+       Of itself, Unicode support does not make PCRE2 treat strings as  UTF-8,
+       UTF-16 or UTF-32. To do that, applications that use the library can set
+       the PCRE2_UTF option when they call pcre2_compile() to compile  a  pat-
+       tern.   Alternatively,  patterns  may be started with (*UTF) unless the
+       application has locked this out by setting PCRE2_NEVER_UTF.
+
+       UTF support allows the libraries to process character code points up to
+       0x10ffff  in  the  strings that they handle. Unicode support also gives
+       access to the Unicode properties of characters, using  pattern  escapes
+       such as \P, \p, and \X. Only the general category properties such as Lu
+       and Nd are supported. Details are given in the pcre2pattern  documenta-
+       tion.
+
+       Pattern escapes such as \d and \w do not by default make use of Unicode
+       properties. The application can request that they  do  by  setting  the
+       PCRE2_UCP  option.  Unless  the  application has set PCRE2_NEVER_UCP, a
+       pattern may also request this by starting with (*UCP).
+
+
+DISABLING THE USE OF \C
+
+       The \C escape sequence, which matches a single code unit, even in a UTF
+       mode,  can  cause unpredictable behaviour because it may leave the cur-
+       rent matching point in the middle of a multi-code-unit  character.  The
+       application  can  lock  it  out  by setting the PCRE2_NEVER_BACKSLASH_C
+       option when calling pcre2_compile(). There is also a build-time option
+
+         --enable-never-backslash-C
+
+       (note the upper case C) which locks out the use of \C entirely.
+
+
+JUST-IN-TIME COMPILER SUPPORT
+
+       Just-in-time (JIT) compiler support is included in the build by  speci-
+       fying
+
+         --enable-jit
+
+       This  support  is available only for certain hardware architectures. If
+       this option is set for an unsupported architecture,  a  building  error
+       occurs.  If in doubt, use
+
+         --enable-jit=auto
+
+       which  enables  JIT  only if the current hardware is supported. You can
+       check if JIT is enabled in the configuration summary that is output  at
+       the  end  of a configure run. If you are enabling JIT under SELinux you
+       may also want to add
+
+         --enable-jit-sealloc
+
+       which enables the use of an execmem allocator in JIT that is compatible
+       with  SELinux.  This  has  no  effect  if  JIT  is not enabled. See the
+       pcre2jit documentation for a discussion of JIT usage. When JIT  support
+       is enabled, pcre2grep automatically makes use of it, unless you add
+
+         --disable-pcre2grep-jit
+
+       to the "configure" command.
+
+
+NEWLINE RECOGNITION
+
+       By  default, PCRE2 interprets the linefeed (LF) character as indicating
+       the end of a line. This is the normal newline  character  on  Unix-like
+       systems.  You can compile PCRE2 to use carriage return (CR) instead, by
+       adding
+
+         --enable-newline-is-cr
+
+       to the configure  command.  There  is  also  an  --enable-newline-is-lf
+       option, which explicitly specifies linefeed as the newline character.
+
+       Alternatively, you can specify that line endings are to be indicated by
+       the two-character sequence CRLF (CR immediately followed by LF). If you
+       want this, add
+
+         --enable-newline-is-crlf
+
+       to the configure command. There is a fourth option, specified by
+
+         --enable-newline-is-anycrlf
+
+       which  causes  PCRE2 to recognize any of the three sequences CR, LF, or
+       CRLF as indicating a line ending. A fifth option, specified by
+
+         --enable-newline-is-any
+
+       causes PCRE2 to recognize any Unicode  newline  sequence.  The  Unicode
+       newline sequences are the three just mentioned, plus the single charac-
+       ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
+       U+0085),  LS  (line  separator,  U+2028),  and PS (paragraph separator,
+       U+2029). The final option is
+
+         --enable-newline-is-nul
+
+       which causes NUL (binary zero) to be set  as  the  default  line-ending
+       character.
+
+       Whatever default line ending convention is selected when PCRE2 is built
+       can be overridden by applications that use the library. At  build  time
+       it is recommended to use the standard for your operating system.
+
+
+WHAT \R MATCHES
+
+       By  default,  the  sequence \R in a pattern matches any Unicode newline
+       sequence, independently of what has been selected as  the  line  ending
+       sequence. If you specify
+
+         --enable-bsr-anycrlf
+
+       the  default  is changed so that \R matches only CR, LF, or CRLF. What-
+       ever is selected when PCRE2 is built can be overridden by  applications
+       that use the library.
+
+
+HANDLING VERY LARGE PATTERNS
+
+       Within  a  compiled  pattern,  offset values are used to point from one
+       part to another (for example, from an opening parenthesis to an  alter-
+       nation  metacharacter).  By default, in the 8-bit and 16-bit libraries,
+       two-byte values are used for these offsets, leading to a  maximum  size
+       for a compiled pattern of around 64 thousand code units. This is suffi-
+       cient to handle all but the most gigantic patterns. Nevertheless,  some
+       people do want to process truly enormous patterns, so it is possible to
+       compile PCRE2 to use three-byte or four-byte offsets by adding  a  set-
+       ting such as
+
+         --with-link-size=3
+
+       to  the  configure command. The value given must be 2, 3, or 4. For the
+       16-bit library, a value of 3 is rounded up to 4.  In  these  libraries,
+       using  longer  offsets slows down the operation of PCRE2 because it has
+       to load additional data when handling them. For the 32-bit library  the
+       value  is  always 4 and cannot be overridden; the value of --with-link-
+       size is ignored.
+
+
+LIMITING PCRE2 RESOURCE USAGE
+
+       The pcre2_match() function increments a counter each time it goes round
+       its  main  loop. Putting a limit on this counter controls the amount of
+       computing resource used by a single call to  pcre2_match().  The  limit
+       can be changed at run time, as described in the pcre2api documentation.
+       The default is 10 million, but this can be changed by adding a  setting
+       such as
+
+         --with-match-limit=500000
+
+       to   the   configure   command.   This  setting  also  applies  to  the
+       pcre2_dfa_match() matching function, and to JIT  matching  (though  the
+       counting is done differently).
+
+       The  pcre2_match() function starts out using a 20KiB vector on the sys-
+       tem stack to record backtracking points. The more  nested  backtracking
+       points there are (that is, the deeper the search tree), the more memory
+       is needed. If the initial vector is not large enough,  heap  memory  is
+       used,  up to a certain limit, which is specified in kibibytes (units of
+       1024 bytes). The limit can be changed at run time, as described in  the
+       pcre2api  documentation.  The default limit (in effect unlimited) is 20
+       million. You can change this by a setting such as
+
+         --with-heap-limit=500
+
+       which limits the amount of heap to 500 KiB. This limit applies only  to
+       interpretive matching in pcre2_match() and pcre2_dfa_match(), which may
+       also use the heap for internal workspace  when  processing  complicated
+       patterns.  This limit does not apply when JIT (which has its own memory
+       arrangements) is used.
+
+       You can also explicitly limit the depth of nested backtracking  in  the
+       pcre2_match() interpreter. This limit defaults to the value that is set
+       for --with-match-limit. You can set a lower default  limit  by  adding,
+       for example,
+
+         --with-match-limit_depth=10000
+
+       to  the  configure  command.  This value can be overridden at run time.
+       This depth limit indirectly limits the amount of heap  memory  that  is
+       used,  but because the size of each backtracking "frame" depends on the
+       number of capturing parentheses in a pattern, the amount of  heap  that
+       is  used  before  the  limit is reached varies from pattern to pattern.
+       This limit was more useful in versions  before  10.30,  where  function
+       recursion was used for backtracking.
+
+       As well as applying to pcre2_match(), the depth limit also controls the
+       depth of recursive function calls in pcre2_dfa_match(). These are  used
+       for  lookaround  assertions,  atomic  groups, and recursion within pat-
+       terns.  The limit does not apply to JIT matching.
+
+
+CREATING CHARACTER TABLES AT BUILD TIME
+
+       PCRE2 uses fixed tables for processing characters whose code points are
+       less than 256. By default, PCRE2 is built with a set of tables that are
+       distributed in the file src/pcre2_chartables.c.dist. These  tables  are
+       for ASCII codes only. If you add
+
+         --enable-rebuild-chartables
+
+       to  the  configure  command, the distributed tables are no longer used.
+       Instead, a program called dftables is compiled and  run.  This  outputs
+       the source for new set of tables, created in the default locale of your
+       C run-time system. This method of replacing the tables does not work if
+       you  are cross compiling, because dftables is run on the local host. If
+       you need to create alternative tables when cross  compiling,  you  will
+       have to do so "by hand".
+
+
+USING EBCDIC CODE
+
+       PCRE2  assumes  by default that it will run in an environment where the
+       character code is ASCII or Unicode, which is a superset of ASCII.  This
+       is the case for most computer operating systems. PCRE2 can, however, be
+       compiled to run in an 8-bit EBCDIC environment by adding
+
+         --enable-ebcdic --disable-unicode
+
+       to the configure command. This setting implies --enable-rebuild-charta-
+       bles.  You  should  only  use  it if you know that you are in an EBCDIC
+       environment (for example, an IBM mainframe operating system).
+
+       It is not possible to support both EBCDIC and UTF-8 codes in  the  same
+       version  of  the  library. Consequently, --enable-unicode and --enable-
+       ebcdic are mutually exclusive.
+
+       The EBCDIC character that corresponds to an ASCII LF is assumed to have
+       the  value  0x15 by default. However, in some EBCDIC environments, 0x25
+       is used. In such an environment you should use
+
+         --enable-ebcdic-nl25
+
+       as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
+       has  the  same  value  as in ASCII, namely, 0x0d. Whichever of 0x15 and
+       0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
+       acter (which, in Unicode, is 0x85).
+
+       The options that select newline behaviour, such as --enable-newline-is-
+       cr, and equivalent run-time options, refer to these character values in
+       an EBCDIC environment.
+
+
+PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS
+
+       By default, on non-Windows systems, pcre2grep supports the use of call-
+       outs with string arguments within the patterns it is matching, in order
+       to  run external scripts. For details, see the pcre2grep documentation.
+       This support can be disabled by adding  --disable-pcre2grep-callout  to
+       the configure command.
+
+
+PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT
+
+       By  default,  pcre2grep reads all files as plain text. You can build it
+       so that it recognizes files whose names end in .gz or .bz2,  and  reads
+       them with libz or libbz2, respectively, by adding one or both of
+
+         --enable-pcre2grep-libz
+         --enable-pcre2grep-libbz2
+
+       to the configure command. These options naturally require that the rel-
+       evant libraries are installed on your system. Configuration  will  fail
+       if they are not.
+
+
+PCRE2GREP BUFFER SIZE
+
+       pcre2grep  uses an internal buffer to hold a "window" on the file it is
+       scanning, in order to be able to output "before" and "after" lines when
+       it finds a match. The default starting size of the buffer is 20KiB. The
+       buffer itself is three times this size, but because of the  way  it  is
+       used for holding "before" lines, the longest line that is guaranteed to
+       be processable is the notional buffer size. If a longer line is encoun-
+       tered,  pcre2grep  automatically  expands the buffer, up to a specified
+       maximum size, whose default is 1MiB or the starting size, whichever  is
+       the  larger. You can change the default parameter values by adding, for
+       example,
+
+         --with-pcre2grep-bufsize=51200
+         --with-pcre2grep-max-bufsize=2097152
+
+       to the configure command. The caller of pcre2grep  can  override  these
+       values  by  using  --buffer-size  and  --max-buffer-size on the command
+       line.
+
+
+PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
+
+       If you add one of
+
+         --enable-pcre2test-libreadline
+         --enable-pcre2test-libedit
+
+       to the configure command, pcre2test  is  linked  with  the  libreadline
+       orlibedit library, respectively, and when its input is from a terminal,
+       it reads it using the readline() function. This  provides  line-editing
+       and  history  facilities.  Note that libreadline is GPL-licensed, so if
+       you distribute a binary of pcre2test linked in this way, there  may  be
+       licensing issues. These can be avoided by linking instead with libedit,
+       which has a BSD licence.
+
+       Setting --enable-pcre2test-libreadline causes the -lreadline option  to
+       be  added to the pcre2test build. In many operating environments with a
+       sytem-installed readline library this is sufficient. However,  in  some
+       environments (e.g. if an unmodified distribution version of readline is
+       in use), some extra configuration may be necessary.  The  INSTALL  file
+       for libreadline says this:
+
+         "Readline uses the termcap functions, but does not link with
+         the termcap or curses library itself, allowing applications
+         which link with readline the to choose an appropriate library."
+
+       If  your environment has not been set up so that an appropriate library
+       is automatically included, you may need to add something like
+
+         LIBS="-ncurses"
+
+       immediately before the configure command.
+
+
+INCLUDING DEBUGGING CODE
+
+       If you add
+
+         --enable-debug
+
+       to the configure command, additional debugging code is included in  the
+       build. This feature is intended for use by the PCRE2 maintainers.
+
+
+DEBUGGING WITH VALGRIND SUPPORT
+
+       If you add
+
+         --enable-valgrind
+
+       to  the  configure command, PCRE2 will use valgrind annotations to mark
+       certain memory regions as  unaddressable.  This  allows  it  to  detect
+       invalid  memory  accesses,  and  is  mostly  useful for debugging PCRE2
+       itself.
+
+
+CODE COVERAGE REPORTING
+
+       If your C compiler is gcc, you can build a version of  PCRE2  that  can
+       generate a code coverage report for its test suite. To enable this, you
+       must install lcov version 1.6 or above. Then specify
+
+         --enable-coverage
+
+       to the configure command and build PCRE2 in the usual way.
+
+       Note that using ccache (a caching C compiler) is incompatible with code
+       coverage  reporting. If you have configured ccache to run automatically
+       on your system, you must set the environment variable
+
+         CCACHE_DISABLE=1
+
+       before running make to build PCRE2, so that ccache is not used.
+
+       When --enable-coverage is used,  the  following  addition  targets  are
+       added to the Makefile:
+
+         make coverage
+
+       This  creates  a  fresh coverage report for the PCRE2 test suite. It is
+       equivalent to running "make coverage-reset", "make  coverage-baseline",
+       "make check", and then "make coverage-report".
+
+         make coverage-reset
+
+       This zeroes the coverage counters, but does nothing else.
+
+         make coverage-baseline
+
+       This captures baseline coverage information.
+
+         make coverage-report
+
+       This creates the coverage report.
+
+         make coverage-clean-report
+
+       This  removes the generated coverage report without cleaning the cover-
+       age data itself.
+
+         make coverage-clean-data
+
+       This removes the captured coverage data without removing  the  coverage
+       files created at compile time (*.gcno).
+
+         make coverage-clean
+
+       This  cleans all coverage data including the generated coverage report.
+       For more information about code coverage, see the gcov and  lcov  docu-
+       mentation.
+
+
+SUPPORT FOR FUZZERS
+
+       There  is  a  special  option for use by people who want to run fuzzing
+       tests on PCRE2:
+
+         --enable-fuzz-support
+
+       At present this applies only to the 8-bit library. If set, it causes an
+       extra  library  called  libpcre2-fuzzsupport.a  to  be  built,  but not
+       installed. This contains a single function called  LLVMFuzzerTestOneIn-
+       put()  whose  arguments are a pointer to a string and the length of the
+       string. When called, this function tries to compile  the  string  as  a
+       pattern,  and if that succeeds, to match it.  This is done both with no
+       options and with some random options bits that are generated  from  the
+       string.
+
+       Setting  --enable-fuzz-support  also  causes  a binary called pcre2fuz-
+       zcheck to be created. This is normally run under valgrind or used  when
+       PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
+       function and outputs information about what  it  is  doing.  The  input
+       strings  are specified by arguments: if an argument starts with "=" the
+       rest of it is a literal input string. Otherwise, it is assumed to be  a
+       file name, and the contents of the file are the test string.
+
+
+OBSOLETE OPTION
+
+       In  versions  of  PCRE2 prior to 10.30, there were two ways of handling
+       backtracking in the pcre2_match() function. The default was to use  the
+       system stack, but if
+
+         --disable-stack-for-recursion
+
+       was  set,  memory on the heap was used. From release 10.30 onwards this
+       has changed (the stack is no longer used)  and  this  option  now  does
+       nothing except give a warning.
+
+
+SEE ALSO
+
+       pcre2api(3), pcre2-config(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 26 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2CALLOUT(3)            Library Functions Manual            PCRE2CALLOUT(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+SYNOPSIS
+
+       #include <pcre2.h>
+
+       int (*pcre2_callout)(pcre2_callout_block *, void *);
+
+       int pcre2_callout_enumerate(const pcre2_code *code,
+         int (*callback)(pcre2_callout_enumerate_block *, void *),
+         void *user_data);
+
+
+DESCRIPTION
+
+       PCRE2  provides  a feature called "callout", which is a means of tempo-
+       rarily passing control to the caller of PCRE2 in the middle of  pattern
+       matching.  The caller of PCRE2 provides an external function by putting
+       its entry point in a match  context  (see  pcre2_set_callout()  in  the
+       pcre2api documentation).
+
+       Within  a  regular expression, (?C<arg>) indicates a point at which the
+       external function is to be called.  Different  callout  points  can  be
+       identified  by  putting  a number less than 256 after the letter C. The
+       default value is zero.  Alternatively, the argument may be a  delimited
+       string.  The  starting delimiter must be one of ` ' " ^ % # $ { and the
+       ending delimiter is the same as the start, except for {, where the end-
+       ing  delimiter  is  }.  If  the  ending  delimiter is needed within the
+       string, it must be doubled. For example, this pattern has  two  callout
+       points:
+
+         (?C1)abc(?C"some ""arbitrary"" text")def
+
+       If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled,
+       PCRE2 automatically inserts callouts, all with number 255, before  each
+       item  in the pattern except for immediately before or after an explicit
+       callout. For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+
+         A(?C3)B
+
+       it is processed as if it were
+
+         (?C255)A(?C3)B(?C255)
+
+       Here is a more complicated example:
+
+         A(\d{2}|--)
+
+       With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
+
+         (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+
+       Notice that there is a callout before and after  each  parenthesis  and
+       alternation bar. If the pattern contains a conditional group whose con-
+       dition is an assertion, an automatic callout  is  inserted  immediately
+       before  the  condition. Such a callout may also be inserted explicitly,
+       for example:
+
+         (?(?C9)(?=a)ab|de)  (?(?C%text%)(?!=d)ab|de)
+
+       This applies only to assertion conditions (because they are  themselves
+       independent groups).
+
+       Callouts  can  be useful for tracking the progress of pattern matching.
+       The pcre2test program has a pattern qualifier (/auto_callout) that sets
+       automatic  callouts.   When  any  callouts are present, the output from
+       pcre2test indicates how the pattern is being matched.  This  is  useful
+       information  when  you are trying to optimize the performance of a par-
+       ticular pattern.
+
+
+MISSING CALLOUTS
+
+       You should be aware that, because of optimizations  in  the  way  PCRE2
+       compiles and matches patterns, callouts sometimes do not happen exactly
+       as you might expect.
+
+   Auto-possessification
+
+       At compile time, PCRE2 "auto-possessifies" repeated items when it knows
+       that  what follows cannot be part of the repeat. For example, a+[bc] is
+       compiled as if it were a++[bc]. The pcre2test output when this  pattern
+       is compiled with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied
+       to the string "aaaa" is:
+
+         --->aaaa
+          +0 ^        a+
+          +2 ^   ^    [bc]
+         No match
+
+       This indicates that when matching [bc] fails, there is no  backtracking
+       into a+ (because it is being treated as a++) and therefore the callouts
+       that would be taken for the backtracks do not occur.  You  can  disable
+       the   auto-possessify   feature  by  passing  PCRE2_NO_AUTO_POSSESS  to
+       pcre2_compile(), or starting the pattern  with  (*NO_AUTO_POSSESS).  In
+       this case, the output changes to this:
+
+         --->aaaa
+          +0 ^        a+
+          +2 ^   ^    [bc]
+          +2 ^  ^     [bc]
+          +2 ^ ^      [bc]
+          +2 ^^       [bc]
+         No match
+
+       This time, when matching [bc] fails, the matcher backtracks into a+ and
+       tries again, repeatedly, until a+ itself fails.
+
+   Automatic .* anchoring
+
+       By default, an optimization is applied when .* is the first significant
+       item  in  a  pattern. If PCRE2_DOTALL is set, so that the dot can match
+       any character, the pattern is automatically anchored.  If  PCRE2_DOTALL
+       is  not set, a match can start only after an internal newline or at the
+       beginning of the subject, and pcre2_compile() remembers this. If a pat-
+       tern  has more than one top-level branch, automatic anchoring occurs if
+       all branches are anchorable.
+
+       This optimization is disabled, however, if .* is in an atomic group  or
+       if there is a backreference to the capturing group in which it appears.
+       It is also disabled if the pattern contains (*PRUNE) or  (*SKIP).  How-
+       ever, the presence of callouts does not affect it.
+
+       For  example,  if  the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT
+       and applied to the string "aa", the pcre2test output is:
+
+         --->aa
+          +0 ^      .*
+          +2 ^ ^    \d
+          +2 ^^     \d
+          +2 ^      \d
+         No match
+
+       This shows that all match attempts start at the beginning of  the  sub-
+       ject.  In  other  words,  the pattern is anchored. You can disable this
+       optimization by passing PCRE2_NO_DOTSTAR_ANCHOR to pcre2_compile(),  or
+       starting  the pattern with (*NO_DOTSTAR_ANCHOR). In this case, the out-
+       put changes to:
+
+         --->aa
+          +0 ^      .*
+          +2 ^ ^    \d
+          +2 ^^     \d
+          +2 ^      \d
+          +0  ^     .*
+          +2  ^^    \d
+          +2  ^     \d
+         No match
+
+       This shows more match attempts, starting at the second subject  charac-
+       ter.   Another  optimization, described in the next section, means that
+       there is no subsequent attempt to match with an empty subject.
+
+   Other optimizations
+
+       Other optimizations that provide fast "no match"  results  also  affect
+       callouts.  For example, if the pattern is
+
+         ab(?C4)cd
+
+       PCRE2  knows  that  any matching string must contain the letter "d". If
+       the subject string is "abyz", the  lack  of  "d"  means  that  matching
+       doesn't  ever  start,  and  the callout is never reached. However, with
+       "abyd", though the result is still no match, the callout is obeyed.
+
+       For most patterns PCRE2 also knows the minimum  length  of  a  matching
+       string,  and will immediately give a "no match" return without actually
+       running a match if the subject is not long enough, or,  for  unanchored
+       patterns, if it has been scanned far enough.
+
+       You can disable these optimizations by passing the PCRE2_NO_START_OPTI-
+       MIZE option  to  pcre2_compile(),  or  by  starting  the  pattern  with
+       (*NO_START_OPT).  This slows down the matching process, but does ensure
+       that callouts such as the example above are obeyed.
+
+
+THE CALLOUT INTERFACE
+
+       During matching, when PCRE2 reaches a callout  point,  if  an  external
+       function  is  provided in the match context, it is called. This applies
+       to both normal, DFA, and JIT matching. The first argument to the  call-
+       out function is a pointer to a pcre2_callout block. The second argument
+       is the void * callout data that was supplied when the callout  was  set
+       up by calling pcre2_set_callout() (see the pcre2api documentation). The
+       callout block structure contains the following fields, not  necessarily
+       in this order:
+
+         uint32_t      version;
+         uint32_t      callout_number;
+         uint32_t      capture_top;
+         uint32_t      capture_last;
+         uint32_t      callout_flags;
+         PCRE2_SIZE   *offset_vector;
+         PCRE2_SPTR    mark;
+         PCRE2_SPTR    subject;
+         PCRE2_SIZE    subject_length;
+         PCRE2_SIZE    start_match;
+         PCRE2_SIZE    current_position;
+         PCRE2_SIZE    pattern_position;
+         PCRE2_SIZE    next_item_length;
+         PCRE2_SIZE    callout_string_offset;
+         PCRE2_SIZE    callout_string_length;
+         PCRE2_SPTR    callout_string;
+
+       The  version field contains the version number of the block format. The
+       current version is 2; the three callout string fields  were  added  for
+       version  1, and the callout_flags field for version 2. If you are writ-
+       ing an application that might use an  earlier  release  of  PCRE2,  you
+       should  check  the version number before accessing any of these fields.
+       The version number will increase in future if more  fields  are  added,
+       but the intention is never to remove any of the existing fields.
+
+   Fields for numerical callouts
+
+       For  a  numerical  callout,  callout_string is NULL, and callout_number
+       contains the number of the callout, in the range  0-255.  This  is  the
+       number  that  follows  (?C for callouts that part of the pattern; it is
+       255 for automatically generated callouts.
+
+   Fields for string callouts
+
+       For callouts with string arguments, callout_number is always zero,  and
+       callout_string  points  to the string that is contained within the com-
+       piled pattern. Its length is given by callout_string_length. Duplicated
+       ending delimiters that were present in the original pattern string have
+       been turned into single characters, but there is no other processing of
+       the  callout string argument. An additional code unit containing binary
+       zero is present after the string, but is not included  in  the  length.
+       The  delimiter  that was used to start the string is also stored within
+       the pattern, immediately before the string itself. You can access  this
+       delimiter as callout_string[-1] if you need it.
+
+       The callout_string_offset field is the code unit offset to the start of
+       the callout argument string within the original pattern string. This is
+       provided  for the benefit of applications such as script languages that
+       might need to report errors in the callout string within the pattern.
+
+   Fields for all callouts
+
+       The remaining fields in the callout block are the same for  both  kinds
+       of callout.
+
+       The  offset_vector  field is a pointer to a vector of capturing offsets
+       (the "ovector"). You may read the elements in this vector, but you must
+       not change any of them.
+
+       For  calls  to  pcre2_match(),  the  offset_vector  field is not (since
+       release 10.30) a pointer to the actual ovector that was passed  to  the
+       matching  function  in  the  match  data block. Instead it points to an
+       internal ovector of a size large enough to hold all  possible  captured
+       substrings in the pattern. Note that whenever a recursion or subroutine
+       call within a pattern completes, the capturing state is reset  to  what
+       it was before.
+
+       The  capture_last  field  contains the number of the most recently cap-
+       tured substring, and the capture_top field contains one more  than  the
+       number  of  the  highest numbered captured substring so far. If no sub-
+       strings have yet been captured, the value of capture_last is 0 and  the
+       value  of  capture_top  is  1. The values of these fields do not always
+       differ  by  one;  for  example,  when  the  callout  in   the   pattern
+       ((a)(b))(?C2) is taken, capture_last is 1 but capture_top is 4.
+
+       The   contents  of  ovector[2]  to  ovector[<capture_top>*2-1]  can  be
+       inspected in order to extract substrings that have been matched so far,
+       in  the  same way as extracting substrings after a match has completed.
+       The values in ovector[0] and ovector[1] are always PCRE2_UNSET  because
+       the  match is by definition not complete. Substrings that have not been
+       captured but whose numbers are less than capture_top also have both  of
+       their ovector slots set to PCRE2_UNSET.
+
+       For  DFA  matching,  the offset_vector field points to the ovector that
+       was passed to the matching function in the match data block  for  call-
+       outs at the top level, but to an internal ovector during the processing
+       of pattern recursions, lookarounds, and atomic groups.  However,  these
+       ovectors  hold no useful information because pcre2_dfa_match() does not
+       support substring capturing. The value of capture_top is always  1  and
+       the value of capture_last is always 0 for DFA matching.
+
+       The subject and subject_length fields contain copies of the values that
+       were passed to the matching function.
+
+       The start_match field normally contains the offset within  the  subject
+       at  which  the  current  match  attempt started. However, if the escape
+       sequence \K has been encountered, this value is changed to reflect  the
+       modified  starting  point.  If the pattern is not anchored, the callout
+       function may be called several times from the same point in the pattern
+       for different starting points in the subject.
+
+       The  current_position  field  contains the offset within the subject of
+       the current match pointer.
+
+       The pattern_position field contains the offset in the pattern string to
+       the next item to be matched.
+
+       The  next_item_length  field contains the length of the next item to be
+       processed in the pattern string. When the callout is at the end of  the
+       pattern,  the  length  is  zero.  When  the callout precedes an opening
+       parenthesis, the length includes meta characters that follow the paren-
+       thesis.  For  example,  in a callout before an assertion such as (?=ab)
+       the length is 3. For an an alternation bar or  a  closing  parenthesis,
+       the  length is one, unless a closing parenthesis is followed by a quan-
+       tifier, in which case its length is included.  (This changed in release
+       10.23.  In  earlier  releases, before an opening parenthesis the length
+       was that of the entire subpattern, and before an alternation bar  or  a
+       closing parenthesis the length was zero.)
+
+       The  pattern_position  and next_item_length fields are intended to help
+       in distinguishing between different automatic callouts, which all  have
+       the  same  callout  number. However, they are set for all callouts, and
+       are used by pcre2test to show the next item to be matched when display-
+       ing callout information.
+
+       In callouts from pcre2_match() the mark field contains a pointer to the
+       zero-terminated name of the most recently passed (*MARK), (*PRUNE),  or
+       (*THEN)  item  in the match, or NULL if no such items have been passed.
+       Instances of (*PRUNE) or (*THEN) without a name  do  not  obliterate  a
+       previous (*MARK). In callouts from the DFA matching function this field
+       always contains NULL.
+
+       The   callout_flags   field   is   always   zero   in   callouts   from
+       pcre2_dfa_match() or when JIT is being used. When pcre2_match() without
+       JIT is used, the following bits may be set:
+
+         PCRE2_CALLOUT_STARTMATCH
+
+       This is set for the first callout after the start of matching for  each
+       new starting position in the subject.
+
+         PCRE2_CALLOUT_BACKTRACK
+
+       This  is  set if there has been a matching backtrack since the previous
+       callout, or since the start of matching if this is  the  first  callout
+       from a pcre2_match() run.
+
+       Both  bits  are  set when a backtrack has caused a "bumpalong" to a new
+       starting position in the subject. Output from pcre2test does not  indi-
+       cate  the  presence  of these bits unless the callout_extra modifier is
+       set.
+
+       The information in the callout_flags field is provided so that applica-
+       tions  can track and tell their users how matching with backtracking is
+       done. This can be useful when trying to optimize patterns, or  just  to
+       understand  how  PCRE2  works. There is no support in pcre2_dfa_match()
+       because there is no backtracking in DFA matching, and there is no  sup-
+       port in JIT because JIT is all about maximimizing matching performance.
+       In both these cases the callout_flags field is always zero.
+
+
+RETURN VALUES FROM CALLOUTS
+
+       The external callout function returns an integer to PCRE2. If the value
+       is  zero,  matching  proceeds  as  normal. If the value is greater than
+       zero, matching fails at the current point, but  the  testing  of  other
+       matching possibilities goes ahead, just as if a lookahead assertion had
+       failed. If the value is less than zero, the match is abandoned, and the
+       matching function returns the negative value.
+
+       Negative   values   should   normally   be   chosen  from  the  set  of
+       PCRE2_ERROR_xxx values. In  particular,  PCRE2_ERROR_NOMATCH  forces  a
+       standard  "no  match"  failure. The error number PCRE2_ERROR_CALLOUT is
+       reserved for use by callout functions; it will never be used  by  PCRE2
+       itself.
+
+
+CALLOUT ENUMERATION
+
+       int pcre2_callout_enumerate(const pcre2_code *code,
+         int (*callback)(pcre2_callout_enumerate_block *, void *),
+         void *user_data);
+
+       A script language that supports the use of string arguments in callouts
+       might like to scan all the callouts in a  pattern  before  running  the
+       match. This can be done by calling pcre2_callout_enumerate(). The first
+       argument is a pointer to a compiled pattern, the  second  points  to  a
+       callback  function,  and the third is arbitrary user data. The callback
+       function is called for every callout in the pattern  in  the  order  in
+       which they appear. Its first argument is a pointer to a callout enumer-
+       ation block, and its second argument is the user_data  value  that  was
+       passed  to  pcre2_callout_enumerate(). The data block contains the fol-
+       lowing fields:
+
+         version                Block version number
+         pattern_position       Offset to next item in pattern
+         next_item_length       Length of next item in pattern
+         callout_number         Number for numbered callouts
+         callout_string_offset  Offset to string within pattern
+         callout_string_length  Length of callout string
+         callout_string         Points to callout string or is NULL
+
+       The version number is currently 0. It will increase if new  fields  are
+       ever  added  to  the  block. The remaining fields are the same as their
+       namesakes in the pcre2_callout block that is used for  callouts  during
+       matching, as described above.
+
+       Note  that  the  value  of pattern_position is unique for each callout.
+       However, if a callout occurs inside a group that is quantified  with  a
+       non-zero minimum or a fixed maximum, the group is replicated inside the
+       compiled pattern. For example, a pattern such as /(a){2}/  is  compiled
+       as  if it were /(a)(a)/. This means that the callout will be enumerated
+       more than once, but with the same value for  pattern_position  in  each
+       case.
+
+       The callback function should normally return zero. If it returns a non-
+       zero value, scanning the pattern stops, and that value is returned from
+       pcre2_callout_enumerate().
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 26 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2COMPAT(3)             Library Functions Manual             PCRE2COMPAT(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+DIFFERENCES BETWEEN PCRE2 AND PERL
+
+       This document describes the differences in the ways that PCRE2 and Perl
+       handle regular expressions. The differences  described  here  are  with
+       respect  to Perl versions 5.26, but as both Perl and PCRE2 are continu-
+       ally changing, the information may sometimes be out of date.
+
+       1. PCRE2 has only a subset of Perl's Unicode support. Details  of  what
+       it does have are given in the pcre2unicode page.
+
+       2.  Like  Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
+       tions, but they do not mean what you might think. For example, (?!a){3}
+       does  not  assert  that  the next three characters are not "a". It just
+       asserts that the next character is not "a" three times  (in  principle;
+       PCRE2  optimizes this to run the assertion just once). Perl allows some
+       repeat quantifiers on other  assertions,  for  example,  \b*  (but  not
+       \b{3}), but these do not seem to have any use.
+
+       3.  Capturing  subpatterns that occur inside negative lookaround asser-
+       tions are counted, but their entries in the offsets vector are set only
+       when  a  negative  assertion  is a condition that has a matching branch
+       (that is, the condition is false).
+
+       4. The following Perl escape sequences are not supported: \F,  \l,  \L,
+       \u, \U, and \N when followed by a character name. \N on its own, match-
+       ing a non-newline character, and \N{U+dd..}, matching  a  Unicode  code
+       point,  are  supported.  The  escapes that modify the case of following
+       letters are implemented by Perl's general string-handling and  are  not
+       part of its pattern matching engine. If any of these are encountered by
+       PCRE2, an error is generated by default. However, if the PCRE2_ALT_BSUX
+       option is set, \U and \u are interpreted as ECMAScript interprets them.
+
+       5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
+       is built with Unicode support (the default). The properties that can be
+       tested  with  \p  and \P are limited to the general category properties
+       such as Lu and Nd, script names such as Greek or Han, and  the  derived
+       properties Any and L&.  PCRE2 does support the Cs (surrogate) property,
+       which Perl does not; the Perl documentation says  "Because  Perl  hides
+       the need for the user to understand the internal representation of Uni-
+       code characters, there is no need to implement the somewhat messy  con-
+       cept of surrogates."
+
+       6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
+       in between are treated as literals. However, this is slightly different
+       from  Perl  in  that  $  and  @ are also handled as literals inside the
+       quotes. In Perl, they cause variable interpolation (but of course PCRE2
+       does  not  have  variables).  Also, Perl does "double-quotish backslash
+       interpolation" on any backslashes between \Q and \E which, its documen-
+       tation  says, "may lead to confusing results". PCRE2 treats a backslash
+       between \Q and \E just like any other  character.  Note  the  following
+       examples:
+
+           Pattern            PCRE2 matches     Perl matches
+
+           \Qabc$xyz\E        abc$xyz           abc followed by the
+                                                  contents of $xyz
+           \Qabc\$xyz\E       abc\$xyz          abc\$xyz
+           \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
+           \QA\B\E            A\B               A\B
+           \Q\\E              \                 \\E
+
+       The  \Q...\E  sequence  is recognized both inside and outside character
+       classes.
+
+       7.  Fairly  obviously,  PCRE2  does  not  support  the  (?{code})   and
+       (??{code}) constructions. However, PCRE2 does have a "callout" feature,
+       which allows an external function to be called during pattern matching.
+       See the pcre2callout documentation for details.
+
+       8.  Subroutine  calls (whether recursive or not) were treated as atomic
+       groups up to PCRE2 release 10.23, but from release 10.30 this  changed,
+       and backtracking into subroutine calls is now supported, as in Perl.
+
+       9.  If  any  of the backtracking control verbs are used in a subpattern
+       that is called as a subroutine  (whether  or  not  recursively),  their
+       effect  is  confined to that subpattern; it does not extend to the sur-
+       rounding pattern. This is not always the case in Perl.  In  particular,
+       if  (*THEN)  is  present in a group that is called as a subroutine, its
+       action is limited to that group, even if the group does not contain any
+       |  characters.  Note that such subpatterns are processed as anchored at
+       the point where they are tested.
+
+       10. If a pattern contains more than one backtracking control verb,  the
+       first  one  that  is backtracked onto acts. For example, in the pattern
+       A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but  a  failure
+       in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
+       it is the same as PCRE2, but there are cases where it differs.
+
+       11. Most backtracking verbs in assertions have  their  normal  actions.
+       They are not confined to the assertion.
+
+       12.  There are some differences that are concerned with the settings of
+       captured strings when part of  a  pattern  is  repeated.  For  example,
+       matching  "aba"  against  the  pattern  /^(a(b)?)+$/  in Perl leaves $2
+       unset, but in PCRE2 it is set to "b".
+
+       13. PCRE2's handling of duplicate subpattern numbers and duplicate sub-
+       pattern names is not as general as Perl's. This is a consequence of the
+       fact the PCRE2 works internally just with numbers,  using  an  external
+       table  to translate between numbers and names. In particular, a pattern
+       such as (?|(?<a>A)|(?<b>B), where the two  capturing  parentheses  have
+       the  same  number  but different names, is not supported, and causes an
+       error at compile time. If it were allowed, it would not be possible  to
+       distinguish  which  parentheses matched, because both names map to cap-
+       turing subpattern number 1. To avoid this confusing situation, an error
+       is given at compile time.
+
+       14. Perl used to recognize comments in some places that PCRE2 does not,
+       for example, between the ( and ? at the start of a subpattern.  If  the
+       /x modifier is set, Perl allowed white space between ( and ? though the
+       latest Perls give an error (for a while it was just deprecated).  There
+       may still be some cases where Perl behaves differently.
+
+       15.  Perl,  when  in warning mode, gives warnings for character classes
+       such as [A-\d] or [a-[:digit:]]. It then treats the hyphens  as  liter-
+       als. PCRE2 has no warning features, so it gives an error in these cases
+       because they are almost certainly user mistakes.
+
+       16. In PCRE2, the upper/lower case character properties Lu and  Ll  are
+       not  affected when case-independent matching is specified. For example,
+       \p{Lu} always matches an upper case letter. I think Perl has changed in
+       this  respect; in the release at the time of writing (5.24), \p{Lu} and
+       \p{Ll} match all letters, regardless of case, when case independence is
+       specified.
+
+       17.  PCRE2  provides  some  extensions  to  the Perl regular expression
+       facilities.  Perl 5.10 includes new features that are  not  in  earlier
+       versions  of  Perl,  some  of which (such as named parentheses) were in
+       PCRE2 for some time before. This list is with respect to Perl 5.26:
+
+       (a) Although lookbehind assertions in PCRE2  must  match  fixed  length
+       strings,  each alternative branch of a lookbehind assertion can match a
+       different length of string. Perl requires them all  to  have  the  same
+       length.
+
+       (b) From PCRE2 10.23, backreferences to groups of fixed length are sup-
+       ported in lookbehinds, provided that there is no possibility of  refer-
+       encing  a  non-unique  number or name. Perl does not support backrefer-
+       ences in lookbehinds.
+
+       (c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set,  the
+       $ meta-character matches only at the very end of the string.
+
+       (d)  A  backslash  followed  by  a  letter  with  no special meaning is
+       faulted. (Perl can be made to issue a warning.)
+
+       (e) If PCRE2_UNGREEDY is set, the greediness of the repetition  quanti-
+       fiers is inverted, that is, by default they are not greedy, but if fol-
+       lowed by a question mark they are.
+
+       (f) PCRE2_ANCHORED can be used at matching time to force a  pattern  to
+       be tried only at the first matching position in the subject string.
+
+       (g)     The     PCRE2_NOTBOL,    PCRE2_NOTEOL,    PCRE2_NOTEMPTY    and
+       PCRE2_NOTEMPTY_ATSTART options have no Perl equivalents.
+
+       (h) The \R escape sequence can be restricted to match only CR,  LF,  or
+       CRLF by the PCRE2_BSR_ANYCRLF option.
+
+       (i)  The  callout  facility is PCRE2-specific. Perl supports codeblocks
+       and variable interpolation, but not general hooks on every match.
+
+       (j) The partial matching facility is PCRE2-specific.
+
+       (k) The alternative matching function (pcre2_dfa_match() matches  in  a
+       different way and is not Perl-compatible.
+
+       (l)  PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT)
+       at the start of a pattern that  set  overall  options  that  cannot  be
+       changed within the pattern.
+
+       18.  The  Perl  /a modifier restricts /d numbers to pure ascii, and the
+       /aa modifier restricts /i  case-insensitive  matching  to  pure  ascii,
+       ignoring  Unicode  rules.  This  separation  cannot be represented with
+       PCRE2_UCP.
+
+       19. Perl has different limits than PCRE2. See the pcre2limit documenta-
+       tion for details. Perl went with 5.10 from recursion to iteration keep-
+       ing the intermediate matches on the heap, which is ~10% slower but does
+       not  fall into any stack-overflow limit. PCRE2 made a similar change at
+       release 10.30, and also has many build-time and  run-time  customizable
+       limits.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 28 July 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2JIT(3)                Library Functions Manual                PCRE2JIT(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+PCRE2 JUST-IN-TIME COMPILER SUPPORT
+
+       Just-in-time  compiling  is a heavyweight optimization that can greatly
+       speed up pattern matching. However, it comes at the cost of extra  pro-
+       cessing  before  the  match is performed, so it is of most benefit when
+       the same pattern is going to be matched many times. This does not  nec-
+       essarily  mean many calls of a matching function; if the pattern is not
+       anchored, matching attempts may take place many times at various  posi-
+       tions in the subject, even for a single call. Therefore, if the subject
+       string is very long, it may still pay  to  use  JIT  even  for  one-off
+       matches.  JIT  support  is  available  for all of the 8-bit, 16-bit and
+       32-bit PCRE2 libraries.
+
+       JIT support applies only to the  traditional  Perl-compatible  matching
+       function.   It  does  not apply when the DFA matching function is being
+       used. The code for this support was written by Zoltan Herczeg.
+
+
+AVAILABILITY OF JIT SUPPORT
+
+       JIT support is an optional feature of  PCRE2.  The  "configure"  option
+       --enable-jit  (or  equivalent  CMake  option) must be set when PCRE2 is
+       built if you want to use JIT. The support is limited to  the  following
+       hardware platforms:
+
+         ARM 32-bit (v5, v7, and Thumb2)
+         ARM 64-bit
+         Intel x86 32-bit and 64-bit
+         MIPS 32-bit and 64-bit
+         Power PC 32-bit and 64-bit
+         SPARC 32-bit
+
+       If --enable-jit is set on an unsupported platform, compilation fails.
+
+       A  program  can  tell if JIT support is available by calling pcre2_con-
+       fig() with the PCRE2_CONFIG_JIT option. The result is  1  when  JIT  is
+       available,  and 0 otherwise. However, a simple program does not need to
+       check this in order to use JIT. The API is implemented in  a  way  that
+       falls  back  to the interpretive code if JIT is not available. For pro-
+       grams that need the best possible performance, there is  also  a  "fast
+       path" API that is JIT-specific.
+
+
+SIMPLE USE OF JIT
+
+       To  make use of the JIT support in the simplest way, all you have to do
+       is to call pcre2_jit_compile() after successfully compiling  a  pattern
+       with pcre2_compile(). This function has two arguments: the first is the
+       compiled pattern pointer that was returned by pcre2_compile(), and  the
+       second  is  zero  or  more of the following option bits: PCRE2_JIT_COM-
+       PLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
+
+       If JIT support is not available, a  call  to  pcre2_jit_compile()  does
+       nothing  and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled
+       pattern is passed to the JIT compiler, which turns it into machine code
+       that executes much faster than the normal interpretive code, but yields
+       exactly the same results. The returned value  from  pcre2_jit_compile()
+       is zero on success, or a negative error code.
+
+       There  is  a limit to the size of pattern that JIT supports, imposed by
+       the size of machine stack that it uses. The exact rules are  not  docu-
+       mented  because  they  may  change at any time, in particular, when new
+       optimizations are introduced.  If a pattern  is  too  big,  a  call  to
+       pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY.
+
+       PCRE2_JIT_COMPLETE  requests the JIT compiler to generate code for com-
+       plete matches. If you want to run partial matches using the  PCRE2_PAR-
+       TIAL_HARD  or  PCRE2_PARTIAL_SOFT  options of pcre2_match(), you should
+       set one or both of  the  other  options  as  well  as,  or  instead  of
+       PCRE2_JIT_COMPLETE. The JIT compiler generates different optimized code
+       for each of the three modes (normal, soft partial, hard partial).  When
+       pcre2_match()  is  called,  the appropriate code is run if it is avail-
+       able. Otherwise, the pattern is matched using interpretive code.
+
+       You can call pcre2_jit_compile() multiple times for the  same  compiled
+       pattern.  It does nothing if it has previously compiled code for any of
+       the option bits. For example, you can call it once with  PCRE2_JIT_COM-
+       PLETE  and  (perhaps  later,  when  you find you need partial matching)
+       again with PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time  it
+       will ignore PCRE2_JIT_COMPLETE and just compile code for partial match-
+       ing. If pcre2_jit_compile() is called with no option bits set, it imme-
+       diately returns zero. This is an alternative way of testing whether JIT
+       is available.
+
+       At present, it is not possible to free JIT compiled  code  except  when
+       the entire compiled pattern is freed by calling pcre2_code_free().
+
+       In  some circumstances you may need to call additional functions. These
+       are described in the  section  entitled  "Controlling  the  JIT  stack"
+       below.
+
+       There are some pcre2_match() options that are not supported by JIT, and
+       there are also some pattern items that JIT cannot handle.  Details  are
+       given  below.  In  both cases, matching automatically falls back to the
+       interpretive code. If you want to know whether JIT  was  actually  used
+       for  a particular match, you should arrange for a JIT callback function
+       to be set up as described in the section entitled "Controlling the  JIT
+       stack"  below,  even  if  you  do  not need to supply a non-default JIT
+       stack. Such a callback function is called whenever JIT code is about to
+       be  obeyed.  If the match-time options are not right for JIT execution,
+       the callback function is not obeyed.
+
+       If the JIT compiler finds an unsupported item, no JIT  data  is  gener-
+       ated.  You  can find out if JIT matching is available after compiling a
+       pattern by calling  pcre2_pattern_info()  with  the  PCRE2_INFO_JITSIZE
+       option.  A non-zero result means that JIT compilation was successful. A
+       result of 0 means that JIT support is not available, or the pattern was
+       not  processed by pcre2_jit_compile(), or the JIT compiler was not able
+       to handle the pattern.
+
+
+UNSUPPORTED OPTIONS AND PATTERN ITEMS
+
+       The pcre2_match() options that  are  supported  for  JIT  matching  are
+       PCRE2_NOTBOL,   PCRE2_NOTEOL,  PCRE2_NOTEMPTY,  PCRE2_NOTEMPTY_ATSTART,
+       PCRE2_NO_UTF_CHECK,  PCRE2_PARTIAL_HARD,  and  PCRE2_PARTIAL_SOFT.  The
+       PCRE2_ANCHORED option is not supported at match time.
+
+       If  the  PCRE2_NO_JIT option is passed to pcre2_match() it disables the
+       use of JIT, forcing matching by the interpreter code.
+
+       The only unsupported pattern items are \C (match a  single  data  unit)
+       when  running in a UTF mode, and a callout immediately before an asser-
+       tion condition in a conditional group.
+
+
+RETURN VALUES FROM JIT MATCHING
+
+       When a pattern is matched using JIT matching, the return values are the
+       same  as  those  given by the interpretive pcre2_match() code, with the
+       addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This  means
+       that  the memory used for the JIT stack was insufficient. See "Control-
+       ling the JIT stack" below for a discussion of JIT stack usage.
+
+       The error code PCRE2_ERROR_MATCHLIMIT is returned by the  JIT  code  if
+       searching  a  very large pattern tree goes on for too long, as it is in
+       the same circumstance when JIT is not used, but the details of  exactly
+       what is counted are not the same. The PCRE2_ERROR_DEPTHLIMIT error code
+       is never returned when JIT matching is used.
+
+
+CONTROLLING THE JIT STACK
+
+       When the compiled JIT code runs, it needs a block of memory to use as a
+       stack.   By  default, it uses 32KiB on the machine stack. However, some
+       large  or  complicated  patterns  need  more  than  this.   The   error
+       PCRE2_ERROR_JIT_STACKLIMIT  is  given  when  there is not enough stack.
+       Three functions are provided for managing blocks of memory for  use  as
+       JIT  stacks. There is further discussion about the use of JIT stacks in
+       the section entitled "JIT stack FAQ" below.
+
+       The pcre2_jit_stack_create() function creates a JIT  stack.  Its  argu-
+       ments  are  a starting size, a maximum size, and a general context (for
+       memory allocation functions, or NULL for standard  memory  allocation).
+       It returns a pointer to an opaque structure of type pcre2_jit_stack, or
+       NULL if there is an error. The pcre2_jit_stack_free() function is  used
+       to free a stack that is no longer needed. If its argument is NULL, this
+       function returns immediately, without doing anything. (For the  techni-
+       cally  minded: the address space is allocated by mmap or VirtualAlloc.)
+       A maximum stack size of 512KiB to 1MiB should be more than  enough  for
+       any pattern.
+
+       The  pcre2_jit_stack_assign()  function  specifies which stack JIT code
+       should use. Its arguments are as follows:
+
+         pcre2_match_context  *mcontext
+         pcre2_jit_callback    callback
+         void                 *data
+
+       The first argument is a pointer to a match context. When this is subse-
+       quently passed to a matching function, its information determines which
+       JIT stack is used. If this argument is NULL, the function returns imme-
+       diately,  without  doing anything. There are three cases for the values
+       of the other two options:
+
+         (1) If callback is NULL and data is NULL, an internal 32KiB block
+             on the machine stack is used. This is the default when a match
+             context is created.
+
+         (2) If callback is NULL and data is not NULL, data must be
+             a pointer to a valid JIT stack, the result of calling
+             pcre2_jit_stack_create().
+
+         (3) If callback is not NULL, it must point to a function that is
+             called with data as an argument at the start of matching, in
+             order to set up a JIT stack. If the return from the callback
+             function is NULL, the internal 32KiB stack is used; otherwise the
+             return value must be a valid JIT stack, the result of calling
+             pcre2_jit_stack_create().
+
+       A callback function is obeyed whenever JIT code is about to be run;  it
+       is not obeyed when pcre2_match() is called with options that are incom-
+       patible for JIT matching. A callback function can therefore be used  to
+       determine  whether  a  match  operation  was  executed by JIT or by the
+       interpreter.
+
+       You may safely use the same JIT stack for more than one pattern (either
+       by  assigning  directly  or  by  callback), as long as the patterns are
+       matched sequentially in the same thread. Currently, the only way to set
+       up  non-sequential matches in one thread is to use callouts: if a call-
+       out function starts another match, that match must use a different  JIT
+       stack to the one used for currently suspended match(es).
+
+       In  a multithread application, if you do not specify a JIT stack, or if
+       you assign or pass back NULL from  a  callback,  that  is  thread-safe,
+       because  each  thread has its own machine stack. However, if you assign
+       or pass back a non-NULL JIT stack, this must be a different  stack  for
+       each thread so that the application is thread-safe.
+
+       Strictly  speaking,  even more is allowed. You can assign the same non-
+       NULL stack to a match context that is used by any number  of  patterns,
+       as  long  as  they are not used for matching by multiple threads at the
+       same time. For example, you could use the same stack  in  all  compiled
+       patterns,  with  a global mutex in the callback to wait until the stack
+       is available for use. However, this is an inefficient solution, and not
+       recommended.
+
+       This  is a suggestion for how a multithreaded program that needs to set
+       up non-default JIT stacks might operate:
+
+         During thread initalization
+           thread_local_var = pcre2_jit_stack_create(...)
+
+         During thread exit
+           pcre2_jit_stack_free(thread_local_var)
+
+         Use a one-line callback function
+           return thread_local_var
+
+       All the functions described in this section do nothing if  JIT  is  not
+       available.
+
+
+JIT STACK FAQ
+
+       (1) Why do we need JIT stacks?
+
+       PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack
+       where the local data of the current node is pushed before checking  its
+       child nodes.  Allocating real machine stack on some platforms is diffi-
+       cult. For example, the stack chain needs to be updated every time if we
+       extend  the  stack  on  PowerPC.  Although it is possible, its updating
+       time overhead decreases performance. So we do the recursion in memory.
+
+       (2) Why don't we simply allocate blocks of memory with malloc()?
+
+       Modern operating systems have a  nice  feature:  they  can  reserve  an
+       address space instead of allocating memory. We can safely allocate mem-
+       ory pages inside this address space, so the stack  could  grow  without
+       moving memory data (this is important because of pointers). Thus we can
+       allocate 1MiB address space, and use only a single memory page (usually
+       4KiB)  if that is enough. However, we can still grow up to 1MiB anytime
+       if needed.
+
+       (3) Who "owns" a JIT stack?
+
+       The owner of the stack is the user program, not the JIT studied pattern
+       or anything else. The user program must ensure that if a stack is being
+       used by pcre2_match(), (that is, it is assigned to a match context that
+       is  passed  to  the  pattern currently running), that stack must not be
+       used by any other threads (to avoid overwriting the same memory  area).
+       The best practice for multithreaded programs is to allocate a stack for
+       each thread, and return this stack through the JIT callback function.
+
+       (4) When should a JIT stack be freed?
+
+       You can free a JIT stack at any time, as long as it will not be used by
+       pcre2_match() again. When you assign the stack to a match context, only
+       a pointer is set. There is no reference counting or  any  other  magic.
+       You can free compiled patterns, contexts, and stacks in any order, any-
+       time. Just do not call pcre2_match() with a match context  pointing  to
+       an already freed stack, as that will cause SEGFAULT. (Also, do not free
+       a stack currently used by pcre2_match() in  another  thread).  You  can
+       also  replace the stack in a context at any time when it is not in use.
+       You should free the previous stack before assigning a replacement.
+
+       (5) Should I allocate/free a  stack  every  time  before/after  calling
+       pcre2_match()?
+
+       No,  because  this  is  too  costly in terms of resources. However, you
+       could implement some clever idea which release the stack if it  is  not
+       used  in  let's  say  two minutes. The JIT callback can help to achieve
+       this without keeping a list of patterns.
+
+       (6) OK, the stack is for long term memory allocation. But what  happens
+       if  a  pattern causes stack overflow with a stack of 1MiB? Is that 1MiB
+       kept until the stack is freed?
+
+       Especially on embedded sytems, it might be a good idea to release  mem-
+       ory  sometimes  without  freeing the stack. There is no API for this at
+       the moment.  Probably a function call which returns with the  currently
+       allocated  memory for any stack and another which allows releasing mem-
+       ory (shrinking the stack) would be a good idea if someone needs this.
+
+       (7) This is too much of a headache. Isn't there any better solution for
+       JIT stack handling?
+
+       No,  thanks to Windows. If POSIX threads were used everywhere, we could
+       throw out this complicated API.
+
+
+FREEING JIT SPECULATIVE MEMORY
+
+       void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
+
+       The JIT executable allocator does not free all memory when it is possi-
+       ble.   It expects new allocations, and keeps some free memory around to
+       improve allocation speed. However, in low memory conditions,  it  might
+       be  better to free all possible memory. You can cause this to happen by
+       calling pcre2_jit_free_unused_memory(). Its argument is a general  con-
+       text, for custom memory management, or NULL for standard memory manage-
+       ment.
+
+
+EXAMPLE CODE
+
+       This is a single-threaded example that specifies a  JIT  stack  without
+       using  a  callback.  A real program should include error checking after
+       all the function calls.
+
+         int rc;
+         pcre2_code *re;
+         pcre2_match_data *match_data;
+         pcre2_match_context *mcontext;
+         pcre2_jit_stack *jit_stack;
+
+         re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0,
+           &errornumber, &erroffset, NULL);
+         rc = pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);
+         mcontext = pcre2_match_context_create(NULL);
+         jit_stack = pcre2_jit_stack_create(32*1024, 512*1024, NULL);
+         pcre2_jit_stack_assign(mcontext, NULL, jit_stack);
+         match_data = pcre2_match_data_create(re, 10);
+         rc = pcre2_match(re, subject, length, 0, 0, match_data, mcontext);
+         /* Process result */
+
+         pcre2_code_free(re);
+         pcre2_match_data_free(match_data);
+         pcre2_match_context_free(mcontext);
+         pcre2_jit_stack_free(jit_stack);
+
+
+JIT FAST PATH API
+
+       Because the API described above falls back to interpreted matching when
+       JIT  is  not  available, it is convenient for programs that are written
+       for  general  use  in  many  environments.  However,  calling  JIT  via
+       pcre2_match() does have a performance impact. Programs that are written
+       for use where JIT is known to be available, and  which  need  the  best
+       possible  performance,  can  instead  use a "fast path" API to call JIT
+       matching directly instead of calling pcre2_match() (obviously only  for
+       patterns that have been successfully processed by pcre2_jit_compile()).
+
+       The  fast  path  function  is  called  pcre2_jit_match(),  and it takes
+       exactly the same arguments as pcre2_match(). The return values are also
+       the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
+       complete) is requested that was not compiled. Unsupported  option  bits
+       (for  example,  PCRE2_ANCHORED)  are  ignored,  as  is the PCRE2_NO_JIT
+       option.
+
+       When you call pcre2_match(), as well as testing for invalid options,  a
+       number of other sanity checks are performed on the arguments. For exam-
+       ple, if the subject pointer is NULL, an immediate error is given. Also,
+       unless  PCRE2_NO_UTF_CHECK  is  set, a UTF subject string is tested for
+       validity. In the interests of speed, these checks do not happen on  the
+       JIT fast path, and if invalid data is passed, the result is undefined.
+
+       Bypassing  the  sanity  checks  and the pcre2_match() wrapping can give
+       speedups of more than 10%.
+
+
+SEE ALSO
+
+       pcre2api(3)
+
+
+AUTHOR
+
+       Philip Hazel (FAQ by Zoltan Herczeg)
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 28 June 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2LIMITS(3)             Library Functions Manual             PCRE2LIMITS(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+SIZE AND OTHER LIMITATIONS
+
+       There are some size limitations in PCRE2 but it is hoped that they will
+       never in practice be relevant.
+
+       The maximum size of a compiled pattern  is  approximately  64  thousand
+       code units for the 8-bit and 16-bit libraries if PCRE2 is compiled with
+       the  default  internal  linkage  size,  which  is  2  bytes  for  these
+       libraries.  If  you  want to process regular expressions that are truly
+       enormous, you can compile PCRE2 with an internal linkage size of 3 or 4
+       (when  building  the  16-bit  library,  3  is rounded up to 4). See the
+       README file in the source distribution and the pcre2build documentation
+       for  details.  In  these cases the limit is substantially larger.  How-
+       ever, the speed of execution is slower.  In  the  32-bit  library,  the
+       internal linkage size is always 4.
+
+       The maximum length of a source pattern string is essentially unlimited;
+       it is the largest number a PCRE2_SIZE variable can hold.  However,  the
+       program that calls pcre2_compile() can specify a smaller limit.
+
+       The maximum length (in code units) of a subject string is one less than
+       the largest number a PCRE2_SIZE variable can  hold.  PCRE2_SIZE  is  an
+       unsigned  integer  type,  usually  defined as size_t. Its maximum value
+       (that is ~(PCRE2_SIZE)0) is reserved as a special indicator  for  zero-
+       terminated strings and unset offsets.
+
+       All values in repeating quantifiers must be less than 65536.
+
+       The maximum length of a lookbehind assertion is 65535 characters.
+
+       There is no limit to the number of parenthesized subpatterns, but there
+       can be no more than 65535 capturing subpatterns. There is,  however,  a
+       limit  to  the  depth  of  nesting  of parenthesized subpatterns of all
+       kinds. This is imposed in order to limit the  amount  of  system  stack
+       used  at compile time. The default limit can be specified when PCRE2 is
+       built; if not, the default is set to 250.  An  application  can  change
+       this limit by calling pcre2_set_parens_nest_limit() to set the limit in
+       a compile context.
+
+       The maximum length of name for a named subpattern is 32 code units, and
+       the maximum number of named subpatterns is 10000.
+
+       The  maximum  length  of  a  name  in  a (*MARK), (*PRUNE), (*SKIP), or
+       (*THEN) verb is 255 code units for the 8-bit  library  and  65535  code
+       units for the 16-bit and 32-bit libraries.
+
+       The  maximum  length  of  a string argument to a callout is the largest
+       number a 32-bit unsigned integer can hold.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 30 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2MATCHING(3)           Library Functions Manual           PCRE2MATCHING(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+PCRE2 MATCHING ALGORITHMS
+
+       This document describes the two different algorithms that are available
+       in PCRE2 for matching a compiled regular  expression  against  a  given
+       subject  string.  The  "standard"  algorithm is the one provided by the
+       pcre2_match() function. This works in the same as  as  Perl's  matching
+       function,  and  provide a Perl-compatible matching operation. The just-
+       in-time (JIT) optimization that is described in the pcre2jit documenta-
+       tion is compatible with this function.
+
+       An alternative algorithm is provided by the pcre2_dfa_match() function;
+       it operates in a different way, and is not Perl-compatible. This alter-
+       native  has  advantages  and  disadvantages  compared with the standard
+       algorithm, and these are described below.
+
+       When there is only one possible way in which a given subject string can
+       match  a pattern, the two algorithms give the same answer. A difference
+       arises, however, when there are multiple possibilities. For example, if
+       the pattern
+
+         ^<.*>
+
+       is matched against the string
+
+         <something> <something else> <something further>
+
+       there are three possible answers. The standard algorithm finds only one
+       of them, whereas the alternative algorithm finds all three.
+
+
+REGULAR EXPRESSIONS AS TREES
+
+       The set of strings that are matched by a regular expression can be rep-
+       resented  as  a  tree structure. An unlimited repetition in the pattern
+       makes the tree of infinite size, but it is still a tree.  Matching  the
+       pattern  to a given subject string (from a given starting point) can be
+       thought of as a search of the tree.  There are two  ways  to  search  a
+       tree:  depth-first  and  breadth-first, and these correspond to the two
+       matching algorithms provided by PCRE2.
+
+
+THE STANDARD MATCHING ALGORITHM
+
+       In the terminology of Jeffrey Friedl's book "Mastering Regular  Expres-
+       sions",  the  standard  algorithm  is an "NFA algorithm". It conducts a
+       depth-first search of the pattern tree. That is, it  proceeds  along  a
+       single path through the tree, checking that the subject matches what is
+       required. When there is a mismatch, the algorithm  tries  any  alterna-
+       tives  at  the  current point, and if they all fail, it backs up to the
+       previous branch point in the  tree,  and  tries  the  next  alternative
+       branch  at  that  level.  This often involves backing up (moving to the
+       left) in the subject string as well.  The  order  in  which  repetition
+       branches  are  tried  is controlled by the greedy or ungreedy nature of
+       the quantifier.
+
+       If a leaf node is reached, a matching string has  been  found,  and  at
+       that  point the algorithm stops. Thus, if there is more than one possi-
+       ble match, this algorithm returns the first one that it finds.  Whether
+       this  is the shortest, the longest, or some intermediate length depends
+       on the way the greedy and ungreedy repetition quantifiers are specified
+       in the pattern.
+
+       Because  it  ends  up  with a single path through the tree, it is rela-
+       tively straightforward for this algorithm to keep  track  of  the  sub-
+       strings  that  are  matched  by portions of the pattern in parentheses.
+       This provides support for capturing parentheses and backreferences.
+
+
+THE ALTERNATIVE MATCHING ALGORITHM
+
+       This algorithm conducts a breadth-first search of  the  tree.  Starting
+       from  the  first  matching  point  in the subject, it scans the subject
+       string from left to right, once, character by character, and as it does
+       this,  it remembers all the paths through the tree that represent valid
+       matches. In Friedl's terminology, this is a kind  of  "DFA  algorithm",
+       though  it is not implemented as a traditional finite state machine (it
+       keeps multiple states active simultaneously).
+
+       Although the general principle of this matching algorithm  is  that  it
+       scans  the subject string only once, without backtracking, there is one
+       exception: when a lookaround assertion is encountered,  the  characters
+       following  or  preceding  the  current  point  have to be independently
+       inspected.
+
+       The scan continues until either the end of the subject is  reached,  or
+       there  are  no more unterminated paths. At this point, terminated paths
+       represent the different matching possibilities (if there are none,  the
+       match  has  failed).   Thus,  if there is more than one possible match,
+       this algorithm finds all of them, and in particular, it finds the long-
+       est.  The  matches are returned in decreasing order of length. There is
+       an option to stop the algorithm after the first match (which is  neces-
+       sarily the shortest) is found.
+
+       Note that all the matches that are found start at the same point in the
+       subject. If the pattern
+
+         cat(er(pillar)?)?
+
+       is matched against the string "the caterpillar catchment",  the  result
+       is  the  three  strings "caterpillar", "cater", and "cat" that start at
+       the fifth character of the subject. The algorithm  does  not  automati-
+       cally move on to find matches that start at later positions.
+
+       PCRE2's "auto-possessification" optimization usually applies to charac-
+       ter repeats at the end of a pattern (as well as internally). For  exam-
+       ple, the pattern "a\d+" is compiled as if it were "a\d++" because there
+       is no point even considering the possibility of backtracking  into  the
+       repeated  digits.  For  DFA matching, this means that only one possible
+       match is found. If you really do want multiple matches in  such  cases,
+       either  use  an ungreedy repeat ("a\d+?") or set the PCRE2_NO_AUTO_POS-
+       SESS option when compiling.
+
+       There are a number of features of PCRE2 regular  expressions  that  are
+       not  supported  by the alternative matching algorithm. They are as fol-
+       lows:
+
+       1. Because the algorithm finds all  possible  matches,  the  greedy  or
+       ungreedy  nature  of  repetition quantifiers is not relevant (though it
+       may affect auto-possessification, as just described). During  matching,
+       greedy  and  ungreedy  quantifiers are treated in exactly the same way.
+       However, possessive quantifiers can make a difference when what follows
+       could  also  match  what  is  quantified, for example in a pattern like
+       this:
+
+         ^a++\w!
+
+       This pattern matches "aaab!" but not "aaa!", which would be matched  by
+       a  non-possessive quantifier. Similarly, if an atomic group is present,
+       it is matched as if it were a standalone pattern at the current  point,
+       and  the  longest match is then "locked in" for the rest of the overall
+       pattern.
+
+       2. When dealing with multiple paths through the tree simultaneously, it
+       is  not  straightforward  to  keep track of captured substrings for the
+       different matching possibilities, and PCRE2's  implementation  of  this
+       algorithm does not attempt to do this. This means that no captured sub-
+       strings are available.
+
+       3. Because no substrings are captured, backreferences within  the  pat-
+       tern are not supported, and cause errors if encountered.
+
+       4.  For  the same reason, conditional expressions that use a backrefer-
+       ence as the condition or test for a specific group  recursion  are  not
+       supported.
+
+       5.  Because  many  paths  through the tree may be active, the \K escape
+       sequence, which resets the start of the match when encountered (but may
+       be  on  some  paths  and not on others), is not supported. It causes an
+       error if encountered.
+
+       6. Callouts are supported, but the value of the  capture_top  field  is
+       always 1, and the value of the capture_last field is always 0.
+
+       7.  The  \C  escape  sequence, which (in the standard algorithm) always
+       matches a single code unit, even in a UTF mode,  is  not  supported  in
+       these  modes,  because the alternative algorithm moves through the sub-
+       ject string one character (not code unit) at a  time,  for  all  active
+       paths through the tree.
+
+       8.  Except for (*FAIL), the backtracking control verbs such as (*PRUNE)
+       are not supported. (*FAIL) is supported, and  behaves  like  a  failing
+       negative assertion.
+
+
+ADVANTAGES OF THE ALTERNATIVE ALGORITHM
+
+       Using  the alternative matching algorithm provides the following advan-
+       tages:
+
+       1. All possible matches (at a single point in the subject) are automat-
+       ically  found,  and  in particular, the longest match is found. To find
+       more than one match using the standard algorithm, you have to do kludgy
+       things with callouts.
+
+       2.  Because  the  alternative  algorithm  scans the subject string just
+       once, and never needs to backtrack (except for lookbehinds), it is pos-
+       sible  to  pass  very  long subject strings to the matching function in
+       several pieces, checking for partial matching each time. Although it is
+       also  possible  to  do  multi-segment matching using the standard algo-
+       rithm, by retaining partially matched substrings, it  is  more  compli-
+       cated. The pcre2partial documentation gives details of partial matching
+       and discusses multi-segment matching.
+
+
+DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
+
+       The alternative algorithm suffers from a number of disadvantages:
+
+       1. It is substantially slower than  the  standard  algorithm.  This  is
+       partly  because  it has to search for all possible matches, but is also
+       because it is less susceptible to optimization.
+
+       2. Capturing parentheses and backreferences are not supported.
+
+       3. Although atomic groups are supported, their use does not provide the
+       performance advantage that it does for the standard algorithm.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 29 September 2014
+       Copyright (c) 1997-2014 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2PARTIAL(3)            Library Functions Manual            PCRE2PARTIAL(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions
+
+PARTIAL MATCHING IN PCRE2
+
+       In  normal  use  of  PCRE2,  if  the subject string that is passed to a
+       matching function matches as far as it goes, but is too short to  match
+       the  entire pattern, PCRE2_ERROR_NOMATCH is returned. There are circum-
+       stances where it might be helpful to distinguish this case  from  other
+       cases in which there is no match.
+
+       Consider, for example, an application where a human is required to type
+       in data for a field with specific formatting requirements.  An  example
+       might be a date in the form ddmmmyy, defined by this pattern:
+
+         ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
+
+       If the application sees the user's keystrokes one by one, and can check
+       that what has been typed so far is potentially valid,  it  is  able  to
+       raise  an  error  as  soon  as  a  mistake  is made, by beeping and not
+       reflecting the character that has been typed, for example. This immedi-
+       ate  feedback is likely to be a better user interface than a check that
+       is delayed until the entire string has been entered.  Partial  matching
+       can  also be useful when the subject string is very long and is not all
+       available at once.
+
+       PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT  and
+       PCRE2_PARTIAL_HARD  options,  which  can be set when calling a matching
+       function.  The difference between the two options is whether or  not  a
+       partial match is preferred to an alternative complete match, though the
+       details differ between the two types  of  matching  function.  If  both
+       options are set, PCRE2_PARTIAL_HARD takes precedence.
+
+       If  you  want to use partial matching with just-in-time optimized code,
+       you must call pcre2_jit_compile() with one or both of these options:
+
+         PCRE2_JIT_PARTIAL_SOFT
+         PCRE2_JIT_PARTIAL_HARD
+
+       PCRE2_JIT_COMPLETE should also be set if you are going to run  non-par-
+       tial  matches  on the same pattern. If the appropriate JIT mode has not
+       been compiled, interpretive matching code is used.
+
+       Setting a partial matching option  disables  two  of  PCRE2's  standard
+       optimizations. PCRE2 remembers the last literal code unit in a pattern,
+       and abandons matching immediately if it is not present in  the  subject
+       string.  This  optimization  cannot  be  used for a subject string that
+       might match only partially. PCRE2 also knows the minimum  length  of  a
+       matching  string,  and  does not bother to run the matching function on
+       shorter strings. This optimization is also disabled for partial  match-
+       ing.
+
+
+PARTIAL MATCHING USING pcre2_match()
+
+       A  partial  match occurs during a call to pcre2_match() when the end of
+       the subject string is reached successfully, but  matching  cannot  con-
+       tinue because more characters are needed. However, at least one charac-
+       ter in the subject must have been inspected. This  character  need  not
+       form part of the final matched string; lookbehind assertions and the \K
+       escape sequence provide ways of inspecting characters before the  start
+       of  a matched string. The requirement for inspecting at least one char-
+       acter exists because an empty string can  always  be  matched;  without
+       such  a  restriction  there would always be a partial match of an empty
+       string at the end of the subject.
+
+       When a partial match is returned, the first two elements in the ovector
+       point to the portion of the subject that was matched, but the values in
+       the rest of the ovector are undefined. The appearance of \K in the pat-
+       tern has no effect for a partial match. Consider this pattern:
+
+         /abc\K123/
+
+       If it is matched against "456abc123xyz" the result is a complete match,
+       and the ovector defines the matched string as "123", because \K  resets
+       the  "start  of  match" point. However, if a partial match is requested
+       and the subject string is "456abc12", a partial match is found for  the
+       string  "abc12",  because  all these characters are needed for a subse-
+       quent re-match with additional characters.
+
+       What happens when a partial match is identified depends on which of the
+       two partial matching options are set.
+
+   PCRE2_PARTIAL_SOFT WITH pcre2_match()
+
+       If  PCRE2_PARTIAL_SOFT  is  set when pcre2_match() identifies a partial
+       match, the partial match is remembered, but matching continues as  nor-
+       mal,  and  other  alternatives in the pattern are tried. If no complete
+       match  can  be  found,  PCRE2_ERROR_PARTIAL  is  returned  instead   of
+       PCRE2_ERROR_NOMATCH.
+
+       This  option  is "soft" because it prefers a complete match over a par-
+       tial match.  All the various matching items in a pattern behave  as  if
+       the  subject string is potentially complete. For example, \z, \Z, and $
+       match at the end of the subject, as normal, and for \b and \B  the  end
+       of the subject is treated as a non-alphanumeric.
+
+       If  there  is more than one partial match, the first one that was found
+       provides the data that is returned. Consider this pattern:
+
+         /123\w+X|dogY/
+
+       If this is matched against the subject string "abc123dog", both  alter-
+       natives  fail  to  match,  but the end of the subject is reached during
+       matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to  3
+       and  9, identifying "123dog" as the first partial match that was found.
+       (In this example, there are two partial matches, because "dog"  on  its
+       own partially matches the second alternative.)
+
+   PCRE2_PARTIAL_HARD WITH pcre2_match()
+
+       If  PCRE2_PARTIAL_HARD is set for pcre2_match(), PCRE2_ERROR_PARTIAL is
+       returned as soon as a partial match is  found,  without  continuing  to
+       search  for possible complete matches. This option is "hard" because it
+       prefers an earlier partial match over a later complete match. For  this
+       reason,  the  assumption  is  made that the end of the supplied subject
+       string may not be the true end of the available data, and  so,  if  \z,
+       \Z,  \b, \B, or $ are encountered at the end of the subject, the result
+       is PCRE2_ERROR_PARTIAL, provided that at least  one  character  in  the
+       subject has been inspected.
+
+   Comparing hard and soft partial matching
+
+       The  difference  between the two partial matching options can be illus-
+       trated by a pattern such as:
+
+         /dog(sbody)?/
+
+       This matches either "dog" or "dogsbody", greedily (that is, it  prefers
+       the  longer  string  if  possible). If it is matched against the string
+       "dog" with PCRE2_PARTIAL_SOFT, it yields a complete  match  for  "dog".
+       However,  if  PCRE2_PARTIAL_HARD is set, the result is PCRE2_ERROR_PAR-
+       TIAL. On the other hand, if the pattern is made ungreedy the result  is
+       different:
+
+         /dog(sbody)??/
+
+       In  this  case  the  result  is always a complete match because that is
+       found first, and matching never  continues  after  finding  a  complete
+       match. It might be easier to follow this explanation by thinking of the
+       two patterns like this:
+
+         /dog(sbody)?/    is the same as  /dogsbody|dog/
+         /dog(sbody)??/   is the same as  /dog|dogsbody/
+
+       The second pattern will never match "dogsbody", because it will  always
+       find the shorter match first.
+
+
+PARTIAL MATCHING USING pcre2_dfa_match()
+
+       The DFA functions move along the subject string character by character,
+       without backtracking, searching for  all  possible  matches  simultane-
+       ously.  If the end of the subject is reached before the end of the pat-
+       tern, there is the possibility of a partial match, again provided  that
+       at least one character has been inspected.
+
+       When PCRE2_PARTIAL_SOFT is set, PCRE2_ERROR_PARTIAL is returned only if
+       there have been no complete matches. Otherwise,  the  complete  matches
+       are  returned.   However, if PCRE2_PARTIAL_HARD is set, a partial match
+       takes precedence over any complete matches. The portion of  the  string
+       that was matched when the longest partial match was found is set as the
+       first matching string.
+
+       Because the DFA functions always search for all possible  matches,  and
+       there  is  no  difference between greedy and ungreedy repetition, their
+       behaviour is different from  the  standard  functions  when  PCRE2_PAR-
+       TIAL_HARD  is  set.  Consider  the  string  "dog"  matched  against the
+       ungreedy pattern shown above:
+
+         /dog(sbody)??/
+
+       Whereas the standard function stops as soon as it  finds  the  complete
+       match  for  "dog",  the  DFA  function also finds the partial match for
+       "dogsbody", and so returns that when PCRE2_PARTIAL_HARD is set.
+
+
+PARTIAL MATCHING AND WORD BOUNDARIES
+
+       If a pattern ends with one of sequences \b or \B, which test  for  word
+       boundaries,  partial matching with PCRE2_PARTIAL_SOFT can give counter-
+       intuitive results. Consider this pattern:
+
+         /\bcat\b/
+
+       This matches "cat", provided there is a word boundary at either end. If
+       the subject string is "the cat", the comparison of the final "t" with a
+       following character cannot take place, so a  partial  match  is  found.
+       However,  normal  matching carries on, and \b matches at the end of the
+       subject when the last character is a letter, so  a  complete  match  is
+       found.   The  result,  therefore,  is  not  PCRE2_ERROR_PARTIAL.  Using
+       PCRE2_PARTIAL_HARD in this case does yield PCRE2_ERROR_PARTIAL, because
+       then the partial match takes precedence.
+
+
+EXAMPLE OF PARTIAL MATCHING USING PCRE2TEST
+
+       If  the  partial_soft  (or  ps) modifier is present on a pcre2test data
+       line, the PCRE2_PARTIAL_SOFT option is used for the match.  Here  is  a
+       run of pcre2test that uses the date example quoted above:
+
+           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+         data> 25jun04\=ps
+          0: 25jun04
+          1: jun
+         data> 25dec3\=ps
+         Partial match: 23dec3
+         data> 3ju\=ps
+         Partial match: 3ju
+         data> 3juj\=ps
+         No match
+         data> j\=ps
+         No match
+
+       The  first  data  string  is matched completely, so pcre2test shows the
+       matched substrings. The remaining four strings do not  match  the  com-
+       plete pattern, but the first two are partial matches. Similar output is
+       obtained if DFA matching is used.
+
+       If the partial_hard (or ph) modifier is present  on  a  pcre2test  data
+       line, the PCRE2_PARTIAL_HARD option is set for the match.
+
+
+MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()
+
+       When  a  partial match has been found using a DFA matching function, it
+       is possible to continue the match by providing additional subject  data
+       and  calling  the function again with the same compiled regular expres-
+       sion, this time setting the PCRE2_DFA_RESTART option. You must pass the
+       same working space as before, because this is where details of the pre-
+       vious partial match are stored. Here is an example using pcre2test:
+
+           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+         data> 23ja\=dfa,ps
+         Partial match: 23ja
+         data> n05\=dfa,dfa_restart
+          0: n05
+
+       The first call has "23ja" as the subject, and requests  partial  match-
+       ing;  the  second  call  has  "n05"  as  the  subject for the continued
+       (restarted) match.  Notice that when the match is  complete,  only  the
+       last  part  is  shown;  PCRE2 does not retain the previously partially-
+       matched string. It is up to the calling program to do that if it  needs
+       to.
+
+       That means that, for an unanchored pattern, if a continued match fails,
+       it is not possible to try again at  a  new  starting  point.  All  this
+       facility  is  capable  of  doing  is continuing with the previous match
+       attempt. In the previous example, if the second set of data  is  "ug23"
+       the  result is no match, even though there would be a match for "aug23"
+       if the entire string were given at once. Depending on the  application,
+       this may or may not be what you want.  The only way to allow for start-
+       ing again at the next character is to retain the matched  part  of  the
+       subject and try a new complete match.
+
+       You  can  set the PCRE2_PARTIAL_SOFT or PCRE2_PARTIAL_HARD options with
+       PCRE2_DFA_RESTART to continue partial matching over multiple  segments.
+       This  facility can be used to pass very long subject strings to the DFA
+       matching functions.
+
+
+MULTI-SEGMENT MATCHING WITH pcre2_match()
+
+       Unlike the DFA function, it is not possible  to  restart  the  previous
+       match with a new segment of data when using pcre2_match(). Instead, new
+       data must be added to the previous subject string, and the entire match
+       re-run,  starting from the point where the partial match occurred. Ear-
+       lier data can be discarded.
+
+       It is best to use PCRE2_PARTIAL_HARD in this situation, because it does
+       not  treat the end of a segment as the end of the subject when matching
+       \z, \Z, \b, \B, and $. Consider  an  unanchored  pattern  that  matches
+       dates:
+
+           re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
+         data> The date is 23ja\=ph
+         Partial match: 23ja
+
+       At  this stage, an application could discard the text preceding "23ja",
+       add on text from the next  segment,  and  call  the  matching  function
+       again.  Unlike  the  DFA  matching function, the entire matching string
+       must always be available, and the complete matching process occurs  for
+       each call, so more memory and more processing time is needed.
+
+
+ISSUES WITH MULTI-SEGMENT MATCHING
+
+       Certain types of pattern may give problems with multi-segment matching,
+       whichever matching function is used.
+
+       1. If the pattern contains a test for the beginning of a line, you need
+       to  pass  the  PCRE2_NOTBOL option when the subject string for any call
+       does start at the beginning of a line. There  is  also  a  PCRE2_NOTEOL
+       option, but in practice when doing multi-segment matching you should be
+       using PCRE2_PARTIAL_HARD, which includes the effect of PCRE2_NOTEOL.
+
+       2. If a pattern contains a lookbehind assertion, characters  that  pre-
+       cede  the start of the partial match may have been inspected during the
+       matching process.  When using pcre2_match(), sufficient characters must
+       be  retained  for  the  next  match attempt. You can ensure that enough
+       characters are retained by doing the following:
+
+       Before doing any matching, find the length of the longest lookbehind in
+       the     pattern    by    calling    pcre2_pattern_info()    with    the
+       PCRE2_INFO_MAXLOOKBEHIND option. Note that the resulting  count  is  in
+       characters, not code units. After a partial match, moving back from the
+       ovector[0] offset in the subject by the number of characters given  for
+       the  maximum lookbehind gets you to the earliest character that must be
+       retained. In a non-UTF or a 32-bit situation, moving  back  is  just  a
+       subtraction,  but in UTF-8 or UTF-16 you have to count characters while
+       moving back through the code units.
+
+       Characters before the point you have now reached can be discarded,  and
+       after  the  next segment has been added to what is retained, you should
+       run the next match with the startoffset argument set so that the  match
+       begins at the same point as before.
+
+       For  example, if the pattern "(?<=123)abc" is partially matched against
+       the string "xx123ab", the ovector offsets are 5 and 7 ("ab"). The maxi-
+       mum  lookbehind  count  is  3, so all characters before offset 2 can be
+       discarded. The value of startoffset for the next  match  should  be  3.
+       When  pcre2test  displays  a partial match, it indicates the lookbehind
+       characters with '<' characters:
+
+           re> "(?<=123)abc"
+         data> xx123ab\=ph
+         Partial match: 123ab
+                        <<<
+
+       3. Because a partial match must always contain at least one  character,
+       what  might  be  considered a partial match of an empty string actually
+       gives a "no match" result. For example:
+
+           re> /c(?<=abc)x/
+         data> ab\=ps
+         No match
+
+       If the next segment begins "cx", a match should be found, but this will
+       only  happen  if characters from the previous segment are retained. For
+       this reason, a "no match" result  should  be  interpreted  as  "partial
+       match of an empty string" when the pattern contains lookbehinds.
+
+       4.  Matching  a subject string that is split into multiple segments may
+       not always produce exactly the same result as matching over one  single
+       long  string,  especially  when PCRE2_PARTIAL_SOFT is used. The section
+       "Partial Matching and Word Boundaries" above describes  an  issue  that
+       arises  if  the  pattern ends with \b or \B. Another kind of difference
+       may occur when there are multiple matching possibilities, because  (for
+       PCRE2_PARTIAL_SOFT) a partial match result is given only when there are
+       no completed matches. This means that as soon as the shortest match has
+       been  found,  continuation to a new subject segment is no longer possi-
+       ble. Consider this pcre2test example:
+
+           re> /dog(sbody)?/
+         data> dogsb\=ps
+          0: dog
+         data> do\=ps,dfa
+         Partial match: do
+         data> gsb\=ps,dfa,dfa_restart
+          0: g
+         data> dogsbody\=dfa
+          0: dogsbody
+          1: dog
+
+       The first data line passes the string "dogsb" to  a  standard  matching
+       function, setting the PCRE2_PARTIAL_SOFT option. Although the string is
+       a partial match for "dogsbody", the result is not  PCRE2_ERROR_PARTIAL,
+       because  the  shorter string "dog" is a complete match. Similarly, when
+       the subject is presented to a DFA matching function  in  several  parts
+       ("do"  and  "gsb"  being  the first two) the match stops when "dog" has
+       been found, and it is not possible to continue.  On the other hand,  if
+       "dogsbody"  is  presented  as  a single string, a DFA matching function
+       finds both matches.
+
+       Because of these problems, it is best to  use  PCRE2_PARTIAL_HARD  when
+       matching  multi-segment  data.  The  example above then behaves differ-
+       ently:
+
+           re> /dog(sbody)?/
+         data> dogsb\=ph
+         Partial match: dogsb
+         data> do\=ps,dfa
+         Partial match: do
+         data> gsb\=ph,dfa,dfa_restart
+         Partial match: gsb
+
+       5. Patterns that contain alternatives at the top level which do not all
+       start  with  the  same  pattern  item  may  not  work  as expected when
+       PCRE2_DFA_RESTART is used. For example, consider this pattern:
+
+         1234|3789
+
+       If the first part of the subject is "ABC123", a partial  match  of  the
+       first  alternative  is found at offset 3. There is no partial match for
+       the second alternative, because such a match does not start at the same
+       point  in  the  subject  string. Attempting to continue with the string
+       "7890" does not yield a match  because  only  those  alternatives  that
+       match  at  one  point in the subject are remembered. The problem arises
+       because the start of the second alternative matches  within  the  first
+       alternative.  There  is  no  problem with anchored patterns or patterns
+       such as:
+
+         1234|ABCD
+
+       where no string can be a partial match for both alternatives.  This  is
+       not  a  problem  if  a  standard matching function is used, because the
+       entire match has to be rerun each time:
+
+           re> /1234|3789/
+         data> ABC123\=ph
+         Partial match: 123
+         data> 1237890
+          0: 3789
+
+       Of course, instead of using PCRE2_DFA_RESTART, the  same  technique  of
+       re-running  the  entire  match  can  also be used with the DFA matching
+       function. Another possibility is to work with two buffers. If a partial
+       match  at  offset  n in the first buffer is followed by "no match" when
+       PCRE2_DFA_RESTART is used on the second buffer, you can then try a  new
+       match starting at offset n+1 in the first buffer.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 22 December 2014
+       Copyright (c) 1997-2014 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2PATTERN(3)            Library Functions Manual            PCRE2PATTERN(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+PCRE2 REGULAR EXPRESSION DETAILS
+
+       The  syntax and semantics of the regular expressions that are supported
+       by PCRE2 are described in detail below. There is a quick-reference syn-
+       tax  summary  in the pcre2syntax page. PCRE2 tries to match Perl syntax
+       and semantics as closely as it can.  PCRE2 also supports some  alterna-
+       tive  regular  expression syntax (which does not conflict with the Perl
+       syntax) in order to provide some compatibility with regular expressions
+       in Python, .NET, and Oniguruma.
+
+       Perl's  regular expressions are described in its own documentation, and
+       regular expressions in general are covered in a number of  books,  some
+       of  which  have  copious  examples. Jeffrey Friedl's "Mastering Regular
+       Expressions", published by  O'Reilly,  covers  regular  expressions  in
+       great  detail.  This  description  of  PCRE2's  regular  expressions is
+       intended as reference material.
+
+       This document discusses the patterns that are supported by  PCRE2  when
+       its  main  matching function, pcre2_match(), is used. PCRE2 also has an
+       alternative matching function, pcre2_dfa_match(), which matches using a
+       different  algorithm  that is not Perl-compatible. Some of the features
+       discussed below are not available when DFA matching is used. The advan-
+       tages and disadvantages of the alternative function, and how it differs
+       from the normal function, are discussed in the pcre2matching page.
+
+
+SPECIAL START-OF-PATTERN ITEMS
+
+       A number of options that can be passed to pcre2_compile() can  also  be
+       set by special items at the start of a pattern. These are not Perl-com-
+       patible, but are provided to make these options accessible  to  pattern
+       writers  who are not able to change the program that processes the pat-
+       tern. Any number of these items  may  appear,  but  they  must  all  be
+       together right at the start of the pattern string, and the letters must
+       be in upper case.
+
+   UTF support
+
+       In the 8-bit and 16-bit PCRE2 libraries, characters may be coded either
+       as single code units, or as multiple UTF-8 or UTF-16 code units. UTF-32
+       can be specified for the 32-bit library, in which  case  it  constrains
+       the  character  values  to  valid  Unicode  code points. To process UTF
+       strings, PCRE2 must be built to include Unicode support (which  is  the
+       default).  When  using  UTF  strings you must either call the compiling
+       function with the PCRE2_UTF option, or the pattern must start with  the
+       special  sequence  (*UTF),  which is equivalent to setting the relevant
+       option. How setting a UTF mode affects pattern matching is mentioned in
+       several  places  below.  There  is  also  a  summary of features in the
+       pcre2unicode page.
+
+       Some applications that allow their users to supply patterns may wish to
+       restrict   them   to   non-UTF   data  for  security  reasons.  If  the
+       PCRE2_NEVER_UTF option is passed  to  pcre2_compile(),  (*UTF)  is  not
+       allowed, and its appearance in a pattern causes an error.
+
+   Unicode property support
+
+       Another  special  sequence that may appear at the start of a pattern is
+       (*UCP).  This has the same effect as setting the PCRE2_UCP  option:  it
+       causes  sequences such as \d and \w to use Unicode properties to deter-
+       mine character types, instead of recognizing only characters with codes
+       less than 256 via a lookup table.
+
+       Some applications that allow their users to supply patterns may wish to
+       restrict them for security reasons. If the  PCRE2_NEVER_UCP  option  is
+       passed to pcre2_compile(), (*UCP) is not allowed, and its appearance in
+       a pattern causes an error.
+
+   Locking out empty string matching
+
+       Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same
+       effect  as  passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option
+       to whichever matching function is subsequently called to match the pat-
+       tern.  These  options  lock  out  the matching of empty strings, either
+       entirely, or only at the start of the subject.
+
+   Disabling auto-possessification
+
+       If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect  as
+       setting  the PCRE2_NO_AUTO_POSSESS option. This stops PCRE2 from making
+       quantifiers possessive when what  follows  cannot  match  the  repeated
+       item. For example, by default a+b is treated as a++b. For more details,
+       see the pcre2api documentation.
+
+   Disabling start-up optimizations
+
+       If a pattern starts with (*NO_START_OPT), it has  the  same  effect  as
+       setting the PCRE2_NO_START_OPTIMIZE option. This disables several opti-
+       mizations for quickly reaching "no match" results.  For  more  details,
+       see the pcre2api documentation.
+
+   Disabling automatic anchoring
+
+       If  a  pattern starts with (*NO_DOTSTAR_ANCHOR), it has the same effect
+       as setting the PCRE2_NO_DOTSTAR_ANCHOR option. This disables  optimiza-
+       tions that apply to patterns whose top-level branches all start with .*
+       (match any number of arbitrary characters). For more details,  see  the
+       pcre2api documentation.
+
+   Disabling JIT compilation
+
+       If  a  pattern  that starts with (*NO_JIT) is successfully compiled, an
+       attempt by the application to apply the  JIT  optimization  by  calling
+       pcre2_jit_compile() is ignored.
+
+   Setting match resource limits
+
+       The pcre2_match() function contains a counter that is incremented every
+       time it goes round its main loop. The caller of pcre2_match() can set a
+       limit  on  this counter, which therefore limits the amount of computing
+       resource used for a match. The maximum depth of nested backtracking can
+       also  be  limited;  this indirectly restricts the amount of heap memory
+       that is used, but there is also an explicit memory limit  that  can  be
+       set.
+
+       These  facilities  are  provided to catch runaway matches that are pro-
+       voked by patterns with huge matching trees (a typical example is a pat-
+       tern  with  nested unlimited repeats applied to a long string that does
+       not match). When one of these limits is reached, pcre2_match() gives an
+       error  return.  The limits can also be set by items at the start of the
+       pattern of the form
+
+         (*LIMIT_HEAP=d)
+         (*LIMIT_MATCH=d)
+         (*LIMIT_DEPTH=d)
+
+       where d is any number of decimal digits. However, the value of the set-
+       ting  must  be  less than the value set (or defaulted) by the caller of
+       pcre2_match() for it to have any effect. In other  words,  the  pattern
+       writer  can lower the limits set by the programmer, but not raise them.
+       If there is more than one setting of one of  these  limits,  the  lower
+       value  is used. The heap limit is specified in kibibytes (units of 1024
+       bytes).
+
+       Prior to release 10.30, LIMIT_DEPTH was  called  LIMIT_RECURSION.  This
+       name is still recognized for backwards compatibility.
+
+       The heap limit applies only when the pcre2_match() or pcre2_dfa_match()
+       interpreters are used for matching. It does not apply to JIT. The match
+       limit  is used (but in a different way) when JIT is being used, or when
+       pcre2_dfa_match() is called, to limit computing resource usage by those
+       matching  functions.  The depth limit is ignored by JIT but is relevant
+       for DFA matching, which uses function recursion for  recursions  within
+       the  pattern  and  for lookaround assertions and atomic groups. In this
+       case, the depth limit controls the depth of such recursion.
+
+   Newline conventions
+
+       PCRE2 supports six different conventions for indicating line breaks  in
+       strings:  a  single  CR (carriage return) character, a single LF (line-
+       feed) character, the two-character sequence CRLF, any of the three pre-
+       ceding,  any  Unicode  newline  sequence,  or the NUL character (binary
+       zero). The pcre2api page has further  discussion  about  newlines,  and
+       shows how to set the newline convention when calling pcre2_compile().
+
+       It  is also possible to specify a newline convention by starting a pat-
+       tern string with one of the following sequences:
+
+         (*CR)        carriage return
+         (*LF)        linefeed
+         (*CRLF)      carriage return, followed by linefeed
+         (*ANYCRLF)   any of the three above
+         (*ANY)       all Unicode newline sequences
+         (*NUL)       the NUL character (binary zero)
+
+       These override the default and the options given to the compiling func-
+       tion.  For  example,  on  a Unix system where LF is the default newline
+       sequence, the pattern
+
+         (*CR)a.b
+
+       changes the convention to CR. That pattern matches "a\nb" because LF is
+       no longer a newline. If more than one of these settings is present, the
+       last one is used.
+
+       The newline convention affects where the circumflex and  dollar  asser-
+       tions are true. It also affects the interpretation of the dot metachar-
+       acter when PCRE2_DOTALL is not set, and the behaviour of  \N  when  not
+       followed  by  an opening brace. However, it does not affect what the \R
+       escape sequence matches.  By  default,  this  is  any  Unicode  newline
+       sequence, for Perl compatibility. However, this can be changed; see the
+       next section and the description of \R in the section entitled "Newline
+       sequences"  below. A change of \R setting can be combined with a change
+       of newline convention.
+
+   Specifying what \R matches
+
+       It is possible to restrict \R to match only CR, LF, or CRLF (instead of
+       the  complete  set  of  Unicode  line  endings)  by  setting the option
+       PCRE2_BSR_ANYCRLF at compile time. This effect can also be achieved  by
+       starting  a  pattern  with (*BSR_ANYCRLF). For completeness, (*BSR_UNI-
+       CODE) is also recognized, corresponding to PCRE2_BSR_UNICODE.
+
+
+EBCDIC CHARACTER CODES
+
+       PCRE2 can be compiled to run in an environment that uses EBCDIC as  its
+       character  code instead of ASCII or Unicode (typically a mainframe sys-
+       tem). In the sections below, character code values are  ASCII  or  Uni-
+       code; in an EBCDIC environment these characters may have different code
+       values, and there are no code points greater than 255.
+
+
+CHARACTERS AND METACHARACTERS
+
+       A regular expression is a pattern that is  matched  against  a  subject
+       string  from  left  to right. Most characters stand for themselves in a
+       pattern, and match the corresponding characters in the  subject.  As  a
+       trivial example, the pattern
+
+         The quick brown fox
+
+       matches a portion of a subject string that is identical to itself. When
+       caseless matching is specified (the PCRE2_CASELESS option), letters are
+       matched independently of case.
+
+       The  power  of  regular  expressions  comes from the ability to include
+       alternatives and repetitions in the pattern. These are encoded  in  the
+       pattern by the use of metacharacters, which do not stand for themselves
+       but instead are interpreted in some special way.
+
+       There are two different sets of metacharacters: those that  are  recog-
+       nized  anywhere in the pattern except within square brackets, and those
+       that are recognized within square brackets.  Outside  square  brackets,
+       the metacharacters are as follows:
+
+         \      general escape character with several uses
+         ^      assert start of string (or line, in multiline mode)
+         $      assert end of string (or line, in multiline mode)
+         .      match any character except newline (by default)
+         [      start character class definition
+         |      start of alternative branch
+         (      start subpattern
+         )      end subpattern
+         ?      extends the meaning of (
+                also 0 or 1 quantifier
+                also quantifier minimizer
+         *      0 or more quantifier
+         +      1 or more quantifier
+                also "possessive quantifier"
+         {      start min/max quantifier
+
+       Part  of  a  pattern  that is in square brackets is called a "character
+       class". In a character class the only metacharacters are:
+
+         \      general escape character
+         ^      negate the class, but only if the first character
+         -      indicates character range
+         [      POSIX character class (only if followed by POSIX
+                  syntax)
+         ]      terminates the character class
+
+       The following sections describe the use of each of the metacharacters.
+
+
+BACKSLASH
+
+       The backslash character has several uses. Firstly, if it is followed by
+       a character that is not a number or a letter, it takes away any special
+       meaning that character may have. This use of  backslash  as  an  escape
+       character applies both inside and outside character classes.
+
+       For  example,  if you want to match a * character, you must write \* in
+       the pattern. This escaping action applies whether or not the  following
+       character  would  otherwise be interpreted as a metacharacter, so it is
+       always safe to precede a non-alphanumeric  with  backslash  to  specify
+       that it stands for itself.  In particular, if you want to match a back-
+       slash, you write \\.
+
+       In a UTF mode, only ASCII numbers and letters have any special  meaning
+       after  a  backslash.  All  other characters (in particular, those whose
+       code points are greater than 127) are treated as literals.
+
+       If a pattern is compiled with the  PCRE2_EXTENDED  option,  most  white
+       space  in the pattern (other than in a character class), and characters
+       between a # outside a character class and the next newline,  inclusive,
+       are ignored. An escaping backslash can be used to include a white space
+       or # character as part of the pattern.
+
+       If you want to remove the special meaning from a  sequence  of  charac-
+       ters,  you can do so by putting them between \Q and \E. This is differ-
+       ent from Perl in that $ and  @  are  handled  as  literals  in  \Q...\E
+       sequences  in PCRE2, whereas in Perl, $ and @ cause variable interpola-
+       tion. Also, Perl does "double-quotish backslash interpolation"  on  any
+       backslashes  between \Q and \E which, its documentation says, "may lead
+       to confusing results". PCRE2 treats a backslash between \Q and \E  just
+       like any other character. Note the following examples:
+
+         Pattern            PCRE2 matches   Perl matches
+
+         \Qabc$xyz\E        abc$xyz        abc followed by the
+                                             contents of $xyz
+         \Qabc\$xyz\E       abc\$xyz       abc\$xyz
+         \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
+         \QA\B\E            A\B            A\B
+         \Q\\E              \              \\E
+
+       The  \Q...\E  sequence  is recognized both inside and outside character
+       classes.  An isolated \E that is not preceded by \Q is ignored.  If  \Q
+       is  not followed by \E later in the pattern, the literal interpretation
+       continues to the end of the pattern (that is,  \E  is  assumed  at  the
+       end).  If  the  isolated \Q is inside a character class, this causes an
+       error, because the character class  is  not  terminated  by  a  closing
+       square bracket.
+
+   Non-printing characters
+
+       A second use of backslash provides a way of encoding non-printing char-
+       acters in patterns in a visible manner. There is no restriction on  the
+       appearance  of non-printing characters in a pattern, but when a pattern
+       is being prepared by text editing, it is often easier to use one of the
+       following  escape sequences than the binary character it represents. In
+       an ASCII or Unicode environment, these escapes are as follows:
+
+         \a          alarm, that is, the BEL character (hex 07)
+         \cx         "control-x", where x is any printable ASCII character
+         \e          escape (hex 1B)
+         \f          form feed (hex 0C)
+         \n          linefeed (hex 0A)
+         \r          carriage return (hex 0D)
+         \t          tab (hex 09)
+         \0dd        character with octal code 0dd
+         \ddd        character with octal code ddd, or backreference
+         \o{ddd..}   character with octal code ddd..
+         \xhh        character with hex code hh
+         \x{hhh..}   character with hex code hhh..
+         \N{U+hhh..} character with Unicode hex code point hhh..
+         \uhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
+
+       The \N{U+hhh..} escape sequence is recognized only when  the  PCRE2_UTF
+       option is set, that is, when PCRE2 is operating in a Unicode mode. Perl
+       also uses \N{name} to specify characters by Unicode  name;  PCRE2  does
+       not  support  this.   Note  that  when \N is not followed by an opening
+       brace (curly bracket) it has an entirely  different  meaning,  matching
+       any character that is not a newline.
+
+       The  precise effect of \cx on ASCII characters is as follows: if x is a
+       lower case letter, it is converted to upper case. Then  bit  6  of  the
+       character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
+       (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and  \c;  becomes
+       hex  7B  (; is 3B). If the code unit following \c has a value less than
+       32 or greater than 126, a compile-time error occurs.
+
+       When PCRE2 is compiled in EBCDIC mode, \N{U+hhh..}  is  not  supported.
+       \a, \e, \f, \n, \r, and \t generate the appropriate EBCDIC code values.
+       The \c escape is processed as specified for Perl in the perlebcdic doc-
+       ument.  The  only characters that are allowed after \c are A-Z, a-z, or
+       one of @, [, \, ], ^, _, or ?. Any other character provokes a  compile-
+       time  error.  The  sequence  \c@ encodes character code 0; after \c the
+       letters (in either case) encode characters 1-26 (hex 01 to hex 1A);  [,
+       \,  ],  ^,  and  _  encode characters 27-31 (hex 1B to hex 1F), and \c?
+       becomes either 255 (hex FF) or 95 (hex 5F).
+
+       Thus, apart from \c?, these escapes generate the  same  character  code
+       values  as  they do in an ASCII environment, though the meanings of the
+       values mostly differ. For example, \cG always generates code  value  7,
+       which is BEL in ASCII but DEL in EBCDIC.
+
+       The  sequence  \c? generates DEL (127, hex 7F) in an ASCII environment,
+       but because 127 is not a control character in  EBCDIC,  Perl  makes  it
+       generate  the  APC character. Unfortunately, there are several variants
+       of EBCDIC. In most of them the APC character has  the  value  255  (hex
+       FF),  but  in  the one Perl calls POSIX-BC its value is 95 (hex 5F). If
+       certain other characters have POSIX-BC values, PCRE2 makes \c? generate
+       95; otherwise it generates 255.
+
+       After  \0  up  to two further octal digits are read. If there are fewer
+       than two digits, just  those  that  are  present  are  used.  Thus  the
+       sequence \0\x\015 specifies two binary zeros followed by a CR character
+       (code value 13). Make sure you supply two digits after the initial zero
+       if the pattern character that follows is itself an octal digit.
+
+       The  escape \o must be followed by a sequence of octal digits, enclosed
+       in braces. An error occurs if this is not the case. This  escape  is  a
+       recent  addition  to Perl; it provides way of specifying character code
+       points as octal numbers greater than 0777, and  it  also  allows  octal
+       numbers and backreferences to be unambiguously specified.
+
+       For greater clarity and unambiguity, it is best to avoid following \ by
+       a digit greater than zero. Instead, use \o{} or \x{} to specify numeri-
+       cal character code points, and \g{} to specify backreferences. The fol-
+       lowing paragraphs describe the old, ambiguous syntax.
+
+       The handling of a backslash followed by a digit other than 0 is compli-
+       cated, and Perl has changed over time, causing PCRE2 also to change.
+
+       Outside a character class, PCRE2 reads the digit and any following dig-
+       its as a decimal number. If the number is less than 10, begins with the
+       digit  8  or  9,  or if there are at least that many previous capturing
+       left parentheses in the expression, the entire sequence is taken  as  a
+       backreference.  A description of how this works is given later, follow-
+       ing the discussion of  parenthesized  subpatterns.   Otherwise,  up  to
+       three octal digits are read to form a character code.
+
+       Inside  a character class, PCRE2 handles \8 and \9 as the literal char-
+       acters "8" and "9", and otherwise reads up to three octal  digits  fol-
+       lowing the backslash, using them to generate a data character. Any sub-
+       sequent digits stand for themselves. For example, outside  a  character
+       class:
+
+         \040   is another way of writing an ASCII space
+         \40    is the same, provided there are fewer than 40
+                   previous capturing subpatterns
+         \7     is always a backreference
+         \11    might be a backreference, or another way of
+                   writing a tab
+         \011   is always a tab
+         \0113  is a tab followed by the character "3"
+         \113   might be a backreference, otherwise the
+                   character with octal code 113
+         \377   might be a backreference, otherwise
+                   the value 255 (decimal)
+         \81    is always a backreference
+
+       Note  that octal values of 100 or greater that are specified using this
+       syntax must not be introduced by a leading zero, because no  more  than
+       three octal digits are ever read.
+
+       By  default, after \x that is not followed by {, from zero to two hexa-
+       decimal digits are read (letters can be in upper or  lower  case).  Any
+       number of hexadecimal digits may appear between \x{ and }. If a charac-
+       ter other than a hexadecimal digit appears between \x{  and  },  or  if
+       there is no terminating }, an error occurs.
+
+       If  the  PCRE2_ALT_BSUX  option  is set, the interpretation of \x is as
+       just described only when it is followed by two hexadecimal digits. Oth-
+       erwise,  it  matches a literal "x" character. In this mode, support for
+       code points greater than 256 is provided by \u, which must be  followed
+       by  four hexadecimal digits; otherwise it matches a literal "u" charac-
+       ter.
+
+       Characters whose value is less than 256 can be defined by either of the
+       two syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no dif-
+       ference in the way they are handled. For example, \xdc is  exactly  the
+       same as \x{dc} (or \u00dc in PCRE2_ALT_BSUX mode).
+
+   Constraints on character values
+
+       Characters  that  are  specified using octal or hexadecimal numbers are
+       limited to certain values, as follows:
+
+         8-bit non-UTF mode    no greater than 0xff
+         16-bit non-UTF mode   no greater than 0xffff
+         32-bit non-UTF mode   no greater than 0xffffffff
+         All UTF modes         no greater than 0x10ffff and a valid code point
+
+       Invalid Unicode code points are all those in the range 0xd800 to 0xdfff
+       (the  so-called  "surrogate"  code  points). The check for these can be
+       disabled by  the  caller  of  pcre2_compile()  by  setting  the  option
+       PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.  However, this is possible only in
+       UTF-8 and UTF-32 modes, because these values are not  representable  in
+       UTF-16.
+
+   Escape sequences in character classes
+
+       All the sequences that define a single character value can be used both
+       inside and outside character classes. In addition, inside  a  character
+       class, \b is interpreted as the backspace character (hex 08).
+
+       When not followed by an opening brace, \N is not allowed in a character
+       class.  \B, \R, and \X are not special inside a character  class.  Like
+       other  unrecognized  alphabetic  escape sequences, they cause an error.
+       Outside a character class, these sequences have different meanings.
+
+   Unsupported escape sequences
+
+       In Perl, the sequences \F, \l, \L, \u, and \U  are  recognized  by  its
+       string  handler and used to modify the case of following characters. By
+       default, PCRE2 does not support these escape sequences. However, if the
+       PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be
+       used to define a character by code point, as described above.
+
+   Absolute and relative backreferences
+
+       The sequence \g followed by a signed  or  unsigned  number,  optionally
+       enclosed  in  braces, is an absolute or relative backreference. A named
+       backreference can be coded as \g{name}.  Backreferences  are  discussed
+       later, following the discussion of parenthesized subpatterns.
+
+   Absolute and relative subroutine calls
+
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       name or a number enclosed either in angle brackets or single quotes, is
+       an  alternative  syntax for referencing a subpattern as a "subroutine".
+       Details are discussed later.   Note  that  \g{...}  (Perl  syntax)  and
+       \g<...> (Oniguruma syntax) are not synonymous. The former is a backref-
+       erence; the latter is a subroutine call.
+
+   Generic character types
+
+       Another use of backslash is for specifying generic character types:
+
+         \d     any decimal digit
+         \D     any character that is not a decimal digit
+         \h     any horizontal white space character
+         \H     any character that is not a horizontal white space character
+         \N     any character that is not a newline
+         \s     any white space character
+         \S     any character that is not a white space character
+         \v     any vertical white space character
+         \V     any character that is not a vertical white space character
+         \w     any "word" character
+         \W     any "non-word" character
+
+       The \N escape sequence has the same meaning as  the  "."  metacharacter
+       when  PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change
+       the meaning of \N. Note that when \N is followed by an opening brace it
+       has a different meaning. See the section entitled "Non-printing charac-
+       ters" above for details. Perl also uses \N{name} to specify  characters
+       by Unicode name; PCRE2 does not support this.
+
+       Each  pair of lower and upper case escape sequences partitions the com-
+       plete set of characters into two disjoint  sets.  Any  given  character
+       matches  one, and only one, of each pair. The sequences can appear both
+       inside and outside character classes. They each match one character  of
+       the  appropriate  type.  If the current matching point is at the end of
+       the subject string, all of them fail, because there is no character  to
+       match.
+
+       The  default  \s  characters  are HT (9), LF (10), VT (11), FF (12), CR
+       (13), and space (32), which are defined  as  white  space  in  the  "C"
+       locale. This list may vary if locale-specific matching is taking place.
+       For example, in some locales the "non-breaking space" character  (\xA0)
+       is recognized as white space, and in others the VT character is not.
+
+       A  "word"  character is an underscore or any character that is a letter
+       or digit.  By default, the definition of letters  and  digits  is  con-
+       trolled by PCRE2's low-valued character tables, and may vary if locale-
+       specific matching is taking place (see "Locale support" in the pcre2api
+       page).  For  example,  in  a French locale such as "fr_FR" in Unix-like
+       systems, or "french" in Windows, some character codes greater than  127
+       are  used  for  accented letters, and these are then matched by \w. The
+       use of locales with Unicode is discouraged.
+
+       By default, characters whose code points are  greater  than  127  never
+       match \d, \s, or \w, and always match \D, \S, and \W, although this may
+       be different for characters in the range 128-255  when  locale-specific
+       matching  is  happening.   These escape sequences retain their original
+       meanings from before Unicode support was available,  mainly  for  effi-
+       ciency  reasons.  If  the  PCRE2_UCP  option  is  set, the behaviour is
+       changed so that Unicode properties  are  used  to  determine  character
+       types, as follows:
+
+         \d  any character that matches \p{Nd} (decimal digit)
+         \s  any character that matches \p{Z} or \h or \v
+         \w  any character that matches \p{L} or \p{N}, plus underscore
+
+       The  upper case escapes match the inverse sets of characters. Note that
+       \d matches only decimal digits, whereas \w matches any  Unicode  digit,
+       as well as any Unicode letter, and underscore. Note also that PCRE2_UCP
+       affects \b, and \B because they are defined in  terms  of  \w  and  \W.
+       Matching these sequences is noticeably slower when PCRE2_UCP is set.
+
+       The  sequences  \h, \H, \v, and \V, in contrast to the other sequences,
+       which match only ASCII characters by default, always match  a  specific
+       list  of  code  points, whether or not PCRE2_UCP is set. The horizontal
+       space characters are:
+
+         U+0009     Horizontal tab (HT)
+         U+0020     Space
+         U+00A0     Non-break space
+         U+1680     Ogham space mark
+         U+180E     Mongolian vowel separator
+         U+2000     En quad
+         U+2001     Em quad
+         U+2002     En space
+         U+2003     Em space
+         U+2004     Three-per-em space
+         U+2005     Four-per-em space
+         U+2006     Six-per-em space
+         U+2007     Figure space
+         U+2008     Punctuation space
+         U+2009     Thin space
+         U+200A     Hair space
+         U+202F     Narrow no-break space
+         U+205F     Medium mathematical space
+         U+3000     Ideographic space
+
+       The vertical space characters are:
+
+         U+000A     Linefeed (LF)
+         U+000B     Vertical tab (VT)
+         U+000C     Form feed (FF)
+         U+000D     Carriage return (CR)
+         U+0085     Next line (NEL)
+         U+2028     Line separator
+         U+2029     Paragraph separator
+
+       In 8-bit, non-UTF-8 mode, only the characters  with  code  points  less
+       than 256 are relevant.
+
+   Newline sequences
+
+       Outside  a  character class, by default, the escape sequence \R matches
+       any Unicode newline sequence. In 8-bit non-UTF-8 mode \R is  equivalent
+       to the following:
+
+         (?>\r\n|\n|\x0b|\f|\r|\x85)
+
+       This  is  an  example  of an "atomic group", details of which are given
+       below.  This particular group matches either the two-character sequence
+       CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
+       U+000A), VT (vertical tab, U+000B), FF (form feed,  U+000C),  CR  (car-
+       riage  return,  U+000D), or NEL (next line, U+0085). Because this is an
+       atomic group, the two-character sequence is treated as  a  single  unit
+       that cannot be split.
+
+       In other modes, two additional characters whose code points are greater
+       than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
+       rator,  U+2029).  Unicode support is not needed for these characters to
+       be recognized.
+
+       It is possible to restrict \R to match only CR, LF, or CRLF (instead of
+       the  complete  set  of  Unicode  line  endings)  by  setting the option
+       PCRE2_BSR_ANYCRLF at compile time. (BSR is an  abbrevation  for  "back-
+       slash R".) This can be made the default when PCRE2 is built; if this is
+       the case, the other behaviour can be requested via  the  PCRE2_BSR_UNI-
+       CODE  option. It is also possible to specify these settings by starting
+       a pattern string with one of the following sequences:
+
+         (*BSR_ANYCRLF)   CR, LF, or CRLF only
+         (*BSR_UNICODE)   any Unicode newline sequence
+
+       These override the default and the options given to the compiling func-
+       tion.  Note that these special settings, which are not Perl-compatible,
+       are recognized only at the very start of a pattern, and that they  must
+       be  in upper case. If more than one of them is present, the last one is
+       used. They can be combined with a change  of  newline  convention;  for
+       example, a pattern can start with:
+
+         (*ANY)(*BSR_ANYCRLF)
+
+       They  can also be combined with the (*UTF) or (*UCP) special sequences.
+       Inside a character class, \R  is  treated  as  an  unrecognized  escape
+       sequence, and causes an error.
+
+   Unicode character properties
+
+       When  PCRE2  is  built  with Unicode support (the default), three addi-
+       tional escape sequences that match characters with specific  properties
+       are  available.  In 8-bit non-UTF-8 mode, these sequences are of course
+       limited to testing characters whose code points are less than 256,  but
+       they do work in this mode.  In 32-bit non-UTF mode, code points greater
+       than 0x10ffff (the Unicode limit) may be  encountered.  These  are  all
+       treated  as being in the Common script and with an unassigned type. The
+       extra escape sequences are:
+
+         \p{xx}   a character with the xx property
+         \P{xx}   a character without the xx property
+         \X       a Unicode extended grapheme cluster
+
+       The property names represented by xx above are limited to  the  Unicode
+       script names, the general category properties, "Any", which matches any
+       character  (including  newline),  and  some  special  PCRE2  properties
+       (described  in the next section).  Other Perl properties such as "InMu-
+       sicalSymbols" are not supported by PCRE2.  Note that \P{Any}  does  not
+       match any characters, so always causes a match failure.
+
+       Sets of Unicode characters are defined as belonging to certain scripts.
+       A character from one of these sets can be matched using a script  name.
+       For example:
+
+         \p{Greek}
+         \P{Han}
+
+       Those  that are not part of an identified script are lumped together as
+       "Common". The current list of scripts is:
+
+       Adlam, Ahom, Anatolian_Hieroglyphs, Arabic,  Armenian,  Avestan,  Bali-
+       nese,  Bamum,  Bassa_Vah,  Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
+       Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Caucasian_Alba-
+       nian,  Chakma,  Cham,  Cherokee,  Common,  Coptic,  Cuneiform, Cypriot,
+       Cyrillic, Deseret, Devanagari, Dogra,  Duployan,  Egyptian_Hieroglyphs,
+       Elbasan,   Ethiopic,  Georgian,  Glagolitic,  Gothic,  Grantha,  Greek,
+       Gujarati,  Gunjala_Gondi,  Gurmukhi,  Han,   Hangul,   Hanifi_Rohingya,
+       Hanunoo,   Hatran,   Hebrew,   Hiragana,  Imperial_Aramaic,  Inherited,
+       Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese,  Kaithi,  Kan-
+       nada,  Katakana,  Kayah_Li,  Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
+       Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian,  Lydian,  Maha-
+       jani,  Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi,
+       Medefaidrin,     Meetei_Mayek,     Mende_Kikakui,     Meroitic_Cursive,
+       Meroitic_Hieroglyphs,  Miao,  Modi,  Mongolian,  Mro, Multani, Myanmar,
+       Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki,  Old_Hungar-
+       ian,  Old_Italic,  Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog-
+       dian,   Old_South_Arabian,   Old_Turkic,   Oriya,    Osage,    Osmanya,
+       Pahawh_Hmong,    Palmyrene,    Pau_Cin_Hau,    Phags_Pa,    Phoenician,
+       Psalter_Pahlavi, Rejang, Runic, Samaritan,  Saurashtra,  Sharada,  Sha-
+       vian,  Siddham,  SignWriting,  Sinhala, Sogdian, Sora_Sompeng, Soyombo,
+       Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa,  Tai_Le,  Tai_Tham,
+       Tai_Viet,  Takri,  Tamil,  Tangut, Telugu, Thaana, Thai, Tibetan, Tifi-
+       nagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi, Zanabazar_Square.
+
+       Each character has exactly one Unicode general category property, spec-
+       ified  by a two-letter abbreviation. For compatibility with Perl, nega-
+       tion can be specified by including a  circumflex  between  the  opening
+       brace  and  the  property  name.  For  example,  \p{^Lu} is the same as
+       \P{Lu}.
+
+       If only one letter is specified with \p or \P, it includes all the gen-
+       eral  category properties that start with that letter. In this case, in
+       the absence of negation, the curly brackets in the escape sequence  are
+       optional; these two examples have the same effect:
+
+         \p{L}
+         \pL
+
+       The following general category property codes are supported:
+
+         C     Other
+         Cc    Control
+         Cf    Format
+         Cn    Unassigned
+         Co    Private use
+         Cs    Surrogate
+
+         L     Letter
+         Ll    Lower case letter
+         Lm    Modifier letter
+         Lo    Other letter
+         Lt    Title case letter
+         Lu    Upper case letter
+
+         M     Mark
+         Mc    Spacing mark
+         Me    Enclosing mark
+         Mn    Non-spacing mark
+
+         N     Number
+         Nd    Decimal number
+         Nl    Letter number
+         No    Other number
+
+         P     Punctuation
+         Pc    Connector punctuation
+         Pd    Dash punctuation
+         Pe    Close punctuation
+         Pf    Final punctuation
+         Pi    Initial punctuation
+         Po    Other punctuation
+         Ps    Open punctuation
+
+         S     Symbol
+         Sc    Currency symbol
+         Sk    Modifier symbol
+         Sm    Mathematical symbol
+         So    Other symbol
+
+         Z     Separator
+         Zl    Line separator
+         Zp    Paragraph separator
+         Zs    Space separator
+
+       The  special property L& is also supported: it matches a character that
+       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
+       classified as a modifier or "other".
+
+       The  Cs  (Surrogate)  property  applies only to characters in the range
+       U+D800 to U+DFFF. Such characters are not valid in Unicode strings  and
+       so  cannot  be  tested  by PCRE2, unless UTF validity checking has been
+       turned off (see the discussion of PCRE2_NO_UTF_CHECK  in  the  pcre2api
+       page). Perl does not support the Cs property.
+
+       The  long  synonyms  for  property  names  that  Perl supports (such as
+       \p{Letter}) are not supported by PCRE2, nor is it permitted  to  prefix
+       any of these properties with "Is".
+
+       No character that is in the Unicode table has the Cn (unassigned) prop-
+       erty.  Instead, this property is assumed for any code point that is not
+       in the Unicode table.
+
+       Specifying  caseless  matching  does not affect these escape sequences.
+       For example, \p{Lu} always matches only upper  case  letters.  This  is
+       different from the behaviour of current versions of Perl.
+
+       Matching  characters by Unicode property is not fast, because PCRE2 has
+       to do a multistage table lookup in order to find  a  character's  prop-
+       erty. That is why the traditional escape sequences such as \d and \w do
+       not use Unicode properties in PCRE2 by default,  though  you  can  make
+       them  do  so by setting the PCRE2_UCP option or by starting the pattern
+       with (*UCP).
+
+   Extended grapheme clusters
+
+       The \X escape matches any number of Unicode  characters  that  form  an
+       "extended grapheme cluster", and treats the sequence as an atomic group
+       (see below).  Unicode supports various kinds of composite character  by
+       giving  each  character  a grapheme breaking property, and having rules
+       that use these properties to define the boundaries of extended grapheme
+       clusters.  The rules are defined in Unicode Standard Annex 29, "Unicode
+       Text Segmentation". Unicode 11.0.0 abandoned the use of  some  previous
+       properties  that had been used for emojis.  Instead it introduced vari-
+       ous emoji-specific properties. PCRE2  uses  only  the  Extended  Picto-
+       graphic property.
+
+       \X  always  matches  at least one character. Then it decides whether to
+       add additional characters according to the following rules for ending a
+       cluster:
+
+       1. End at the end of the subject string.
+
+       2.  Do not end between CR and LF; otherwise end after any control char-
+       acter.
+
+       3. Do not break Hangul (a Korean  script)  syllable  sequences.  Hangul
+       characters  are of five types: L, V, T, LV, and LVT. An L character may
+       be followed by an L, V, LV, or LVT character; an LV or V character  may
+       be followed by a V or T character; an LVT or T character may be follwed
+       only by a T character.
+
+       4. Do not end before extending  characters  or  spacing  marks  or  the
+       "zero-width  joiner"  character.  Characters  with  the "mark" property
+       always have the "extend" grapheme breaking property.
+
+       5. Do not end after prepend characters.
+
+       6. Do not break within emoji modifier sequences or emoji zwj sequences.
+       That is, do not break between characters with the Extended_Pictographic
+       property.  Extend and ZWJ characters are allowed  between  the  charac-
+       ters.
+
+       7.  Do  not  break  within  emoji flag sequences. That is, do not break
+       between regional indicator (RI) characters if there are an  odd  number
+       of RI characters before the break point.
+
+       8. Otherwise, end the cluster.
+
+   PCRE2's additional properties
+
+       As  well as the standard Unicode properties described above, PCRE2 sup-
+       ports four more that make it possible  to  convert  traditional  escape
+       sequences such as \w and \s to use Unicode properties. PCRE2 uses these
+       non-standard, non-Perl properties internally  when  PCRE2_UCP  is  set.
+       However, they may also be used explicitly. These properties are:
+
+         Xan   Any alphanumeric character
+         Xps   Any POSIX space character
+         Xsp   Any Perl space character
+         Xwd   Any Perl "word" character
+
+       Xan  matches  characters that have either the L (letter) or the N (num-
+       ber) property. Xps matches the characters tab, linefeed, vertical  tab,
+       form  feed,  or carriage return, and any other character that has the Z
+       (separator) property.  Xsp is the same as Xps;  in  PCRE1  it  used  to
+       exclude  vertical  tab,  for  Perl compatibility, but Perl changed. Xwd
+       matches the same characters as Xan, plus underscore.
+
+       There is another non-standard property, Xuc, which matches any  charac-
+       ter  that  can  be represented by a Universal Character Name in C++ and
+       other programming languages. These are the characters $,  @,  `  (grave
+       accent),  and  all  characters with Unicode code points greater than or
+       equal to U+00A0, except for the surrogates U+D800 to U+DFFF. Note  that
+       most  base  (ASCII) characters are excluded. (Universal Character Names
+       are of the form \uHHHH or \UHHHHHHHH where H is  a  hexadecimal  digit.
+       Note that the Xuc property does not match these sequences but the char-
+       acters that they represent.)
+
+   Resetting the match start
+
+       In normal use, the escape sequence \K  causes  any  previously  matched
+       characters  not  to  be  included in the final matched sequence that is
+       returned. For example, the pattern:
+
+         foo\Kbar
+
+       matches "foobar", but reports that it has matched "bar".  \K  does  not
+       interact with anchoring in any way. The pattern:
+
+         ^foo\Kbar
+
+       matches  only  when  the  subject  begins with "foobar" (in single line
+       mode), though it again reports the matched string as "bar".  This  fea-
+       ture  is similar to a lookbehind assertion (described below).  However,
+       in this case, the part of the subject before the real  match  does  not
+       have  to be of fixed length, as lookbehind assertions do. The use of \K
+       does not interfere with the setting of captured substrings.  For  exam-
+       ple, when the pattern
+
+         (foo)\Kbar
+
+       matches "foobar", the first substring is still set to "foo".
+
+       Perl  documents  that  the  use  of  \K  within assertions is "not well
+       defined". In PCRE2, \K is acted upon when  it  occurs  inside  positive
+       assertions,  but  is  ignored  in negative assertions. Note that when a
+       pattern such as (?=ab\K) matches, the reported start of the  match  can
+       be  greater  than the end of the match. Using \K in a lookbehind asser-
+       tion at the start of a pattern can also lead to odd effects. For  exam-
+       ple, consider this pattern:
+
+         (?<=\Kfoo)bar
+
+       If  the  subject  is  "foobar", a call to pcre2_match() with a starting
+       offset of 3 succeeds and reports the matching string as "foobar",  that
+       is,  the  start  of  the reported match is earlier than where the match
+       started.
+
+   Simple assertions
+
+       The final use of backslash is for certain simple assertions. An  asser-
+       tion  specifies a condition that has to be met at a particular point in
+       a match, without consuming any characters from the subject string.  The
+       use  of subpatterns for more complicated assertions is described below.
+       The backslashed assertions are:
+
+         \b     matches at a word boundary
+         \B     matches when not at a word boundary
+         \A     matches at the start of the subject
+         \Z     matches at the end of the subject
+                 also matches before a newline at the end of the subject
+         \z     matches only at the end of the subject
+         \G     matches at the first matching position in the subject
+
+       Inside a character class, \b has a different meaning;  it  matches  the
+       backspace  character.  If  any  other  of these assertions appears in a
+       character class, an "invalid escape sequence" error is generated.
+
+       A word boundary is a position in the subject string where  the  current
+       character  and  the previous character do not both match \w or \W (i.e.
+       one matches \w and the other matches \W), or the start or  end  of  the
+       string  if  the  first or last character matches \w, respectively. In a
+       UTF mode, the meanings of \w and \W  can  be  changed  by  setting  the
+       PCRE2_UCP option. When this is done, it also affects \b and \B. Neither
+       PCRE2 nor Perl has a separate "start of word" or "end of word"  metase-
+       quence.  However,  whatever follows \b normally determines which it is.
+       For example, the fragment \ba matches "a" at the start of a word.
+
+       The \A, \Z, and \z assertions differ from  the  traditional  circumflex
+       and dollar (described in the next section) in that they only ever match
+       at the very start and end of the subject string, whatever  options  are
+       set.  Thus,  they are independent of multiline mode. These three asser-
+       tions are not affected by the  PCRE2_NOTBOL  or  PCRE2_NOTEOL  options,
+       which  affect only the behaviour of the circumflex and dollar metachar-
+       acters. However, if the startoffset argument of pcre2_match()  is  non-
+       zero,  indicating  that  matching is to start at a point other than the
+       beginning of the subject, \A can never match.  The  difference  between
+       \Z  and \z is that \Z matches before a newline at the end of the string
+       as well as at the very end, whereas \z matches only at the end.
+
+       The \G assertion is true only when the current matching position is  at
+       the  start point of the matching process, as specified by the startoff-
+       set argument of pcre2_match(). It differs from \A  when  the  value  of
+       startoffset  is  non-zero. By calling pcre2_match() multiple times with
+       appropriate arguments, you can mimic Perl's /g option,  and  it  is  in
+       this kind of implementation where \G can be useful.
+
+       Note,  however,  that  PCRE2's  implementation of \G, being true at the
+       starting character of the matching process, is  subtly  different  from
+       Perl's,  which  defines it as true at the end of the previous match. In
+       Perl, these can be different when the  previously  matched  string  was
+       empty. Because PCRE2 does just one match at a time, it cannot reproduce
+       this behaviour.
+
+       If all the alternatives of a pattern begin with \G, the  expression  is
+       anchored to the starting match position, and the "anchored" flag is set
+       in the compiled regular expression.
+
+
+CIRCUMFLEX AND DOLLAR
+
+       The circumflex and dollar  metacharacters  are  zero-width  assertions.
+       That  is,  they test for a particular condition being true without con-
+       suming any characters from the subject string. These two metacharacters
+       are  concerned  with matching the starts and ends of lines. If the new-
+       line convention is set so that only the two-character sequence CRLF  is
+       recognized  as  a newline, isolated CR and LF characters are treated as
+       ordinary data characters, and are not recognized as newlines.
+
+       Outside a character class, in the default matching mode, the circumflex
+       character  is  an  assertion  that is true only if the current matching
+       point is at the start of the subject string. If the  startoffset  argu-
+       ment  of  pcre2_match() is non-zero, or if PCRE2_NOTBOL is set, circum-
+       flex can never match if the PCRE2_MULTILINE option is unset.  Inside  a
+       character  class,  circumflex  has  an  entirely different meaning (see
+       below).
+
+       Circumflex need not be the first character of the pattern if  a  number
+       of  alternatives are involved, but it should be the first thing in each
+       alternative in which it appears if the pattern is ever  to  match  that
+       branch.  If all possible alternatives start with a circumflex, that is,
+       if the pattern is constrained to match only at the start  of  the  sub-
+       ject,  it  is  said  to be an "anchored" pattern. (There are also other
+       constructs that can cause a pattern to be anchored.)
+
+       The dollar character is an assertion that is true only if  the  current
+       matching  point  is  at  the  end of the subject string, or immediately
+       before a newline  at  the  end  of  the  string  (by  default),  unless
+       PCRE2_NOTEOL is set. Note, however, that it does not actually match the
+       newline. Dollar need not be the last character of the pattern if a num-
+       ber of alternatives are involved, but it should be the last item in any
+       branch in which it appears. Dollar has no special meaning in a  charac-
+       ter class.
+
+       The  meaning  of  dollar  can be changed so that it matches only at the
+       very end of the string, by setting the PCRE2_DOLLAR_ENDONLY  option  at
+       compile time. This does not affect the \Z assertion.
+
+       The meanings of the circumflex and dollar metacharacters are changed if
+       the PCRE2_MULTILINE option is set. When this  is  the  case,  a  dollar
+       character  matches before any newlines in the string, as well as at the
+       very end, and a circumflex matches immediately after internal  newlines
+       as  well as at the start of the subject string. It does not match after
+       a newline that ends the string, for compatibility with  Perl.  However,
+       this can be changed by setting the PCRE2_ALT_CIRCUMFLEX option.
+
+       For  example, the pattern /^abc$/ matches the subject string "def\nabc"
+       (where \n represents a newline) in multiline mode, but  not  otherwise.
+       Consequently,  patterns  that  are anchored in single line mode because
+       all branches start with ^ are not anchored in  multiline  mode,  and  a
+       match  for  circumflex  is  possible  when  the startoffset argument of
+       pcre2_match() is non-zero. The PCRE2_DOLLAR_ENDONLY option  is  ignored
+       if PCRE2_MULTILINE is set.
+
+       When  the  newline  convention (see "Newline conventions" below) recog-
+       nizes the two-character sequence CRLF as a newline, this is  preferred,
+       even  if  the  single  characters CR and LF are also recognized as new-
+       lines. For example, if the newline convention  is  "any",  a  multiline
+       mode  circumflex matches before "xyz" in the string "abc\r\nxyz" rather
+       than after CR, even though CR on its own is a valid newline.  (It  also
+       matches at the very start of the string, of course.)
+
+       Note  that  the sequences \A, \Z, and \z can be used to match the start
+       and end of the subject in both modes, and if all branches of a  pattern
+       start  with \A it is always anchored, whether or not PCRE2_MULTILINE is
+       set.
+
+
+FULL STOP (PERIOD, DOT) AND \N
+
+       Outside a character class, a dot in the pattern matches any one charac-
+       ter  in  the subject string except (by default) a character that signi-
+       fies the end of a line.
+
+       When a line ending is defined as a single character, dot never  matches
+       that  character; when the two-character sequence CRLF is used, dot does
+       not match CR if it is immediately followed  by  LF,  but  otherwise  it
+       matches  all characters (including isolated CRs and LFs). When any Uni-
+       code line endings are being recognized, dot does not match CR or LF  or
+       any of the other line ending characters.
+
+       The  behaviour  of  dot  with regard to newlines can be changed. If the
+       PCRE2_DOTALL option is set, a dot matches any  one  character,  without
+       exception.   If  the two-character sequence CRLF is present in the sub-
+       ject string, it takes two dots to match it.
+
+       The handling of dot is entirely independent of the handling of  circum-
+       flex  and  dollar,  the  only relationship being that they both involve
+       newlines. Dot has no special meaning in a character class.
+
+       The escape sequence \N when not followed by an  opening  brace  behaves
+       like  a dot, except that it is not affected by the PCRE2_DOTALL option.
+       In other words, it matches any character except one that signifies  the
+       end of a line.
+
+       When \N is followed by an opening brace it has a different meaning. See
+       the section entitled "Non-printing characters" above for details.  Perl
+       also  uses  \N{name}  to specify characters by Unicode name; PCRE2 does
+       not support this.
+
+
+MATCHING A SINGLE CODE UNIT
+
+       Outside a character class, the escape sequence \C matches any one  code
+       unit,  whether or not a UTF mode is set. In the 8-bit library, one code
+       unit is one byte; in the 16-bit library it is a  16-bit  unit;  in  the
+       32-bit  library  it  is  a 32-bit unit. Unlike a dot, \C always matches
+       line-ending characters. The feature is provided in  Perl  in  order  to
+       match individual bytes in UTF-8 mode, but it is unclear how it can use-
+       fully be used.
+
+       Because \C breaks up characters into individual  code  units,  matching
+       one  unit  with  \C  in UTF-8 or UTF-16 mode means that the rest of the
+       string may start with a malformed UTF  character.  This  has  undefined
+       results, because PCRE2 assumes that it is matching character by charac-
+       ter in a valid UTF string (by default it checks  the  subject  string's
+       validity  at  the  start  of  processing  unless the PCRE2_NO_UTF_CHECK
+       option is used).
+
+       An  application  can  lock  out  the  use  of   \C   by   setting   the
+       PCRE2_NEVER_BACKSLASH_C  option  when  compiling  a pattern. It is also
+       possible to build PCRE2 with the use of \C permanently disabled.
+
+       PCRE2 does not allow \C to appear in lookbehind  assertions  (described
+       below)  in UTF-8 or UTF-16 modes, because this would make it impossible
+       to calculate the length of  the  lookbehind.  Neither  the  alternative
+       matching function pcre2_dfa_match() nor the JIT optimizer support \C in
+       these UTF modes.  The former gives a match-time error; the latter fails
+       to optimize and so the match is always run using the interpreter.
+
+       In  the  32-bit  library,  however,  \C  is  always supported (when not
+       explicitly locked out) because it always matches a  single  code  unit,
+       whether or not UTF-32 is specified.
+
+       In general, the \C escape sequence is best avoided. However, one way of
+       using it that avoids the problem of malformed UTF-8 or  UTF-16  charac-
+       ters  is  to use a lookahead to check the length of the next character,
+       as in this pattern, which could be used with  a  UTF-8  string  (ignore
+       white space and line breaks):
+
+         (?| (?=[\x00-\x7f])(\C) |
+             (?=[\x80-\x{7ff}])(\C)(\C) |
+             (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
+             (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
+
+       In  this  example,  a  group  that starts with (?| resets the capturing
+       parentheses numbers in each alternative (see "Duplicate Subpattern Num-
+       bers" below). The assertions at the start of each branch check the next
+       UTF-8 character for values whose encoding uses 1, 2,  3,  or  4  bytes,
+       respectively. The character's individual bytes are then captured by the
+       appropriate number of \C groups.
+
+
+SQUARE BRACKETS AND CHARACTER CLASSES
+
+       An opening square bracket introduces a character class, terminated by a
+       closing square bracket. A closing square bracket on its own is not spe-
+       cial by default.  If a closing square bracket is required as  a  member
+       of the class, it should be the first data character in the class (after
+       an initial circumflex, if present) or escaped with  a  backslash.  This
+       means  that,  by default, an empty class cannot be defined. However, if
+       the PCRE2_ALLOW_EMPTY_CLASS option is set, a closing square bracket  at
+       the start does end the (empty) class.
+
+       A  character class matches a single character in the subject. A matched
+       character must be in the set of characters defined by the class, unless
+       the  first  character in the class definition is a circumflex, in which
+       case the subject character must not be in the set defined by the class.
+       If  a  circumflex is actually required as a member of the class, ensure
+       it is not the first character, or escape it with a backslash.
+
+       For example, the character class [aeiou] matches any lower case  vowel,
+       while  [^aeiou]  matches  any character that is not a lower case vowel.
+       Note that a circumflex is just a convenient notation for specifying the
+       characters  that  are in the class by enumerating those that are not. A
+       class that starts with a circumflex is not an assertion; it still  con-
+       sumes  a  character  from the subject string, and therefore it fails if
+       the current pointer is at the end of the string.
+
+       Characters in a class may be specified by their code points  using  \o,
+       \x,  or \N{U+hh..} in the usual way. When caseless matching is set, any
+       letters in a class represent both their upper case and lower case  ver-
+       sions,  so  for example, a caseless [aeiou] matches "A" as well as "a",
+       and a caseless [^aeiou] does not match "A", whereas a  caseful  version
+       would.
+
+       Characters  that  might  indicate  line breaks are never treated in any
+       special way  when  matching  character  classes,  whatever  line-ending
+       sequence  is  in  use,  and  whatever  setting  of the PCRE2_DOTALL and
+       PCRE2_MULTILINE options is used. A class such as  [^a]  always  matches
+       one of these characters.
+
+       The generic character type escape sequences \d, \D, \h, \H, \p, \P, \s,
+       \S, \v, \V, \w, and \W may appear in a character  class,  and  add  the
+       characters  that  they  match  to  the  class.  For example, [\dABCDEF]
+       matches any hexadecimal digit.  In  UTF  modes,  the  PCRE2_UCP  option
+       affects  the meanings of \d, \s, \w and their upper case partners, just
+       as it does when they appear outside a character class, as described  in
+       the  section  entitled  "Generic  character  types"  above.  The escape
+       sequence \b has a  different  meaning  inside  a  character  class;  it
+       matches  the  backspace character. The sequences \B, \R, and \X are not
+       special inside a character class. Like any  other  unrecognized  escape
+       sequences,  they  cause an error. The same is true for \N when not fol-
+       lowed by an opening brace.
+
+       The minus (hyphen) character can be used to specify a range of  charac-
+       ters  in  a  character  class.  For  example,  [d-m] matches any letter
+       between d and m, inclusive. If a  minus  character  is  required  in  a
+       class,  it  must  be  escaped  with a backslash or appear in a position
+       where it cannot be interpreted as indicating a range, typically as  the
+       first or last character in the class, or immediately after a range. For
+       example, [b-d-z] matches letters in the range b to d, a hyphen  charac-
+       ter, or z.
+
+       Perl treats a hyphen as a literal if it appears before or after a POSIX
+       class (see below) or before or after a character type escape such as as
+       \d  or  \H.   However,  unless  the hyphen is the last character in the
+       class, Perl outputs a warning in its warning  mode,  as  this  is  most
+       likely  a user error. As PCRE2 has no facility for warning, an error is
+       given in these cases.
+
+       It is not possible to have the literal character "]" as the end charac-
+       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+       two characters ("W" and "-") followed by a literal string "46]", so  it
+       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+       preted  as a class containing a range followed by two other characters.
+       The octal or hexadecimal representation of "]" can also be used to  end
+       a range.
+
+       Ranges normally include all code points between the start and end char-
+       acters, inclusive. They can also be  used  for  code  points  specified
+       numerically, for example [\000-\037]. Ranges can include any characters
+       that are valid for the current mode. In any  UTF  mode,  the  so-called
+       "surrogate"  characters (those whose code points lie between 0xd800 and
+       0xdfff inclusive) may not  be  specified  explicitly  by  default  (the
+       PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  option  disables this check). How-
+       ever, ranges such as [\x{d7ff}-\x{e000}], which include the surrogates,
+       are always permitted.
+
+       There  is  a  special  case in EBCDIC environments for ranges whose end
+       points are both specified as literal letters in the same case. For com-
+       patibility  with Perl, EBCDIC code points within the range that are not
+       letters are omitted. For example, [h-k] matches only  four  characters,
+       even though the codes for h and k are 0x88 and 0x92, a range of 11 code
+       points. However, if the range is specified  numerically,  for  example,
+       [\x88-\x92] or [h-\x92], all code points are included.
+
+       If a range that includes letters is used when caseless matching is set,
+       it matches the letters in either case. For example, [W-c] is equivalent
+       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in a non-UTF mode, if
+       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
+       accented E characters in both cases.
+
+       A  circumflex  can  conveniently  be used with the upper case character
+       types to specify a more restricted set of characters than the  matching
+       lower  case  type.  For example, the class [^\W_] matches any letter or
+       digit, but not underscore, whereas [\w] includes underscore. A positive
+       character class should be read as "something OR something OR ..." and a
+       negative class as "NOT something AND NOT something AND NOT ...".
+
+       The only metacharacters that are recognized in  character  classes  are
+       backslash,  hyphen  (only  where  it can be interpreted as specifying a
+       range), circumflex (only at the start), opening  square  bracket  (only
+       when  it can be interpreted as introducing a POSIX class name, or for a
+       special compatibility feature - see the next  two  sections),  and  the
+       terminating  closing  square  bracket.  However,  escaping  other  non-
+       alphanumeric characters does no harm.
+
+
+POSIX CHARACTER CLASSES
+
+       Perl supports the POSIX notation for character classes. This uses names
+       enclosed  by [: and :] within the enclosing square brackets. PCRE2 also
+       supports this notation. For example,
+
+         [01[:alpha:]%]
+
+       matches "0", "1", any alphabetic character, or "%". The supported class
+       names are:
+
+         alnum    letters and digits
+         alpha    letters
+         ascii    character codes 0 - 127
+         blank    space or tab only
+         cntrl    control characters
+         digit    decimal digits (same as \d)
+         graph    printing characters, excluding space
+         lower    lower case letters
+         print    printing characters, including space
+         punct    printing characters, excluding letters and digits and space
+         space    white space (the same as \s from PCRE2 8.34)
+         upper    upper case letters
+         word     "word" characters (same as \w)
+         xdigit   hexadecimal digits
+
+       The  default  "space" characters are HT (9), LF (10), VT (11), FF (12),
+       CR (13), and space (32). If locale-specific matching is  taking  place,
+       the  list  of  space characters may be different; there may be fewer or
+       more of them. "Space" and \s match the same set of characters.
+
+       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
+       from  Perl  5.8. Another Perl extension is negation, which is indicated
+       by a ^ character after the colon. For example,
+
+         [12[:^digit:]]
+
+       matches "1", "2", or any non-digit. PCRE2 (and Perl) also recognize the
+       POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
+       these are not supported, and an error is given if they are encountered.
+
+       By default, characters with values greater than 127 do not match any of
+       the POSIX character classes, although this may be different for charac-
+       ters in the range 128-255 when locale-specific matching  is  happening.
+       However,  if the PCRE2_UCP option is passed to pcre2_compile(), some of
+       the classes are changed so that Unicode character properties are  used.
+       This  is  achieved  by  replacing  certain  POSIX  classes  with  other
+       sequences, as follows:
+
+         [:alnum:]  becomes  \p{Xan}
+         [:alpha:]  becomes  \p{L}
+         [:blank:]  becomes  \h
+         [:cntrl:]  becomes  \p{Cc}
+         [:digit:]  becomes  \p{Nd}
+         [:lower:]  becomes  \p{Ll}
+         [:space:]  becomes  \p{Xps}
+         [:upper:]  becomes  \p{Lu}
+         [:word:]   becomes  \p{Xwd}
+
+       Negated versions, such as [:^alpha:] use \P instead of \p. Three  other
+       POSIX classes are handled specially in UCP mode:
+
+       [:graph:] This  matches  characters that have glyphs that mark the page
+                 when printed. In Unicode property terms, it matches all char-
+                 acters with the L, M, N, P, S, or Cf properties, except for:
+
+                   U+061C           Arabic Letter Mark
+                   U+180E           Mongolian Vowel Separator
+                   U+2066 - U+2069  Various "isolate"s
+
+
+       [:print:] This  matches  the  same  characters  as [:graph:] plus space
+                 characters that are not controls, that  is,  characters  with
+                 the Zs property.
+
+       [:punct:] This matches all characters that have the Unicode P (punctua-
+                 tion) property, plus those characters with code  points  less
+                 than 256 that have the S (Symbol) property.
+
+       The  other  POSIX classes are unchanged, and match only characters with
+       code points less than 256.
+
+
+COMPATIBILITY FEATURE FOR WORD BOUNDARIES
+
+       In the POSIX.2 compliant library that was included in 4.4BSD Unix,  the
+       ugly  syntax  [[:<:]]  and [[:>:]] is used for matching "start of word"
+       and "end of word". PCRE2 treats these items as follows:
+
+         [[:<:]]  is converted to  \b(?=\w)
+         [[:>:]]  is converted to  \b(?<=\w)
+
+       Only these exact character sequences are recognized. A sequence such as
+       [a[:<:]b]  provokes  error  for  an unrecognized POSIX class name. This
+       support is not compatible with Perl. It is provided to help  migrations
+       from other environments, and is best not used in any new patterns. Note
+       that \b matches at the start and the end of a word (see "Simple  asser-
+       tions"  above),  and in a Perl-style pattern the preceding or following
+       character normally shows which is wanted,  without  the  need  for  the
+       assertions  that  are used above in order to give exactly the POSIX be-
+       haviour.
+
+
+VERTICAL BAR
+
+       Vertical bar characters are used to separate alternative patterns.  For
+       example, the pattern
+
+         gilbert|sullivan
+
+       matches  either "gilbert" or "sullivan". Any number of alternatives may
+       appear, and an empty  alternative  is  permitted  (matching  the  empty
+       string). The matching process tries each alternative in turn, from left
+       to right, and the first one that succeeds is used. If the  alternatives
+       are  within a subpattern (defined below), "succeeds" means matching the
+       rest of the main pattern as well as the alternative in the subpattern.
+
+
+INTERNAL OPTION SETTING
+
+       The settings  of  the  PCRE2_CASELESS,  PCRE2_MULTILINE,  PCRE2_DOTALL,
+       PCRE2_EXTENDED,  PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE options
+       can be changed from  within  the  pattern  by  a  sequence  of  letters
+       enclosed  between "(?"  and ")". These options are Perl-compatible, and
+       are described in detail in the pcre2api documentation. The option  let-
+       ters are:
+
+         i  for PCRE2_CASELESS
+         m  for PCRE2_MULTILINE
+         n  for PCRE2_NO_AUTO_CAPTURE
+         s  for PCRE2_DOTALL
+         x  for PCRE2_EXTENDED
+         xx for PCRE2_EXTENDED_MORE
+
+       For example, (?im) sets caseless, multiline matching. It is also possi-
+       ble to unset these options by preceding the  relevant  letters  with  a
+       hyphen, for example (?-im). The two "extended" options are not indepen-
+       dent; unsetting either one cancels the effects of both of them.
+
+       A  combined  setting  and  unsetting  such  as  (?im-sx),  which   sets
+       PCRE2_CASELESS  and  PCRE2_MULTILINE  while  unsetting PCRE2_DOTALL and
+       PCRE2_EXTENDED, is also permitted. Only one hyphen may  appear  in  the
+       options  string.  If a letter appears both before and after the hyphen,
+       the option is unset. An empty options setting "(?)" is  allowed.  Need-
+       less to say, it has no effect.
+
+       If  the  first character following (? is a circumflex, it causes all of
+       the above options to be unset. Thus, (?^) is equivalent  to  (?-imnsx).
+       Letters  may  follow  the  circumflex  to  cause some options to be re-
+       instated, but a hyphen may not appear.
+
+       The PCRE2-specific options PCRE2_DUPNAMES  and  PCRE2_UNGREEDY  can  be
+       changed  in  the  same  way as the Perl-compatible options by using the
+       characters J and U respectively. However, these are not unset by (?^).
+
+       When one of these option changes occurs at  top  level  (that  is,  not
+       inside  subpattern parentheses), the change applies to the remainder of
+       the pattern that follows. An option change  within  a  subpattern  (see
+       below  for  a description of subpatterns) affects only that part of the
+       subpattern that follows it, so
+
+         (a(?i)b)c
+
+       matches abc and aBc and no other strings  (assuming  PCRE2_CASELESS  is
+       not  used).   By this means, options can be made to have different set-
+       tings in different parts of the pattern. Any changes made in one alter-
+       native do carry on into subsequent branches within the same subpattern.
+       For example,
+
+         (a(?i)b|c)
+
+       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
+       first  branch  is  abandoned before the option setting. This is because
+       the effects of option settings happen at compile time. There  would  be
+       some very weird behaviour otherwise.
+
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern (see the next section), the  option
+       letters may appear between the "?" and the ":". Thus the two patterns
+
+         (?i:saturday|sunday)
+         (?:(?i)saturday|sunday)
+
+       match exactly the same set of strings.
+
+       Note:  There  are  other  PCRE2-specific options that can be set by the
+       application when the compiling function is called. The pattern can con-
+       tain  special  leading  sequences  such as (*CRLF) to override what the
+       application has set or what has been defaulted. Details  are  given  in
+       the  section  entitled  "Newline  sequences"  above. There are also the
+       (*UTF) and (*UCP) leading sequences that can be used  to  set  UTF  and
+       Unicode  property  modes;  they are equivalent to setting the PCRE2_UTF
+       and PCRE2_UCP options, respectively. However, the application  can  set
+       the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP options, which lock out the use
+       of the (*UTF) and (*UCP) sequences.
+
+
+SUBPATTERNS
+
+       Subpatterns are delimited by parentheses (round brackets), which can be
+       nested.  Turning part of a pattern into a subpattern does two things:
+
+       1. It localizes a set of alternatives. For example, the pattern
+
+         cat(aract|erpillar|)
+
+       matches  "cataract",  "caterpillar", or "cat". Without the parentheses,
+       it would match "cataract", "erpillar" or an empty string.
+
+       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
+       that, when the whole pattern matches, the portion of the subject string
+       that matched the subpattern is passed back to  the  caller,  separately
+       from  the portion that matched the whole pattern. (This applies only to
+       the traditional matching function; the DFA matching function  does  not
+       support capturing.)
+
+       Opening parentheses are counted from left to right (starting from 1) to
+       obtain numbers for the  capturing  subpatterns.  For  example,  if  the
+       string "the red king" is matched against the pattern
+
+         the ((red|white) (king|queen))
+
+       the captured substrings are "red king", "red", and "king", and are num-
+       bered 1, 2, and 3, respectively.
+
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
+       matched against the pattern
+
+         the ((?:red|white) (king|queen))
+
+       the captured substrings are "white queen" and "queen", and are numbered
+       1 and 2. The maximum number of capturing subpatterns is 65535.
+
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
+       between the "?" and the ":". Thus the two patterns
+
+         (?i:saturday|sunday)
+         (?:(?i)saturday|sunday)
+
+       match exactly the same set of strings. Because alternative branches are
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
+       "Saturday".
+
+
+DUPLICATE SUBPATTERN NUMBERS
+
+       Perl 5.10 introduced a feature whereby each alternative in a subpattern
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
+       consider this pattern:
+
+         (?|(Sat)ur|(Sun))day
+
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
+       not all, of one of a number of alternatives. Inside a (?| group, paren-
+       theses  are  numbered as usual, but the number is reset at the start of
+       each branch. The numbers of any capturing parentheses that  follow  the
+       subpattern  start after the highest number used in any branch. The fol-
+       lowing example is taken from the Perl documentation. The numbers under-
+       neath show in which buffer the captured content will be stored.
+
+         # before  ---------------branch-reset----------- after
+         / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
+         # 1            2         2  3        2     3     4
+
+       A  backreference  to  a  numbered subpattern uses the most recent value
+       that is set for that number by any subpattern.  The  following  pattern
+       matches "abcabc" or "defdef":
+
+         /(?|(abc)|(def))\1/
+
+       In  contrast,  a subroutine call to a numbered subpattern always refers
+       to the first one in the pattern with the given  number.  The  following
+       pattern matches "abcabc" or "defabc":
+
+         /(?|(abc)|(def))(?1)/
+
+       A relative reference such as (?-1) is no different: it is just a conve-
+       nient way of computing an absolute group number.
+
+       If a condition test for a subpattern's having matched refers to a  non-
+       unique  number, the test is true if any of the subpatterns of that num-
+       ber have matched.
+
+       An alternative approach to using this "branch reset" feature is to  use
+       duplicate named subpatterns, as described in the next section.
+
+
+NAMED SUBPATTERNS
+
+       Identifying  capturing  parentheses  by number is simple, but it can be
+       very hard to keep track of the numbers in  complicated  patterns.  Fur-
+       thermore, if an expression is modified, the numbers may change. To help
+       with this difficulty, PCRE2 supports the naming  of  capturing  subpat-
+       terns.  This  feature  was not added to Perl until release 5.10. Python
+       had the feature earlier, and PCRE1 introduced it at release 4.0,  using
+       the Python syntax. PCRE2 supports both the Perl and the Python syntax.
+
+       In  PCRE2,  a  capturing  subpattern can be named in one of three ways:
+       (?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python.
+       Names  consist of up to 32 alphanumeric characters and underscores, but
+       must start with a non-digit. References to capturing  parentheses  from
+       other parts of the pattern, such as backreferences, recursion, and con-
+       ditions, can all be made by name as well as by number.
+
+       Named capturing parentheses are allocated numbers  as  well  as  names,
+       exactly  as if the names were not present. In both PCRE2 and Perl, cap-
+       turing subpatterns are primarily identified by numbers; any  names  are
+       just  aliases  for these numbers. The PCRE2 API provides function calls
+       for extracting the complete name-to-number  translation  table  from  a
+       compiled  pattern, as well as convenience functions for extracting cap-
+       tured substrings by name.
+
+       Warning: When  more  than  one  subpattern  has  the  same  number,  as
+       described  in the previous section, a name given to one of them applies
+       to all of them.  Perl allows identically numbered subpatterns  to  have
+       different  names.  Consider this pattern, where there are two capturing
+       subpatterns, both numbered 1:
+
+         (?|(?<AA>aa)|(?<BB>bb))
+
+       Perl allows this, with both names AA and BB  as  aliases  of  group  1.
+       Thus, after a successful match, both names yield the same value (either
+       "aa" or "bb").
+
+       In an attempt to reduce confusion, PCRE2 does not allow the same  group
+       number to be associated with more than one name. The example above pro-
+       vokes a compile-time error. However, there is still  scope  for  confu-
+       sion. Consider this pattern:
+
+         (?|(?<AA>aa)|(bb))
+
+       Although  the  second  subpattern number 1 is not explicitly named, the
+       name AA is still an alias for subpattern 1. Whether the pattern matches
+       "aa"  or  "bb",  a  reference  by  name  to group AA yields the matched
+       string.
+
+       By default, a name must be unique within a pattern, except that  dupli-
+       cate  names  are  permitted  for  subpatterns with the same number, for
+       example:
+
+         (?|(?<AA>aa)|(?<AA>bb))
+
+       The duplicate name constraint can be disabled by setting the PCRE2_DUP-
+       NAMES option at compile time, or by the use of (?J) within the pattern.
+       Duplicate names can be useful for patterns where only one  instance  of
+       the  named parentheses can match. Suppose you want to match the name of
+       a weekday, either as a 3-letter abbreviation or as the full  name,  and
+       in  both  cases  you  want  to  extract  the abbreviation. This pattern
+       (ignoring the line breaks) does the job:
+
+         (?<DN>Mon|Fri|Sun)(?:day)?|
+         (?<DN>Tue)(?:sday)?|
+         (?<DN>Wed)(?:nesday)?|
+         (?<DN>Thu)(?:rsday)?|
+         (?<DN>Sat)(?:urday)?
+
+       There are five capturing substrings, but only one is ever set  after  a
+       match.   The  convenience  functions  for  extracting  the data by name
+       returns the substring for the first (and in  this  example,  the  only)
+       subpattern  of  that  name  that  matched. This saves searching to find
+       which numbered subpattern it was. (An alternative way of  solving  this
+       problem is to use a "branch reset" subpattern, as described in the pre-
+       vious section.)
+
+       If you make a backreference to a non-unique named subpattern from else-
+       where  in  the  pattern,  the  subpatterns to which the name refers are
+       checked in the order in which they appear in the overall  pattern.  The
+       first one that is set is used for the reference. For example, this pat-
+       tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":
+
+         (?:(?<n>foo)|(?<n>bar))\k<n>
+
+
+       If you make a subroutine call to a non-unique named subpattern, the one
+       that  corresponds  to  the first occurrence of the name is used. In the
+       absence of duplicate numbers this is the one with the lowest number.
+
+       If you use a named reference in a condition test (see the section about
+       conditions below), either to check whether a subpattern has matched, or
+       to check for recursion, all subpatterns with the same name are  tested.
+       If  the condition is true for any one of them, the overall condition is
+       true. This is the same behaviour as  testing  by  number.  For  further
+       details  of  the  interfaces  for  handling  named subpatterns, see the
+       pcre2api documentation.
+
+
+REPETITION
+
+       Repetition is specified by quantifiers, which can  follow  any  of  the
+       following items:
+
+         a literal data character
+         the dot metacharacter
+         the \C escape sequence
+         the \X escape sequence
+         the \R escape sequence
+         an escape such as \d or \pL that matches a single character
+         a character class
+         a backreference
+         a parenthesized subpattern (including most assertions)
+         a subroutine call to a subpattern (recursive or otherwise)
+
+       The  general repetition quantifier specifies a minimum and maximum num-
+       ber of permitted matches, by giving the two numbers in  curly  brackets
+       (braces),  separated  by  a comma. The numbers must be less than 65536,
+       and the first must be less than or equal to the second. For example:
+
+         z{2,4}
+
+       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
+       special  character.  If  the second number is omitted, but the comma is
+       present, there is no upper limit; if the second number  and  the  comma
+       are  both omitted, the quantifier specifies an exact number of required
+       matches. Thus
+
+         [aeiou]{3,}
+
+       matches at least 3 successive vowels, but may match many more, whereas
+
+         \d{8}
+
+       matches exactly 8 digits. An opening curly bracket that  appears  in  a
+       position  where a quantifier is not allowed, or one that does not match
+       the syntax of a quantifier, is taken as a literal character. For  exam-
+       ple, {,6} is not a quantifier, but a literal string of four characters.
+
+       In UTF modes, quantifiers apply to characters rather than to individual
+       code units. Thus, for example, \x{100}{2} matches two characters,  each
+       of which is represented by a two-byte sequence in a UTF-8 string. Simi-
+       larly, \X{3} matches three Unicode extended grapheme clusters, each  of
+       which  may  be  several  code  units long (and they may be of different
+       lengths).
+
+       The quantifier {0} is permitted, causing the expression to behave as if
+       the previous item and the quantifier were not present. This may be use-
+       ful for subpatterns that are referenced as subroutines  from  elsewhere
+       in the pattern (but see also the section entitled "Defining subpatterns
+       for use by reference only" below). Items other  than  subpatterns  that
+       have a {0} quantifier are omitted from the compiled pattern.
+
+       For  convenience, the three most common quantifiers have single-charac-
+       ter abbreviations:
+
+         *    is equivalent to {0,}
+         +    is equivalent to {1,}
+         ?    is equivalent to {0,1}
+
+       It is possible to construct infinite loops by  following  a  subpattern
+       that can match no characters with a quantifier that has no upper limit,
+       for example:
+
+         (a?)*
+
+       Earlier versions of Perl and PCRE1 used to give  an  error  at  compile
+       time for such patterns. However, because there are cases where this can
+       be useful, such patterns are now accepted, but if any repetition of the
+       subpattern  does in fact match no characters, the loop is forcibly bro-
+       ken.
+
+       By default, the quantifiers are "greedy", that is, they match  as  much
+       as  possible  (up  to  the  maximum number of permitted times), without
+       causing the rest of the pattern to fail. The classic example  of  where
+       this gives problems is in trying to match comments in C programs. These
+       appear between /* and */ and within the comment,  individual  *  and  /
+       characters  may  appear. An attempt to match C comments by applying the
+       pattern
+
+         /\*.*\*/
+
+       to the string
+
+         /* first comment */  not comment  /* second comment */
+
+       fails, because it matches the entire string owing to the greediness  of
+       the .*  item.
+
+       If a quantifier is followed by a question mark, it ceases to be greedy,
+       and instead matches the minimum number of times possible, so  the  pat-
+       tern
+
+         /\*.*?\*/
+
+       does  the  right  thing with the C comments. The meaning of the various
+       quantifiers is not otherwise changed,  just  the  preferred  number  of
+       matches.   Do  not  confuse this use of question mark with its use as a
+       quantifier in its own right. Because it has two uses, it can  sometimes
+       appear doubled, as in
+
+         \d??\d
+
+       which matches one digit by preference, but can match two if that is the
+       only way the rest of the pattern matches.
+
+       If the PCRE2_UNGREEDY option is set (an option that is not available in
+       Perl),  the  quantifiers are not greedy by default, but individual ones
+       can be made greedy by following them with a  question  mark.  In  other
+       words, it inverts the default behaviour.
+
+       When  a  parenthesized  subpattern  is quantified with a minimum repeat
+       count that is greater than 1 or with a limited maximum, more memory  is
+       required  for  the  compiled  pattern, in proportion to the size of the
+       minimum or maximum.
+
+       If a pattern starts with  .*  or  .{0,}  and  the  PCRE2_DOTALL  option
+       (equivalent  to  Perl's /s) is set, thus allowing the dot to match new-
+       lines, the pattern is implicitly  anchored,  because  whatever  follows
+       will  be  tried against every character position in the subject string,
+       so there is no point in retrying the  overall  match  at  any  position
+       after the first. PCRE2 normally treats such a pattern as though it were
+       preceded by \A.
+
+       In cases where it is known that the subject  string  contains  no  new-
+       lines,  it  is worth setting PCRE2_DOTALL in order to obtain this opti-
+       mization, or alternatively, using ^ to indicate anchoring explicitly.
+
+       However, there are some cases where the optimization  cannot  be  used.
+       When  .*   is  inside  capturing  parentheses that are the subject of a
+       backreference elsewhere in the pattern, a match at the start  may  fail
+       where a later one succeeds. Consider, for example:
+
+         (.*)abc\1
+
+       If  the subject is "xyz123abc123" the match point is the fourth charac-
+       ter. For this reason, such a pattern is not implicitly anchored.
+
+       Another case where implicit anchoring is not applied is when the  lead-
+       ing  .* is inside an atomic group. Once again, a match at the start may
+       fail where a later one succeeds. Consider this pattern:
+
+         (?>.*?a)b
+
+       It matches "ab" in the subject "aab". The use of the backtracking  con-
+       trol  verbs  (*PRUNE)  and  (*SKIP) also disable this optimization, and
+       there is an option, PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly.
+
+       When a capturing subpattern is repeated, the value captured is the sub-
+       string that matched the final iteration. For example, after
+
+         (tweedle[dume]{3}\s*)+
+
+       has matched "tweedledum tweedledee" the value of the captured substring
+       is "tweedledee". However, if there are  nested  capturing  subpatterns,
+       the  corresponding captured values may have been set in previous itera-
+       tions. For example, after
+
+         (a|(b))+
+
+       matches "aba" the value of the second captured substring is "b".
+
+
+ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
+
+       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
+       repetition,  failure  of what follows normally causes the repeated item
+       to be re-evaluated to see if a different number of repeats  allows  the
+       rest  of  the pattern to match. Sometimes it is useful to prevent this,
+       either to change the nature of the match, or to cause it  fail  earlier
+       than  it otherwise might, when the author of the pattern knows there is
+       no point in carrying on.
+
+       Consider, for example, the pattern \d+foo when applied to  the  subject
+       line
+
+         123456bar
+
+       After matching all 6 digits and then failing to match "foo", the normal
+       action of the matcher is to try again with only 5 digits  matching  the
+       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
+       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
+       the  means for specifying that once a subpattern has matched, it is not
+       to be re-evaluated in this way.
+
+       If we use atomic grouping for the previous example, the  matcher  gives
+       up  immediately  on failing to match "foo" the first time. The notation
+       is a kind of special parenthesis, starting with (?> as in this example:
+
+         (?>\d+)foo
+
+       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
+       tains  once  it  has matched, and a failure further into the pattern is
+       prevented from backtracking into it. Backtracking past it  to  previous
+       items, however, works as normal.
+
+       An  alternative  description  is that a subpattern of this type matches
+       exactly the string of characters that an identical  standalone  pattern
+       would match, if anchored at the current point in the subject string.
+
+       Atomic grouping subpatterns are not capturing subpatterns. Simple cases
+       such as the above example can be thought of as a maximizing repeat that
+       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
+       pared to adjust the number of digits they match in order  to  make  the
+       rest of the pattern match, (?>\d+) can only match an entire sequence of
+       digits.
+
+       Atomic groups in general can of course contain arbitrarily  complicated
+       subpatterns,  and  can  be  nested. However, when the subpattern for an
+       atomic group is just a single repeated item, as in the example above, a
+       simpler  notation,  called  a "possessive quantifier" can be used. This
+       consists of an additional + character  following  a  quantifier.  Using
+       this notation, the previous example can be rewritten as
+
+         \d++foo
+
+       Note that a possessive quantifier can be used with an entire group, for
+       example:
+
+         (abc|xyz){2,3}+
+
+       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
+       PCRE2_UNGREEDY  option  is  ignored. They are a convenient notation for
+       the simpler forms of atomic group. However, there is no  difference  in
+       the meaning of a possessive quantifier and the equivalent atomic group,
+       though there may be a performance  difference;  possessive  quantifiers
+       should be slightly faster.
+
+       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
+       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
+       edition of his book. Mike McCloskey liked it, so implemented it when he
+       built Sun's Java package, and PCRE1 copied it from there. It ultimately
+       found its way into Perl at release 5.10.
+
+       PCRE2  has  an  optimization  that automatically "possessifies" certain
+       simple pattern constructs. For example, the sequence A+B is treated  as
+       A++B  because  there is no point in backtracking into a sequence of A's
+       when B must follow.  This feature can be disabled by the PCRE2_NO_AUTO-
+       POSSESS option, or starting the pattern with (*NO_AUTO_POSSESS).
+
+       When  a  pattern  contains an unlimited repeat inside a subpattern that
+       can itself be repeated an unlimited number of  times,  the  use  of  an
+       atomic  group  is  the  only way to avoid some failing matches taking a
+       very long time indeed. The pattern
+
+         (\D+|<\d+>)*[!?]
+
+       matches an unlimited number of substrings that either consist  of  non-
+       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
+       matches, it runs quickly. However, if it is applied to
+
+         aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+       it takes a long time before reporting  failure.  This  is  because  the
+       string  can be divided between the internal \D+ repeat and the external
+       * repeat in a large number of ways, and all  have  to  be  tried.  (The
+       example  uses  [!?]  rather than a single character at the end, because
+       both PCRE2 and Perl have an optimization that allows for  fast  failure
+       when  a single character is used. They remember the last single charac-
+       ter that is required for a match, and fail early if it is  not  present
+       in  the  string.)  If  the pattern is changed so that it uses an atomic
+       group, like this:
+
+         ((?>\D+)|<\d+>)*[!?]
+
+       sequences of non-digits cannot be broken, and failure happens quickly.
+
+
+BACKREFERENCES
+
+       Outside a character class, a backslash followed by a digit greater than
+       0  (and possibly further digits) is a backreference to a capturing sub-
+       pattern earlier (that is, to its left) in the pattern,  provided  there
+       have been that many previous capturing left parentheses.
+
+       However,  if the decimal number following the backslash is less than 8,
+       it is always taken as a backreference, and  causes  an  error  only  if
+       there  are  not that many capturing left parentheses in the entire pat-
+       tern. In other words, the parentheses that are referenced need  not  be
+       to  the left of the reference for numbers less than 8. A "forward back-
+       reference" of this type can make sense when a  repetition  is  involved
+       and  the  subpattern to the right has participated in an earlier itera-
+       tion.
+
+       It is not possible to have a numerical  "forward  backreference"  to  a
+       subpattern  whose  number  is  8  or  more  using this syntax because a
+       sequence such as \50 is interpreted as a character  defined  in  octal.
+       See the subsection entitled "Non-printing characters" above for further
+       details of the handling of digits following a backslash.  There  is  no
+       such  problem  when  named parentheses are used. A backreference to any
+       subpattern is possible using named parentheses (see below).
+
+       Another way of avoiding the ambiguity inherent in  the  use  of  digits
+       following  a  backslash  is  to use the \g escape sequence. This escape
+       must be followed by a signed or unsigned number, optionally enclosed in
+       braces. These examples are all identical:
+
+         (ring), \1
+         (ring), \g1
+         (ring), \g{1}
+
+       An  unsigned number specifies an absolute reference without the ambigu-
+       ity that is present in the older syntax. It is also useful when literal
+       digits  follow  the reference. A signed number is a relative reference.
+       Consider this example:
+
+         (abc(def)ghi)\g{-1}
+
+       The sequence \g{-1} is a reference to the most recently started captur-
+       ing subpattern before \g, that is, is it equivalent to \2 in this exam-
+       ple.  Similarly, \g{-2} would be equivalent to \1. The use of  relative
+       references  can  be helpful in long patterns, and also in patterns that
+       are created by  joining  together  fragments  that  contain  references
+       within themselves.
+
+       The  sequence  \g{+1}  is a reference to the next capturing subpattern.
+       This kind of forward reference can be useful it patterns  that  repeat.
+       Perl does not support the use of + in this way.
+
+       A backreference matches whatever actually matched the capturing subpat-
+       tern in the current subject string, rather than anything  matching  the
+       subpattern  itself (see "Subpatterns as subroutines" below for a way of
+       doing that). So the pattern
+
+         (sens|respons)e and \1ibility
+
+       matches "sense and sensibility" and "response and responsibility",  but
+       not  "sense and responsibility". If caseful matching is in force at the
+       time of the backreference, the case of letters is relevant.  For  exam-
+       ple,
+
+         ((?i)rah)\s+\1
+
+       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
+       original capturing subpattern is matched caselessly.
+
+       There are several different ways of  writing  backreferences  to  named
+       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
+       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
+       unified  backreference syntax, in which \g can be used for both numeric
+       and named references, is also supported. We  could  rewrite  the  above
+       example in any of the following ways:
+
+         (?<p1>(?i)rah)\s+\k<p1>
+         (?'p1'(?i)rah)\s+\k{p1}
+         (?P<p1>(?i)rah)\s+(?P=p1)
+         (?<p1>(?i)rah)\s+\g{p1}
+
+       A  subpattern  that  is  referenced  by  name may appear in the pattern
+       before or after the reference.
+
+       There may be more than one backreference to the same subpattern.  If  a
+       subpattern  has not actually been used in a particular match, any back-
+       references to it always fail by default. For example, the pattern
+
+         (a|(bc))\2
+
+       always fails if it starts to match "a" rather than  "bc".  However,  if
+       the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backref-
+       erence to an unset value matches an empty string.
+
+       Because there may be many capturing parentheses in a pattern, all  dig-
+       its  following  a backslash are taken as part of a potential backrefer-
+       ence number.  If the pattern continues with  a  digit  character,  some
+       delimiter   must  be  used  to  terminate  the  backreference.  If  the
+       PCRE2_EXTENDED or PCRE2_EXTENDED_MORE option is set, this can be  white
+       space.  Otherwise,  the  \g{ syntax or an empty comment (see "Comments"
+       below) can be used.
+
+   Recursive backreferences
+
+       A backreference that occurs inside the parentheses to which  it  refers
+       fails  when  the subpattern is first used, so, for example, (a\1) never
+       matches.  However, such references can be useful inside  repeated  sub-
+       patterns. For example, the pattern
+
+         (a|b\1)+
+
+       matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
+       ation of the subpattern, the backreference matches the character string
+       corresponding to the previous iteration. In order for this to work, the
+       pattern must be such that the first iteration does not  need  to  match
+       the  backreference. This can be done using alternation, as in the exam-
+       ple above, or by a quantifier with a minimum of zero.
+
+       Backreferences of this type cause the group that they reference  to  be
+       treated  as  an atomic group.  Once the whole group has been matched, a
+       subsequent matching failure cannot cause backtracking into  the  middle
+       of the group.
+
+
+ASSERTIONS
+
+       An  assertion  is  a  test on the characters following or preceding the
+       current matching point that does not consume any characters. The simple
+       assertions  coded  as  \b,  \B,  \A,  \G, \Z, \z, ^ and $ are described
+       above.
+
+       More complicated assertions are coded as  subpatterns.  There  are  two
+       kinds:  those  that  look  ahead of the current position in the subject
+       string, and those that look behind it, and in each  case  an  assertion
+       may  be  positive  (must  succeed for matching to continue) or negative
+       (must not succeed for matching to continue). An assertion subpattern is
+       matched in the normal way, except that, when matching continues after a
+       successful assertion, the matching position in the subject string is as
+       it was before the assertion was processed.
+
+       Assertion  subpatterns  are  not capturing subpatterns. If an assertion
+       contains capturing subpatterns within it, these  are  counted  for  the
+       purposes  of  numbering the capturing subpatterns in the whole pattern.
+       Within each branch of an assertion, locally captured substrings may  be
+       referenced in the usual way.  For example, a sequence such as (.)\g{-1}
+       can be used to check that two adjacent characters are the same.
+
+       When a branch within an assertion fails to match, any  substrings  that
+       were  captured  are  discarded (as happens with any pattern branch that
+       fails to match). A  negative  assertion  succeeds  only  when  all  its
+       branches fail to match; this means that no captured substrings are ever
+       retained after a successful negative assertion. When an assertion  con-
+       tains a matching branch, what happens depends on the type of assertion.
+
+       For  a  positive  assertion, internally captured substrings in the suc-
+       cessful branch are retained, and matching continues with the next  pat-
+       tern  item  after  the  assertion. For a negative assertion, a matching
+       branch means that the assertion has failed. If the assertion  is  being
+       used  as  a condition in a conditional subpattern (see below), captured
+       substrings are retained,  because  matching  continues  with  the  "no"
+       branch of the condition. For other failing negative assertions, control
+       passes to the previous backtracking point, thus discarding any captured
+       strings within the assertion.
+
+       For   compatibility  with  Perl,  most  assertion  subpatterns  may  be
+       repeated; though it makes no sense to assert  the  same  thing  several
+       times,  the  side  effect  of capturing parentheses may occasionally be
+       useful. However, an assertion that forms the  condition  for  a  condi-
+       tional  subpattern may not be quantified. In practice, for other asser-
+       tions, there only three cases:
+
+       (1) If the quantifier is {0}, the  assertion  is  never  obeyed  during
+       matching.   However,  it  may  contain internal capturing parenthesized
+       groups that are called from elsewhere via the subroutine mechanism.
+
+       (2) If quantifier is {0,n} where n is greater than zero, it is  treated
+       as  if  it  were  {0,1}.  At run time, the rest of the pattern match is
+       tried with and without the assertion, the order depending on the greed-
+       iness of the quantifier.
+
+       (3)  If  the minimum repetition is greater than zero, the quantifier is
+       ignored.  The assertion is obeyed just  once  when  encountered  during
+       matching.
+
+   Lookahead assertions
+
+       Lookahead assertions start with (?= for positive assertions and (?! for
+       negative assertions. For example,
+
+         \w+(?=;)
+
+       matches a word followed by a semicolon, but does not include the  semi-
+       colon in the match, and
+
+         foo(?!bar)
+
+       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
+       that the apparently similar pattern
+
+         (?!foo)bar
+
+       does not find an occurrence of "bar"  that  is  preceded  by  something
+       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
+       the assertion (?!foo) is always true when the next three characters are
+       "bar". A lookbehind assertion is needed to achieve the other effect.
+
+       If you want to force a matching failure at some point in a pattern, the
+       most convenient way to do it is  with  (?!)  because  an  empty  string
+       always  matches, so an assertion that requires there not to be an empty
+       string must always fail.  The backtracking control verb (*FAIL) or (*F)
+       is a synonym for (?!).
+
+   Lookbehind assertions
+
+       Lookbehind  assertions start with (?<= for positive assertions and (?<!
+       for negative assertions. For example,
+
+         (?<!foo)bar
+
+       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+       contents  of  a  lookbehind  assertion are restricted such that all the
+       strings it matches must have a fixed length. However, if there are sev-
+       eral  top-level  alternatives,  they  do  not all have to have the same
+       fixed length. Thus
+
+         (?<=bullock|donkey)
+
+       is permitted, but
+
+         (?<!dogs?|cats?)
+
+       causes an error at compile time. Branches that match  different  length
+       strings  are permitted only at the top level of a lookbehind assertion.
+       This is an extension compared with Perl, which requires all branches to
+       match the same length of string. An assertion such as
+
+         (?<=ab(c|de))
+
+       is  not  permitted,  because  its single top-level branch can match two
+       different lengths, but it is acceptable to PCRE2 if  rewritten  to  use
+       two top-level branches:
+
+         (?<=abc|abde)
+
+       In  some  cases, the escape sequence \K (see above) can be used instead
+       of a lookbehind assertion to get round the fixed-length restriction.
+
+       The implementation of lookbehind assertions is, for  each  alternative,
+       to  temporarily  move the current position back by the fixed length and
+       then try to match. If there are insufficient characters before the cur-
+       rent position, the assertion fails.
+
+       In  UTF-8  and  UTF-16 modes, PCRE2 does not allow the \C escape (which
+       matches a single code unit even in a UTF mode) to appear in  lookbehind
+       assertions,  because  it makes it impossible to calculate the length of
+       the lookbehind. The \X and \R escapes, which can match  different  num-
+       bers of code units, are never permitted in lookbehinds.
+
+       "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in
+       lookbehinds, as long as the subpattern matches a  fixed-length  string.
+       However,  recursion,  that is, a "subroutine" call into a group that is
+       already active, is not supported.
+
+       Perl does not support backreferences in lookbehinds. PCRE2 does support
+       them,    but    only    if    certain    conditions    are   met.   The
+       PCRE2_MATCH_UNSET_BACKREF option must not be set, there must be no  use
+       of (?| in the pattern (it creates duplicate subpattern numbers), and if
+       the backreference is by name, the name must be unique. Of  course,  the
+       referenced  subpattern  must  itself  be of fixed length. The following
+       pattern matches words containing at least two characters that begin and
+       end with the same character:
+
+          \b(\w)\w++(?<=\1)
+
+       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
+       assertions to specify efficient matching of fixed-length strings at the
+       end of subject strings. Consider a simple pattern such as
+
+         abcd$
+
+       when  applied  to  a  long string that does not match. Because matching
+       proceeds from left to right, PCRE2 will look for each "a" in  the  sub-
+       ject  and  then see if what follows matches the rest of the pattern. If
+       the pattern is specified as
+
+         ^.*abcd$
+
+       the initial .* matches the entire string at first, but when this  fails
+       (because there is no following "a"), it backtracks to match all but the
+       last character, then all but the last two characters, and so  on.  Once
+       again  the search for "a" covers the entire string, from right to left,
+       so we are no better off. However, if the pattern is written as
+
+         ^.*+(?<=abcd)
+
+       there can be no backtracking for the .*+ item because of the possessive
+       quantifier; it can match only the entire string. The subsequent lookbe-
+       hind assertion does a single test on the last four  characters.  If  it
+       fails,  the  match  fails  immediately. For long strings, this approach
+       makes a significant difference to the processing time.
+
+   Using multiple assertions
+
+       Several assertions (of any sort) may occur in succession. For example,
+
+         (?<=\d{3})(?<!999)foo
+
+       matches "foo" preceded by three digits that are not "999". Notice  that
+       each  of  the  assertions is applied independently at the same point in
+       the subject string. First there is a  check  that  the  previous  three
+       characters  are  all  digits,  and  then there is a check that the same
+       three characters are not "999".  This pattern does not match "foo" pre-
+       ceded  by  six  characters,  the first of which are digits and the last
+       three of which are not "999". For example, it  doesn't  match  "123abc-
+       foo". A pattern to do that is
+
+         (?<=\d{3}...)(?<!999)foo
+
+       This  time  the  first assertion looks at the preceding six characters,
+       checking that the first three are digits, and then the second assertion
+       checks that the preceding three characters are not "999".
+
+       Assertions can be nested in any combination. For example,
+
+         (?<=(?<!foo)bar)baz
+
+       matches  an occurrence of "baz" that is preceded by "bar" which in turn
+       is not preceded by "foo", while
+
+         (?<=\d{3}(?!999)...)foo
+
+       is another pattern that matches "foo" preceded by three digits and  any
+       three characters that are not "999".
+
+
+CONDITIONAL SUBPATTERNS
+
+       It  is possible to cause the matching process to obey a subpattern con-
+       ditionally or to choose between two alternative subpatterns,  depending
+       on  the result of an assertion, or whether a specific capturing subpat-
+       tern has already been matched. The two possible  forms  of  conditional
+       subpattern are:
+
+         (?(condition)yes-pattern)
+         (?(condition)yes-pattern|no-pattern)
+
+       If  the  condition is satisfied, the yes-pattern is used; otherwise the
+       no-pattern (if present) is used. An absent no-pattern is equivalent  to
+       an  empty string (it always matches). If there are more than two alter-
+       natives in the subpattern, a compile-time error occurs. Each of the two
+       alternatives may itself contain nested subpatterns of any form, includ-
+       ing  conditional  subpatterns;  the  restriction  to  two  alternatives
+       applies only at the level of the condition. This pattern fragment is an
+       example where the alternatives are complex:
+
+         (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
+
+
+       There are five kinds of condition: references  to  subpatterns,  refer-
+       ences  to  recursion,  two pseudo-conditions called DEFINE and VERSION,
+       and assertions.
+
+   Checking for a used subpattern by number
+
+       If the text between the parentheses consists of a sequence  of  digits,
+       the condition is true if a capturing subpattern of that number has pre-
+       viously matched. If there is more than one  capturing  subpattern  with
+       the  same  number  (see  the earlier section about duplicate subpattern
+       numbers), the condition is true if any of them have matched. An  alter-
+       native  notation is to precede the digits with a plus or minus sign. In
+       this case, the subpattern number is relative rather than absolute.  The
+       most  recently opened parentheses can be referenced by (?(-1), the next
+       most recent by (?(-2), and so on. Inside loops it can also  make  sense
+       to refer to subsequent groups. The next parentheses to be opened can be
+       referenced as (?(+1), and so on. (The value zero in any of these  forms
+       is not used; it provokes a compile-time error.)
+
+       Consider  the  following  pattern, which contains non-significant white
+       space to make it more readable (assume the PCRE2_EXTENDED  option)  and
+       to divide it into three parts for ease of discussion:
+
+         ( \( )?    [^()]+    (?(1) \) )
+
+       The  first  part  matches  an optional opening parenthesis, and if that
+       character is present, sets it as the first captured substring. The sec-
+       ond  part  matches one or more characters that are not parentheses. The
+       third part is a conditional subpattern that tests whether  or  not  the
+       first  set  of  parentheses  matched.  If they did, that is, if subject
+       started with an opening parenthesis, the condition is true, and so  the
+       yes-pattern  is  executed and a closing parenthesis is required. Other-
+       wise, since no-pattern is not present, the subpattern matches  nothing.
+       In  other  words,  this  pattern matches a sequence of non-parentheses,
+       optionally enclosed in parentheses.
+
+       If you were embedding this pattern in a larger one,  you  could  use  a
+       relative reference:
+
+         ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
+
+       This  makes  the  fragment independent of the parentheses in the larger
+       pattern.
+
+   Checking for a used subpattern by name
+
+       Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
+       used  subpattern  by  name.  For compatibility with earlier versions of
+       PCRE1, which had this facility before Perl, the syntax (?(name)...)  is
+       also  recognized.  Note,  however, that undelimited names consisting of
+       the letter R followed by digits are ambiguous (see the  following  sec-
+       tion).
+
+       Rewriting the above example to use a named subpattern gives this:
+
+         (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
+
+       If  the  name used in a condition of this kind is a duplicate, the test
+       is applied to all subpatterns of the same name, and is true if any  one
+       of them has matched.
+
+   Checking for pattern recursion
+
+       "Recursion"  in  this sense refers to any subroutine-like call from one
+       part of the pattern to another, whether or not it  is  actually  recur-
+       sive.  See  the sections entitled "Recursive patterns" and "Subpatterns
+       as subroutines" below for details of recursion and subpattern calls.
+
+       If a condition is the string (R), and there is no subpattern  with  the
+       name  R,  the condition is true if matching is currently in a recursion
+       or subroutine call to the whole pattern or any  subpattern.  If  digits
+       follow  the  letter  R,  and there is no subpattern with that name, the
+       condition is true if the most recent call is into a subpattern with the
+       given  number,  which must exist somewhere in the overall pattern. This
+       is a contrived example that is equivalent to a+b:
+
+         ((?(R1)a+|(?1)b))
+
+       However, in both cases, if there is a subpattern with a matching  name,
+       the  condition  tests  for  its  being set, as described in the section
+       above, instead of testing for recursion. For example, creating a  group
+       with  the  name  R1  by  adding (?<R1>) to the above pattern completely
+       changes its meaning.
+
+       If a name preceded by ampersand follows the letter R, for example:
+
+         (?(R&name)...)
+
+       the condition is true if the most recent recursion is into a subpattern
+       of that name (which must exist within the pattern).
+
+       This condition does not check the entire recursion stack. It tests only
+       the current level. If the name used in a condition of this  kind  is  a
+       duplicate, the test is applied to all subpatterns of the same name, and
+       is true if any one of them is the most recent recursion.
+
+       At "top level", all these recursion test conditions are false.
+
+   Defining subpatterns for use by reference only
+
+       If the condition is the string (DEFINE), the condition is always false,
+       even  if there is a group with the name DEFINE. In this case, there may
+       be only one alternative in the subpattern. It is always skipped if con-
+       trol  reaches  this point in the pattern; the idea of DEFINE is that it
+       can be used to define subroutines that can  be  referenced  from  else-
+       where. (The use of subroutines is described below.) For example, a pat-
+       tern to match an IPv4 address such as "192.168.23.245" could be written
+       like this (ignore white space and line breaks):
+
+         (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
+         \b (?&byte) (\.(?&byte)){3} \b
+
+       The  first part of the pattern is a DEFINE group inside which a another
+       group named "byte" is defined. This matches an individual component  of
+       an  IPv4  address  (a number less than 256). When matching takes place,
+       this part of the pattern is skipped because DEFINE acts  like  a  false
+       condition.  The  rest of the pattern uses references to the named group
+       to match the four dot-separated components of an IPv4 address,  insist-
+       ing on a word boundary at each end.
+
+   Checking the PCRE2 version
+
+       Programs  that link with a PCRE2 library can check the version by call-
+       ing pcre2_config() with appropriate arguments.  Users  of  applications
+       that  do  not have access to the underlying code cannot do this. A spe-
+       cial "condition" called VERSION exists to allow such users to  discover
+       which version of PCRE2 they are dealing with by using this condition to
+       match a string such as "yesno". VERSION must be followed either by  "="
+       or ">=" and a version number.  For example:
+
+         (?(VERSION>=10.4)yes|no)
+
+       This  pattern matches "yes" if the PCRE2 version is greater or equal to
+       10.4, or "no" otherwise. The fractional part of the version number  may
+       not contain more than two digits.
+
+   Assertion conditions
+
+       If  the  condition  is  not  in any of the above formats, it must be an
+       assertion.  This may be a positive or negative lookahead or  lookbehind
+       assertion.  Consider  this  pattern,  again  containing non-significant
+       white space, and with the two alternatives on the second line:
+
+         (?(?=[^a-z]*[a-z])
+         \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
+
+       The condition  is  a  positive  lookahead  assertion  that  matches  an
+       optional  sequence of non-letters followed by a letter. In other words,
+       it tests for the presence of at least one letter in the subject.  If  a
+       letter  is found, the subject is matched against the first alternative;
+       otherwise it is  matched  against  the  second.  This  pattern  matches
+       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+       letters and dd are digits.
+
+       When an assertion that is a condition contains  capturing  subpatterns,
+       any  capturing that occurs in a matching branch is retained afterwards,
+       for both positive and negative assertions, because matching always con-
+       tinues after the assertion, whether it succeeds or fails. (Compare non-
+       conditional assertions, when captures are retained  only  for  positive
+       assertions that succeed.)
+
+
+COMMENTS
+
+       There are two ways of including comments in patterns that are processed
+       by PCRE2. In both cases, the start of the comment  must  not  be  in  a
+       character  class,  nor  in  the middle of any other sequence of related
+       characters such as (?: or a subpattern name or number.  The  characters
+       that make up a comment play no part in the pattern matching.
+
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses are not permitted. If  the
+       PCRE2_EXTENDED  or  PCRE2_EXTENDED_MORE  option  is set, an unescaped #
+       character also introduces a comment, which in this  case  continues  to
+       immediately  after  the next newline character or character sequence in
+       the pattern. Which characters are interpreted as newlines is controlled
+       by  an option passed to the compiling function or by a special sequence
+       at the start of the pattern, as described in the section entitled "New-
+       line conventions" above. Note that the end of this type of comment is a
+       literal newline sequence in the pattern; escape sequences  that  happen
+       to represent a newline do not count. For example, consider this pattern
+       when PCRE2_EXTENDED is set, and the default newline convention (a  sin-
+       gle linefeed character) is in force:
+
+         abc #comment \n still comment
+
+       On  encountering  the # character, pcre2_compile() skips along, looking
+       for a newline in the pattern. The sequence \n is still literal at  this
+       stage,  so  it does not terminate the comment. Only an actual character
+       with the code value 0x0a (the default newline) does so.
+
+
+RECURSIVE PATTERNS
+
+       Consider the problem of matching a string in parentheses, allowing  for
+       unlimited  nested  parentheses.  Without the use of recursion, the best
+       that can be done is to use a pattern that  matches  up  to  some  fixed
+       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
+       depth.
+
+       For some time, Perl has provided a facility that allows regular expres-
+       sions  to recurse (amongst other things). It does this by interpolating
+       Perl code in the expression at run time, and the code can refer to  the
+       expression itself. A Perl pattern using code interpolation to solve the
+       parentheses problem can be created like this:
+
+         $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
+
+       The (?p{...}) item interpolates Perl code at run time, and in this case
+       refers recursively to the pattern in which it appears.
+
+       Obviously,  PCRE2  cannot  support  the  interpolation  of  Perl  code.
+       Instead, it supports special syntax for recursion of  the  entire  pat-
+       tern, and also for individual subpattern recursion. After its introduc-
+       tion in PCRE1 and Python,  this  kind  of  recursion  was  subsequently
+       introduced into Perl at release 5.10.
+
+       A  special  item  that consists of (? followed by a number greater than
+       zero and a closing parenthesis is a recursive subroutine  call  of  the
+       subpattern  of  the  given  number, provided that it occurs inside that
+       subpattern. (If not, it is a non-recursive subroutine  call,  which  is
+       described  in  the  next  section.)  The special item (?R) or (?0) is a
+       recursive call of the entire regular expression.
+
+       This PCRE2 pattern solves the nested parentheses  problem  (assume  the
+       PCRE2_EXTENDED option is set so that white space is ignored):
+
+         \( ( [^()]++ | (?R) )* \)
+
+       First  it matches an opening parenthesis. Then it matches any number of
+       substrings which can either be a  sequence  of  non-parentheses,  or  a
+       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       sized substring).  Finally there is a closing parenthesis. Note the use
+       of a possessive quantifier to avoid backtracking into sequences of non-
+       parentheses.
+
+       If this were part of a larger pattern, you would not  want  to  recurse
+       the entire pattern, so instead you could use this:
+
+         ( \( ( [^()]++ | (?1) )* \) )
+
+       We  have  put the pattern into parentheses, and caused the recursion to
+       refer to them instead of the whole pattern.
+
+       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
+       tricky.  This is made easier by the use of relative references. Instead
+       of (?1) in the pattern above you can write (?-2) to refer to the second
+       most  recently  opened  parentheses  preceding  the recursion. In other
+       words, a negative number counts capturing  parentheses  leftwards  from
+       the point at which it is encountered.
+
+       Be aware however, that if duplicate subpattern numbers are in use, rel-
+       ative references refer to the earliest subpattern with the  appropriate
+       number. Consider, for example:
+
+         (?|(a)|(b)) (c) (?-2)
+
+       The  first  two  capturing  groups (a) and (b) are both numbered 1, and
+       group (c) is number 2. When the reference  (?-2)  is  encountered,  the
+       second most recently opened parentheses has the number 1, but it is the
+       first such group (the (a) group) to which the  recursion  refers.  This
+       would  be  the  same  if  an absolute reference (?1) was used. In other
+       words, relative references are just a shorthand for computing  a  group
+       number.
+
+       It  is  also  possible  to refer to subsequently opened parentheses, by
+       writing references such as (?+2). However, these  cannot  be  recursive
+       because  the  reference  is  not inside the parentheses that are refer-
+       enced. They are always non-recursive subroutine calls, as described  in
+       the next section.
+
+       An  alternative  approach  is to use named parentheses. The Perl syntax
+       for this is (?&name); PCRE1's earlier syntax  (?P>name)  is  also  sup-
+       ported. We could rewrite the above example as follows:
+
+         (?<pn> \( ( [^()]++ | (?&pn) )* \) )
+
+       If  there  is more than one subpattern with the same name, the earliest
+       one is used.
+
+       The example pattern that we have been looking at contains nested unlim-
+       ited  repeats,  and  so the use of a possessive quantifier for matching
+       strings of non-parentheses is important when applying  the  pattern  to
+       strings that do not match. For example, when this pattern is applied to
+
+         (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+
+       it  yields  "no  match" quickly. However, if a possessive quantifier is
+       not used, the match runs for a very long time indeed because there  are
+       so  many  different  ways the + and * repeats can carve up the subject,
+       and all have to be tested before failure can be reported.
+
+       At the end of a match, the values of capturing  parentheses  are  those
+       from  the outermost level. If you want to obtain intermediate values, a
+       callout function can be used (see below and the pcre2callout documenta-
+       tion). If the pattern above is matched against
+
+         (ab(cd)ef)
+
+       the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
+       which is the last value taken on at the top level. If a capturing  sub-
+       pattern  is  not  matched at the top level, its final captured value is
+       unset, even if it was (temporarily) set at a deeper  level  during  the
+       matching process.
+
+       Do  not  confuse  the (?R) item with the condition (R), which tests for
+       recursion.  Consider this pattern, which matches text in  angle  brack-
+       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
+       brackets (that is, when recursing), whereas any characters are  permit-
+       ted at the outer level.
+
+         < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
+
+       In  this  pattern, (?(R) is the start of a conditional subpattern, with
+       two different alternatives for the recursive and  non-recursive  cases.
+       The (?R) item is the actual recursive call.
+
+   Differences in recursion processing between PCRE2 and Perl
+
+       Some former differences between PCRE2 and Perl no longer exist.
+
+       Before  release 10.30, recursion processing in PCRE2 differed from Perl
+       in that a recursive subpattern call was always  treated  as  an  atomic
+       group.  That is, once it had matched some of the subject string, it was
+       never re-entered, even if it contained untried alternatives  and  there
+       was  a  subsequent matching failure. (Historical note: PCRE implemented
+       recursion before Perl did.)
+
+       Starting with release 10.30, recursive subroutine calls are  no  longer
+       treated as atomic. That is, they can be re-entered to try unused alter-
+       natives if there is a matching failure later in the  pattern.  This  is
+       now  compatible  with the way Perl works. If you want a subroutine call
+       to be atomic, you must explicitly enclose it in an atomic group.
+
+       Supporting backtracking into recursions  simplifies  certain  types  of
+       recursive  pattern.  For  example,  this  pattern  matches  palindromic
+       strings:
+
+         ^((.)(?1)\2|.?)$
+
+       The second branch in the group matches a single  central  character  in
+       the  palindrome  when there are an odd number of characters, or nothing
+       when there are an even number of characters, but in order  to  work  it
+       has  to  be  able  to  try the second case when the rest of the pattern
+       match fails. If you want to match typical palindromic phrases, the pat-
+       tern  has  to  ignore  all  non-word characters, which can be done like
+       this:
+
+         ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$
+
+       If run with the PCRE2_CASELESS option,  this  pattern  matches  phrases
+       such  as "A man, a plan, a canal: Panama!". Note the use of the posses-
+       sive quantifier *+ to avoid backtracking  into  sequences  of  non-word
+       characters. Without this, PCRE2 takes a great deal longer (ten times or
+       more) to match typical phrases, and Perl takes so long that  you  think
+       it has gone into a loop.
+
+       Another  way  in which PCRE2 and Perl used to differ in their recursion
+       processing is in the handling of captured  values.  Formerly  in  Perl,
+       when  a  subpattern  was called recursively or as a subpattern (see the
+       next section), it had no access to any values that were  captured  out-
+       side  the  recursion,  whereas in PCRE2 these values can be referenced.
+       Consider this pattern:
+
+         ^(.)(\1|a(?2))
+
+       This pattern matches "bab". The first capturing parentheses match  "b",
+       then in the second group, when the backreference \1 fails to match "b",
+       the second alternative matches "a" and then recurses. In the recursion,
+       \1  does now match "b" and so the whole match succeeds. This match used
+       to fail in Perl, but in later versions (I tried 5.024) it now works.
+
+
+SUBPATTERNS AS SUBROUTINES
+
+       If the syntax for a recursive subpattern call (either by number  or  by
+       name) is used outside the parentheses to which it refers, it operates a
+       bit like a subroutine in a programming language. More accurately, PCRE2
+       treats  the referenced subpattern as an independent subpattern which it
+       tries to match at the current matching position. The called  subpattern
+       may  be defined before or after the reference. A numbered reference can
+       be absolute or relative, as in these examples:
+
+         (...(absolute)...)...(?2)...
+         (...(relative)...)...(?-1)...
+         (...(?+1)...(relative)...
+
+       An earlier example pointed out that the pattern
+
+         (sens|respons)e and \1ibility
+
+       matches "sense and sensibility" and "response and responsibility",  but
+       not "sense and responsibility". If instead the pattern
+
+         (sens|respons)e and (?1)ibility
+
+       is  used, it does match "sense and responsibility" as well as the other
+       two strings. Another example is  given  in  the  discussion  of  DEFINE
+       above.
+
+       Like  recursions,  subroutine  calls  used to be treated as atomic, but
+       this changed at PCRE2 release 10.30, so  backtracking  into  subroutine
+       calls  can  now  occur. However, any capturing parentheses that are set
+       during the subroutine call revert to their previous values afterwards.
+
+       Processing options such as case-independence are fixed when  a  subpat-
+       tern  is defined, so if it is used as a subroutine, such options cannot
+       be changed for different calls. For example, consider this pattern:
+
+         (abc)(?i:(?-1))
+
+       It matches "abcabc". It does not match "abcABC" because the  change  of
+       processing option does not affect the called subpattern.
+
+       The  behaviour of backtracking control verbs in subpatterns when called
+       as subroutines is described in the section entitled "Backtracking verbs
+       in subroutines" below.
+
+
+ONIGURUMA SUBROUTINE SYNTAX
+
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       name or a number enclosed either in angle brackets or single quotes, is
+       an  alternative  syntax  for  referencing a subpattern as a subroutine,
+       possibly recursively. Here are two of the examples used above,  rewrit-
+       ten using this syntax:
+
+         (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
+         (sens|respons)e and \g'1'ibility
+
+       PCRE2  supports an extension to Oniguruma: if a number is preceded by a
+       plus or a minus sign it is taken as a relative reference. For example:
+
+         (abc)(?i:\g<-1>)
+
+       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
+       synonymous.  The  former is a backreference; the latter is a subroutine
+       call.
+
+
+CALLOUTS
+
+       Perl has a feature whereby using the sequence (?{...}) causes arbitrary
+       Perl  code to be obeyed in the middle of matching a regular expression.
+       This makes it possible, amongst other things, to extract different sub-
+       strings that match the same pair of parentheses when there is a repeti-
+       tion.
+
+       PCRE2 provides a similar feature, but of course it  cannot  obey  arbi-
+       trary  Perl  code. The feature is called "callout". The caller of PCRE2
+       provides an external function by putting its entry  point  in  a  match
+       context  using  the function pcre2_set_callout(), and then passing that
+       context to pcre2_match() or pcre2_dfa_match(). If no match  context  is
+       passed, or if the callout entry point is set to NULL, callouts are dis-
+       abled.
+
+       Within a regular expression, (?C<arg>) indicates a point at  which  the
+       external  function  is  to  be  called. There are two kinds of callout:
+       those with a numerical argument and those with a string argument.  (?C)
+       on  its  own with no argument is treated as (?C0). A numerical argument
+       allows the  application  to  distinguish  between  different  callouts.
+       String  arguments  were added for release 10.20 to make it possible for
+       script languages that use PCRE2 to embed short scripts within  patterns
+       in a similar way to Perl.
+
+       During matching, when PCRE2 reaches a callout point, the external func-
+       tion is called. It is provided with the number or  string  argument  of
+       the  callout, the position in the pattern, and one item of data that is
+       also set in the match block. The callout function may cause matching to
+       proceed, to backtrack, or to fail.
+
+       By  default,  PCRE2  implements  a  number of optimizations at matching
+       time, and one side-effect is that sometimes callouts  are  skipped.  If
+       you  need all possible callouts to happen, you need to set options that
+       disable the relevant optimizations. More details, including a  complete
+       description  of  the programming interface to the callout function, are
+       given in the pcre2callout documentation.
+
+   Callouts with numerical arguments
+
+       If you just want to have  a  means  of  identifying  different  callout
+       points,  put  a  number  less than 256 after the letter C. For example,
+       this pattern has two callout points:
+
+         (?C1)abc(?C2)def
+
+       If the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(),  numerical
+       callouts  are  automatically installed before each item in the pattern.
+       They are all numbered 255. If there is a conditional group in the  pat-
+       tern whose condition is an assertion, an additional callout is inserted
+       just before the condition. An explicit callout may also be set at  this
+       position, as in this example:
+
+         (?(?C9)(?=a)abc|def)
+
+       Note that this applies only to assertion conditions, not to other types
+       of condition.
+
+   Callouts with string arguments
+
+       A delimited string may be used instead of a number as a  callout  argu-
+       ment.  The  starting  delimiter  must be one of ` ' " ^ % # $ { and the
+       ending delimiter is the same as the start, except for {, where the end-
+       ing  delimiter  is  }.  If  the  ending  delimiter is needed within the
+       string, it must be doubled. For example:
+
+         (?C'ab ''c'' d')xyz(?C{any text})pqr
+
+       The doubling is removed before the string  is  passed  to  the  callout
+       function.
+
+
+BACKTRACKING CONTROL
+
+       There  are  a  number  of  special "Backtracking Control Verbs" (to use
+       Perl's terminology) that modify the behaviour  of  backtracking  during
+       matching.  They are generally of the form (*VERB) or (*VERB:NAME). Some
+       verbs take either form,  possibly  behaving  differently  depending  on
+       whether or not a name is present.
+
+       By  default,  for  compatibility  with  Perl, a name is any sequence of
+       characters that does not include a closing parenthesis. The name is not
+       processed  in  any  way,  and  it  is not possible to include a closing
+       parenthesis  in  the  name.   This  can  be  changed  by  setting   the
+       PCRE2_ALT_VERBNAMES  option,  but the result is no longer Perl-compati-
+       ble.
+
+       When PCRE2_ALT_VERBNAMES is set, backslash  processing  is  applied  to
+       verb  names  and  only  an unescaped closing parenthesis terminates the
+       name. However, the only backslash items that are permitted are \Q,  \E,
+       and  sequences such as \x{100} that define character code points. Char-
+       acter type escapes such as \d are faulted.
+
+       A closing parenthesis can be included in a name either as \) or between
+       \Q  and  \E. In addition to backslash processing, if the PCRE2_EXTENDED
+       or PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb
+       names is skipped, and #-comments are recognized, exactly as in the rest
+       of the pattern.  PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do  not  affect
+       verb names unless PCRE2_ALT_VERBNAMES is also set.
+
+       The  maximum  length of a name is 255 in the 8-bit library and 65535 in
+       the 16-bit and 32-bit libraries. If the name is empty, that is, if  the
+       closing  parenthesis immediately follows the colon, the effect is as if
+       the colon were not there. Any number of these verbs may occur in a pat-
+       tern.
+
+       Since  these  verbs  are  specifically related to backtracking, most of
+       them can be used only when the pattern is to be matched using the  tra-
+       ditional matching function, because that uses a backtracking algorithm.
+       With the exception of (*FAIL), which behaves like  a  failing  negative
+       assertion, the backtracking control verbs cause an error if encountered
+       by the DFA matching function.
+
+       The behaviour of these verbs in repeated  groups,  assertions,  and  in
+       subpatterns called as subroutines (whether or not recursively) is docu-
+       mented below.
+
+   Optimizations that affect backtracking verbs
+
+       PCRE2 contains some optimizations that are used to speed up matching by
+       running some checks at the start of each match attempt. For example, it
+       may know the minimum length of matching subject, or that  a  particular
+       character must be present. When one of these optimizations bypasses the
+       running of a match,  any  included  backtracking  verbs  will  not,  of
+       course, be processed. You can suppress the start-of-match optimizations
+       by setting the PCRE2_NO_START_OPTIMIZE option when  calling  pcre2_com-
+       pile(),  or by starting the pattern with (*NO_START_OPT). There is more
+       discussion of this option in the section entitled "Compiling a pattern"
+       in the pcre2api documentation.
+
+       Experiments  with  Perl  suggest that it too has similar optimizations,
+       and like PCRE2, turning them off can change the result of a match.
+
+   Verbs that act immediately
+
+       The following verbs act as soon as they are encountered.
+
+          (*ACCEPT) or (*ACCEPT:NAME)
+
+       This verb causes the match to end successfully, skipping the  remainder
+       of  the pattern. However, when it is inside a subpattern that is called
+       as a subroutine, only that subpattern is ended  successfully.  Matching
+       then continues at the outer level. If (*ACCEPT) in triggered in a posi-
+       tive assertion, the assertion succeeds; in a  negative  assertion,  the
+       assertion fails.
+
+       If  (*ACCEPT)  is inside capturing parentheses, the data so far is cap-
+       tured. For example:
+
+         A((?:A|B(*ACCEPT)|C)D)
+
+       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
+       tured by the outer parentheses.
+
+         (*FAIL) or (*FAIL:NAME)
+
+       This  verb causes a matching failure, forcing backtracking to occur. It
+       may be abbreviated to (*F). It is equivalent  to  (?!)  but  easier  to
+       read. The Perl documentation notes that it is probably useful only when
+       combined with (?{}) or (??{}). Those are, of course, Perl features that
+       are  not  present  in PCRE2. The nearest equivalent is the callout fea-
+       ture, as for example in this pattern:
+
+         a+(?C)(*FAIL)
+
+       A match with the string "aaaa" always fails, but the callout  is  taken
+       before each backtrack happens (in this example, 10 times).
+
+       (*ACCEPT:NAME)   and   (*FAIL:NAME)   behave   exactly   the   same  as
+       (*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
+
+   Recording which path was taken
+
+       There is one verb whose main purpose  is  to  track  how  a  match  was
+       arrived  at,  though  it  also  has a secondary use in conjunction with
+       advancing the match starting point (see (*SKIP) below).
+
+         (*MARK:NAME) or (*:NAME)
+
+       A name is always  required  with  this  verb.  There  may  be  as  many
+       instances  of  (*MARK) as you like in a pattern, and their names do not
+       have to be unique.
+
+       When a match succeeds, the name of the last-encountered (*MARK:NAME) on
+       the matching path is passed back to the caller as described in the sec-
+       tion entitled "Other information about the match" in the pcre2api docu-
+       mentation.  This  applies  to all instances of (*MARK), including those
+       inside assertions and atomic groups. (There are  differences  in  those
+       cases  when  (*MARK)  is  used in conjunction with (*SKIP) as described
+       below.)
+
+       As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may  have
+       associated  NAME  arguments.  Whichever is last on the matching path is
+       passed back. See below for more details of these other verbs.
+
+       Here is an example of  pcre2test  output,  where  the  "mark"  modifier
+       requests the retrieval and outputting of (*MARK) data:
+
+           re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
+         data> XY
+          0: XY
+         MK: A
+         XZ
+          0: XZ
+         MK: B
+
+       The (*MARK) name is tagged with "MK:" in this output, and in this exam-
+       ple it indicates which of the two alternatives matched. This is a  more
+       efficient  way of obtaining this information than putting each alterna-
+       tive in its own capturing parentheses.
+
+       If a verb with a name is encountered in a positive  assertion  that  is
+       true,  the  name  is recorded and passed back if it is the last-encoun-
+       tered. This does not happen for negative assertions or failing positive
+       assertions.
+
+       After  a  partial match or a failed match, the last encountered name in
+       the entire match process is returned. For example:
+
+           re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
+         data> XP
+         No match, mark = B
+
+       Note that in this unanchored example the  mark  is  retained  from  the
+       match attempt that started at the letter "X" in the subject. Subsequent
+       match attempts starting at "P" and then with an empty string do not get
+       as far as the (*MARK) item, but nevertheless do not reset it.
+
+       If  you  are  interested  in  (*MARK)  values after failed matches, you
+       should probably set the PCRE2_NO_START_OPTIMIZE option (see  above)  to
+       ensure that the match is always attempted.
+
+   Verbs that act after backtracking
+
+       The following verbs do nothing when they are encountered. Matching con-
+       tinues with what follows, but if there is a subsequent  match  failure,
+       causing  a  backtrack  to the verb, a failure is forced. That is, back-
+       tracking cannot pass to the left of the  verb.  However,  when  one  of
+       these verbs appears inside an atomic group or in a lookaround assertion
+       that is true, its effect is confined to that group,  because  once  the
+       group  has been matched, there is never any backtracking into it. Back-
+       tracking from beyond an assertion or an atomic group ignores the entire
+       group, and seeks a preceeding backtracking point.
+
+       These  verbs  differ  in exactly what kind of failure occurs when back-
+       tracking reaches them. The behaviour described below  is  what  happens
+       when  the  verb is not in a subroutine or an assertion. Subsequent sec-
+       tions cover these special cases.
+
+         (*COMMIT) or (*COMMIT:NAME)
+
+       This verb causes the whole match to fail outright if there is  a  later
+       matching failure that causes backtracking to reach it. Even if the pat-
+       tern is unanchored, no further attempts to find a  match  by  advancing
+       the  starting  point  take place. If (*COMMIT) is the only backtracking
+       verb that is encountered, once it has been passed pcre2_match() is com-
+       mitted to finding a match at the current starting point, or not at all.
+       For example:
+
+         a+(*COMMIT)b
+
+       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
+       of dynamic anchor, or "I've started, so I must finish."
+
+       The  behaviour  of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM-
+       MIT). It is like (*MARK:NAME) in that the name is remembered for  pass-
+       ing  back  to the caller. However, (*SKIP:NAME) searches only for names
+       set with  (*MARK),  ignoring  those  set  by  (*COMMIT),  (*PRUNE)  and
+       (*THEN).
+
+       If  there  is more than one backtracking verb in a pattern, a different
+       one that follows (*COMMIT) may be triggered first,  so  merely  passing
+       (*COMMIT) during a match does not always guarantee that a match must be
+       at this starting point.
+
+       Note that (*COMMIT) at the start of a pattern is not  the  same  as  an
+       anchor,  unless PCRE2's start-of-match optimizations are turned off, as
+       shown in this output from pcre2test:
+
+           re> /(*COMMIT)abc/
+         data> xyzabc
+          0: abc
+         data>
+         re> /(*COMMIT)abc/no_start_optimize
+         data> xyzabc
+         No match
+
+       For the first pattern, PCRE2 knows that any match must start with  "a",
+       so  the optimization skips along the subject to "a" before applying the
+       pattern to the first set of data. The match attempt then succeeds.  The
+       second  pattern disables the optimization that skips along to the first
+       character. The pattern is now applied  starting  at  "x",  and  so  the
+       (*COMMIT)  causes  the  match to fail without trying any other starting
+       points.
+
+         (*PRUNE) or (*PRUNE:NAME)
+
+       This verb causes the match to fail at the current starting position  in
+       the subject if there is a later matching failure that causes backtrack-
+       ing to reach it. If the pattern is unanchored, the  normal  "bumpalong"
+       advance  to  the next starting character then happens. Backtracking can
+       occur as usual to the left of (*PRUNE), before it is reached,  or  when
+       matching  to  the  right  of  (*PRUNE), but if there is no match to the
+       right, backtracking cannot cross (*PRUNE). In simple cases, the use  of
+       (*PRUNE)  is just an alternative to an atomic group or possessive quan-
+       tifier, but there are some uses of (*PRUNE) that cannot be expressed in
+       any  other  way. In an anchored pattern (*PRUNE) has the same effect as
+       (*COMMIT).
+
+       The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
+       It is like (*MARK:NAME) in that the name is remembered for passing back
+       to the caller. However, (*SKIP:NAME) searches only for names  set  with
+       (*MARK), ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
+
+         (*SKIP)
+
+       This  verb, when given without a name, is like (*PRUNE), except that if
+       the pattern is unanchored, the "bumpalong" advance is not to  the  next
+       character, but to the position in the subject where (*SKIP) was encoun-
+       tered. (*SKIP) signifies that whatever text was matched leading  up  to
+       it  cannot  be part of a successful match if there is a later mismatch.
+       Consider:
+
+         a+(*SKIP)b
+
+       If the subject is "aaaac...",  after  the  first  match  attempt  fails
+       (starting  at  the  first  character in the string), the starting point
+       skips on to start the next attempt at "c". Note that a possessive quan-
+       tifer  does not have the same effect as this example; although it would
+       suppress backtracking  during  the  first  match  attempt,  the  second
+       attempt  would  start at the second character instead of skipping on to
+       "c".
+
+         (*SKIP:NAME)
+
+       When (*SKIP) has an associated name, its behaviour  is  modified.  When
+       such  a  (*SKIP) is triggered, the previous path through the pattern is
+       searched for the most recent (*MARK) that has the same name. If one  is
+       found,  the  "bumpalong" advance is to the subject position that corre-
+       sponds to that (*MARK) instead of to where (*SKIP) was encountered.  If
+       no (*MARK) with a matching name is found, the (*SKIP) is ignored.
+
+       The  search  for a (*MARK) name uses the normal backtracking mechanism,
+       which means that it does not  see  (*MARK)  settings  that  are  inside
+       atomic groups or assertions, because they are never re-entered by back-
+       tracking. Compare the following pcre2test examples:
+
+           re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
+         data: abc
+          0: a
+          1: a
+         data:
+           re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
+         data: abc
+          0: b
+          1: b
+
+       In the first example, the (*MARK) setting is in an atomic group, so  it
+       is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
+       This allows the second branch of the pattern to be tried at  the  first
+       character  position.  In the second example, the (*MARK) setting is not
+       in an atomic group. This allows (*SKIP:X) to find the (*MARK)  when  it
+       backtracks, and this causes a new matching attempt to start at the sec-
+       ond character. This time, the (*MARK) is never seen  because  "a"  does
+       not match "b", so the matcher immediately jumps to the second branch of
+       the pattern.
+
+       Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME).  It
+       ignores   names  that  are  set  by  (*COMMIT:NAME),  (*PRUNE:NAME)  or
+       (*THEN:NAME).
+
+         (*THEN) or (*THEN:NAME)
+
+       This verb causes a skip to the next innermost  alternative  when  back-
+       tracking  reaches  it.  That  is,  it  cancels any further backtracking
+       within the current alternative. Its name  comes  from  the  observation
+       that it can be used for a pattern-based if-then-else block:
+
+         ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
+
+       If  the COND1 pattern matches, FOO is tried (and possibly further items
+       after the end of the group if FOO succeeds); on  failure,  the  matcher
+       skips  to  the second alternative and tries COND2, without backtracking
+       into COND1. If that succeeds and BAR fails, COND3 is tried.  If  subse-
+       quently  BAZ fails, there are no more alternatives, so there is a back-
+       track to whatever came before the  entire  group.  If  (*THEN)  is  not
+       inside an alternation, it acts like (*PRUNE).
+
+       The  behaviour  of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN).
+       It is like (*MARK:NAME) in that the name is remembered for passing back
+       to  the  caller. However, (*SKIP:NAME) searches only for names set with
+       (*MARK), ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
+
+       A subpattern that does not contain a | character is just a part of  the
+       enclosing  alternative;  it  is  not a nested alternation with only one
+       alternative. The effect of (*THEN) extends beyond such a subpattern  to
+       the  enclosing alternative. Consider this pattern, where A, B, etc. are
+       complex pattern fragments that do not contain any | characters at  this
+       level:
+
+         A (B(*THEN)C) | D
+
+       If  A and B are matched, but there is a failure in C, matching does not
+       backtrack into A; instead it moves to the next alternative, that is, D.
+       However,  if the subpattern containing (*THEN) is given an alternative,
+       it behaves differently:
+
+         A (B(*THEN)C | (*FAIL)) | D
+
+       The effect of (*THEN) is now confined to the inner subpattern. After  a
+       failure in C, matching moves to (*FAIL), which causes the whole subpat-
+       tern to fail because there are no more alternatives  to  try.  In  this
+       case, matching does now backtrack into A.
+
+       Note  that  a  conditional  subpattern  is not considered as having two
+       alternatives, because only one is ever used.  In  other  words,  the  |
+       character in a conditional subpattern has a different meaning. Ignoring
+       white space, consider:
+
+         ^.*? (?(?=a) a | b(*THEN)c )
+
+       If the subject is "ba", this pattern does not  match.  Because  .*?  is
+       ungreedy,  it  initially  matches  zero characters. The condition (?=a)
+       then fails, the character "b" is matched,  but  "c"  is  not.  At  this
+       point,  matching does not backtrack to .*? as might perhaps be expected
+       from the presence of the | character.  The  conditional  subpattern  is
+       part of the single alternative that comprises the whole pattern, and so
+       the match fails. (If there was a backtrack into  .*?,  allowing  it  to
+       match "b", the match would succeed.)
+
+       The  verbs just described provide four different "strengths" of control
+       when subsequent matching fails. (*THEN) is the weakest, carrying on the
+       match  at  the next alternative. (*PRUNE) comes next, failing the match
+       at the current starting position, but allowing an advance to  the  next
+       character  (for an unanchored pattern). (*SKIP) is similar, except that
+       the advance may be more than one character. (*COMMIT) is the strongest,
+       causing the entire match to fail.
+
+   More than one backtracking verb
+
+       If  more  than  one  backtracking verb is present in a pattern, the one
+       that is backtracked onto first acts. For example,  consider  this  pat-
+       tern, where A, B, etc. are complex pattern fragments:
+
+         (A(*COMMIT)B(*THEN)C|ABD)
+
+       If  A matches but B fails, the backtrack to (*COMMIT) causes the entire
+       match to fail. However, if A and B match, but C fails, the backtrack to
+       (*THEN)  causes  the next alternative (ABD) to be tried. This behaviour
+       is consistent, but is not always the same as Perl's. It means  that  if
+       two  or  more backtracking verbs appear in succession, all the the last
+       of them has no effect. Consider this example:
+
+         ...(*COMMIT)(*PRUNE)...
+
+       If there is a matching failure to the right, backtracking onto (*PRUNE)
+       causes  it to be triggered, and its action is taken. There can never be
+       a backtrack onto (*COMMIT).
+
+   Backtracking verbs in repeated groups
+
+       PCRE2 sometimes differs from Perl in its handling of backtracking verbs
+       in repeated groups. For example, consider:
+
+         /(a(*COMMIT)b)+ac/
+
+       If  the  subject  is  "abac", Perl matches unless its optimizations are
+       disabled, but PCRE2 always fails because the (*COMMIT)  in  the  second
+       repeat of the group acts.
+
+   Backtracking verbs in assertions
+
+       (*FAIL)  in any assertion has its normal effect: it forces an immediate
+       backtrack. The behaviour of the other  backtracking  verbs  depends  on
+       whether  or  not the assertion is standalone or acting as the condition
+       in a conditional subpattern.
+
+       (*ACCEPT) in a standalone positive assertion causes  the  assertion  to
+       succeed  without any further processing; captured strings and a (*MARK)
+       name (if  set)  are  retained.  In  a  standalone  negative  assertion,
+       (*ACCEPT)  causes the assertion to fail without any further processing;
+       captured substrings and any (*MARK) name are discarded.
+
+       If the assertion is a condition, (*ACCEPT) causes the condition  to  be
+       true  for  a  positive assertion and false for a negative one; captured
+       substrings are retained in both cases.
+
+       The remaining verbs act only when a later failure causes a backtrack to
+       reach  them. This means that their effect is confined to the assertion,
+       because lookaround assertions are atomic. A backtrack that occurs after
+       an assertion is complete does not jump back into the assertion. Note in
+       particular that a (*MARK) name that is  set  in  an  assertion  is  not
+       "seen" by an instance of (*SKIP:NAME) latter in the pattern.
+
+       The  effect of (*THEN) is not allowed to escape beyond an assertion. If
+       there are no more branches to try, (*THEN) causes a positive  assertion
+       to be false, and a negative assertion to be true.
+
+       The  other  backtracking verbs are not treated specially if they appear
+       in a standalone positive assertion. In a  conditional  positive  asser-
+       tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
+       or (*PRUNE) causes the condition to be false. However, for both  stand-
+       alone and conditional negative assertions, backtracking into (*COMMIT),
+       (*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
+       ing any further alternative branches.
+
+   Backtracking verbs in subroutines
+
+       These  behaviours  occur whether or not the subpattern is called recur-
+       sively.
+
+       (*ACCEPT) in a subpattern called as a subroutine causes the  subroutine
+       match  to succeed without any further processing. Matching then contin-
+       ues after the subroutine call. Perl documents  this  behaviour.  Perl's
+       treatment of the other verbs in subroutines is different in some cases.
+
+       (*FAIL)  in  a subpattern called as a subroutine has its normal effect:
+       it forces an immediate backtrack.
+
+       (*COMMIT), (*SKIP), and (*PRUNE) cause the  subroutine  match  to  fail
+       when triggered by being backtracked to in a subpattern called as a sub-
+       routine. There is then a backtrack at the outer level.
+
+       (*THEN), when triggered, skips to the next alternative in the innermost
+       enclosing group within the subpattern that has alternatives (its normal
+       behaviour). However, if there is no such group  within  the  subroutine
+       subpattern,  the subroutine match fails and there is a backtrack at the
+       outer level.
+
+
+SEE ALSO
+
+       pcre2api(3),   pcre2callout(3),    pcre2matching(3),    pcre2syntax(3),
+       pcre2(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 04 September 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2PERFORM(3)            Library Functions Manual            PCRE2PERFORM(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+PCRE2 PERFORMANCE
+
+       Two  aspects  of performance are discussed below: memory usage and pro-
+       cessing time. The way you express your pattern as a regular  expression
+       can affect both of them.
+
+
+COMPILED PATTERN MEMORY USAGE
+
+       Patterns are compiled by PCRE2 into a reasonably efficient interpretive
+       code, so that most simple patterns do not use much memory  for  storing
+       the compiled version. However, there is one case where the memory usage
+       of a compiled pattern can be unexpectedly  large.  If  a  parenthesized
+       subpattern has a quantifier with a minimum greater than 1 and/or a lim-
+       ited maximum, the whole subpattern is repeated in  the  compiled  code.
+       For example, the pattern
+
+         (abc|def){2,4}
+
+       is compiled as if it were
+
+         (abc|def)(abc|def)((abc|def)(abc|def)?)?
+
+       (Technical  aside:  It is done this way so that backtrack points within
+       each of the repetitions can be independently maintained.)
+
+       For regular expressions whose quantifiers use only small numbers,  this
+       is  not  usually a problem. However, if the numbers are large, and par-
+       ticularly if such repetitions are nested, the memory usage  can  become
+       an embarrassment. For example, the very simple pattern
+
+         ((ab){1,1000}c){1,3}
+
+       uses  over  50KiB  when compiled using the 8-bit library. When PCRE2 is
+       compiled with its default internal pointer size of two bytes, the  size
+       limit on a compiled pattern is 65535 code units in the 8-bit and 16-bit
+       libraries, and this is reached with the above pattern if the outer rep-
+       etition  is  increased from 3 to 4. PCRE2 can be compiled to use larger
+       internal pointers and thus handle larger compiled patterns, but  it  is
+       better to try to rewrite your pattern to use less memory if you can.
+
+       One  way  of reducing the memory usage for such patterns is to make use
+       of PCRE2's "subroutine" facility. Re-writing the above pattern as
+
+         ((ab)(?2){0,999}c)(?1){0,2}
+
+       reduces the memory requirements to around 16KiB, and indeed it  remains
+       under  20KiB  even with the outer repetition increased to 100. However,
+       this kind of pattern is not always exactly equivalent, because any cap-
+       tures  within  subroutine calls are lost when the subroutine completes.
+       If this is not a problem, this kind of  rewriting  will  allow  you  to
+       process  patterns that PCRE2 cannot otherwise handle. The matching per-
+       formance of the two different versions of the pattern are  roughly  the
+       same.  (This applies from release 10.30 - things were different in ear-
+       lier releases.)
+
+
+STACK AND HEAP USAGE AT RUN TIME
+
+       From release 10.30, the interpretive (non-JIT) version of pcre2_match()
+       uses  very  little system stack at run time. In earlier releases recur-
+       sive function calls could use a great deal of  stack,  and  this  could
+       cause  problems, but this usage has been eliminated. Backtracking posi-
+       tions are now explicitly remembered in memory frames controlled by  the
+       code.  An  initial  20KiB  vector  of frames is allocated on the system
+       stack (enough for about 100 frames for small patterns), but if this  is
+       insufficient,  heap  memory  is  used. The amount of heap memory can be
+       limited; if the limit is set to zero, only the initial stack vector  is
+       used.  Rewriting patterns to be time-efficient, as described below, may
+       also reduce the memory requirements.
+
+       In contrast to  pcre2_match(),  pcre2_dfa_match()  does  use  recursive
+       function  calls,  but  only  for  processing  atomic groups, lookaround
+       assertions, and recursion within the pattern. The original  version  of
+       the code used to allocate quite large internal workspace vectors on the
+       stack, which caused some problems for  some  patterns  in  environments
+       with  small  stacks.  From release 10.32 the code for pcre2_dfa_match()
+       has been re-factored to use heap memory  when  necessary  for  internal
+       workspace  when  recursing,  though  recursive function calls are still
+       used.
+
+       The "match depth" parameter can be used to limit the depth of  function
+       recursion,  and  the  "match  heap"  parameter  to limit heap memory in
+       pcre2_dfa_match().
+
+
+PROCESSING TIME
+
+       Certain items in regular expression patterns are processed  more  effi-
+       ciently than others. It is more efficient to use a character class like
+       [aeiou]  than  a  set  of   single-character   alternatives   such   as
+       (a|e|i|o|u).  In  general,  the simplest construction that provides the
+       required behaviour is usually the most efficient. Jeffrey Friedl's book
+       contains  a  lot  of useful general discussion about optimizing regular
+       expressions for efficient performance. This  document  contains  a  few
+       observations about PCRE2.
+
+       Using  Unicode  character  properties  (the  \p, \P, and \X escapes) is
+       slow, because PCRE2 has to use a multi-stage table lookup  whenever  it
+       needs  a  character's  property. If you can find an alternative pattern
+       that does not use character properties, it will probably be faster.
+
+       By default, the escape sequences \b, \d, \s,  and  \w,  and  the  POSIX
+       character  classes  such  as  [:alpha:]  do not use Unicode properties,
+       partly for backwards compatibility, and partly for performance reasons.
+       However,  you  can  set  the PCRE2_UCP option or start the pattern with
+       (*UCP) if you want Unicode character properties to be  used.  This  can
+       double  the  matching  time  for  items  such  as \d, when matched with
+       pcre2_match(); the performance loss is less with a DFA  matching  func-
+       tion, and in both cases there is not much difference for \b.
+
+       When  a pattern begins with .* not in atomic parentheses, nor in paren-
+       theses that are the subject of a backreference,  and  the  PCRE2_DOTALL
+       option  is  set,  the pattern is implicitly anchored by PCRE2, since it
+       can match only at the start of a subject string.  If  the  pattern  has
+       multiple top-level branches, they must all be anchorable. The optimiza-
+       tion can be disabled by  the  PCRE2_NO_DOTSTAR_ANCHOR  option,  and  is
+       automatically disabled if the pattern contains (*PRUNE) or (*SKIP).
+
+       If  PCRE2_DOTALL  is  not  set,  PCRE2  cannot  make this optimization,
+       because the dot metacharacter does not then match a newline, and if the
+       subject  string contains newlines, the pattern may match from the char-
+       acter immediately following one of them instead of from the very start.
+       For example, the pattern
+
+         .*second
+
+       matches  the subject "first\nand second" (where \n stands for a newline
+       character), with the match starting at the seventh character. In  order
+       to  do  this, PCRE2 has to retry the match starting after every newline
+       in the subject.
+
+       If you are using such a pattern with subject strings that do  not  con-
+       tain   newlines,   the   best   performance   is  obtained  by  setting
+       PCRE2_DOTALL, or starting the pattern with  ^.*  or  ^.*?  to  indicate
+       explicit anchoring. That saves PCRE2 from having to scan along the sub-
+       ject looking for a newline to restart at.
+
+       Beware of patterns that contain nested indefinite  repeats.  These  can
+       take  a  long time to run when applied to a string that does not match.
+       Consider the pattern fragment
+
+         ^(a+)*
+
+       This can match "aaaa" in 16 different ways, and this  number  increases
+       very  rapidly  as the string gets longer. (The * repeat can match 0, 1,
+       2, 3, or 4 times, and for each of those cases other than 0 or 4, the  +
+       repeats  can  match  different numbers of times.) When the remainder of
+       the pattern is such that the entire match is going to fail,  PCRE2  has
+       in  principle  to  try  every  possible variation, and this can take an
+       extremely long time, even for relatively short strings.
+
+       An optimization catches some of the more simple cases such as
+
+         (a+)*b
+
+       where a literal character follows. Before  embarking  on  the  standard
+       matching  procedure, PCRE2 checks that there is a "b" later in the sub-
+       ject string, and if there is not, it fails the match immediately.  How-
+       ever,  when  there  is no following literal this optimization cannot be
+       used. You can see the difference by comparing the behaviour of
+
+         (a+)*\d
+
+       with the pattern above. The former gives  a  failure  almost  instantly
+       when  applied  to  a  whole  line of "a" characters, whereas the latter
+       takes an appreciable time with strings longer than about 20 characters.
+
+       In many cases, the solution to this kind of performance issue is to use
+       an  atomic group or a possessive quantifier. This can often reduce mem-
+       ory requirements as well. As another example, consider this pattern:
+
+         ([^<]|<(?!inet))+
+
+       It matches from wherever it starts until it encounters "<inet"  or  the
+       end  of  the  data,  and is the kind of pattern that might be used when
+       processing an XML file. Each iteration of the outer parentheses matches
+       either  one  character that is not "<" or a "<" that is not followed by
+       "inet". However, each time a parenthesis is processed,  a  backtracking
+       position  is  passed,  so this formulation uses a memory frame for each
+       matched character. For a long string, a lot of memory is required. Con-
+       sider  now  this  rewritten  pattern,  which  matches  exactly the same
+       strings:
+
+         ([^<]++|<(?!inet))+
+
+       This runs much faster, because sequences of characters that do not con-
+       tain "<" are "swallowed" in one item inside the parentheses, and a pos-
+       sessive quantifier is used to stop any backtracking into  the  runs  of
+       non-"<"  characters.  This  version also uses a lot less memory because
+       entry to a new set of parentheses happens only  when  a  "<"  character
+       that  is  not  followed by "inet" is encountered (and we assume this is
+       relatively rare).
+
+       This example shows that one way of optimizing performance when matching
+       long  subject strings is to write repeated parenthesized subpatterns to
+       match more than one character whenever possible.
+
+   SETTING RESOURCE LIMITS
+
+       You can set limits on the amount of processing that  takes  place  when
+       matching,  and  on  the amount of heap memory that is used. The default
+       values of the limits are very large, and unlikely ever to operate. They
+       can  be  changed  when  PCRE2  is  built, and they can also be set when
+       pcre2_match() or pcre2_dfa_match() is  called.  For  details  of  these
+       interfaces,  see  the pcre2build documentation and the section entitled
+       "The match context" in the pcre2api documentation.
+
+       The pcre2test test program has a modifier called  "find_limits"  which,
+       if  applied  to  a  subject line, causes it to find the smallest limits
+       that allow a pattern to match. This is done by repeatedly matching with
+       different limits.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 25 April 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2POSIX(3)              Library Functions Manual              PCRE2POSIX(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+SYNOPSIS
+
+       #include <pcre2posix.h>
+
+       int regcomp(regex_t *preg, const char *pattern,
+            int cflags);
+
+       int regexec(const regex_t *preg, const char *string,
+            size_t nmatch, regmatch_t pmatch[], int eflags);
+
+       size_t regerror(int errcode, const regex_t *preg,
+            char *errbuf, size_t errbuf_size);
+
+       void regfree(regex_t *preg);
+
+
+DESCRIPTION
+
+       This  set of functions provides a POSIX-style API for the PCRE2 regular
+       expression 8-bit library. See the pcre2api documentation for a descrip-
+       tion  of PCRE2's native API, which contains much additional functional-
+       ity. There are no POSIX-style wrappers for PCRE2's  16-bit  and  32-bit
+       libraries.
+
+       The functions described here are just wrapper functions that ultimately
+       call the  PCRE2  native  API.  Their  prototypes  are  defined  in  the
+       pcre2posix.h  header  file,  and  on Unix systems the library itself is
+       called libpcre2-posix.a, so can be accessed by adding -lpcre2-posix  to
+       the  command  for  linking  an  application that uses them. Because the
+       POSIX functions call the native ones,  it  is  also  necessary  to  add
+       -lpcre2-8.
+
+       Those  POSIX  option bits that can reasonably be mapped to PCRE2 native
+       options have been implemented. In addition, the option REG_EXTENDED  is
+       defined  with  the  value  zero. This has no effect, but since programs
+       that are written to the POSIX interface often use  it,  this  makes  it
+       easier  to  slot in PCRE2 as a replacement library. Other POSIX options
+       are not even defined.
+
+       There are also some options that are not defined by POSIX.  These  have
+       been  added  at  the  request  of users who want to make use of certain
+       PCRE2-specific features via the POSIX calling interface or to  add  BSD
+       or GNU functionality.
+
+       When  PCRE2  is  called via these functions, it is only the API that is
+       POSIX-like in style. The syntax and semantics of  the  regular  expres-
+       sions  themselves  are  still  those of Perl, subject to the setting of
+       various PCRE2 options, as described below. "POSIX-like in style"  means
+       that  the  API  approximates  to  the POSIX definition; it is not fully
+       POSIX-compatible, and in multi-unit encoding  domains  it  is  probably
+       even less compatible.
+
+       The header for these functions is supplied as pcre2posix.h to avoid any
+       potential clash with other POSIX  libraries.  It  can,  of  course,  be
+       renamed or aliased as regex.h, which is the "correct" name. It provides
+       two structure types, regex_t for  compiled  internal  forms,  and  reg-
+       match_t  for  returning  captured substrings. It also defines some con-
+       stants whose names start  with  "REG_";  these  are  used  for  setting
+       options and identifying error codes.
+
+
+COMPILING A PATTERN
+
+       The  function regcomp() is called to compile a pattern into an internal
+       form. By default, the pattern is a C string terminated by a binary zero
+       (but  see  REG_PEND below). The preg argument is a pointer to a regex_t
+       structure that is used as a base for storing information about the com-
+       piled  regular  expression. (It is also used for input when REG_PEND is
+       set.)
+
+       The argument cflags is either zero, or contains one or more of the bits
+       defined by the following macros:
+
+         REG_DOTALL
+
+       The  PCRE2_DOTALL  option  is set when the regular expression is passed
+       for compilation to the native function. Note  that  REG_DOTALL  is  not
+       part of the POSIX standard.
+
+         REG_ICASE
+
+       The  PCRE2_CASELESS option is set when the regular expression is passed
+       for compilation to the native function.
+
+         REG_NEWLINE
+
+       The PCRE2_MULTILINE option is set when the regular expression is passed
+       for  compilation  to the native function. Note that this does not mimic
+       the defined POSIX behaviour for REG_NEWLINE  (see  the  following  sec-
+       tion).
+
+         REG_NOSPEC
+
+       The  PCRE2_LITERAL  option is set when the regular expression is passed
+       for compilation to the native function. This disables all meta  charac-
+       ters  in the pattern, causing it to be treated as a literal string. The
+       only other options that are  allowed  with  REG_NOSPEC  are  REG_ICASE,
+       REG_NOSUB,  REG_PEND,  and REG_UTF. Note that REG_NOSPEC is not part of
+       the POSIX standard.
+
+         REG_NOSUB
+
+       When a pattern that is compiled with this flag is passed  to  regexec()
+       for  matching, the nmatch and pmatch arguments are ignored, and no cap-
+       tured strings are returned. Versions of the PCRE library prior to 10.22
+       used  to  set  the  PCRE2_NO_AUTO_CAPTURE  compile  option, but this no
+       longer happens because it disables the use of backreferences.
+
+         REG_PEND
+
+       If this option is set, the reg_endp field in the preg structure  (which
+       has the type const char *) must be set to point to the character beyond
+       the end of the pattern before calling regcomp(). The pattern itself may
+       now contain binary zeros, which are treated as data characters. Without
+       REG_PEND, a binary zero terminates the pattern and the re_endp field is
+       ignored.  This  is  a GNU extension to the POSIX standard and should be
+       used with caution in software intended to be portable to other systems.
+
+         REG_UCP
+
+       The PCRE2_UCP option is set when the regular expression is  passed  for
+       compilation  to  the  native function. This causes PCRE2 to use Unicode
+       properties when matchine \d, \w,  etc.,  instead  of  just  recognizing
+       ASCII values. Note that REG_UCP is not part of the POSIX standard.
+
+         REG_UNGREEDY
+
+       The  PCRE2_UNGREEDY option is set when the regular expression is passed
+       for compilation to the native function. Note that REG_UNGREEDY  is  not
+       part of the POSIX standard.
+
+         REG_UTF
+
+       The  PCRE2_UTF  option is set when the regular expression is passed for
+       compilation to the native function. This causes the pattern itself  and
+       all  data  strings used for matching it to be treated as UTF-8 strings.
+       Note that REG_UTF is not part of the POSIX standard.
+
+       In the absence of these flags, no options  are  passed  to  the  native
+       function.   This  means  the  the  regex is compiled with PCRE2 default
+       semantics. In particular, the way it handles newline characters in  the
+       subject  string  is  the Perl way, not the POSIX way. Note that setting
+       PCRE2_MULTILINE has only some of the effects specified for REG_NEWLINE.
+       It  does not affect the way newlines are matched by the dot metacharac-
+       ter (they are not) or by a negative class such as [^a] (they are).
+
+       The yield of regcomp() is zero on success, and non-zero otherwise.  The
+       preg  structure  is  filled  in on success, and one other member of the
+       structure (as well as re_endp) is public: re_nsub contains  the  number
+       of capturing subpatterns in the regular expression. Various error codes
+       are defined in the header file.
+
+       NOTE: If the yield of regcomp() is non-zero, you must  not  attempt  to
+       use the contents of the preg structure. If, for example, you pass it to
+       regexec(), the result is undefined and your program is likely to crash.
+
+
+MATCHING NEWLINE CHARACTERS
+
+       This area is not simple, because POSIX and Perl take different views of
+       things.   It  is not possible to get PCRE2 to obey POSIX semantics, but
+       then PCRE2 was never intended to be a POSIX engine. The following table
+       lists  the  different  possibilities for matching newline characters in
+       Perl and PCRE2:
+
+                                 Default   Change with
+
+         . matches newline          no     PCRE2_DOTALL
+         newline matches [^a]       yes    not changeable
+         $ matches \n at end        yes    PCRE2_DOLLAR_ENDONLY
+         $ matches \n in middle     no     PCRE2_MULTILINE
+         ^ matches \n in middle     no     PCRE2_MULTILINE
+
+       This is the equivalent table for a POSIX-compatible pattern matcher:
+
+                                 Default   Change with
+
+         . matches newline          yes    REG_NEWLINE
+         newline matches [^a]       yes    REG_NEWLINE
+         $ matches \n at end        no     REG_NEWLINE
+         $ matches \n in middle     no     REG_NEWLINE
+         ^ matches \n in middle     no     REG_NEWLINE
+
+       This behaviour is not what happens when PCRE2 is called via  its  POSIX
+       API.  By  default, PCRE2's behaviour is the same as Perl's, except that
+       there is no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both  PCRE2
+       and Perl, there is no way to stop newline from matching [^a].
+
+       Default  POSIX newline handling can be obtained by setting PCRE2_DOTALL
+       and PCRE2_DOLLAR_ENDONLY when  calling  pcre2_compile()  directly,  but
+       there  is  no  way  to make PCRE2 behave exactly as for the REG_NEWLINE
+       action. When using the POSIX API, passing REG_NEWLINE to  PCRE2's  reg-
+       comp() function causes PCRE2_MULTILINE to be passed to pcre2_compile(),
+       and REG_DOTALL passes PCRE2_DOTALL. There is no way to pass  PCRE2_DOL-
+       LAR_ENDONLY.
+
+
+MATCHING A PATTERN
+
+       The  function  regexec()  is  called  to  match a compiled pattern preg
+       against a given string, which is by default terminated by a  zero  byte
+       (but  see  REG_STARTEND below), subject to the options in eflags. These
+       can be:
+
+         REG_NOTBOL
+
+       The PCRE2_NOTBOL option is set when calling the underlying PCRE2 match-
+       ing function.
+
+         REG_NOTEMPTY
+
+       The  PCRE2_NOTEMPTY  option  is  set  when calling the underlying PCRE2
+       matching function. Note that REG_NOTEMPTY is  not  part  of  the  POSIX
+       standard.  However, setting this option can give more POSIX-like behav-
+       iour in some situations.
+
+         REG_NOTEOL
+
+       The PCRE2_NOTEOL option is set when calling the underlying PCRE2 match-
+       ing function.
+
+         REG_STARTEND
+
+       When  this  option  is  set,  the  subject  string  starts  at string +
+       pmatch[0].rm_so and ends at  string  +  pmatch[0].rm_eo,  which  should
+       point  to  the  first  character beyond the string. There may be binary
+       zeros within the subject string, and indeed, using REG_STARTEND is  the
+       only way to pass a subject string that contains a binary zero.
+
+       Whatever  the  value  of  pmatch[0].rm_so,  the  offsets of the matched
+       string and any captured substrings are  still  given  relative  to  the
+       start  of  string  itself. (Before PCRE2 release 10.30 these were given
+       relative to string +  pmatch[0].rm_so,  but  this  differs  from  other
+       implementations.)
+
+       This  is  a  BSD  extension,  compatible with but not specified by IEEE
+       Standard 1003.2 (POSIX.2), and should be used with caution in  software
+       intended  to  be  portable to other systems. Note that a non-zero rm_so
+       does not imply REG_NOTBOL; REG_STARTEND affects only the  location  and
+       length  of  the string, not how it is matched. Setting REG_STARTEND and
+       passing pmatch as NULL are mutually exclusive; the error REG_INVARG  is
+       returned.
+
+       If  the pattern was compiled with the REG_NOSUB flag, no data about any
+       matched strings  is  returned.  The  nmatch  and  pmatch  arguments  of
+       regexec() are ignored (except possibly as input for REG_STARTEND).
+
+       The  value  of  nmatch  may  be  zero, and the value pmatch may be NULL
+       (unless REG_STARTEND is set); in both these cases  no  data  about  any
+       matched strings is returned.
+
+       Otherwise,  the  portion  of  the string that was matched, and also any
+       captured substrings, are returned via the pmatch argument, which points
+       to  an  array  of  nmatch structures of type regmatch_t, containing the
+       members rm_so and rm_eo. These contain the byte  offset  to  the  first
+       character of each substring and the offset to the first character after
+       the end of each substring, respectively. The 0th element of the  vector
+       relates  to  the  entire portion of string that was matched; subsequent
+       elements relate to the capturing subpatterns of the regular expression.
+       Unused entries in the array have both structure members set to -1.
+
+       A  successful  match  yields  a  zero  return;  various error codes are
+       defined in the header file, of  which  REG_NOMATCH  is  the  "expected"
+       failure code.
+
+
+ERROR MESSAGES
+
+       The regerror() function maps a non-zero errorcode from either regcomp()
+       or regexec() to a printable message. If preg is  not  NULL,  the  error
+       should have arisen from the use of that structure. A message terminated
+       by a binary zero is placed in errbuf. If the buffer is too short,  only
+       the first errbuf_size - 1 characters of the error message are used. The
+       yield of the function is the size of buffer needed to  hold  the  whole
+       message,  including  the  terminating  zero. This value is greater than
+       errbuf_size if the message was truncated.
+
+
+MEMORY USAGE
+
+       Compiling a regular expression causes memory to be allocated and  asso-
+       ciated  with  the preg structure. The function regfree() frees all such
+       memory, after which preg may no longer be used as  a  compiled  expres-
+       sion.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 15 June 2017
+       Copyright (c) 1997-2017 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2SAMPLE(3)             Library Functions Manual             PCRE2SAMPLE(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+PCRE2 SAMPLE PROGRAM
+
+       A  simple, complete demonstration program to get you started with using
+       PCRE2 is supplied in the file pcre2demo.c in the src directory  in  the
+       PCRE2 distribution. A listing of this program is given in the pcre2demo
+       documentation. If you do not have a copy of the PCRE2 distribution, you
+       can save this listing to re-create the contents of pcre2demo.c.
+
+       The  demonstration  program compiles the regular expression that is its
+       first argument, and matches it against the subject string in its second
+       argument.  No  PCRE2  options are set, and default character tables are
+       used. If matching succeeds, the program outputs the portion of the sub-
+       ject  that  matched,  together  with  the contents of any captured sub-
+       strings.
+
+       If the -g option is given on the command line, the program then goes on
+       to check for further matches of the same regular expression in the same
+       subject string. The logic is a little bit tricky because of the  possi-
+       bility  of  matching an empty string. Comments in the code explain what
+       is going on.
+
+       The code in pcre2demo.c is an 8-bit program that uses the  PCRE2  8-bit
+       library.  It  handles  strings  and characters that are stored in 8-bit
+       code units.  By default, one character corresponds to  one  code  unit,
+       but  if  the  pattern starts with "(*UTF)", both it and the subject are
+       treated as UTF-8 strings, where characters  may  occupy  multiple  code
+       units.
+
+       If  PCRE2  is installed in the standard include and library directories
+       for your operating system, you should be able to compile the demonstra-
+       tion program using a command like this:
+
+         cc -o pcre2demo pcre2demo.c -lpcre2-8
+
+       If PCRE2 is installed elsewhere, you may need to add additional options
+       to the command line. For example, on a Unix-like system that has  PCRE2
+       installed  in  /usr/local,  you  can  compile the demonstration program
+       using a command like this:
+
+         cc -o pcre2demo -I/usr/local/include pcre2demo.c \
+            -L/usr/local/lib -lpcre2-8
+
+       Once you have built the demonstration program, you can run simple tests
+       like this:
+
+         ./pcre2demo 'cat|dog' 'the cat sat on the mat'
+         ./pcre2demo -g 'cat|dog' 'the dog sat on the cat'
+
+       Note  that  there  is  a  much  more comprehensive test program, called
+       pcre2test, which supports many  more  facilities  for  testing  regular
+       expressions using all three PCRE2 libraries (8-bit, 16-bit, and 32-bit,
+       though not all three need be installed). The pcre2demo program is  pro-
+       vided as a relatively simple coding example.
+
+       If you try to run pcre2demo when PCRE2 is not installed in the standard
+       library directory, you may get an error like  this  on  some  operating
+       systems (e.g. Solaris):
+
+         ld.so.1: pcre2demo: fatal: libpcre2-8.so.0: open failed: No such file
+       or directory
+
+       This is caused by the way shared library support works  on  those  sys-
+       tems. You need to add
+
+         -R/usr/local/lib
+
+       (for example) to the compile command to get round this problem.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 02 February 2016
+       Copyright (c) 1997-2016 University of Cambridge.
+------------------------------------------------------------------------------
+PCRE2SERIALIZE(3)          Library Functions Manual          PCRE2SERIALIZE(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS
+
+       int32_t pcre2_serialize_decode(pcre2_code **codes,
+         int32_t number_of_codes, const uint32_t *bytes,
+         pcre2_general_context *gcontext);
+
+       int32_t pcre2_serialize_encode(pcre2_code **codes,
+         int32_t number_of_codes, uint32_t **serialized_bytes,
+         PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
+
+       void pcre2_serialize_free(uint8_t *bytes);
+
+       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
+
+       If  you  are running an application that uses a large number of regular
+       expression patterns, it may be useful to store them  in  a  precompiled
+       form  instead  of  having to compile them every time the application is
+       run. However, if you are using the just-in-time  optimization  feature,
+       it is not possible to save and reload the JIT data, because it is posi-
+       tion-dependent. The host on which the patterns  are  reloaded  must  be
+       running  the  same version of PCRE2, with the same code unit width, and
+       must also have the same endianness, pointer width and PCRE2_SIZE  type.
+       For  example, patterns compiled on a 32-bit system using PCRE2's 16-bit
+       library cannot be reloaded on a 64-bit system, nor can they be reloaded
+       using the 8-bit library.
+
+       Note  that  "serialization" in PCRE2 does not convert compiled patterns
+       to an abstract format like Java or .NET serialization.  The  serialized
+       output  is  really  just  a  bytecode dump, which is why it can only be
+       reloaded in the same environment as the one that created it. Hence  the
+       restrictions  mentioned  above.   Applications  that are not statically
+       linked with a fixed version of PCRE2 must be prepared to recompile pat-
+       terns from their sources, in order to be immune to PCRE2 upgrades.
+
+
+SECURITY CONCERNS
+
+       The facility for saving and restoring compiled patterns is intended for
+       use within individual applications.  As  such,  the  data  supplied  to
+       pcre2_serialize_decode()  is expected to be trusted data, not data from
+       arbitrary external sources.  There  is  only  some  simple  consistency
+       checking, not complete validation of what is being re-loaded. Corrupted
+       data may cause undefined results. For example, if the length field of a
+       pattern in the serialized data is corrupted, the deserializing code may
+       read beyond the end of the byte stream that is passed to it.
+
+
+SAVING COMPILED PATTERNS
+
+       Before compiled patterns can be saved they must be serialized, which in
+       PCRE2  means converting the pattern to a stream of bytes. A single byte
+       stream may contain any number of compiled patterns, but they  must  all
+       use  the same character tables. A single copy of the tables is included
+       in the byte stream (its size is 1088 bytes). For more details of  char-
+       acter  tables,  see the section on locale support in the pcre2api docu-
+       mentation.
+
+       The function pcre2_serialize_encode() creates a serialized byte  stream
+       from  a  list of compiled patterns. Its first two arguments specify the
+       list, being a pointer to a vector of pointers to compiled patterns, and
+       the length of the vector. The third and fourth arguments point to vari-
+       ables which are set to point to the created byte stream and its length,
+       respectively.  The  final  argument  is a pointer to a general context,
+       which can be used to specify custom memory  mangagement  functions.  If
+       this  argument  is NULL, malloc() is used to obtain memory for the byte
+       stream. The yield of the function is the number of serialized patterns,
+       or one of the following negative error codes:
+
+         PCRE2_ERROR_BADDATA      the number of patterns is zero or less
+         PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
+         PCRE2_ERROR_MEMORY       memory allocation failed
+         PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
+         PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
+
+       PCRE2_ERROR_BADMAGIC  means  either that a pattern's code has been cor-
+       rupted, or that a slot in the vector does not point to a compiled  pat-
+       tern.
+
+       Once a set of patterns has been serialized you can save the data in any
+       appropriate manner. Here is sample code that compiles two patterns  and
+       writes them to a file. It assumes that the variable fd refers to a file
+       that is open for output. The error checking that should be present in a
+       real application has been omitted for simplicity.
+
+         int errorcode;
+         uint8_t *bytes;
+         PCRE2_SIZE erroroffset;
+         PCRE2_SIZE bytescount;
+         pcre2_code *list_of_codes[2];
+         list_of_codes[0] = pcre2_compile("first pattern",
+           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
+         list_of_codes[1] = pcre2_compile("second pattern",
+           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
+         errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
+           &bytescount, NULL);
+         errorcode = fwrite(bytes, 1, bytescount, fd);
+
+       Note  that  the  serialized data is binary data that may contain any of
+       the 256 possible byte  values.  On  systems  that  make  a  distinction
+       between binary and non-binary data, be sure that the file is opened for
+       binary output.
+
+       Serializing a set of patterns leaves the original  data  untouched,  so
+       they  can  still  be used for matching. Their memory must eventually be
+       freed in the usual way by calling pcre2_code_free(). When you have fin-
+       ished with the byte stream, it too must be freed by calling pcre2_seri-
+       alize_free(). If this function is  called  with  a  NULL  argument,  it
+       returns immediately without doing anything.
+
+
+RE-USING PRECOMPILED PATTERNS
+
+       In  order  to  re-use  a  set of saved patterns you must first make the
+       serialized byte stream available in main memory (for example, by  read-
+       ing  from  a  file).  The  management of this memory block is up to the
+       application.  You  can  use  the  pcre2_serialize_get_number_of_codes()
+       function  to  find out how many compiled patterns are in the serialized
+       data without actually decoding the patterns:
+
+         uint8_t *bytes = <serialized data>;
+         int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
+
+       The pcre2_serialize_decode() function reads a byte stream and recreates
+       the compiled patterns in new memory blocks, setting pointers to them in
+       a vector. The first two arguments are a pointer to  a  suitable  vector
+       and  its  length,  and  the third argument points to a byte stream. The
+       final argument is a pointer to a general context, which can be used  to
+       specify  custom  memory mangagement functions for the decoded patterns.
+       If this argument is NULL, malloc() and free() are used. After deserial-
+       ization, the byte stream is no longer needed and can be discarded.
+
+         int32_t number_of_codes;
+         pcre2_code *list_of_codes[2];
+         uint8_t *bytes = <serialized data>;
+         int32_t number_of_codes =
+           pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
+
+       If  the  vector  is  not  large enough for all the patterns in the byte
+       stream, it is filled  with  those  that  fit,  and  the  remainder  are
+       ignored.  The  yield of the function is the number of decoded patterns,
+       or one of the following negative error codes:
+
+         PCRE2_ERROR_BADDATA    second argument is zero or less
+         PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
+         PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
+         PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
+         PCRE2_ERROR_MEMORY     memory allocation failed
+         PCRE2_ERROR_NULL       first or third argument is NULL
+
+       PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it  was
+       compiled on a system with different endianness.
+
+       Decoded patterns can be used for matching in the usual way, and must be
+       freed by calling pcre2_code_free(). However, be aware that there  is  a
+       potential  race  issue  if  you  are  using multiple patterns that were
+       decoded from a single byte stream in  a  multithreaded  application.  A
+       single copy of the character tables is used by all the decoded patterns
+       and a reference count is used to arrange for its memory to be automati-
+       cally  freed when the last pattern is freed, but there is no locking on
+       this reference count. Therefore, if you want to call  pcre2_code_free()
+       for  these  patterns  in  different  threads, you must arrange your own
+       locking, and ensure that pcre2_code_free()  cannot  be  called  by  two
+       threads at the same time.
+
+       If  a pattern was processed by pcre2_jit_compile() before being serial-
+       ized, the JIT data is discarded and so is no longer available  after  a
+       save/restore  cycle.  You can, however, process a restored pattern with
+       pcre2_jit_compile() if you wish.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 27 June 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2SYNTAX(3)             Library Functions Manual             PCRE2SYNTAX(3)
+
+
+
+NAME
+       PCRE2 - Perl-compatible regular expressions (revised API)
+
+PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY
+
+       The  full syntax and semantics of the regular expressions that are sup-
+       ported by PCRE2 are described in the pcre2pattern  documentation.  This
+       document contains a quick-reference summary of the syntax.
+
+
+QUOTING
+
+         \x         where x is non-alphanumeric is a literal x
+         \Q...\E    treat enclosed characters as literal
+
+
+ESCAPED CHARACTERS
+
+       This table applies to ASCII and Unicode environments.
+
+         \a         alarm, that is, the BEL character (hex 07)
+         \cx        "control-x", where x is any ASCII printing character
+         \e         escape (hex 1B)
+         \f         form feed (hex 0C)
+         \n         newline (hex 0A)
+         \r         carriage return (hex 0D)
+         \t         tab (hex 09)
+         \0dd       character with octal code 0dd
+         \ddd       character with octal code ddd, or backreference
+         \o{ddd..}  character with octal code ddd..
+         \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
+         \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
+         \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
+         \xhh       character with hex code hh
+         \x{hh..}   character with hex code hh..
+
+       Note that \0dd is always an octal code. The treatment of backslash fol-
+       lowed by a non-zero digit is complicated; for details see  the  section
+       "Non-printing  characters"  in  the  pcre2pattern  documentation, where
+       details of escape processing in EBCDIC  environments  are  also  given.
+       \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not supported in
+       EBCDIC environments. Note that \N not  followed  by  an  opening  curly
+       bracket has a different meaning (see below).
+
+       When  \x  is not followed by {, from zero to two hexadecimal digits are
+       read, but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadec-
+       imal  digits  to  be  recognized  as a hexadecimal escape; otherwise it
+       matches a literal "x".  Likewise, if \u (in ALT_BSUX mode) is not  fol-
+       lowed by four hexadecimal digits, it matches a literal "u".
+
+
+CHARACTER TYPES
+
+         .          any character except newline;
+                      in dotall mode, any character whatsoever
+         \C         one code unit, even in UTF mode (best avoided)
+         \d         a decimal digit
+         \D         a character that is not a decimal digit
+         \h         a horizontal white space character
+         \H         a character that is not a horizontal white space character
+         \N         a character that is not a newline
+         \p{xx}     a character with the xx property
+         \P{xx}     a character without the xx property
+         \R         a newline sequence
+         \s         a white space character
+         \S         a character that is not a white space character
+         \v         a vertical white space character
+         \V         a character that is not a vertical white space character
+         \w         a "word" character
+         \W         a "non-word" character
+         \X         a Unicode extended grapheme cluster
+
+       \C  is dangerous because it may leave the current matching point in the
+       middle of a UTF-8 or UTF-16 character. The application can lock out the
+       use  of  \C  by  setting the PCRE2_NEVER_BACKSLASH_C option. It is also
+       possible to build PCRE2 with the use of \C permanently disabled.
+
+       By default, \d, \s, and \w match only ASCII characters, even  in  UTF-8
+       mode or in the 16-bit and 32-bit libraries. However, if locale-specific
+       matching is happening, \s and \w may also match  characters  with  code
+       points in the range 128-255. If the PCRE2_UCP option is set, the behav-
+       iour of these escape sequences is changed to use Unicode properties and
+       they match many more characters.
+
+
+GENERAL CATEGORY PROPERTIES FOR \p and \P
+
+         C          Other
+         Cc         Control
+         Cf         Format
+         Cn         Unassigned
+         Co         Private use
+         Cs         Surrogate
+
+         L          Letter
+         Ll         Lower case letter
+         Lm         Modifier letter
+         Lo         Other letter
+         Lt         Title case letter
+         Lu         Upper case letter
+         L&         Ll, Lu, or Lt
+
+         M          Mark
+         Mc         Spacing mark
+         Me         Enclosing mark
+         Mn         Non-spacing mark
+
+         N          Number
+         Nd         Decimal number
+         Nl         Letter number
+         No         Other number
+
+         P          Punctuation
+         Pc         Connector punctuation
+         Pd         Dash punctuation
+         Pe         Close punctuation
+         Pf         Final punctuation
+         Pi         Initial punctuation
+         Po         Other punctuation
+         Ps         Open punctuation
+
+         S          Symbol
+         Sc         Currency symbol
+         Sk         Modifier symbol
+         Sm         Mathematical symbol
+         So         Other symbol
+
+         Z          Separator
+         Zl         Line separator
+         Zp         Paragraph separator
+         Zs         Space separator
+
+
+PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P
+
+         Xan        Alphanumeric: union of properties L and N
+         Xps        POSIX space: property Z or tab, NL, VT, FF, CR
+         Xsp        Perl space: property Z or tab, NL, VT, FF, CR
+         Xuc        Univerally-named character: one that can be
+                      represented by a Universal Character Name
+         Xwd        Perl word: property Xan or underscore
+
+       Perl and POSIX space are now the same. Perl added VT to its space char-
+       acter set at release 5.18.
+
+
+SCRIPT NAMES FOR \p AND \P
+
+       Adlam, Ahom, Anatolian_Hieroglyphs, Arabic,  Armenian,  Avestan,  Bali-
+       nese,  Bamum,  Bassa_Vah,  Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
+       Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Caucasian_Alba-
+       nian,  Chakma,  Cham,  Cherokee,  Common,  Coptic,  Cuneiform, Cypriot,
+       Cyrillic, Deseret, Devanagari, Dogra,  Duployan,  Egyptian_Hieroglyphs,
+       Elbasan,   Ethiopic,  Georgian,  Glagolitic,  Gothic,  Grantha,  Greek,
+       Gujarati,  Gunjala_Gondi,  Gurmukhi,  Han,   Hangul,   Hanifi_Rohingya,
+       Hanunoo,   Hatran,   Hebrew,   Hiragana,  Imperial_Aramaic,  Inherited,
+       Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese,  Kaithi,  Kan-
+       nada,  Katakana,  Kayah_Li,  Kharoshthi, Khmer, Khojki, Khudawadi, Lao,
+       Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu, Lycian,  Lydian,  Maha-
+       jani,  Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi,
+       Medefaidrin,     Meetei_Mayek,     Mende_Kikakui,     Meroitic_Cursive,
+       Meroitic_Hieroglyphs,  Miao,  Modi,  Mongolian,  Mro, Multani, Myanmar,
+       Nabataean, New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki,  Old_Hungar-
+       ian,  Old_Italic,  Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog-
+       dian,   Old_South_Arabian,   Old_Turkic,   Oriya,    Osage,    Osmanya,
+       Pahawh_Hmong,    Palmyrene,    Pau_Cin_Hau,    Phags_Pa,    Phoenician,
+       Psalter_Pahlavi, Rejang, Runic, Samaritan,  Saurashtra,  Sharada,  Sha-
+       vian,  Siddham,  SignWriting,  Sinhala, Sogdian, Sora_Sompeng, Soyombo,
+       Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa,  Tai_Le,  Tai_Tham,
+       Tai_Viet,  Takri,  Tamil,  Tangut, Telugu, Thaana, Thai, Tibetan, Tifi-
+       nagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi, Zanabazar_Square.
+
+
+CHARACTER CLASSES
+
+         [...]       positive character class
+         [^...]      negative character class
+         [x-y]       range (can be used for hex characters)
+         [[:xxx:]]   positive POSIX named set
+         [[:^xxx:]]  negative POSIX named set
+
+         alnum       alphanumeric
+         alpha       alphabetic
+         ascii       0-127
+         blank       space or tab
+         cntrl       control character
+         digit       decimal digit
+         graph       printing, excluding space
+         lower       lower case letter
+         print       printing, including space
+         punct       printing, excluding alphanumeric
+         space       white space
+         upper       upper case letter
+         word        same as \w
+         xdigit      hexadecimal digit
+
+       In PCRE2, POSIX character set names recognize only ASCII characters  by
+       default,  but  some of them use Unicode properties if PCRE2_UCP is set.
+       You can use \Q...\E inside a character class.
+
+
+QUANTIFIERS
+
+         ?           0 or 1, greedy
+         ?+          0 or 1, possessive
+         ??          0 or 1, lazy
+         *           0 or more, greedy
+         *+          0 or more, possessive
+         *?          0 or more, lazy
+         +           1 or more, greedy
+         ++          1 or more, possessive
+         +?          1 or more, lazy
+         {n}         exactly n
+         {n,m}       at least n, no more than m, greedy
+         {n,m}+      at least n, no more than m, possessive
+         {n,m}?      at least n, no more than m, lazy
+         {n,}        n or more, greedy
+         {n,}+       n or more, possessive
+         {n,}?       n or more, lazy
+
+
+ANCHORS AND SIMPLE ASSERTIONS
+
+         \b          word boundary
+         \B          not a word boundary
+         ^           start of subject
+                       also after an internal newline in multiline mode
+                       (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
+         \A          start of subject
+         $           end of subject
+                       also before newline at end of subject
+                       also before internal newline in multiline mode
+         \Z          end of subject
+                       also before newline at end of subject
+         \z          end of subject
+         \G          first matching position in subject
+
+
+REPORTED MATCH POINT SETTING
+
+         \K          set reported start of match
+
+       \K is honoured in positive assertions, but ignored in negative ones.
+
+
+ALTERNATION
+
+         expr|expr|expr...
+
+
+CAPTURING
+
+         (...)           capturing group
+         (?<name>...)    named capturing group (Perl)
+         (?'name'...)    named capturing group (Perl)
+         (?P<name>...)   named capturing group (Python)
+         (?:...)         non-capturing group
+         (?|...)         non-capturing group; reset group numbers for
+                          capturing groups in each alternative
+
+
+ATOMIC GROUPS
+
+         (?>...)         atomic, non-capturing group
+
+
+COMMENT
+
+         (?#....)        comment (not nestable)
+
+
+OPTION SETTING
+       Changes of these options within a group are automatically cancelled  at
+       the end of the group.
+
+         (?i)            caseless
+         (?J)            allow duplicate names
+         (?m)            multiline
+         (?n)            no auto capture
+         (?s)            single line (dotall)
+         (?U)            default ungreedy (lazy)
+         (?x)            extended: ignore white space except in classes
+         (?xx)           as (?x) but also ignore space and tab in classes
+         (?-...)         unset option(s)
+         (?^)            unset imnsx options
+
+       Unsetting  x or xx unsets both. Several options may be set at once, and
+       a mixture of setting and unsetting such as (?i-x) is allowed, but there
+       may be only one hyphen. Setting (but no unsetting) is allowed after (?^
+       for example (?^in). An option setting may appear at the start of a non-
+       capturing group, for example (?i:...).
+
+       The  following  are  recognized  only at the very start of a pattern or
+       after one of the newline or \R options with similar syntax.  More  than
+       one of them may appear. For the first three, d is a decimal number.
+
+         (*LIMIT_DEPTH=d) set the backtracking limit to d
+         (*LIMIT_HEAP=d)  set the heap size limit to d * 1024 bytes
+         (*LIMIT_MATCH=d) set the match limit to d
+         (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
+         (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
+         (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
+         (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
+         (*NO_JIT)       disable JIT optimization
+         (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
+         (*UTF)          set appropriate UTF mode for the library in use
+         (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
+
+       Note  that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the
+       value  of  the  limits  set  by  the   caller   of   pcre2_match()   or
+       pcre2_dfa_match(),  not  increase  them. LIMIT_RECURSION is an obsolete
+       synonym for LIMIT_DEPTH. The application can lock out the use of (*UTF)
+       and  (*UCP)  by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options,
+       respectively, at compile time.
+
+
+NEWLINE CONVENTION
+
+       These are recognized only at the very start of  the  pattern  or  after
+       option settings with a similar syntax.
+
+         (*CR)           carriage return only
+         (*LF)           linefeed only
+         (*CRLF)         carriage return followed by linefeed
+         (*ANYCRLF)      all three of the above
+         (*ANY)          any Unicode newline sequence
+         (*NUL)          the NUL character (binary zero)
+
+
+WHAT \R MATCHES
+
+       These  are  recognized  only  at the very start of the pattern or after
+       option setting with a similar syntax.
+
+         (*BSR_ANYCRLF)  CR, LF, or CRLF
+         (*BSR_UNICODE)  any Unicode newline sequence
+
+
+LOOKAHEAD AND LOOKBEHIND ASSERTIONS
+
+         (?=...)         positive look ahead
+         (?!...)         negative look ahead
+         (?<=...)        positive look behind
+         (?<!...)        negative look behind
+
+       Each top-level branch of a look behind must be of a fixed length.
+
+
+BACKREFERENCES
+
+         \n              reference by number (can be ambiguous)
+         \gn             reference by number
+         \g{n}           reference by number
+         \g+n            relative reference by number (PCRE2 extension)
+         \g-n            relative reference by number
+         \g{+n}          relative reference by number (PCRE2 extension)
+         \g{-n}          relative reference by number
+         \k<name>        reference by name (Perl)
+         \k'name'        reference by name (Perl)
+         \g{name}        reference by name (Perl)
+         \k{name}        reference by name (.NET)
+         (?P=name)       reference by name (Python)
+
+
+SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
+
+         (?R)            recurse whole pattern
+         (?n)            call subpattern by absolute number
+         (?+n)           call subpattern by relative number
+         (?-n)           call subpattern by relative number
+         (?&name)        call subpattern by name (Perl)
+         (?P>name)       call subpattern by name (Python)
+         \g<name>        call subpattern by name (Oniguruma)
+         \g'name'        call subpattern by name (Oniguruma)
+         \g<n>           call subpattern by absolute number (Oniguruma)
+         \g'n'           call subpattern by absolute number (Oniguruma)
+         \g<+n>          call subpattern by relative number (PCRE2 extension)
+         \g'+n'          call subpattern by relative number (PCRE2 extension)
+         \g<-n>          call subpattern by relative number (PCRE2 extension)
+         \g'-n'          call subpattern by relative number (PCRE2 extension)
+
+
+CONDITIONAL PATTERNS
+
+         (?(condition)yes-pattern)
+         (?(condition)yes-pattern|no-pattern)
+
+         (?(n)               absolute reference condition
+         (?(+n)              relative reference condition
+         (?(-n)              relative reference condition
+         (?(<name>)          named reference condition (Perl)
+         (?('name')          named reference condition (Perl)
+         (?(name)            named reference condition (PCRE2, deprecated)
+         (?(R)               overall recursion condition
+         (?(Rn)              specific numbered group recursion condition
+         (?(R&name)          specific named group recursion condition
+         (?(DEFINE)          define subpattern for reference
+         (?(VERSION[>]=n.m)  test PCRE2 version
+         (?(assert)          assertion condition
+
+       Note the ambiguity of (?(R) and (?(Rn) which might be  named  reference
+       conditions  or  recursion  tests.  Such a condition is interpreted as a
+       reference condition if the relevant named group exists.
+
+
+BACKTRACKING CONTROL
+
+       All backtracking control verbs may be in  the  form  (*VERB:NAME).  For
+       (*MARK)  the  name is mandatory, for the others it is optional. (*SKIP)
+       changes its behaviour if :NAME is present. The others just set  a  name
+       for passing back to the caller, but this is not a name that (*SKIP) can
+       see. The following act immediately they are reached:
+
+         (*ACCEPT)       force successful match
+         (*FAIL)         force backtrack; synonym (*F)
+         (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
+
+       The following act only when a subsequent match failure causes  a  back-
+       track to reach them. They all force a match failure, but they differ in
+       what happens afterwards. Those that advance the start-of-match point do
+       so only if the pattern is not anchored.
+
+         (*COMMIT)       overall failure, no advance of starting point
+         (*PRUNE)        advance to next starting character
+         (*SKIP)         advance to current matching position
+         (*SKIP:NAME)    advance to position corresponding to an earlier
+                         (*MARK:NAME); if not found, the (*SKIP) is ignored
+         (*THEN)         local failure, backtrack to next alternation
+
+       The  effect  of one of these verbs in a group called as a subroutine is
+       confined to the subroutine call.
+
+
+CALLOUTS
+
+         (?C)            callout (assumed number 0)
+         (?Cn)           callout with numerical data n
+         (?C"text")      callout with string data
+
+       The allowed string delimiters are ` ' " ^ % # $ (which are the same for
+       the  start  and the end), and the starting delimiter { matched with the
+       ending delimiter }. To encode the ending delimiter within  the  string,
+       double it.
+
+
+SEE ALSO
+
+       pcre2pattern(3),    pcre2api(3),   pcre2callout(3),   pcre2matching(3),
+       pcre2(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 02 September 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRE2UNICODE(3)            Library Functions Manual            PCRE2UNICODE(3)
+
+
+
+NAME
+       PCRE - Perl-compatible regular expressions (revised API)
+
+UNICODE AND UTF SUPPORT
+
+       When PCRE2 is built with Unicode support (which is the default), it has
+       knowledge of Unicode character properties and can process text  strings
+       in  UTF-8, UTF-16, or UTF-32 format (depending on the code unit width).
+       However, by default, PCRE2 assumes that one code unit is one character.
+       To  process  a  pattern  as a UTF string, where a character may require
+       more than one  code  unit,  you  must  call  pcre2_compile()  with  the
+       PCRE2_UTF  option  flag,  or  the  pattern must start with the sequence
+       (*UTF). When either of these is the case, both the pattern and any sub-
+       ject  strings  that  are  matched against it are treated as UTF strings
+       instead of strings of individual one-code-unit  characters.  There  are
+       also  some  other  changes  to the way characters are handled, as docu-
+       mented below.
+
+       If you do not need Unicode support you can build PCRE2 without  it,  in
+       which case the library will be smaller.
+
+
+UNICODE PROPERTY SUPPORT
+
+       When  PCRE2 is built with Unicode support, the escape sequences \p{..},
+       \P{..}, and \X can be used. The Unicode properties that can  be  tested
+       are  limited to the general category properties such as Lu for an upper
+       case letter or Nd for a decimal number, the Unicode script  names  such
+       as Arabic or Han, and the derived properties Any and L&. Full lists are
+       given in the pcre2pattern and pcre2syntax documentation. Only the short
+       names  for  properties are supported. For example, \p{L} matches a let-
+       ter. Its Perl synonym, \p{Letter}, is not supported.   Furthermore,  in
+       Perl,  many properties may optionally be prefixed by "Is", for compati-
+       bility with Perl 5.6. PCRE2 does not support this.
+
+
+WIDE CHARACTERS AND UTF MODES
+
+       Code points less than 256 can be specified in patterns by either braced
+       or unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3).
+       Larger values have to use braced sequences. Unbraced octal code  points
+       up to \777 are also recognized; larger ones can be coded using \o{...}.
+
+       The  escape sequence \N{U+<hex digits>} is recognized as another way of
+       specifying a Unicode character by code point in a UTF mode. It  is  not
+       allowed in non-UTF modes.
+
+       In  UTF modes, repeat quantifiers apply to complete UTF characters, not
+       to individual code units.
+
+       In UTF modes, the dot metacharacter matches one UTF  character  instead
+       of a single code unit.
+
+       The escape sequence \C can be used to match a single code unit in a UTF
+       mode, but its use can lead to some strange effects because it breaks up
+       multi-unit  characters  (see  the description of \C in the pcre2pattern
+       documentation).
+
+       The use of \C is not supported by  the  alternative  matching  function
+       pcre2_dfa_match() when in UTF-8 or UTF-16 mode, that is, when a charac-
+       ter may consist of more than one code unit. The  use  of  \C  in  these
+       modes  provokes a match-time error. Also, the JIT optimization does not
+       support \C in these modes. If JIT optimization is requested for a UTF-8
+       or  UTF-16  pattern  that contains \C, it will not succeed, and so when
+       pcre2_match() is called, the matching will be carried out by the normal
+       interpretive function.
+
+       The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly test
+       characters of any code value, but,  by  default,  the  characters  that
+       PCRE2  recognizes as digits, spaces, or word characters remain the same
+       set as in non-UTF mode, all  with  code  points  less  than  256.  This
+       remains  true  even  when  PCRE2  is  built to include Unicode support,
+       because to do otherwise would slow down matching in many common  cases.
+       Note  that  this also applies to \b and \B, because they are defined in
+       terms of \w and \W. If you want to test for  a  wider  sense  of,  say,
+       "digit",  you  can  use explicit Unicode property tests such as \p{Nd}.
+       Alternatively, if you set the PCRE2_UCP option, the way that the  char-
+       acter  escapes  work  is changed so that Unicode properties are used to
+       determine which characters match. There are more details in the section
+       on generic character types in the pcre2pattern documentation.
+
+       Similarly,  characters that match the POSIX named character classes are
+       all low-valued characters, unless the PCRE2_UCP option is set.
+
+       However, the special  horizontal  and  vertical  white  space  matching
+       escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-
+       acters, whether or not PCRE2_UCP is set.
+
+
+CASE-EQUIVALENCE IN UTF MODES
+
+       Case-insensitive matching in a UTF mode makes use of Unicode properties
+       except for characters whose code points are less than 128 and that have
+       at most two case-equivalent values. For these, a direct table lookup is
+       used  for speed. A few Unicode characters such as Greek sigma have more
+       than two code points that are case-equivalent, and these are treated as
+       such.
+
+
+VALIDITY OF UTF STRINGS
+
+       When  the  PCRE2_UTF  option is set, the strings passed as patterns and
+       subjects are (by default) checked for validity on entry to the relevant
+       functions.   If an invalid UTF string is passed, an negative error code
+       is returned. The code unit offset to the  offending  character  can  be
+       extracted  from  the match data block by calling pcre2_get_startchar(),
+       which is used for this purpose after a UTF error.
+
+       UTF-16 and UTF-32 strings can indicate their endianness by special code
+       knows  as  a  byte-order  mark (BOM). The PCRE2 functions do not handle
+       this, expecting strings to be in host byte order.
+
+       A UTF string is checked before any other processing takes place. In the
+       case  of  pcre2_match()  and  pcre2_dfa_match()  calls  with a non-zero
+       starting offset, the check is applied only to that part of the  subject
+       that  could be inspected during matching, and there is a check that the
+       starting offset points to the first code unit of a character or to  the
+       end  of  the subject. If there are no lookbehind assertions in the pat-
+       tern, the check starts at the starting offset. Otherwise, it starts  at
+       the  length of the longest lookbehind before the starting offset, or at
+       the start of the subject if there are not that many  characters  before
+       the  starting offset. Note that the sequences \b and \B are one-charac-
+       ter lookbehinds.
+
+       In addition to checking the format of the string, there is a  check  to
+       ensure that all code points lie in the range U+0 to U+10FFFF, excluding
+       the surrogate area. The so-called "non-character" code points  are  not
+       excluded because Unicode corrigendum #9 makes it clear that they should
+       not be.
+
+       Characters in the "Surrogate Area" of Unicode are reserved for  use  by
+       UTF-16,  where they are used in pairs to encode code points with values
+       greater than 0xFFFF. The code points that are encoded by  UTF-16  pairs
+       are  available  independently  in  the  UTF-8 and UTF-32 encodings. (In
+       other words, the whole surrogate thing is  a  fudge  for  UTF-16  which
+       unfortunately messes up UTF-8 and UTF-32.)
+
+       In  some  situations, you may already know that your strings are valid,
+       and therefore want to skip these checks in  order  to  improve  perfor-
+       mance,  for  example in the case of a long subject string that is being
+       scanned repeatedly.  If you set the PCRE2_NO_UTF_CHECK option  at  com-
+       pile  time  or at match time, PCRE2 assumes that the pattern or subject
+       it is given (respectively) contains only valid UTF code unit sequences.
+
+       Passing PCRE2_NO_UTF_CHECK to pcre2_compile() just disables  the  check
+       for the pattern; it does not also apply to subject strings. If you want
+       to disable the check for a subject string you must pass this option  to
+       pcre2_match() or pcre2_dfa_match().
+
+       If  you  pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the
+       result is undefined and your program may crash or loop indefinitely.
+
+       Note that setting PCRE2_NO_UTF_CHECK at compile time does  not  disable
+       the  error  that  is given if an escape sequence for an invalid Unicode
+       code point is encountered in the pattern. If you want to  allow  escape
+       sequences  such  as  \x{d800}  (a surrogate code point) you can set the
+       PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra option. However, this is pos-
+       sible only in UTF-8 and UTF-32 modes, because these values are not rep-
+       resentable in UTF-16.
+
+   Errors in UTF-8 strings
+
+       The following negative error codes are given for invalid UTF-8 strings:
+
+         PCRE2_ERROR_UTF8_ERR1
+         PCRE2_ERROR_UTF8_ERR2
+         PCRE2_ERROR_UTF8_ERR3
+         PCRE2_ERROR_UTF8_ERR4
+         PCRE2_ERROR_UTF8_ERR5
+
+       The string ends with a truncated UTF-8 character;  the  code  specifies
+       how  many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8
+       characters to be no longer than 4 bytes, the  encoding  scheme  (origi-
+       nally  defined  by  RFC  2279)  allows  for  up to 6 bytes, and this is
+       checked first; hence the possibility of 4 or 5 missing bytes.
+
+         PCRE2_ERROR_UTF8_ERR6
+         PCRE2_ERROR_UTF8_ERR7
+         PCRE2_ERROR_UTF8_ERR8
+         PCRE2_ERROR_UTF8_ERR9
+         PCRE2_ERROR_UTF8_ERR10
+
+       The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of
+       the  character  do  not have the binary value 0b10 (that is, either the
+       most significant bit is 0, or the next bit is 1).
+
+         PCRE2_ERROR_UTF8_ERR11
+         PCRE2_ERROR_UTF8_ERR12
+
+       A character that is valid by the RFC 2279 rules is either 5 or 6  bytes
+       long; these code points are excluded by RFC 3629.
+
+         PCRE2_ERROR_UTF8_ERR13
+
+       A  4-byte character has a value greater than 0x10fff; these code points
+       are excluded by RFC 3629.
+
+         PCRE2_ERROR_UTF8_ERR14
+
+       A 3-byte character has a value in the  range  0xd800  to  0xdfff;  this
+       range  of code points are reserved by RFC 3629 for use with UTF-16, and
+       so are excluded from UTF-8.
+
+         PCRE2_ERROR_UTF8_ERR15
+         PCRE2_ERROR_UTF8_ERR16
+         PCRE2_ERROR_UTF8_ERR17
+         PCRE2_ERROR_UTF8_ERR18
+         PCRE2_ERROR_UTF8_ERR19
+
+       A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it  codes
+       for  a  value that can be represented by fewer bytes, which is invalid.
+       For example, the two bytes 0xc0, 0xae give the value 0x2e,  whose  cor-
+       rect coding uses just one byte.
+
+         PCRE2_ERROR_UTF8_ERR20
+
+       The two most significant bits of the first byte of a character have the
+       binary value 0b10 (that is, the most significant bit is 1 and the  sec-
+       ond  is  0). Such a byte can only validly occur as the second or subse-
+       quent byte of a multi-byte character.
+
+         PCRE2_ERROR_UTF8_ERR21
+
+       The first byte of a character has the value 0xfe or 0xff. These  values
+       can never occur in a valid UTF-8 string.
+
+   Errors in UTF-16 strings
+
+       The  following  negative  error  codes  are  given  for  invalid UTF-16
+       strings:
+
+         PCRE2_ERROR_UTF16_ERR1  Missing low surrogate at end of string
+         PCRE2_ERROR_UTF16_ERR2  Invalid low surrogate follows high surrogate
+         PCRE2_ERROR_UTF16_ERR3  Isolated low surrogate
+
+
+   Errors in UTF-32 strings
+
+       The following  negative  error  codes  are  given  for  invalid  UTF-32
+       strings:
+
+         PCRE2_ERROR_UTF32_ERR1  Surrogate character (0xd800 to 0xdfff)
+         PCRE2_ERROR_UTF32_ERR2  Code point is greater than 0x10ffff
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge, England.
+
+
+REVISION
+
+       Last updated: 02 September 2018
+       Copyright (c) 1997-2018 University of Cambridge.
+------------------------------------------------------------------------------
+
+