#!/usr/bin/perl

=head1 NAME

pristine-tar - regenerate pristine tarballs

=head1 SYNOPSIS

B<pristine-tar> [OPTIONS] gendelta I<tarball> I<delta>

B<pristine-tar> [OPTIONS] gentar I<delta> I<tarball>

B<pristine-tar> [OPTIONS] commit I<tarball> [I<upstream>]

B<pristine-tar> [OPTIONS] checkout I<tarball>

B<pristine-tar> [OPTIONS] list

B<pristine-tar> [OPTIONS] verify I<tarball>

=head1 DESCRIPTION

pristine-tar can regenerate an exact copy of a pristine upstream tarball
using only a small binary I<delta> file and the contents of the tarball,
which are typically kept in an I<upstream> branch in version control.

The I<delta> file is designed to be checked into version control along-side
the I<upstream> branch, thus allowing Debian packages to be built entirely
using sources in version control, without the need to keep copies of
upstream tarballs.

pristine-tar supports compressed tarballs, calling out to pristine-gz(1),
pristine-bz2(1), and pristine-xz(1) to produce the pristine gzip, bzip2,
and xz files.

=head1 COMMANDS

=head2 pristine-tar gendelta I<tarball> I<delta>

This takes the specified upstream I<tarball>, and generates a small binary
delta file that can later be used by pristine-tar gentar to recreate the
tarball.

If the delta filename is "-", it is written to standard output.

=head2 pristine-tar gentar I<delta> I<tarball>

This takes the specified I<delta> file, and the files in the current
directory, which must have identical content to those in the upstream
tarball, and uses these to regenerate the pristine upstream I<tarball>.

If the delta filename is "-", it is read from standard input.

=head2 pristine-tar commit I<tarball> [I<upstream>]

B<pristine-tar commit> generates a pristine-tar delta file for the specified
I<tarball>, and commits it to version control. The B<pristine-tar checkout>
command can later be used to recreate the original tarball based only
on the information stored in version control.

The I<upstream> parameter specifies the tag or branch that contains the
same content that is present in the tarball. This defaults to
"refs/heads/upstream", or if there's no such branch, any
branch matching "upstream". The name of the tree it points to will be
recorded for later use by B<pristine-tar checkout>. Note that the content
does not need to be 100% identical to the content of the tarball, but
if it is not, additional space will be used in the delta file.

The delta files are stored in a branch named "pristine-tar", with filenames
corresponding to the input tarball, with ".delta" appended. This
branch is created or updated as needed to add each new delta.

If I<tarball> already exists previously, it will only be overwritten if it does
not match a hash of the tarball that has been committed to version control.

=head2 pristine-tar checkout I<tarball>

This regenerates a copy of the specified I<tarball> using information
previously saved in version control by B<pristine-tar commit>.

=head2 pristine-tar list

This lists tarballs that pristine-tar is able to checkout from version
control.

=head2 pristine-tar verify I<tarball>

Verifies whether an existing I<tarball> matches the one that has been committed
to version control.

=head1 OPTIONS

=over

=item -v

=item --verbose

Verbose mode, show each command that is run.

=item -d

=item --debug

Debug mode.

=item -k

=item --keep

Don't clean up the temporary directory on exit.

=item -m message

=item --message=message

Use this option to specify a custom commit message to pristine-tar commit.

Applies to the B<commit> command.

=item -s signaturefile

=item --signature-file=signaturefile

Use this option to optionally commit or checkout an upstream signature
file for the tarball. Note that extraction of signatures is not
performed by default.

Applies to the B<commit> and B<checkout> commands.

=item -r

=item --recompress

Use this option to tell pristine-tar that it is OK to recompress the tarball if
1) the tarball can't be regenerated or 2) the delta file produced from the
original tarball is larger than a given threshold (see also
I<-B>/I<--recompress-threshold-bytes> and
I<-T>/I<--recompress-threshold-percent>).

B<Note that this modifies the original tarball on disk>, and you probably
shouldn't use it if you are also storing a upstream GPG signature of the
original tarball (see I<--signature-file>).

On the other hand, the actual contents of the tarball stored by pristine-tar
B<will> be identical to the content of the upstream original tarbal even if
the compressed tarball itself won't be bit-by-bit identical to the one released
by upstream.

A copy of the original tarball will be saved with ".backup" added to its
filename.

Applies to the B<commit> and B<gendelta> commands.

=item -B N

=item --recompress-threshold-bytes=N

Recompress original tarball if the generated delta is larger then I<N> bytes.
Default: 524288000 (500 kB).

If this option is passed explicitly in the command line, it implies
I<--recompress>.

Applies to the B<commit> and B<gendelta> commands.

=item -P N

=item --recompress-threshold-percent=N

Recompress original tarball if the generated delta is larger than I<N>% of the
size of the original tarball. Default: 30.

If this option is passed explicitly in the command line, it implies
I<--recompress>.

Applies to the B<commit> and B<gendelta> commands.

=back

=head1 EXAMPLES

Suppose you maintain the hello package, in a git repository. You have
just created a tarball of the release, I<hello-1.0.tar.gz>, which you
will upload to a "forge" site.

You want to ensure that, if the "forge" loses the tarball, you can always
recreate exactly that same tarball. And you'd prefer not to keep copies
of tarballs for every release, as that could use a lot of disk space
when hello gets the background mp3s and user-contributed levels you
are planning for version 2.0.

The solution is to use pristine-tar to commit a delta file that efficiently
stores enough information to reproduce the tarball later.

    cd hello
    git tag -s 1.0
    pristine-tar commit ../hello-1.0.tar.gz 1.0

Remember to tell git to push both the pristine-tar branch, and your tag:

    git push --all --tags

Now it is a year later. The worst has come to pass; the "forge" lost
all its data, you deleted the tarballs to make room for bug report emails,
and you want to regenerate them. Happily, the git repository is still
available.

    git clone git://github.com/joeyh/hello.git
    cd hello
    pristine-tar checkout ../hello-1.0.tar.gz

=head1 LIMITATIONS

Only tarballs, gzipped tarballs, bzip2ed tarballs, and xzed tarballs
are currently supported.

Currently only the git revision control system is supported by the
"checkout" and "commit" commands. It's ok if the working copy
is not clean or has uncommitted changes, or has changes staged in the
index; none of that will be touched by "checkout" or "commit".

=head1 ENVIRONMENT

=over

=item B<TMPDIR>

Specifies a location to place temporary files, other than the default.

=item B<PRISTINE_TAR>

Defines command line options to be assumed by pristine-tar. Any options passed
explicitly on the command line will override those.

Options will be split on whitespaces, so if you want to pass an option that
needs an argument, use the I<--opt=arg> syntax instead of I<--opt arg>.

=item B<PRISTINE_ALL_XDELTA>

Defines the underlying binary delta tool to be used for new deltas. Supported
values are "xdelta3" (default) and "xdelta".

Existing deltas will be handled with the original tool that was used to
create them, regardless of the value of B<$PRISTINE_ALL_XDELTA>.

=back

=head1 AUTHOR

Joey Hess <joeyh@debian.org>

Licensed under the GPL, version 2 or above.

=cut

use warnings;
use strict;
use Digest::SHA;
use File::Temp;
use Pristine::Tar;
use Pristine::Tar::Delta;
use Pristine::Tar::Formats;
use Pristine::Tar::DeltaTools;
use File::Copy;
use File::Path;
use File::Basename;
use Cwd qw{getcwd abs_path};

# Force locale to C since tar may output utf-8 filenames differently
# depending on the locale.
$ENV{LANG} = 'C';

# Don't let environment change tar's behavior.
delete $ENV{TAR_OPTIONS};
delete $ENV{TAPE};

# Ask tar to please be compatible with version 1.26.  In version 1.27, it
# changed some fields used in longlink entries, and in tar 1.30, it changed the
# output to fix a reproducibility issue.
$ENV{PRISTINE_TAR_COMPAT} = 1;

# The following two assignments are potentially munged during the
# build process to hold the values of TAR_PROGRAM, ZCAT_PROGRAM parameters as
# given to Makefile.PL.

my $tar_program = "tar";
my $zcat_program = "zcat";


# valid versions
my $XDELTA      = "2";
my $XDELTA3     = "3";
my $XDELTA_LONG = "2.0";

my $MIN_VERSION = $XDELTA;
my $MAX_VERSION = $XDELTA3;

my %TOOLS_TO_VERSION = (
  "xdelta"  => $XDELTA,
  "xdelta3" => $XDELTA3,
);

my %VERSION_TO_TOOL = reverse(%TOOLS_TO_VERSION);
$VERSION_TO_TOOL{$XDELTA_LONG} = "xdelta";

my $recompress = 0;
my $recompress_theshold_bytes = 524288000;
my $recompress_theshold_percent = 30;
my ($message, $signature_file);
my $genversion = get_version_from_tool(undef, %TOOLS_TO_VERSION);


unshift(@ARGV, split(/\s+/, $ENV{PRISTINE_TAR} || ''));

dispatch(
  commands => {
    usage    => [ \&usage ],
    gentar   => [ \&gentar, 2 ],
    gendelta => [ \&gendelta, 2 ],
    commit   => [ \&commit ],
    ci       => [ \&commit, 1 ],
    checkout => [ \&checkout, 1 ],
    co       => [ \&checkout, 1 ],
    list     => [ \&list, 0 ],
    verify   => [ \&verify, 1 ],
  },
  options => {
    "m|message=s"        => \$message,
    "s|signature-file=s" => \$signature_file,
    "r|recompress!"      => \$recompress,
    "B|recompress-threshold-bytes=i" => \$recompress_theshold_bytes,
    "P|recompress-threshold-percent=i" => \$recompress_theshold_percent,
  },
);

sub usage {
  print STDERR "Usage: pristine-tar [OPTIONS] gendelta tarball delta
       pristine-tar [OPTIONS] gentar delta tarball\
       pristine-tar [OPTIONS] commit tarball [upstream]
       pristine-tar [OPTIONS] checkout tarball
       pristine-tar [OPTIONS] verify tarball
       pristine-tar [OPTIONS] list
Options:
       -d, --debug               Turn on debugging output
       -v, --verbose             Turn on verbose output
       -k, --keep                Don't delete temporary files
       -h, --help                Display usage information
       -m MSG, --message=MSG     Set commit message
       -s SIG, --signature-file  Set signature file to be stored
                                 together with the tarball
       -r, --recompress          Recompress original tarball if needed
       -B N, --recompress-threshold-bytes=N
                                 Recompress tarball if the delta file is
                                 larger than N bytes.
       -B N, --recompress-threshold-percent=N
                                 Recompress tarball if the delta file is
                                 larger than N% of the tarball size.
";
}

sub unquote_filename {
  my $filename = shift;

  $filename =~ s/\\a/\a/g;
  $filename =~ s/\\b/\b/g;
  $filename =~ s/\\f/\f/g;
  $filename =~ s/\\n/\n/g;
  $filename =~ s/\\r/\r/g;
  $filename =~ s/\\t/\t/g;
  $filename =~ s/\\v/\x11/g;
  $filename =~ s/\\\\/\\/g;

  return $filename;
}

my $recreatetarball_tempdir;
sub recreatetarball {
  my $manifestfile = shift;
  my $source       = shift;
  my %options      = @_;

  my $tempdir = tempdir();

  my @manifest;
  open(IN, "<", $manifestfile) || die "$manifestfile: $!";
  while (<IN>) {
    chomp;
    push @manifest, $_;
  }
  close IN;
  link($manifestfile, "$tempdir/manifest") || die "link $tempdir/manifest: $!";

  # The manifest and source should have the same filenames,
  # but the manifest probably has all the files under a common
  # subdirectory. Check if it does.
  my $subdir = "";
  foreach my $file (@manifest) {
    #debug("file: $file");
    if ($file =~ m!^(/?[^/]+)(/|$)!) {
      if (length $subdir && $subdir ne $1) {
        debug("found file not in subdir $subdir: $file");
        $subdir = "";
        last;
      } elsif (!length $subdir) {
        $subdir = $1;
        debug("set subdir to $subdir");
      }
    } else {
      debug("found file not in subdir: $file");
      $subdir = "";
      last;
    }
  }

  if (length $subdir) {
    debug("subdir is $subdir");
    doit("mkdir", "$tempdir/workdir");
    $subdir = "/$subdir";
  }

  if (!$options{clobber_source}) {
    doit("cp", "-a", $source, "$tempdir/workdir$subdir");
  } else {
    doit("mv", $source, "$tempdir/workdir$subdir");
  }

  # It's important that this create an identical tarball each time
  # for a given set of input files. So don't include file metadata
  # in the tarball, since it can easily vary.
  my $full_sweep = 0;
  foreach my $file (@manifest) {
    my $unquoted_file = unquote_filename($file);

    if (-l "$tempdir/workdir/$unquoted_file") {
      # Can't set timestamp of a symlink, so
      # replace the symlink with an empty file.
      unlink("$tempdir/workdir/$unquoted_file") || die "unlink: $!";
      open(OUT, ">", "$tempdir/workdir/$unquoted_file") || die "open: $!";
      close OUT;
    } elsif (!-e "$tempdir/workdir/$unquoted_file") {
      debug(
"$file is listed in the manifest but may not be present in the source directory"
      );
      $full_sweep = 1;

      if ($options{create_missing}) {
        # Avoid tar failing on the nonexistent item by
        # creating a dummy directory.
        debug("creating missing $unquoted_file");
        mkpath "$tempdir/workdir/$unquoted_file";
      }
    }

    if (-d "$tempdir/workdir/$unquoted_file" && (-u _ || -g _ || -k _)) {
      # tar behaves weirdly for some special modes
      # and ignores --mode, so clear them.
      debug("chmod $file");
      chmod(0755, "$tempdir/workdir/$unquoted_file")
        || die "chmod: $!";
    }
  }

  # Set file times only after modifying of the directory content is
  # done.
  foreach my $file (@manifest) {
    my $unquoted_file = unquote_filename($file);
    if (-e "$tempdir/workdir/$unquoted_file") {
      utime(0, 0, "$tempdir/workdir/$unquoted_file") || die "utime: $file: $!";
    }
  }

  # If some files couldn't be matched up with the manifest,
  # it's possible they do exist, but just with names that make sense
  # to tar, but not to this program. Work around this and make sure
  # such files have their metadata tweaked, by doing a full sweep of
  # the tree.
  if ($full_sweep) {
    debug("doing full tree sweep to catch missing files");
    use File::Find;
    find(
      sub {
        if (-l $_) {
          unlink($_) || die "unlink: $!";
          open(OUT, ">", $_) || die "open: $!";
          close OUT;
        }
        if (-d $_ && (-u _ || -g _ || -k _)) {
          chmod(0755, $_)
            || die "chmod: $!";
        }
      },
      "$tempdir/workdir"
    );
    find(
      sub {
        utime(0, 0, $_) || die "utime: $_: $!";
      },
      "$tempdir/workdir"
    );
  }

  $recreatetarball_tempdir = $tempdir;
  return recreatetarball_helper(%options);
}

sub recreatetarball_helper {
  my %options = @_;
  my $tempdir = $recreatetarball_tempdir;

  my $ret = "$tempdir/recreatetarball";
  my @cmd = (
    $tar_program,     "cf",
    $ret,             "--owner",
    0,                "--group",
    0,                "--numeric-owner",
    "-C",             "$tempdir/workdir",
    "--no-recursion", "--mode",
    "0644",           "--verbatim-files-from",
    "--files-from",   "$tempdir/manifest"
  );
  if (exists $options{tar_format}) {
    push @cmd, ("-H", $options{tar_format});
  }

  doit(@cmd);

  return $ret;
}

sub recreatetarball_longlink_100 {
  # For a long time, Debian's tar had a patch that made it output
  # larger tar files if a filename was exactly 100 bytes. Now that
  # Debian's tar has been fixed, in order to recreate the tarball
  # created by that version of tar, we reply on on an environment
  # variable to turn back on the old behavior.
  #
  # This variable is currently only available in Debian's tar,
  # so users of non-debian tar who want to recreate tarballs from
  # deltas created using the old version of Debian's tar are SOL.

  $ENV{TAR_LONGLINK_100} = 1;
  my $ret = recreatetarball_helper();
  delete $ENV{TAR_LONGLINK_100};
  return $ret;
}

sub recreatetarball_broken_numeric_owner {
  # tar 1.29 and earlier produced incorrect tarballs, including user and
  # group names (which are potentially different across different systems)
  # even when --numeric-owner was passed. This caused tarballs to not be
  # reproducible, and was fixed in tar 1.30. The fix changes the output in a
  # way that some tarballs produced by tar < 1.30 cannot be recreated with
  # tar 1.30+
  #
  # This is a Debian-specific workaround to allow pristine-tar to reproduce
  # tarballs created by tar < 1.30. setting TAR_BROKEN_NUMERIC_OWNER will
  # cause tar to execute the old, broken behavior.
  local $ENV{TAR_BROKEN_NUMERIC_OWNER} = 1;
  return recreatetarball_helper();
}

sub recreatetarball_broken_verbatim {
  # To fix #851286, the option --verbatim-files-from was added by
  # default. But now some older older stored tarballs won't reproduce
  # (#933031). Try again *without* that option to tar.
  my %options = @_;
  my $tempdir = $recreatetarball_tempdir;

  my $ret = "$tempdir/recreatetarball";
  my @cmd = (
    $tar_program,     "cf",
    $ret,             "--owner",
    0,                "--group",
    0,                "--numeric-owner",
    "-C",             "$tempdir/workdir",
    "--no-recursion", "--mode",
    "0644",
    "--files-from",   "$tempdir/manifest"
  );
  if (exists $options{tar_format}) {
    push @cmd, ("-H", $options{tar_format});
  }

  doit(@cmd);

  return $ret;
}

sub gentar {
  my $deltafile = shift;
  my $tarball   = shift;
  my %opts      = @_;

  check_file_exists($deltafile);
  check_directory_exists(dirname($tarball));

  my $delta = Pristine::Tar::Delta::read(Tarball => $deltafile);

  if (-e $tarball && verify_delta($delta, $tarball, fast => 1) == 0) {
    message("$tarball already exists and is valid");
    return;
  }

  Pristine::Tar::Delta::assert(
    $delta,
    type       => "tar",
    maxversion => $MAX_VERSION,
    minversion => $MIN_VERSION,
    fields     => [qw{manifest delta}]
  );

  my $out = (
    defined $delta->{wrapper}
    ? tempdir() . "/" . basename($tarball) . ".tmp"
    : $tarball
  );

  my @try;
  push @try, sub {
    recreatetarball(
      $delta->{manifest}, getcwd,
      clobber_source => 0,
      %opts
    );
  };
  push @try, \&recreatetarball_longlink_100;
  push @try, sub {
    recreatetarball(
      $delta->{manifest}, getcwd,
      clobber_source => 0,
      tar_format     => "gnu",
      %opts
    );
  };
  push @try, sub {
    recreatetarball(
      $delta->{manifest}, getcwd,
      clobber_source => 0,
      tar_format     => "posix",
      %opts
    );
  };
  push @try, \&recreatetarball_broken_verbatim;
  push @try, \&recreatetarball_broken_numeric_owner;

  my $ok;
  my $version = $delta->{version};
  foreach my $variant (@try) {
    my $recreatetarball = $variant->();
    my $ret;
    my $tool;

    my $deltatool = get_delta_tool_by_version($version, %VERSION_TO_TOOL);
    $ret = $deltatool->try_patch($recreatetarball, $delta->{delta}, $out);
    if ($ret == 0) {
      $ok = 1;
      last;
    }
  }
  if (!$ok) {
    error "Failed to reproduce original tarball. Please file a bug report.";
  }

  if (defined $delta->{wrapper}) {
    my $delta_wrapper =
      Pristine::Tar::Delta::read(Tarball => $delta->{wrapper});
    if (grep { $_ eq $delta_wrapper->{type} } qw{gz bz2 xz}) {
      doit(
        "pristine-" . $delta_wrapper->{type},
        ($verbose ? "-v" : "--no-verbose"),
        ($debug   ? "-d" : "--no-debug"),
        ($keep    ? "-k" : "--no-keep"),
        "gen" . $delta_wrapper->{type},
        $delta->{wrapper},
        $out
      );
      doit("mv", "-f", $out . "." . $delta_wrapper->{type}, $tarball);
    } else {
      error "unknown wrapper file type: " . $delta_wrapper->{type};
    }
  }
}

sub genmanifest {
  my $tarball  = shift;
  my $manifest = shift;

  open(IN, "$tar_program --quoting-style=escape -tf $tarball |") || die "tar tf: $!";
  open(OUT, ">", $manifest) || die "$!";
  while (<IN>) {
    chomp;
    # ./ or / in the manifest just confuses tar
    s/^\.?\/+//;
    print OUT unquote_filename($_) . "\n" if length $_;
  }
  close IN;
  close OUT;
}

my $recompressed = 0;

sub gendelta {
  my $tarball   = shift;
  my $deltafile = shift;
  my %opts      = @_;

  check_file_exists($tarball);
  check_directory_exists(dirname($deltafile));

  my $tempdir = tempdir();
  my %delta;

  # Check to see if it's compressed, and get uncompressed tarball.
  my $compression = undef;
  if (is_gz($tarball)) {
    $compression = 'gz';
    open(IN, "-|", "$zcat_program", $tarball) || die "zcat: $!";
    open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
    print OUT $_ while <IN>;
    close IN  || die "zcat: $!";
    close OUT || die "$tempdir/origtarball: $!";
  } elsif (is_bz2($tarball)) {
    $compression = 'bz2';
    open(IN, "-|", "bzcat", $tarball) || die "bzcat: $!";
    open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
    print OUT $_ while <IN>;
    close IN  || die "bzcat: $!";
    close OUT || die "$tempdir/origtarball: $!";
  } elsif (is_xz($tarball)) {
    $compression = 'xz';
    open(IN, "-|", "xzcat", $tarball) || die "xzcat: $!";
    open(OUT, ">", "$tempdir/origtarball") || die "$tempdir/origtarball: $!";
    print OUT $_ while <IN>;
    close IN  || die "xzcat: $!";
    close OUT || die "$tempdir/origtarball: $!";
  }
  close IN;

  # store a hash of the original tarball
  $delta{sha256sum} = get_sha256sum($tarball);

  my $origtarball = $tarball;

  # Generate a wrapper file to recreate the compressed file.
  if (defined $compression) {
    $delta{wrapper} = "$tempdir/wrapper";
    my @gen_wrapper = (
      "pristine-$compression", ($verbose ? "-v" : "--no-verbose"),
      ($debug ? "-d" : "--no-debug"), ($keep ? "-k" : "--no-keep"),
      "gendelta", $tarball,
      $delta{wrapper}
    );

    if (try_doit(@gen_wrapper) != 0) {
      if ($recompress && !$recompressed) {
        recompress($tarball, 'Could not reproduce tarball as-is');
        doit(@gen_wrapper);
        $recompressed = 1;
      } else {
        exit(1);
      }
    }
    $tarball = "$tempdir/origtarball";
  }

  $delta{manifest} = "$tempdir/manifest";
  genmanifest($tarball, $delta{manifest});

  my $recreatetarball;
  if (!exists $opts{recreatetarball}) {
    my $sourcedir = "$tempdir/tmp";
    doit("mkdir", $sourcedir);
    doit($tar_program, "xf", File::Spec->rel2abs($tarball), "-C", $sourcedir);
    # if all files were in a subdir, use the subdir as the sourcedir
    my @out = grep { $_ ne "$sourcedir/.." && $_ ne "$sourcedir/." }
      (glob("$sourcedir/*"), glob("$sourcedir/.*"));
    if ($#out == 0 && -d $out[0]) {
      $sourcedir = $out[0];
    }
    $recreatetarball =
      recreatetarball("$tempdir/manifest", $sourcedir, clobber_source => 1);
  } else {
    $recreatetarball = $opts{recreatetarball};
  }

  $delta{delta} = "$tempdir/delta";
  my $ret;
  my $deltatool = get_delta_tool();
  $ret = $deltatool->try_diff($recreatetarball, $tarball, $delta{delta});

  if ($ret != 0) {
    error "delta program failed with return code $ret";
  }

  if (-s $delta{delta} >= -s $tarball) {
    print STDERR "error: excessively large binary delta for $tarball\n";
    if (!defined $compression) {
      print STDERR
"(Probably the tarball is compressed with an unsupported form of compression.)\n";
    } else {
      print STDERR "(Please consider filing a bug report.)\n";
    }
    exit 1;
  }

  Pristine::Tar::Delta::write(
    Tarball => $deltafile,
    {
      version => $genversion,
      type    => 'tar',
      %delta,
    }
  );

  if ($recompress && !$recompressed && defined($compression)) {
    my $tarballsize = -s $origtarball;
    my $deltasize= -s $deltafile;
    if ($deltasize > $recompress_theshold_bytes || 100 * $deltasize / $tarballsize > $recompress_theshold_percent) {
      # try again
      recompress($origtarball, "Generated delta is too large");
      $recompressed = 1;
      my $newdeltafile = $deltafile . '.new';
      gendelta($origtarball, $newdeltafile, %opts);
      my $newdeltasize = -s $newdeltafile;
      if ($newdeltasize < $deltasize) {
        unlink($deltafile) or die("unlink $deltafile: $!");
        rename($newdeltafile, $deltafile) or die("rename $newdeltafile $deltafile: $!");
      } else {
        unlink($newdeltafile) or die("unlink $newdeltafile: $!");
        unlink($origtarball) or die("unlink $origtarball: $!");
        rename($origtarball . ".backup", $origtarball) or die("rename $origtarball $origtarball.backup: $!");
        print STDERR "warning: recompressing didn't help, delta did not get smaller\n";
        print STDERR "         restoring original tarball\n";
      }
    }
  }
}

sub recompress {
  my $tarball = shift;
  my $reason = shift;

  my $backup = $tarball . '.backup';
  print STDERR "warning: $reason.\n";
  print STDERR "         Let's recompress the tarball, and try again.\n";
  print STDERR "         The original file will be renamed to $backup.\n";

  my ($uncompress, $compress);
  if (is_gz($tarball)) {
    $uncompress = 'gunzip';
    $compress = 'gzip';
  } elsif (is_bz2($tarball)) {
    $uncompress = 'bunzip2';
    $compress = 'bzip2';
  } elsif (is_xz($tarball)) {
    $uncompress = 'unxz';
    $compress = 'xz';
  } else {
    error("unsupported compression: $tarball");
  }

  my $tempdir = tempdir();
  my $tempfile = "$tempdir/uncompressed";
  doit('mv', '-f', $tarball, $backup);
  doit_redir($backup, $tempfile, $uncompress, '-');
  doit_redir($tempfile, $tarball, $compress, '-');
}

sub get_sha256sum {
  my $tarball   = shift;
  my $sha256sum = Digest::SHA->new(256);
  $sha256sum->addfile($tarball);
  return $sha256sum->hexdigest;
}

sub vcstype {
  if (-e ".git" || (exists $ENV{GIT_DIR} && length $ENV{GIT_DIR})) {
    return 'git';
  } else {
    error("cannot determine type of vcs used for the current directory");
  }
}

sub export {
  my $upstream = shift;

  my $dest = tempdir();
  my $id;

  my $vcs = vcstype();
  if ($vcs eq "git") {
    if (defined $upstream && $upstream =~ /[A-Za-z0-9]{40}/) {
      $id = $upstream;
    } else {
      if (!defined $upstream) {
        $upstream = 'upstream';
      }

      my @reflines = map { chomp; $_ } `git show-ref \Q$upstream\E`;
      if (!@reflines) {
        error "failed to find ref using: git show-ref $upstream";
      }

      # if one line's ref matches exactly, use it
      foreach my $line (@reflines) {
        my ($b) = $line =~ /^[A-Za-z0-9]+\s(.*)/;
        if ($b eq $upstream || $b eq "refs/heads/$upstream") {
          ($id) = $line =~ /^([A-Za-z0-9]+)\s/;
          last;
        }
      }

      if (!defined $id) {
        if (@reflines == 1) {
          ($id) = $reflines[0] =~ /^([A-Za-z0-9]+)\s/;
        } else {
          error "more than one ref matches \"$upstream\":\n"
            . join("\n", @reflines);
        }
      }
    }

    # We have an id that is probably a commit. Let's get to the
    # id of the actual tree instead. This makes us more robust
    # against any later changes to the commit.
    my $treeid = `git rev-parse '$id^{tree}'`;
    chomp $treeid;
    $id = $treeid if length $treeid;

    doit("git archive --format=tar \Q$id\E | (cd '$dest' && tar x)");
  } else {
    die "unsupported vcs $vcs";
  }

  return ($dest, $id);
}

sub git_findbranch {
  # Looks for a branch with the given name. If a local branch exists,
  # returns it. Otherwise, looks for a remote branch, and if exactly
  # one exists, returns that. If there's no such branch at all, returns
  # undef. Finally, if there are multiple remote branches and no
  # local branch, fails with an error.
  # If remove_dups is true, then remote branches pointing to the same
  # hash are squashed. This allows to list/checkout in this special case.

  my $branch      = shift;
  my $remove_dups = shift;

  my @reflines = split(/\n/, `git show-ref \Q$branch\E`);
  my @remotes = grep { !m/ refs\/heads\/\Q$branch\E$/ } @reflines;
  if ($#reflines != $#remotes) {
    return $branch;
  } else {
    if ($remove_dups) {
      # squash remotes with the same git hash (first field)
      my (%ids, @new_remotes);
      foreach my $line (@remotes) {
        $line =~ /^(\w+)/;
        if (!$ids{$1}) {
          push @new_remotes, $line;
          $ids{$1} = 1;
        }
      }
      @remotes = @new_remotes;
    }
    if (@reflines == 0) {
      return undef;
    } elsif (@remotes == 1) {
      my ($remote_branch) = $remotes[0] =~ /^[A-Za-z0-9]+\s(.*)/;
      return $remote_branch;
    } else {
      error
"There's no local $branch branch. Several remote $branch branches exist.\n"
        . "Run \"git branch --track $branch <remote>\" to create a local $branch branch\n"
        . join("\n", @remotes);
    }
  }
}

sub checkoutdelta {
  my $tarball = shift;

  my $branch    = "pristine-tar";
  my $deltafile = basename($tarball) . ".delta";
  my $idfile    = basename($tarball) . ".id";
  my $sigfile   = basename($tarball) . ".asc";

  my ($delta, $id, $signature);

  my $vcs = vcstype();
  if ($vcs eq "git") {
    my $b = git_findbranch($branch, 1);
    if (!defined $b) {
      error "no $branch branch found, use \"pristine-tar commit\" first";
    } elsif ($b eq $branch) {
      $branch = "refs/heads/$branch";
    } else {
      # use remote branch
      $branch = $b;
    }

    $delta = `git show $branch:\Q$deltafile\E`;
    if ($?) {
      error "git show $branch:$deltafile failed";
    }
    if (!length $delta) {
      error "git show $branch:$deltafile returned no content";
    }
    $id = `git show $branch:\Q$idfile\E`;
    if ($?) {
      error "git show $branch:$idfile failed";
    }
    chomp $id;
    if (!length $id) {
      error "git show $branch:$idfile returned no id";
    }
    if (defined $signature_file) {
      # We only extract the signature if the user specifically requested
      # it and we assume the data will fit comfortably into memory.
      $signature = `git show $branch:\Q$sigfile\E`;
      if ($?) {
        error "git show $branch:$sigfile failed";
      }
    }
  } else {
    die "unsupported vcs $vcs";
  }

  return ($delta, $id, $signature);
}

sub commitdelta {
  my $delta   = shift;
  my $id      = shift;
  my $tarball = shift;

  my $branch    = "pristine-tar";
  my $deltafile = basename($tarball) . ".delta";
  my $idfile    = basename($tarball) . ".id";
  my $sigfile   = basename($tarball) . ".asc";
  my $commit_message =
    defined $message
    ? $message
    : "pristine-tar data for " . basename($tarball);

  my $vcs = vcstype();
  if ($vcs eq "git") {
    my $tempdir = tempdir();
    open(OUT, ">$tempdir/$deltafile") || die "$tempdir/$deltafile: $!";
    print OUT $delta;
    close OUT;
    open(OUT, ">$tempdir/$idfile") || die "$tempdir/$idfile: $!";
    print OUT "$id\n";
    close OUT;
    if (defined $signature_file) {
      copy($signature_file, "$tempdir/$sigfile") || die "$tempdir/$sigfile: $!";
    }

    # Commit the delta to a branch in git without affecting the
    # index, and without touching the working tree. Aka deep
    # git magick.
    $ENV{GIT_INDEX_FILE} = "$tempdir/index";
    $ENV{GIT_WORK_TREE}  = "$tempdir";
    if (!exists $ENV{GIT_DIR} || !length $ENV{GIT_DIR}) {
      $ENV{GIT_DIR} = getcwd . "/.git";
    } else {
      $ENV{GIT_DIR} = abs_path($ENV{GIT_DIR});
    }
    chdir($tempdir) || die "chdir: $!";

    # If there's no local branch, branch from a remote branch
    # if one exists. If there's no remote branch either, the
    # code below will create the local branch.
    my $b = git_findbranch($branch, 0);
    if (defined $b && $b ne $branch) {
      doit("git branch --track \Q$branch\E \Q$b\E");
    }

    my $branch_exists =
      (system("git show-ref --quiet --verify refs/heads/$branch") == 0);
    if ($branch_exists) {
      doit(
        "git ls-tree -r --full-name $branch | git update-index --index-info");
    }
    doit("git", "update-index", "--add", $deltafile, $idfile);
    if (defined $signature_file) {
      doit("git", "update-index", "--add", $sigfile);
    }
    my $sha = `git write-tree`;
    if ($?) {
      error("git write-tree failed");
    }
    $sha =~ s/\n//sg;
    if (!length $sha) {
      error("git write-tree did not return a sha");
    }
    my $pid = open(COMMIT, "|-");
    if (!$pid) {
      # child
      my $commitopts = $branch_exists ? "-p $branch" : "";
      my $commitsha = `git commit-tree $sha $commitopts`;
      if ($?) {
        error("git commit-tree failed");
      }
      $commitsha =~ s/\n//sg;
      if (!length $commitsha) {
        error("git commit-tree did not return a sha");
      }
      doit("git", "update-ref", "refs/heads/$branch", $commitsha);
      exit 0;
    } else {
      # parent
      print COMMIT $commit_message . "\n";
      close COMMIT || error("git commit-tree failed");
    }

    message("committed $deltafile to branch $branch");
    if (defined $signature_file) {
      message("committed $sigfile to branch $branch");
    }
  } else {
    die "unsupported vcs $vcs";
  }
}

sub commit {
  my $tarball  = shift;
  my $upstream = shift;    # optional

  if (!defined $tarball || @_) {
    usage();
  }

  check_file_exists($tarball);
  if (defined $signature_file) {
    check_file_exists($signature_file);
  }

  my $tempdir = tempdir();
  my ($sourcedir, $id) = export($upstream);
  genmanifest($tarball, "$tempdir/manifest");
  my $recreatetarball = recreatetarball(
    "$tempdir/manifest", $sourcedir,
    clobber_source => 1,
    create_missing => 1
  );
  my $pid = open(GENDELTA, "-|");
  if (!$pid) {
    # child
    gendelta($tarball, "-", recreatetarball => $recreatetarball);
    exit 0;
  }
  local $/ = undef;
  my $delta = <GENDELTA>;
  close GENDELTA || error "failed to generate delta";
  commitdelta($delta, $id, $tarball);
}

sub checkout {
  my $tarball = shift;
  my %opts    = @_;

  check_directory_exists(dirname($tarball));

  my ($delta, $id, $signature) = checkoutdelta($tarball);
  my ($sourcedir, undef) = export($id);
  my $pid = open(GENTAR, "|-");
  if (!$pid) {
    # child
    $tarball = abs_path($tarball);
    chdir($sourcedir) || die "chdir $sourcedir: $!";
    gentar("-", $tarball, clobber_source => 1, create_missing => 1);
    exit 0;
  }
  print GENTAR $delta;
  close GENTAR || error "failed to generate tarball";
  message("successfully generated $tarball") unless $opts{quiet};

  if (defined $signature_file) {
    open(OUT, ">$signature_file") || die "$signature_file: $!";
    print OUT $signature;
    close OUT;
    message("successfully generated $signature_file") unless $opts{quiet};
  }
}

sub list {
  my $branch = "pristine-tar";
  my $vcs    = vcstype();
  if ($vcs eq "git") {
    my $b = git_findbranch($branch, 1);
    if (defined $b) {
      open(LIST, "git ls-tree $b --name-only | sort -Vr |");
      while (<LIST>) {
        chomp;
        next unless s/\.delta$//;
        print $_. "\n";
      }
    }
  } else {
    die "unsupported vcs $vcs";
  }
}

sub verify {
  my $tarball = shift;

  check_file_exists($tarball);

  my ($deltadata, $id, undef) = checkoutdelta($tarball);

  my $deltafile = File::Temp->new();
  print $deltafile $deltadata;

  my $delta = Pristine::Tar::Delta::read(Tarball => $deltafile->filename);

  exit(verify_delta($delta, $tarball));
}

sub check_directory_exists {
  my $dirname = shift;
  if (!-d $dirname) {
    error "The directory $dirname does not exist.";
  }
}

sub check_file_exists {
  my $filename = shift;
  if ($filename ne "-" and !-f $filename) {
     error "The file $filename does not exist.";
  }
}

sub verify_delta {
  my $delta   = shift;
  my $tarball = shift;
  my %opts    = @_;

  my $tarball_hash = get_sha256sum($tarball);
  my $stored_hash;

  if (defined $delta->{sha256sum}) {
    $stored_hash = $delta->{sha256sum};
  } elsif ($opts{fast}) {
    return 1;
  } else {
    my $tmpdir       = File::Temp->newdir();
    my $tmpdirpath   = $tmpdir->{DIRNAME};
    my $orig_tarball = $tmpdirpath . '/' . basename($tarball);
    checkout($orig_tarball, quiet => 1);
    $stored_hash = get_sha256sum($orig_tarball);
  }

  if ($tarball_hash eq $stored_hash) {
    return 0;
  } else {
    message(
"${tarball} does not match stored hash (expected ${stored_hash}, got ${tarball_hash})"
    );
    return 1;
  }
}
