recode.pl

NAME
SYNOPSIS
OPTIONS

NAME

recode.pl - Converts a database from one encoding (or multiple encodings) to UTF-8.

SYNOPSIS

 contrib/recode.pl [--guess [--show-failures]] [--charset=iso-8859-2]
                   [--overrides=file_name]

  --dry-run        Don't modify the database.

  --charset        Primary charset your data is currently in. This can be
                   optionally omitted if you do --guess.

  --guess          Try to guess the charset of the data.

  --show-failures  If we fail to guess, show where we failed.

  --overrides      Specify a file containing overrides. See --help
                   for more info.

  --help           Display detailed help.

 If you aren't sure what to do, try:

   contrib/recode.pl --guess --charset=cp1252

OPTIONS

--dry-run

Don't modify the database, just print out what the conversions will be.

recode.pl will print out a Key for each item. You can use this in the overrides file, described below.

--guess

If your database is in multiple different encodings, specify this switch and recode.pl will do its best to determine the original charset of the data. The detection is usually very reliable.

If recode.pl cannot guess the charset, it will leave the data alone, unless you've specified --charset.

--charset=charset-name

If you do not specify --guess, then your database is converted from this character set into the UTF-8.

If you have specified --guess, recode.pl will use this charset as a fallback--when it cannot guess the charset of a particular piece of data, it will guess that the data is in this charset and convert it from this charset to UTF-8.

charset-name must be a charset that is known to perl's Encode module. To see a list of available charsets, do:

perl -MEncode -e 'print join("\n", Encode->encodings(":all"))'

--show-failures

If --guess fails to guess a charset, print out the data it failed on.

--overrides=file_name

This is a way of specifying certain encodings to override the encodings of --guess. The file is a series of lines. The line should start with the Key from --dry-run, and then a space, and then the encoding you'd like to use.