recode.pl - Converts a database from one encoding (or multiple encodings) to UTF-8.


 contrib/recode.pl [--guess [--show-failures]] [--charset=iso-8859-2]

  --dry-run        Don't modify the database.

  --charset        Primary charset your data is currently in. This can be
                   optionally omitted if you do --guess.

  --guess          Try to guess the charset of the data.

  --show-failures  If we fail to guess, show where we failed.

  --overrides      Specify a file containing overrides. See --help
                   for more info.

  --help           Display detailed help.

 If you aren't sure what to do, try:

   contrib/recode.pl --guess --charset=cp1252



Don't modify the database, just print out what the conversions will be.

recode.pl will print out a Key for each item. You can use this in the overrides file, described below.


If your database is in multiple different encodings, specify this switch and recode.pl will do its best to determine the original charset of the data. The detection is usually very reliable.

If recode.pl cannot guess the charset, it will leave the data alone, unless you've specified --charset.


If you do not specify --guess, then your database is converted from this character set into the UTF-8.

If you have specified --guess, recode.pl will use this charset as a fallback--when it cannot guess the charset of a particular piece of data, it will guess that the data is in this charset and convert it from this charset to UTF-8.

charset-name must be a charset that is known to perl's Encode module. To see a list of available charsets, do:

perl -MEncode -e 'print join("\n", Encode->encodings(":all"))'


If --guess fails to guess a charset, print out the data it failed on.


This is a way of specifying certain encodings to override the encodings of --guess. The file is a series of lines. The line should start with the Key from --dry-run, and then a space, and then the encoding you'd like to use.