C#-compatible Unicode strings in PHP
I’m in the process of converting a .NET web site to PHP and I want to reuse the same database schema and existing data. One of the issues is that user passwords are stored as hashes of the byte representation of the original Unicode password string. As an example, the hash may be generated using a routine like this:
public static string GetMD5(string text) { System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create(); byte[] inputBytes = System.Text.Encoding.Unicode.GetBytes(text); byte[] hash = md5.ComputeHash(inputBytes); return Convert.ToBase64String(hash); }
It’s worth noting that Unicode encoded strings in .NET are actually Little Endian UTF-16 encoded. PHP 5 provides native support for UTF-8 strings (via utf8_decode and utf8_encode), but not UTF-16. Luckily, PHP contains a Swiss Army knife of a function for all multi-byte string conversions, mb_convert_encoding. We can use this function with the UTF-16LE encoding to convert ASCII strings to Unicode strings. The PHP documentation has a full list of supported encodings. Re-writing the above C# in PHP, we get the following:
public static function GetMD5($password) { $pwdUtf16 = mb_convert_encoding($password,'utf-16le'); $hash = md5($pwdUtf16,true); // 'true' tells md5 to use the raw string bytes return base64_encode($hash); }