Skip to content

Updating to 1.1.7.3 stopped supporting non-ASCII fields in brands/campaigns

edited July 2013 in Troubleshooting

I have upgraded right now Sendy to 1.1.7.3 and the result of it is all the Russian names in brands/brands data are now displayed garbled every character is displayed as if ASCII is in effect , not Unicode).

When I have tried to replace it with Russian strings once again, I only saw question marks in text fields.

Looks like the upgrade has destroyed Unicode support for certain fields.

Updated: better subject set.

What I am expected to do now?

Comments

  • Sorry for typos. I only tried to re-assign one affected brand name/description.

    Note that confog.php contains a parameter ($charset) missing in previous versions.

    My ability to send any campaigns in Russian is completely paralyzed now and I would appreciate your assistance in fixing charset representation/whatever problem.

  • Also: re-uploading database dump I made prior to applying upgrade didn't help.

    The problem is definitely in software (Sendy), which stopped supporting Unicode in certain fields.

  • Hi @temmokan,

    1.1.7.3 does not have any changes to do with the charset. Since 1.1.6.3 sets character set to UTF8 for each database connection (which you can change in config.php to something else), there are no changes to character sets at all in versions after that.

    Send me some Russian characters and I can try on my side.

    Thanks.

    Best regards,
    Ben

  • That doesn't solve my problem. All Unicode strings taken from database are not treated as Unicode, even though $charset = 'utf8' is set in config.php.

    How shall I send you some Russian characters? Will this:

    "Каким образом переслать вам русский текст?"

    do?

  • Note: all default character sets at my side are set to utf8. Issuing '\s' commmand after connecting to DB gives:

    ...
    Server characterset: utf8
    Db characterset: utf8
    Client characterset: utf8
    Conn. characterset: utf8
    ...

  • OK, I start to see the problem.

    In the tables, all text field have this collation assigned:

    latin1_swedish_ci

    That's not Unicode-friendly. Something like utf8_unicode_ci should be used.

    I managed to change the brand name (apps.app_name) only after I have changed the collation to mentioned Unicode-supporting one.

    Looks like I have now to walk over all the DB and manually change all collations.

    Have you actually tested the above Unicode support with real-life non-English entries?

  • OK, the summary of the problem.

    Tables in Sendy DB have been originally created either without explicit charset declaration per table, or with default Latin1 charset assigned.

    So Unicode data were written to Latin1 tables. That explains why certain character strings were either corrupted or missing after written to database. I have simply proved that: I set default charset of sever to latin1, set $charset in config.php to latin1 and Sendy displayed brands/whatever correctly.

    So the task is to convert database's text columns to proper charset without actually converting anything.

    If someone knows existing tool to do that , I will greatly appreciate that.

  • Hi, earlier versions of Sendy creates Latin1 tables during installation (it's a long story), later versions of Sendy creates UTF8 tables during installation.

    Don't do anything with your database, just set $charset = 'latin' in /includes/config.php and you should see everything go back to normal.

  • edited July 2013

    Not everything.

    I have reported a problem when using certain character sequences in messages (in subjects, especially) those with quotes followed by Unicode character, result in the string being cut and/or corrupted.

    I can't fathom advising brand owners "If your test message has subject cut/damaged, try adding spaces after special characters until the problem is gone". As I see it, system should work fine without glitches (which are inevitable when Latin1 stores Unicode).

    Correct procedure is to convert database text fields (text fields to blobs and back, with specifying target charset when converting back), so that charset would change without actually converting stored data.

    And, of course, explicitly specifying character sets and collations when creating/altering tables tables.

  • edited July 2013

    OK, since I thought no one would ever bother about converting DB, I did everything myself. Sequence of actions:

    1. Make sure SQL backend works in Latin1. Set $config to latin1 and make sure Sendy interface displays data correctly.
    2. Convert text fields to blobs (I can send SQL file with stetements list to those who could use it.
    3. Dump database using command like "mysqldump --hex-blob --default-character-set=latin1 -usendy -p sendy"
    4. In the dumped SQL file, search and replace all occurrencies of 'latin1' with 'utf8'
    5. Restart SQL backend, setting utf8 as default charset for server and clients
    6. Drop and re-create sendy DB, explicitly specifying its default collation as utf8_general_ci
    7. Load the modified dump
    8. Convert blob fields back to text fields, specifying utf8 as default charset (once again, I'll send the script to those interested)
    9. Set in config.php $charset to utf8
    10. Open Sendy Web interface and see correct display of Unicode strings

    Have a nice day!

  • Thanks for listing the steps to help others in future who may need it!

  • edited July 2013

    Here's a .zip with all files in case someone needs them:

    http://files.vpseer.com/patches/sendy.co-transform-DB-to-Unicode.zip

    Thanks!

This discussion has been closed.