Skip to content

MySQL Indexes on subscribers.email and subscribers.list

edited May 2013 in Suggestions

I know you said no multiple times on threads that are now closed, but we can't find no reason why this wouldn't be an improvement worth of including in the next version.

We did this in our install after seeing imports hang and never finish, and it doesn't seem to slow down subscriber inserts and updates. To the contrary, a simple index on the email field significantly speeds up CSV imports and subscriber additions (like 20-50 times the speed, for a subscribers table of 200k subscribers on an EC2 micro instance, from 3-5 subscribers every second to around 100 subscribers per second), and the list index significantly speeds up the Lists page (again, like 20-50 times or more).

It should also make subscriber imports and updates faster, since you locate the subscriber by e-mail, and without an index that will take much longer to locate.

(If I'm not mistaken, the only way indexes would slow down the app is if you indexed a field that needs to be editable. Subscribers' e-mails and list IDs never get updated, thus the index is not changed except for new records).

Comments

  • Hi Guido,

    I know you said no multiple times on threads that are now closed, but we can't find no reason why this wouldn't be an improvement worth of including in the next version.

    Yes I have said no before. But on later versions of Sendy, I have added indexes to the database.

    But I have not added an index in the email field as it does not seem to make any difference when I tested.

    I'll test it again since you said CSV import speed became 20 - 50 times faster. How many records was in your CSV when you tested?

    Thanks.

    Ben

  • CSVs had an average of 10k records, although it made imports faster on CSVs of 500-1000 records (much less) as well.

    The lists screen got a great speedup when adding an index to the "list" field as well, all this on an EC2 Micro Instance which hanged up consistently before adding the indexes.

  • Do you mean the "list" column in "subscribers" table? It already have an index though.

  • Yes, that's what I mean. Unless I'm very mistaken, I had to add it manually: Subscribers table came with a primary key (and unique index) on "id", but no indexes. I added an index on subscribers.list and on subscribers.email and it greatly sped up the lists screen and the campaign view.

  • It's the subscribers.list column that needs an index and it already has an index. subscribers.email does not require one.

    Thanks.

    Ben

  • Just to clarify:
    a) The index for subscribers.list speeds up the list view.
    b) The index for subscribers.email speeds up the CSV import process.

    Both things were dead slow when installing Sendy the first time, and are at least ~20 times faster since adding both indexes.

    If you see the subscribers.list index already there on the dump for the Sendy install, that's great -- it just didn't get added for me on install.

    As for the subscribers.email index, I would recommend you add it, since it makes the CSV import feature much faster and usable for much larger lists in much smaller servers (thus, it might make your business larger, sell more licenses, etc. because people won't be restricted to small lists or huge, expensive servers). but it's your product and you're obviously free not to add the index if you want.

  • BenBen
    edited June 2013

    Hi Guido,

    Thanks for your suggestion. I want you to know I appreciate it!

    I've added an index for subscribers.email during my tests but only to realize later it's the index on subscribers.list that matters. However, I'll test it again.

    Thanks once again.

    Best regards,
    Ben

  • Ok thanks!
    The reason the email index matters is that the import-csv.php process queries the subscribers database for each CSV line to see if the subscriber already exists (checking by e-mail). This is extremely slow on a large subscribers database without an email index.

  • I want to 2nd the notion that adding indexes for the import speeds it up MASSIVELY -- we have about 150k total emails but we have to segment out that into different lists which means many more. Before the index importing records would peg our RDS Large database to almost 90% for a good couple of hours - and that was for an import of about 50k emails. After the index was added the import process took about 12ish minutes but our database cpu barely jump up a few percent since it wasn't doing massive "full table scans" non-stop.

    Glad to see Ben and crew officially added indexes.

    Regards,
    Jeremy

  • BenBen
    edited October 2013

    Thanks @guidod and @tennispoint for your input. An index for email is now available. I did a 100,000 records import on a Linode server and it's jaw-droppingly fast.

This discussion has been closed.