All my emails show up as unopened and links don't work

LawrenceBernard · October 2012

Hi

I have recently started using Sendy. I followed all the instructions and have correctly setup Sendy (I have double checked that all the settings are correct as far as I can tell).

When I send an email campaign and view the reports it shows that no emails have been opened, even though I have opened numerous emails and clicked Show Pictures on all of them. Additionally when I include a link in the campaign and I click the link I get to a page with an error. The URL I end up on is http://mydomain:myport/sendy/l/d/Resource id #3.

I am not using the Send Test Mail option - I am sending the campaign properly.

Would it make a difference that my sendy is setup using a port like 8087? Or what else could cause these issues?

Thanks

DrCocktor · October 2012

I'm also having the same problem with a brand new install of 1.0.9.

DrCocktor · October 2012

After some further troubleshooting, I've determined that this seems to be the case if the .htaccess file is not uploaded to sendy. I'm not sure exactly how the redirect was ending up where it was, but uploading the .htaccess file solved the problem.

However, in debugging the problem, I noticed something else that troubles me quite a lot: link access is saved by saving a list of user ids in the links table. This is horrendously slow. Every single time a link is clicked it requires fetching and pushing an increasingly large data structure from the database to the php runtime and then back to the database.

The snippet from the links table looks like this in SQL:


+----+-------------+----------------+-----------------------------------+---------------+
| id | campaign_id | ares_emails_id | link                              | clicks        |
+----+-------------+----------------+-----------------------------------+---------------+
|  2 |           1 |           NULL | http://sendy.co/                  | 1             |
|  3 |           1 |           NULL | http://newsletter.mysite.com/     | 10,10,10,10,1 |
+----+-------------+----------------+-----------------------------------+---------------+

As the number of clicks on a link grows, the size of data moving to and from the database becomes absurd.

Assuming 100,000 user in the system, the median user id will be 5 characters long, plus one character for each comma, means that the size of the links column will grow by 6 bytes per click. Worse, however, each time a link is click all previous data is loaded from the database and then re-persisted to the database. This means that over if a link has 1000 unique clicks, the 1000th click will load about 6k data from the database and then save about 6KB data to the database, just for that one click. Furthermore, over the course of that 1,000 clicks, more that 6MB of data will have gone in and out of the database in recording the 1,000 clicks on that link. This puts a tremendous strain on the database, even for a relatively modest link count. Doubling the link click count to just 2,000 puts the total data at a staggering 24MB in and out of SQL just to record that one link.

Furthermore, this is not an atomic or durable access pattern, assuming that PHP is running in a multi-request environment (which it usually is), multiple incoming requests will result in some, but not all, of the link requests being non-deterministicly saved. Consider:

User 1 hits the link page and the clicks field is returned as A
User 2 hits the link page and the clicks field is returned as A
User 2 saves the link field as A,2
User 1 saves the link field as A,1

The final state of the link field is now A,1, not A,2,1 as expected. Furthermore, there is no deterministic way to know which one will be saved, and no way of knowing if such data loss occurred.

In addition, this data is nearly un-queryable. To determine the links that a specific user has clicked on, you have to load the entire click history of every campaign that user has ever been part of, perform explode (making another copy of all that data) and then walk through it counting link occurrence. This results in a ghastly runtime. Again, using my example of average 1000 clicks per link and assuming 2 links per campaign and that a given user has received 12 campaigns, then the system will load 144MB of data (and then double it in live memory with the explode) and perform over 20,000 comparison operations (in_array is O(n)) even if that user has never clicked a single link (for a specific occurrence of such a loop, see line 157 of /includes/subscribers/subscriber-info.php).

This is an extremely anti-performant patttern, and I strongly suggest that you move clicks into a separate table. With the proper use of indices, the time it takes to record each click should fall significantly.

Setting aside the messiness in the database, the redirection code only succeeds if the insert of all that data into the database succeeds. Unfortunately, this means that in a worst case, not only does the user not get recorded, but they don't get re-directed either, and instead are simply dumped on a blank page.

Finally, you use the bare mysql_* functions for DB access. This isn't the end of the world, but they are in the process of getting deprecated, and now might be a good time to switch (before you have too much more code in).

derivativemedia · October 2012

Any response on this Ben?

Ben · October 2012

@DrCocktor 1 click = 6b (depending on id length), 1000 clicks = 6kb, I'm not sure how you derive at 6MB and 25MBs.

Initially clicks was intended to be recorded per row in a table, but it will quickly balloon to thousands and thousands of rows. The maximum no. of rows a mySQL allows will quickly be reached. That is the reason why clicks are stored in a LONGTEXT column instead.

DrCocktor · October 2012

@Ben

I agree, one click generates 6 Bytes (big "B"), and the actual size of the longtext column is 6 KB, however, the total data transferred in and out of the database is cumulative. That is to say, the nth click will have resulted in the partial sum from 1 to n of 6i+6(i-1).

I'm quite confused by your assertion that there is an upper limit on the number of rows in a MySQL table. It is true that the (now largely deprecated) MyISAM engine has a 4GB limit on table size (without some DB engine changes), but the more common InnoDB engine has only a 64TB (or in some cases larger limit) on size. I concede that even for some massive power users this may not be enough, however, a database written using a clean structuring of the data could have the keys sharded across multiple nodes, if it ever comes to that. Furthermore, 64TB can hold just shy of 9 trillion clicks, assuming that your table definition uses just two integers (campaign_id and subscriber_id). Of course this would only allow you to store unique clicks, but with the simple addition of a datetime field so that the table is (campaign_id, subscriber_id, click_time), then you can still store a perfectly respectable 4 and change trillion clicks, and, moreover, can answer some very interesting questions like:

How many clicks do I get each hour for the first n hours of the campaign
Is a particular user more or less active during a certain hour of the day
What is the average (mean, median, mode) time after a campaign is sent that an open occurrs
What is a users average clickrate over multiple campaigns (technically, you can calculate this one now, but as stated earlier, it requires loading the entire click history of every campaign into memory)

You may read this and say "Ok, academically speaking he's right, but the performance seems OK at the moment, so I'm not going to worry about it". I strongly encourage you not to select this option. While the performance seems acceptable on small datasets it will quickly erode. Furthermore, by munging strings you have taken all relational strength away from the data. In a day and age with ultra fast key-value stores (such as redis, etc), the only compelling reason to use a strong relational DB such as MySQL is so that you can perform ad-hoc sql queries, except by effectively obfuscating the data, that no longer works.

derivativemedia · October 2012

you and @Ben should team up to solve this structural issue for ver 1.1 or perhaps 2.0 in this case!

Ben · October 2012

The reason for our implementation is very simple - to lengthen the lifespan of your database. Hence we keep click data in one column. Just imagine blog posts with thousands of words. These words are kept in one column as well.

Can you imagine your database filling up quickly with millions of rows as a result of clicks? Now, querying through these enormous amount of data does not help either. If click data are stored in one column associated with its link, there's less to query and would result in faster performance. Not everything needs to be coded like its set in stone. Sometimes a little creativity is needed to solve a much bigger problem. And the simplest solution wins. Take Reddit for example, Reddit's database only has 2 tables - http://kev.inburke.com/kevin/reddits-database-has-two-tables/

DrCocktor · October 2012

@Ben

Your reason is worthy but your implementation is flawed. In general, a blog post is one unit of data, and as such is stored as a single longtext field. However, this has many significant disadvantages. For instance, since longtext is stored outside of index-able space, performing a search on the text of that blog post is exceedingly expensive. In fact, many heavily used blogs use an external search server like lucene (or solr) to perform searching. A typical implementation of this is to store all of the word prefixes in a tree with each leaf of the tree containing just the id to go back and retrieve the entire block of text that generated that leaf. So in fact, each word is stored "separately", in a sense.

Redis may well only have two tables, but redis shards their data across multiple datastores, and uses a key/value organizations. That is, (based on my understanding of the article), they store many many rows where the row has only four columns, type, id, key and value. Thus, for instance, a row entry might look like (user_click, 1, 1, null), representing that a user_click event occurred on campaign 1 by user 1. Similarly, user metadata (custom fields) would be stored in the same table as something like (user_meta, 1, name, DrCocktor), indicating that user id 1 has a key of name with the associated value attached.

There is a great deal of debate as to the speed associated with using a key/value table, but it seems to work well for many projects (see for example, the post_meta table in Wordpress which exactly implements this pattern). Notice, however, that each column never contains more than one atomic value. I would recommend for you, however, that you stick to the somewhat simpler "normal" form of simply having a "user_clicks" table to join the subscribers table to the campaigns table.

As to your second answer that click data stored in one column is faster to query, I disagree. As we've already established, in order to answer simple questions like "show me all the clicks for this user", you have to load all of the click data for every campaign that user has ever been a part of. That doesn't seem fast at all, and in fact it's not -- it's really, really slow. In addition, by using a longtext column you significantly slow down the speed of each query, as the longtext column lives on disk and is not available directly from the index. See http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html, for instance.

Furthermore, you seem confused by the nature of databases. There is not (as a general statement) a finite number of rows in a database table. With InnoDB you can store 64TB of data, whether it's in one very, very long row or millions of very, very short rows. In fact, a simple joining table consumes just 8 Bytes per click recorded (16 if you include time). Similarly, having a lot of rows does not slow down the query, as MySQL does not do a complete table scan on every query, instead, it uses a flavor of B-Trees to perform indexed queries that allow it to complete complex lookups in milliseconds. In comparison, your current approach does not take advantage of any indexing strategy -- you really have to load all the data and manually scan over it in cubic (or worse) time.

Finally, and most importantly, this implementation does not record all hits. Plain and simple. Setting aside whether you think that my architectural critiques are correct, the fact of the matter is that race conditions exists where some hits are ignored because of the order data is saved in. To solve this, you could a) wrap everything in transactions, making the already pitiful performance worse, or b) just fix it.

As a parting note, you say that "not everything needs to be set in stone," and I suspect that this is a dig at me. After all, the people who are usually pushing for more normalized SQL fields are old fuddy-duddies working at legacy enterprises who want to normalize the data because that's how they've always done things. I assure you, that's not the case with me. I'll denormalize data when appropriate, but in this case there is no conceivable benefit from leaving the data in it's current form and significant performance downsides.

I know -- nobody ever wants to change SQL syntax that seems to be working, but really, it's not so bad.

Afterburner · November 2012

The issue above was related to the .htaccess file but the fix didn't work since the .htaccess file was already there. The issue was that apache2 has .htaccess files disabled by default for sites in /var/www in ubuntu.

The fix for this can be found here
https://help.ubuntu.com/community/EnablingUseOfApacheHtaccessFiles

All my emails show up as unopened and links don't work

Comments