Hey
I have over 300k leads in a month, and nearly 100k page hits a day. I’m using a the js api to track page visits.
I noticed that the initial loading of the dashboard with default widgets after login is very slow. Sometimes it takes more than 15 seconds, sometimes it takes longer and the site fails to load. Caching helps when I try to reload the dashboard, but still viewing some nice statistics of a longer period is nearly impossibile.
I set up the maintenance cron job to clear page hits and unidentified leads older than 90 days. This did not make the loading of the dashboard any faster.
I tried to figure out what could be the cause of the slowness.
The slow queries:
SELECT DATE_FORMAT(t.date_hit, ‘%Y %U’) AS date, COUNT(distinct(t.lead_id)) AS count FROM ma_page_hits t WHERE t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05-12 23:59:59’ GROUP BY DATE_FORMAT(t.date_hit, ‘%Y-%m-%d’) ORDER BY DATE_FORMAT(t.date_hit, ‘%Y %U’) ASC LIMIT 7
SELECT COUNT(t.lead_id) AS count FROM ma_page_hits t WHERE t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05-12 23:59:59’
SELECT COUNT(t.date_hit) AS count FROM ma_page_hits t WHERE (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) >= ‘0’) AND (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) < ‘60’) AND (t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05-12 23:59:59’)
SELECT COUNT(t.date_hit) AS count FROM ma_page_hits t WHERE (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) >= ‘60’) AND (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) < ‘300’) AND (t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05-12 23:59:59’)
SELECT COUNT(t.date_hit) AS count FROM ma_page_hits t WHERE (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) >= ‘300’) AND (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) < ‘600’) AND (t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05-12 23:59:59’)
The ma_page_hits table has 10m records in it. It is a huge 13G table.
Mysql in theses queries on my dataset refuses to use indexes, however there is one on date_hit field.
If I erase other filters and query only on the date_hit field, the index is used. With other filters it doesn’t.
I tried to tune mysql as much as I could to make it use the index. No luck.
I tried to increase disk bandwidth and memory size of the VM. Though giving mautic more resources may helped on the speed but these queries above still take 1minutes to run. Still not comfortable to use and nevertheless I have to pay more.
I dived into mysql query optimizer cost model. (I couldn’t tune it because it is only available on Mysql 5.7, and I’m 5.6)
Somehow mysql considers full table scan more costy than using the index and filtering two thirds of the dataset by that.
I mentioned one minute query time. For example:
SELECT DATE_FORMAT(t.date_hit, ‘%Y %U’) AS date, COUNT(distinct(t.lead_id)) AS count FROM ma_page_hits t WHERE t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05-12 23:59:59’ GROUP BY DATE_FORMAT(t.date_hit, ‘%Y-%m-%d’) ORDER BY DATE_FORMAT(t.date_hit, ‘%Y %U’) ASC LIMIT 7
7 rows in set (58.02 sec)
EXPLAIN EXTENDED SELECT DATE_FORMAT(t.date_hit, ‘%Y %U’) AS date, COUNT(distinct(t.lead_id)) AS count FROM ma_page_hits t WHERE t.date_hit BETWEEN ‘2017-04-01 00:00:00’ AND ‘2017-05
-12 23:59:59’ GROUP BY DATE_FORMAT(t.date_hit, ‘%Y-%m-%d’) ORDER BY DATE_FORMAT(t.date_hit, ‘%Y %U’) ASC LIMIT 7
-> ;
±—+
+
+
+
+
+
+
+
+
+
+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+
+
+
+
+
+
+
+
+
+
+
| 1 | SIMPLE | t | ALL | ma_page_date_hit | NULL | NULL | NULL | 9330849 | 50.00 | Using where; Using temporary; Using filesort |
+----+
+
+
+
+
+
+
+
+
+
+
1 row in set, 1 warning (0.02 sec)
However if I force the index:
EXPLAIN EXTENDED SELECT DATE_FORMAT(t.date_hit, '%Y %U') AS date, COUNT(distinct(t.lead_id)) AS count FROM ma_page_hits t force index(ma_page_date_hit) WHERE t.date_hit BETWEEN '201
7-04-01 00:00:00' AND '2017-05-12 23:59:59' GROUP BY DATE_FORMAT(t.date_hit, '%Y-%m-%d') ORDER BY DATE_FORMAT(t.date_hit, '%Y %U') ASC LIMIT 7;
+----+
+
+
+
+
+
+
+
+
+
+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+
+
+
+
+
+
+
+
+
+
+
| 1 | SIMPLE | t | range | ma_page_date_hit | ma_page_date_hit | 5 | NULL | 4665459 | 100.00 | Using index condition; Using temporary; Using filesort |
+----+
+
+
+
+
+
+
+
+
+
+
1 row in set, 1 warning (0.00 sec)
7 rows in set (33.29 sec)
The query finishes a faster.
Likewise the chart query:
SELECT COUNT(t.date_hit) AS count FROM ma_page_hits t WHERE (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) >= '0') AND (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) < '60') AND (t.date_hit BETWEEN '2017-04-01 00:00:00' AND '2017-05-12 23:59:59')
1 row in set (49.29 sec)
forcing the index:
SELECT COUNT(t.date_hit) AS count FROM ma_page_hits t force index(ma_page_date_hit) WHERE (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) >= '0') AND (TIMESTAMPDIFF(SECOND, t.date_hit, t.date_left) < '60') AND (t.date_hit BETWEEN '2017-04-01 00:00:00' AND '2017-05-12 23:59:59')
1 row in set (23.23 sec)
Suggestion 1
On dashboard chart queries force the date index on queries from page_hits table.
Suggestion 2
Use count(*) instead of count(t.date_hit)
Suggestion 3
Add redundant fields such as a date field to contain the day the log was created, and a time_on_page field that is the stored difference of date_hit and date_left in seconds. Add indexes, possibly composite indexes on these fields and use them for dashboard statistics query.
Suggestion 4
Create a summary table that contains the daily based statistics data