I’ve been running a patch to maintain the email_stats growth a bit better and wonder if this is something that might help others so wanted to get some feedback and any further idea.
Previously, email_stats contained all the following data for each email send to each user:
The statistics for the send, so recording it was sent, and recording number of opens
The tokens for viewing in the browser, which is very large amount of data
The user agents for all opens
This meant you could send up 100s of GBs after many many sends and before you get anywhere near the retention period of the emails you are sending (could be years).
The idea in the PR is to split this out to independent tables with their own cleanup:
email_stats - Keeps only the statistics elements - in testing this reduces the table to single figure GB if the patch is run, compared to without it could be 10s/100s
email_stats_data - Keeps the view in web browser tokens which are large. Has a separate configuration defaulting to 180 days for storage, so cleans up much faster than previously without impact
email_stats_open_details - Keeps the user agents on opens, again with a much shorter retention as it only seems to be used in contact view logs. This data tends to be minuscule in comparison.
Another benefit is hopefully reducing free space issues in email_stats. Because open details are append only to another table, it means email_stats should end up fairly static row lengths and less updates, meaning less fragmentation.
I think these groups of people would benefit from this idea:
System admins running Mautic
Why I think they would benefit from this idea:
Better control over database usage with additional retention for the big data such as view in browser tokens. Less fragmentation of tables.
Any code or resources to support this idea:
Are you willing to work on this idea?:
Yes
What skills and resources do you need to explore this further?
Just some feedback and discussion to hear if this is useful to others.
I do plan to look further at the View in browser tokens in future though to further reduce tokens - as the vast majority of these tokens can be regenerated on-demand and do not need to be stored. I had a basic POC but this bit can get pretty large change as the way tokens are generated needs to be more controlled rather than the current mechanism where discovery and pattern replace for thing like tracking tokens (which are the biggest contributor if you have lots of links in emails) happens all in one sweep.