Optimised storage for email statistics / view in browser tokens / open details

My idea is:

I’ve been running a patch to maintain the email_stats growth a bit better and wonder if this is something that might help others so wanted to get some feedback and any further idea.

Previously, email_stats contained all the following data for each email send to each user:

  • The statistics for the send, so recording it was sent, and recording number of opens
  • The tokens for viewing in the browser, which is very large amount of data
  • The user agents for all opens

This meant you could send up 100s of GBs after many many sends and before you get anywhere near the retention period of the emails you are sending (could be years).

The idea in the PR is to split this out to independent tables with their own cleanup:

  • email_stats - Keeps only the statistics elements - in testing this reduces the table to single figure GB if the patch is run, compared to without it could be 10s/100s
  • email_stats_data - Keeps the view in web browser tokens which are large. Has a separate configuration defaulting to 180 days for storage, so cleans up much faster than previously without impact
  • email_stats_open_details - Keeps the user agents on opens, again with a much shorter retention as it only seems to be used in contact view logs. This data tends to be minuscule in comparison.

Another benefit is hopefully reducing free space issues in email_stats. Because open details are append only to another table, it means email_stats should end up fairly static row lengths and less updates, meaning less fragmentation.

I think these groups of people would benefit from this idea:

System admins running Mautic

Why I think they would benefit from this idea:

Better control over database usage with additional retention for the big data such as view in browser tokens. Less fragmentation of tables.

Any code or resources to support this idea:

Are you willing to work on this idea?:

Yes

What skills and resources do you need to explore this further?

Just some feedback and discussion to hear if this is useful to others.

I do plan to look further at the View in browser tokens in future though to further reduce tokens - as the vast majority of these tokens can be regenerated on-demand and do not need to be stored. I had a basic POC but this bit can get pretty large change as the way tokens are generated needs to be more controlled rather than the current mechanism where discovery and pattern replace for thing like tracking tokens (which are the biggest contributor if you have lots of links in emails) happens all in one sweep.