7,000,000 Spam Traps

I have a list of 7 million+ known spam traps that I’m scratching my head about putting to use in mautic.
Id like to discuss the best way to implement an email quality control mechanism within mautic or as a part of data hygiene before the data is uploaded.

One of my business interests is buying distressed companies or the intellectual property of distressed companies and as part of due diligence there is not a way currently to vet their email lists and practices. I want to protect the integrity of our mission to return these companies to profitability and thus far the expense of correcting bad list hygiene and list building best practices is astronomical.

Any Thoughts?

1 Like

You are not clear on your objectives. You are sure all 7 million addresses are SPAM traps? Why do you want to email to SPAM traps then? You want to clean a list from SPAM traps? I guess you are aware that there are services to do that. So what is your exact question?

Good afternoon. I am positive that all the addresses are confirmed spam traps. My objective is to implement a process of using my own data from these lists to clean existing lists.

I dont know why anyone would intentionally email spam traps as you mentioned so I wont comment on that.
I am aware there are services that already do this but I have a couple of goals. Eliminate the expense of using 3rd party services, and integrate this as a function of Mautic.

OK, I understand now. You want to offer this as a service to check spam trap checking for email addresses. I dont know if it would be feasible to ship a list of 7 million email addresses with a mautic installation. Also that list of spam traps may change over time, not sure if you are planning to update it.

Speaking about list hygiene, there are probably additional tests that should be made to ensure that an email address is really valid.

Not quite sure yet on where you see your part in the picture. Maybe you could set up an email validating service with an API that allows sites to check whether an email is valid or not. Then there would be a need for a Mautic plugin that can connect to the API and query the validity of emails on the list, or even check email validity right on the signup form.

It would be surely possible to write a plugin that has a local list as well. The question is just how the maintenance of the list would be managed. And surely proper validating of an email involves much more than just spam traps. Not sure what would be involved to do this all within a plugin.

I was already thinking it would be nice to have a plugin to connect to an existing Email verification service. For example https://www.ipqualityscore.com/ . They are giving 5000 API requests per month for free. That could be good enough for most sites. If you have more than 5000 signups per month, you can probably afford to pay for their services.

The simpliest way to do this is:

  • create a small php script, that checks an email address like
    https://spamtrapdefuser.com?email=emailaddress&id=1234
  • create a campaign, that pings this address via webhook with the proper info
  • once you have the result, the script could simply tap into your Mautic via API and set a do not contact or a tag of your choice.
    I have something similar set up with one of the verification services API works pretty well. (Not cheap :smiley: )

@joeyk nice thoughts there.

We have a small spinoff business we created specifically due to the reason of high cost of email hygiene. I am playing with the idea of integrating this into Mautic and offering the community some extra special rates. Will keep you posted.

Offtopic: Back in my mailing days I used to pay $3.5 / month to use the https://haveibeenpwned.com/ api. If an email was a subject to bunch of hacks, I just automatically removed them from the list.

Here’s an idea and this is for manually cleaning large lists.

Step 1: Have a clean instance of Mautic with no contacts in it. Then you will upload all of your clients emails to this list - be sure to mark the lead source as the client name

Step 2: Now take your 7+ million list of emails. Create a custom field in Mautic and call it “Spam Trap” - In your excel file make a column with spam tram and set them all too YES.

Step 3: Upload your Spam trap list too Mautic. This will either MERGE the data with the existing emails from step 1 or it will create a new contact. Be sure to allocate lots of memory too your mautic instance and upload in the background with a cron. Be patient because sometimes it will time out but don’t worry because when the cron runs again, it will upload where it left off

Step 4: Create a segment using the filter lead source = client name from step 1.

Step 5: Download the segment of all your clients leads. You will now see that the leads that have spam trap marked as yes were merged with your client data.

Uploading 7 million emails will slow down segment creation BIG time.
But I would be interested to see how it goes.

Haha agreed. I have never tried an instance of Mautic with 7 million emails… However, if your server is set up properly it could probably handle it… I was under the impression you needed this solution just for “one off” cleans - hence my suggestion. If it is dynamic, then another option would be needed.

I agree this seems like the simplest way.
I also agree with you concerning the cost basis. Looking at a relatively low cost service mentioned here my costs could run almost 1,000,000

There is a way you can do this with a script, this isn’t :100: bulletproof way, but it would save you a ton of buck.

Create a script that checks for incorrect email addresses, e.g anything written as @gimail or @ggmail would be automatically deleted, if you don’t want to delete it, you can loop your way through the list and redirect it to a file or CSV file with a name incorrect_email.csv, you get the idea or just check the mx records if it exists instead, see below.

Another super useful thing to do is to cross-check your mail list with a list of disposable email providers, if anything matches your list, delete them right away, it is disposable, and you don’t want to deal with that. Here is a list that is always updated: https://gist.github.com/michenriksen/8710649

Before checking if the actual email address exists at all, you can also create a script that pings the mail server if it’s still online, you know when you send a mail to a user, the message would first need to go to the server which then looks if MX records exist for the mail server, for example, using dig +short gmail.com mx returns:

10 alt1.gmail-smtp-in.l.google.com.
30 alt3.gmail-smtp-in.l.google.com.
40 alt4.gmail-smtp-in.l.google.com.
20 alt2.gmail-smtp-in.l.google.com.
5 gmail-smtp-in.l.google.com.

If it doesn’t return anything, well, you can safely assume the server is no longer in use, so, in that case, you can redirect the bad one to a file or just delete them right away.

Even if a mail server exists, it doesn’t mean that an email address exists on that mail server, it is even worst if the email address is a catch-all email alias, in which case, any email address would also be valid, but it is still better than nothing.

The last step would be to telnet the SMTP server that was returned when you used the dig command, just pick one or use one with a lower priority, and use telnet to simulate sending, then monitor the response:

250 2.1.0 OK
550-5.1.1 The email account that you tried to reach does not exist. - Gmail
452-4.2.2 The email account that you tried to reach is over quota. - Gmail
554 delivery error: dd Not a valid recipient - Yahoo
etc

You can create a script that does these 4 things simultaneously, but don’t bombard the servers as your list is huge, so, you would want to use it with delay (maybe cron) or rotating of IPs. Like I said above, this isn’t a 100% bulletproof way, but would save you a couple of bucks :slight_smile:

Edit: If you want to embark on this journey, you can rent a disposable VPS (5 bucks or so) for this, you don’t want to do this on your actual mail server, just a heads up.

WOW, that is certainly alot to understand I will try and so this one at atime. I appreciate your expertise in this matter as I have none. Just learning ! I hope I can get it going before cyber monday.
On another nother I have add the new cron jobs to the server. I have 1800 name in the list that went ok. I will check for error than try a trial run. Can I delete the list of 96K address from the server and where ? Will this take out the 1800 I have ok now.?
Thank you,
Ron

Hmm, this is off topic, but are you on a shared hosting or using a VPS?

I am on shared hosting was told it was ok but only for 2. not 3.

Mautic can check mx. Server with the built in ‘if email valid’ campaign step. So nr 2 is an existing feature.

Nr. 3 will allow you to check max 50-100 gmails before you are blocked. It’s not scalable.

I have tried this, but unfortunately, it doesn’t work, I am on v 3.0.1, does it work with the current version?

I use “Has Valid Email” in a standalone campaign and set tags depending on the result. Valid Email and Email Did Not Validate, then create a segment of Tag includes Email Did Not Validate and delete them via a campaign.
Im running 3.1.1

Oh okay. I’m still on 2.16.3 :slight_smile:
It does work for me. I found this service not very useful.

I think the best is:

  • don’t buy/scrape
  • if you do, use a professional cleaner service. They do much more than just remove mx-less domains
1 Like

off course! It is not the best way to build relationship with your customers/client.