<!-- ## Which branch should I use for my PR?
Assuming that:
a = current ma…jor release
b = current minor release
c = future major release
* a.x for any features and enhancements (e.g. 4.x)
* a.b for any bug fixes (e.g. 4.0, 4.1, 4.2)
* c.x for any features, enhancements or bug fixes with backward compatibility breaking changes (e.g. 5.x) -->
| Q | A
| -------------------------------------- | ---
| Bug fix? (use the a.b branch) | [n]
| New feature/enhancement? (use the a.x branch) | [enhancement]
| Deprecations? | [y]
| BC breaks? (use the c.x branch) | [ ]
| Automated tests included? | [y]
| Issue(s) addressed | /
<!--
Additionally (see https://contribute.mautic.org/contributing-to-mautic/developer/code/pull-requests#work-on-your-pull-request):
- Always add tests and ensure they pass.
- Bug fixes must be submitted against the lowest maintained branch where they apply
(lowest branches are regularly merged to upper ones so they get the fixes too.)
- Features and deprecations must be submitted against the "4.x" branch.
-->
#### Description:
The deduplicate command used to iterate through all contacts and checking if a duplicate exists for it. This was very inefficient and can take weeks to run it on several million contacts.
This PR optimizes the command so that:
- it finds which fields are unique identifiers
- use those identifiers to search for contacts that have some duplicates
- follows the previous logic of deduplicating those duplicates
- splits the `ContactDeduper` to multiple methods so it can be used elsewhere, not only in a command
- adds new `--batch` and `--processes` params to support parallel execution.
- adds new `mautic:contacts:deduplicate:ids` command that is used internally to run the parallel processes for specific contact IDs.
Example usage:
```
bin/console mautic:contacts:deduplicate --batch=100 --processes=10
```
This will split the deduplication into multiple processes. Each process will process maximally 100 concrete duplicated contacts. And there will be maximally 10 concurrent processes running at the same time.
The command will show a progress bar of how many processes are finished and when all of them are finished then it will display outputs for all of them.
#### Steps to test this PR:
<!--
This part is really important. If you want your PR to be merged, take the time to write very clear, annotated and step by step test instructions. Do not assume any previous knowledge - testers may not be developers.
-->
1. Open this PR on Gitpod or pull down for testing locally (see docs on testing PRs [here](https://contribute.mautic.org/contributing-to-mautic/tester))
2. Create multiple contacts with duplicated email address. The contact you can generate the better as this is mainly speed optimization. We still assume that there are few duplicates within many unique contacts.
3. Run `bin/console mautic:contacts:deduplicate`. Feel free to play with the command parameters.
4. Check that the command output makes sense
5. Check that the duplicates were merged together the way you expect.
Here is a handy SQL query that will create 12 duplicated contacts:
```sql
INSERT INTO `mautic_leads` (`email`, is_published, points, date_identified) VALUES
('asfd1@sdf.dd', 1, 0, NOW()),
('asfd1@sdf.dd', 1, 0, NOW()),
('asfd2@sdf.dd', 1, 0, NOW()),
('asfd2@sdf.dd', 1, 0, NOW()),
('asfd3@sdf.dd', 1, 0, NOW()),
('asfd3@sdf.dd', 1, 0, NOW()),
('asfd4@sdf.dd', 1, 0, NOW()),
('asfd4@sdf.dd', 1, 0, NOW()),
('asfd5@sdf.dd', 1, 0, NOW()),
('asfd5@sdf.dd', 1, 0, NOW()),
('asfd6@sdf.dd', 1, 0, NOW()),
('asfd6@sdf.dd', 1, 0, NOW()),
('asfd7@sdf.dd', 1, 0, NOW()),
('asfd7@sdf.dd', 1, 0, NOW()),
('asfd8@sdf.dd', 1, 0, NOW()),
('asfd8@sdf.dd', 1, 0, NOW()),
('asfd9@sdf.dd', 1, 0, NOW()),
('asfd9@sdf.dd', 1, 0, NOW()),
('asfd10@sdf.dd', 1, 0, NOW()),
('asfd10@sdf.dd', 1, 0, NOW()),
('asfd11@sdf.dd', 1, 0, NOW()),
('asfd11@sdf.dd', 1, 0, NOW()),
('asfd12@sdf.dd', 1, 0, NOW()),
('asfd12@sdf.dd', 1, 0, NOW());
```
#### Other areas of Mautic that may be affected by the change:
1. Just the deduplicate command
2. The dedup command was removed as I think it was created by accident in a merge or something. It's funny that the deduplicate command had a duplicate. The community has only the deduplicate command.
#### List deprecations along with the new alternative:
1. `ContactDeduper::deduplicate()` Use the other methods in this service to compose what you need. See DeduplicateCommand for an example.
[//]: # ( As applicable: )
#### List of areas covered by the unit and/or functional tests:
1. The command tests that it will remove all the duplicated commands.