There are no specific recommendations in this post. You will need to choose parameters for your specific situation. However, it is my hope that this post will add deeper understanding to the recommendations that others make. With that said…
The primary parameter for Argon2 is memory. Increasing memory also increases processing time.
The time cost parameter is intended to make the running time longer when memory usage can’t be increased further.
The threads (or parallelism, or “lanes” when reading the RFC) parameter sub-divides the memory usage. When the memory is specified as 64 MiB, that is the total amount used, whether threads are 1 or 32. However, the synchronization overhead causes a sub-linear speedup, and this is more pronounced with smaller memory sizes. SMT cores offer even less speed improvement than the same number of non-SMT cores, as expected.
I did some tests on my laptop, which has 4 P-cores and 8 E-cores (16 threads / 12 physical cores.) The 256 MiB tests could only push actual CPU usage to about 600% (compared to the 1260% we might expect); it took 1 GiB or more to reach 1000% CPU. More threads than cores didn’t achieve anything.
Overall then, higher threads allow for using more memory, if enough CPU is available to support the thread count. If memory and threads are both in limited supply, then time cost is the last resort for extending the operation time until it takes long enough.
Bonus discovery: in PHP, the argon2id memory is separate from the memory limit.
memory_get_peak_usage() reported the same number at the beginning and end of my test script, even for the 1+ GiB tests.