-
Notifications
You must be signed in to change notification settings - Fork 41
Description
New feature, improvement proposal
This is a duplicate of an external license-maven-plugin issue.
I sysadmin a number of sites one of which is www.gnu.org. We have been getting a significant amount of DDoS abuse on our server infrastructure since August which has been taking our website offline if it goes unchecked. Your CI/CD process is not what I am referring to, but this CI/CD pipeline of your users is nearly indistinguishable from the other attacks as far as system resources go and it is not helping at this time. While blocking the attackers, I also blocked several of these unidentified Java user-agents as I see them as well as rate-limiting. Related: mojohaus/license-maven-plugin#595
Looking in the logs, I saw 327,507 requests that include Java/ in their user-agent field every one that I found is reading a license page. ~55,424 are from the last 24 hours.
I would recommend providing contact information for your automation in the user-agent string as I would have reached out months ago. I believe the user-agent string just says some variation of Java and not much else. Please add identifying information such as the name of the plugin and contact information for the organization running the check. The standard way to do this might look like this: Java/23.0.1 (https://www.mojohaus.org/license-maven-plugin/; mailto: cicd@company.com) I would have reached out to this project earlier if your user-agent string contained contact information. Using the scream test, I had to keep asking netops people that complained about getting blocked until someone told me the name of the tool.
Can you rework the logic with these pipelines to be more gentle on our servers? The reason that I ask is that we self-host all of our infrastructure and we do not have the system resources to provide the kind of load these CI/CD jobs are asking for in addition to the regular load that is required to serve our webpages and the attacks.
Proposed method 1: A cronjob or process could locally mirror or cache the pages you want to view once per day and all of the CI/CD pipelines could check the internal resource instead of reaching out to our servers several times each day. I said once per day, but really once per week or month would probably be more than enough since these pages do not change often.
Proposed method 2: Instead of mirroring or caching several pages on the site, you could locally store the repository that the site is generated from and the CI/CD pipelines could check that file served locally. With a cronjob or process, you could update the local repository. Updating the repository every so often would require the least amount of resources for the both of us. You could then build the site locally and serve that on internal infrastructure so that no logic would need to be changed with the check. This is ultimately the best way.
I have fail2ban helping out with this issue for me.