Hello,
Recently, we had DoS-like episodes on the main Linaro git server, http://git.linaro.org , which affected number of Linaro users, including users of Gerrit system, http://review.linaro.org .
These episodes were related to unfriendly usage of native protocol, git:// (service port 9418). The implementation of this protocol is known to be resource-hungry and not scale to many connections and users. The issue itself is not new, it is something which affected us in waves over last 3 years, and a resolution for which was established a year ago, providing 2 HTTP-based protocols (so called "dump" and "smart" protocols) as more scalable replacement.
So, this is a gentle reminder that use of git:// protocol by is discouraged for Linaro engineers, and completely unsupported(*1) for third parties. Based on the analysis and outcome of the current DoS-like activity, we may need to make git:// access more limited and strict. So, please kindly:
1. Check URLs you use for cloning and updating your local trees. If you use "ssh://" or "http(s)://" protocols, you're ok. If you use git://, please switch to using http-based protocol instead. In most cases, this requires just replacing "git://" schema with "http://". If in doubt, please visit gitweb page for your repositories, which lists all supported URLS to clone a repository, e.g.: https://git.linaro.org/arm/arm-trusted-firmware.git
2. If you set up of oversee CI or automated build jobs, please audit and apply similar changes to them.
Thanks, Paul, on behalf of Linaro Systems Team and ITS
(*1) Unsupported in the current context means that "git://" URLs are not published in up-to-date information, and there's no warranty that any 3rd party will be able to complete a clone successfully using this protocol.
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
On Thu, Aug 28, 2014 at 10:05 AM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Recently, we had DoS-like episodes on the main Linaro git server, http://git.linaro.org , which affected number of Linaro users, including users of Gerrit system, http://review.linaro.org .
These episodes were related to unfriendly usage of native protocol, git:// (service port 9418). The implementation of this protocol is known to be resource-hungry and not scale to many connections and users. The issue itself is not new, it is something which affected us in waves over last 3 years, and a resolution for which was established a year ago, providing 2 HTTP-based protocols (so called "dump" and "smart" protocols) as more scalable replacement.
So, this is a gentle reminder that use of git:// protocol by is discouraged for Linaro engineers, and completely unsupported(*1) for third parties. Based on the analysis and outcome of the current DoS-like activity, we may need to make git:// access more limited and strict. So, please kindly:
So why does this affect us but not kernel.org?
- Check URLs you use for cloning and updating your local trees. If you
use "ssh://" or "http(s)://" protocols, you're ok. If you use git://, please switch to using http-based protocol instead. In most cases, this requires just replacing "git://" schema with "http://". If in doubt, please visit gitweb page for your repositories, which lists all supported URLS to clone a repository, e.g.: https://git.linaro.org/arm/arm-trusted-firmware.git
- If you set up of oversee CI or automated build jobs, please
audit and apply similar changes to them.
So this is problematic, because there are folks out there in the community who already use the git:// urls for fetching work from the Linaro repos. (The 0day build/test bot, for instance..).
While the git:// urls are now off the gitweb (which is good for future users), this wasn't the case previously.
We already went through one painful transition where our URLs got scrambled, and I've had a few situations where folks have just recently realized that we still had trees, but the URLs were just different. So its quite frustrating to have to go through that again.
What would be required to just make the git:// urls work properly?
Is this mainly an issue with the Android repos? If we reduce the git:// url load on the wort users, would that improve things enough? Do you have stats on which trees are hardest hit?
(*1) Unsupported in the current context means that "git://" URLs are not published in up-to-date information, and there's no warranty that any 3rd party will be able to complete a clone successfully using this protocol.
So as someone who has sent git pull requests in the past with the git urls, this is terrifying (and makes me hesitant to further use the linaro infrastructure). Do you have a pointer to why the git urls aren't coherent?
thanks -john
On 08/28/2014 12:28 PM, John Stultz wrote:
So, this is a gentle reminder that use of git:// protocol by is
discouraged for Linaro engineers, and completely unsupported(*1) for third parties. Based on the analysis and outcome of the current DoS-like activity, we may need to make git:// access more limited and strict. So, please kindly:
So why does this affect us but not kernel.org?
We are still trying to understand this. We have some ideas for coping with this I've detailed at the bottom of this reply.
I'd mention that google doesn't support the git protocol and github is trying to discourage the usage by not advertising git:// links.
We already went through one painful transition where our URLs got scrambled, and I've had a few situations where folks have just recently realized that we still had trees, but the URLs were just different. So its quite frustrating to have to go through that again.
NOTE: its discouraged, not prevented.
What would be required to just make the git:// urls work properly?
Is this mainly an issue with the Android repos? If we reduce the git:// url load on the wort users, would that improve things enough? Do you have stats on which trees are hardest hit?
So far we've just had two attacks. The user was hitting multiple repos - I don't think it really had to do with a specific tree. This was against git.linaro.org, but the android server suffers from the same vulnerability.
We do have to try and take some steps to mitigate this risk right now. I'm sending out a separate email to the company on this, but let me say briefly:
1) Right now, my team is just looking at the problem mostly manually to identify when these attacks occur. We've come up with a quick way to block the attack that should allow the server to keep running for everyone else.
2) As #1 doesn't scale, we'd also like to change how the server is configured. The git-daemon itself only has logic for throttling a total number of concurrent connections. This allows a single user to still be able DoS us.
We'd like to create a new iptables rule that will only allow 3 concurrent connections to port 9418 from an IP address not in our EC2 cloud. Based on the previous attacks we've had this should mitigate our risk while also letting CI jobs run how they always have.
(*1) Unsupported in the current context means that "git://" URLs are not published in up-to-date information, and there's no warranty that any 3rd party will be able to complete a clone successfully using this protocol.
So as someone who has sent git pull requests in the past with the git urls, this is terrifying (and makes me hesitant to further use the linaro infrastructure). Do you have a pointer to why the git urls aren't coherent?
I'm not sure what Paul meant here. I see no reason why the git url's would become invalid for as long as we support the native git protocol.
Hello John,
On Thu, 28 Aug 2014 10:28:06 -0700 John Stultz john.stultz@linaro.org wrote:
On Thu, Aug 28, 2014 at 10:05 AM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Recently, we had DoS-like episodes on the main Linaro git server, http://git.linaro.org , which affected number of Linaro users, including users of Gerrit system, http://review.linaro.org .
These episodes were related to unfriendly usage of native protocol, git:// (service port 9418). The implementation of this protocol is known to be resource-hungry and not scale to many connections and users. The issue itself is not new, it is something which affected us in waves over last 3 years, and a resolution for which was established a year ago, providing 2 HTTP-based protocols (so called "dump" and "smart" protocols) as more scalable replacement.
So, this is a gentle reminder that use of git:// protocol by is discouraged for Linaro engineers, and completely unsupported(*1) for third parties. Based on the analysis and outcome of the current DoS-like activity, we may need to make git:// access more limited and strict. So, please kindly:
So why does this affect us but not kernel.org?
Regarding kernel.org, Milo Casagrande did some correspondence with them and can share specific details. IIRC, they use some custom infrastructure.
- Check URLs you use for cloning and updating your local trees. If
you use "ssh://" or "http(s)://" protocols, you're ok. If you use git://, please switch to using http-based protocol instead. In most cases, this requires just replacing "git://" schema with "http://". If in doubt, please visit gitweb page for your repositories, which lists all supported URLS to clone a repository, e.g.: https://git.linaro.org/arm/arm-trusted-firmware.git
- If you set up of oversee CI or automated build jobs, please
audit and apply similar changes to them.
So this is problematic, because there are folks out there in the community who already use the git:// urls for fetching work from the Linaro repos. (The 0day build/test bot, for instance..).
So, it would be nice if they updated to use http://. We actually can be proactive and contact them regarding this change (we could use your help with that, or just with compiling list of parties who should be contacted).
While the git:// urls are now off the gitweb (which is good for future users), this wasn't the case previously.
We already went through one painful transition where our URLs got scrambled, and I've had a few situations where folks have just recently realized that we still had trees, but the URLs were just different. So its quite frustrating to have to go through that again.
I'm not sure which transition you mean, but the matter of deprecating git:// and switching to http:// indeed comes not the first time. And previous time there were conservative (or skeptic) responses too, which made transition be far complete, and now we need to approach the same matter again, instead of having done it once and for all.
But otherwise, the world is dynamic place and changes all the time. For example, in the summer 2011 aforementioned kernel.org was down for unbelievable 1.5 months, and what, people coped. But we're not going to do it like kernel.org and go down, but instead going to start as seamless as possible transition (I hope you didn't get any other idea from this mail). Just for that, we need some help of the users, first of all, internal users, as about their access and its stability we care the most.
What would be required to just make the git:// urls work properly?
There can be different technical and organizational answers to this question, but the most productive I can give is: Systems and ITS will be working towards that; in the meantime, all parties which would like a sustainable service are encouraged to upgrade to http:// protocol.
Is this mainly an issue with the Android repos?
It used to be. It was a big problem in that summer 2011, when with kernel.org down, AOSP tree went down too and after couple of weeks people rushed to fetch from us. But this time, it affected git.linaro.org straight.
If we reduce the git:// url load on the wort users, would that improve things enough? Do you have stats on which trees are hardest hit?
The case we have with git:// is that small number of users can hog almost all resources of a server. This can happen at release time and block work of Linaro engineers, something like that happened this time. So, we're working on technical means to avoid hogging all resources, but we also would like to be sure that we won't affect internal users, and the most productive way for that is them to use a scalable protocol.
(*1) Unsupported in the current context means that "git://" URLs are not published in up-to-date information, and there's no warranty that any 3rd party will be able to complete a clone successfully using this protocol.
So as someone who has sent git pull requests in the past with the git urls, this is terrifying (and makes me hesitant to further use the linaro infrastructure). Do you have a pointer to why the git urls aren't coherent?
Oh, I'm sorry for leaving ambiguity for such interpretation. I just meant that we are going to serve git:// to 3rd-party users on "best effort" basis, and apply measures to give priority to internal users. For example, if there's important build doing fetch, and an external user makes 5 (or maybe just 2) git:// connections, they may get connection reset. Note that this does not change status quo - for example, 2 days ago, *any* user who tried to connect was getting connection refused.
So, kernel.org being down for 1.5 months is terrifying. Myself trying to build OABI toolchain and seeing all support being removed from everywhere, and finding aligned historical releases of toolchain parts almost impossible (while OABI hardware is still in use!) - that's terrifying. But I don't see anything terrifying with being frank about our git access policies and giving users a choice - either get reliable service with http:// or "best effort" with git://.
thanks -john
On Thu, Aug 28, 2014 at 2:51 PM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Thu, 28 Aug 2014 10:28:06 -0700 John Stultz john.stultz@linaro.org wrote:
On Thu, Aug 28, 2014 at 10:05 AM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote: So this is problematic, because there are folks out there in the community who already use the git:// urls for fetching work from the Linaro repos. (The 0day build/test bot, for instance..).
So, it would be nice if they updated to use http://. We actually can be proactive and contact them regarding this change (we could use your help with that, or just with compiling list of parties who should be contacted).
Well, I can try to ping the users I know of, but the problematic spot is users I am not aware of.
We already went through one painful transition where our URLs got scrambled, and I've had a few situations where folks have just recently realized that we still had trees, but the URLs were just different. So its quite frustrating to have to go through that again.
I'm not sure which transition you mean, but the matter of deprecating
So moving from what we had before to gitolite broke the git URLs that existed previously. I know folks tried to add compatibility urls, but those URLs were slightly different then what we had previously. So folks who were pulling regularly from our tree just stopped being able to connect and get any updates.
git:// and switching to http:// indeed comes not the first time. And previous time there were conservative (or skeptic) responses too, which made transition be far complete, and now we need to approach the same matter again, instead of having done it once and for all.
But otherwise, the world is dynamic place and changes all the time. For example, in the summer 2011 aforementioned kernel.org was down for unbelievable 1.5 months, and what, people coped. But we're not going to do it like kernel.org and go down, but instead going to start as seamless as possible transition (I hope you didn't get any other idea from this mail). Just for that, we need some help of the users, first of all, internal users, as about their access and its stability we care the most.
Ok. I appreciate the transition. I was worried git:// url access was about to be turned off.
If we reduce the git:// url load on the wort users, would that improve things enough? Do you have stats on which trees are hardest hit?
The case we have with git:// is that small number of users can hog almost all resources of a server. This can happen at release time and block work of Linaro engineers, something like that happened this time.
Do we have a sense of who those users (IPs? which tree they are pulling?) are?
(*1) Unsupported in the current context means that "git://" URLs are not published in up-to-date information, and there's no warranty that any 3rd party will be able to complete a clone successfully using this protocol.
So as someone who has sent git pull requests in the past with the git urls, this is terrifying (and makes me hesitant to further use the linaro infrastructure). Do you have a pointer to why the git urls aren't coherent?
Oh, I'm sorry for leaving ambiguity for such interpretation. I just meant that we are going to serve git:// to 3rd-party users on "best effort" basis, and apply measures to give priority to internal users. For example, if there's important build doing fetch, and an external user makes 5 (or maybe just 2) git:// connections, they may get connection reset. Note that this does not change status quo - for example, 2 days ago, *any* user who tried to connect was getting connection refused.
Ok. I was worried you were claiming that the git:// url might serve different (possibly stale) data then the https:// urls. If I was making a pull request and someone only got half of the commits, that would be a major infrastructure trust issue.
If its just the connections would be slow or refused, that's enough to bother folks but not something that would make folks think or worry our infrastructure was compromised.
So, kernel.org being down for 1.5 months is terrifying. Myself trying to build OABI toolchain and seeing all support being removed from everywhere, and finding aligned historical releases of toolchain parts almost impossible (while OABI hardware is still in use!) - that's terrifying. But I don't see anything terrifying with being frank about our git access policies and giving users a choice - either get reliable service with http:// or "best effort" with git://.
So yea, kernel.org being down (and some git URLs changing) was a big deal, but it was also due to a major compromise of the system, which required a from-scratch rebuild. Hopefully things aren't (ever, we can dream) so bad for us.
Ok. I'll try to make the change with my own uses, and I really think removing the git:// url on the gitweb was the right call for the best first step here (ie: stop any new users of the git urls).
My main concern was that we might break our current users just because we had under-provisioned infrastructure. So as long as the git urls continue to work for legacy users, I'm happy, and I can work on changing my own uses, and trying to pester folks I know to change their git remote urls.
Though I think having some tracking done via the connection logs and notifying repo owners if their repo is a problem would be good in helping get the worst offenders fixed up.
Also I think continuing discussion w/ the kernel.org folks to understand their infrastructure would be good. They really started taking things seriously after their compromise, and it would be good for us to learn from their experience and take things similarly seriously before any such problems arise for us.
Thanks again for the explanations here! -john
I do not see why you guys should limit git usage. Do you guys have ssh keys in place? also if git uses UDP then that can be another source of said activity. I have seen any UDP ports which I have open on my network usually get flooded by packets inducing a denial of service like situation.
On Fri, Aug 29, 2014 at 6:30 AM, John Stultz john.stultz@linaro.org wrote:
On Thu, Aug 28, 2014 at 2:51 PM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Thu, 28 Aug 2014 10:28:06 -0700 John Stultz john.stultz@linaro.org wrote:
On Thu, Aug 28, 2014 at 10:05 AM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote: So this is problematic, because there are folks out there in the community who already use the git:// urls for fetching work from the Linaro repos. (The 0day build/test bot, for instance..).
So, it would be nice if they updated to use http://. We actually can be proactive and contact them regarding this change (we could use your help with that, or just with compiling list of parties who should be contacted).
Well, I can try to ping the users I know of, but the problematic spot is users I am not aware of.
We already went through one painful transition where our URLs got scrambled, and I've had a few situations where folks have just recently realized that we still had trees, but the URLs were just different. So its quite frustrating to have to go through that again.
I'm not sure which transition you mean, but the matter of deprecating
So moving from what we had before to gitolite broke the git URLs that existed previously. I know folks tried to add compatibility urls, but those URLs were slightly different then what we had previously. So folks who were pulling regularly from our tree just stopped being able to connect and get any updates.
git:// and switching to http:// indeed comes not the first time. And previous time there were conservative (or skeptic) responses too, which made transition be far complete, and now we need to approach the same matter again, instead of having done it once and for all.
But otherwise, the world is dynamic place and changes all the time. For example, in the summer 2011 aforementioned kernel.org was down for unbelievable 1.5 months, and what, people coped. But we're not going to do it like kernel.org and go down, but instead going to start as seamless as possible transition (I hope you didn't get any other idea from this mail). Just for that, we need some help of the users, first of all, internal users, as about their access and its stability we care the most.
Ok. I appreciate the transition. I was worried git:// url access was about to be turned off.
If we reduce the git:// url load on the wort users, would that improve things enough? Do you have stats on which trees are hardest hit?
The case we have with git:// is that small number of users can hog almost all resources of a server. This can happen at release time and block work of Linaro engineers, something like that happened this time.
Do we have a sense of who those users (IPs? which tree they are pulling?) are?
(*1) Unsupported in the current context means that "git://" URLs are not published in up-to-date information, and there's no warranty that any 3rd party will be able to complete a clone successfully using this protocol.
So as someone who has sent git pull requests in the past with the git urls, this is terrifying (and makes me hesitant to further use the linaro infrastructure). Do you have a pointer to why the git urls aren't coherent?
Oh, I'm sorry for leaving ambiguity for such interpretation. I just meant that we are going to serve git:// to 3rd-party users on "best effort" basis, and apply measures to give priority to internal users. For example, if there's important build doing fetch, and an external user makes 5 (or maybe just 2) git:// connections, they may get connection reset. Note that this does not change status quo - for example, 2 days ago, *any* user who tried to connect was getting connection refused.
Ok. I was worried you were claiming that the git:// url might serve different (possibly stale) data then the https:// urls. If I was making a pull request and someone only got half of the commits, that would be a major infrastructure trust issue.
If its just the connections would be slow or refused, that's enough to bother folks but not something that would make folks think or worry our infrastructure was compromised.
So, kernel.org being down for 1.5 months is terrifying. Myself trying to build OABI toolchain and seeing all support being removed from everywhere, and finding aligned historical releases of toolchain parts almost impossible (while OABI hardware is still in use!) - that's terrifying. But I don't see anything terrifying with being frank about our git access policies and giving users a choice - either get reliable service with http:// or "best effort" with git://.
So yea, kernel.org being down (and some git URLs changing) was a big deal, but it was also due to a major compromise of the system, which required a from-scratch rebuild. Hopefully things aren't (ever, we can dream) so bad for us.
Ok. I'll try to make the change with my own uses, and I really think removing the git:// url on the gitweb was the right call for the best first step here (ie: stop any new users of the git urls).
My main concern was that we might break our current users just because we had under-provisioned infrastructure. So as long as the git urls continue to work for legacy users, I'm happy, and I can work on changing my own uses, and trying to pester folks I know to change their git remote urls.
Though I think having some tracking done via the connection logs and notifying repo owners if their repo is a problem would be good in helping get the worst offenders fixed up.
Also I think continuing discussion w/ the kernel.org folks to understand their infrastructure would be good. They really started taking things seriously after their compromise, and it would be good for us to learn from their experience and take things similarly seriously before any such problems arise for us.
Thanks again for the explanations here! -john
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On 08/28/2014 11:30 PM, John Stultz wrote:
On Thu, Aug 28, 2014 at 2:51 PM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
The case we have with git:// is that small number of users can hog almost all resources of a server. This can happen at release time and block work of Linaro engineers, something like that happened this time.
Do we have a sense of who those users (IPs? which tree they are pulling?) are?
It appears to have been one IP address for both "attacks". (I use that term loosely because they may not have known they were causing this).
Around 5UTC this morning I noticed the same user was causing a small resource spike again. They were limiting themselves to about 4-5 concurrent connections, which the server had no problems with. The 2 trees being cloned were linux-linaro-tracking.git and your android.git.
This makes me think the use has no ill-intentions, they just want to clone a bunch of code at the same time.
Also I think continuing discussion w/ the kernel.org folks to understand their infrastructure would be good. They really started taking things seriously after their compromise, and it would be good for us to learn from their experience and take things similarly seriously before any such problems arise for us.
+1 on that
You should plan in advance that you will be distributed denial-of-service (DDoS) attack. How do you plan to survive then ?
On Fri, Aug 29, 2014 at 5:57 PM, Andy Doan andy.doan@linaro.org wrote:
On 08/28/2014 11:30 PM, John Stultz wrote:
On Thu, Aug 28, 2014 at 2:51 PM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
The case we have with git:// is that small number of users can hog almost all resources of a server. This can happen at release time and block work of Linaro engineers, something like that happened this time.
Do we have a sense of who those users (IPs? which tree they are pulling?) are?
It appears to have been one IP address for both "attacks". (I use that term loosely because they may not have known they were causing this).
Around 5UTC this morning I noticed the same user was causing a small resource spike again. They were limiting themselves to about 4-5 concurrent connections, which the server had no problems with. The 2 trees being cloned were linux-linaro-tracking.git and your android.git.
This makes me think the use has no ill-intentions, they just want to clone a bunch of code at the same time.
Also I think continuing discussion w/ the kernel.org folks to understand their infrastructure would be good. They really started taking things seriously after their compromise, and it would be good for us to learn from their experience and take things similarly seriously before any such problems arise for us.
+1 on that
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On Fri, Aug 29, 2014 at 7:57 AM, Andy Doan andy.doan@linaro.org wrote:
On 08/28/2014 11:30 PM, John Stultz wrote:
On Thu, Aug 28, 2014 at 2:51 PM, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
The case we have with git:// is that small number of users can hog almost all resources of a server. This can happen at release time and block work of Linaro engineers, something like that happened this time.
Do we have a sense of who those users (IPs? which tree they are pulling?) are?
It appears to have been one IP address for both "attacks". (I use that term loosely because they may not have known they were causing this).
Around 5UTC this morning I noticed the same user was causing a small resource spike again. They were limiting themselves to about 4-5 concurrent connections, which the server had no problems with. The 2 trees being cloned were linux-linaro-tracking.git and your android.git.
Interesting to hear the android.git tree is part of it. Will ping the few folks I know who pull regularly.
This makes me think the use has no ill-intentions, they just want to clone a bunch of code at the same time.
Also I think continuing discussion w/ the kernel.org folks to understand their infrastructure would be good. They really started taking things seriously after their compromise, and it would be good for us to learn from their experience and take things similarly seriously before any such problems arise for us.
+1 on that
One more point of concern here. For all the git URLs that I have that use http (kernel.org as well as Google's Android urls), its actually https they're using. Maybe shouldn't we be using https: for these urls as well?
thanks -john
On 08/29/2014 02:57 PM, John Stultz wrote:
One more point of concern here. For all the git URLs that I have that use http (kernel.org as well as Google's Android urls), its actually https they're using. Maybe shouldn't we be using https: for these urls as well?
Hey John,
Sorry it took so long, but I've now updated git.linaro.org to advertise HTTPS.
On Mon, Sep 8, 2014 at 8:52 AM, Andy Doan andy.doan@linaro.org wrote:
On 08/29/2014 02:57 PM, John Stultz wrote:
One more point of concern here. For all the git URLs that I have that use http (kernel.org as well as Google's Android urls), its actually https they're using. Maybe shouldn't we be using https: for these urls as well?
Hey John,
Sorry it took so long, but I've now updated git.linaro.org to advertise HTTPS.
Awesome! This makes me feel much better! I'll ping the 0day build robot to get the URLs updated.
thanks -john