Hello, I'm Irena from the Networking Engineering team at Atlassian. I have been directly involved with this incident and wanted to provide some answers to the questions.
We’ve built our architectures based on the AWS Direct Connect service because it’s the most reliable and scalable solution based on our customer and network needs. The AWS Direct Connect service we use in the US East Region has multiple redundant links (4x 10Gbps) optimized for data throughput requirements and availability, and to our knowledge the AWS Direct Connect transit facilities have power backups that would help contribute to its reliability. But, as we saw from today’s event, something still failed.
I should note that we have both publicly and privately reachable resources in AWS. The publicly reachable resources have fail-overs built in for situations like these (it happens automatically), but the private reachable resources with our architecture depend solely on AWS Direct Connect. For example, our Bitbucket failure today was due to the fact that we rely on AWS Direct Connect to link between the Bitbucket Cloud components that we host in our data centers and others that we host on AWS. Bitbucket could continue connecting to services in our own data centers and the public Internet/AWS, but could not talk to the privately reachable resources in the Atlassian infrastructure hosted on AWS.
We understand the importance and the impact for our customers, and dedicated several teams to this issue as soon as it was reported. AWS has resolved the issue, but we will look into ways to help prevent and better mitigate these types of issues in the future as part of our incident review and improvement processes.
I should note that we have both publicly and privately reachable resources in AWS. The publicly reachable resources have fail-overs built in for situations like these (it happens automatically), but the private reachable resources with our architecture depend solely on AWS Direct Connect. For example, our Bitbucket failure today was due to the fact that we rely on AWS Direct Connect to link between the Bitbucket Cloud components that we host in our data centers and others that we host on AWS. Bitbucket could continue connecting to services in our own data centers and the public Internet/AWS, but could not talk to the privately reachable resources in the Atlassian infrastructure hosted on AWS.
We understand the importance and the impact for our customers, and dedicated several teams to this issue as soon as it was reported. AWS has resolved the issue, but we will look into ways to help prevent and better mitigate these types of issues in the future as part of our incident review and improvement processes.