Upstream AWS issue caused 5XX errors and timeouts this morning
Incident Report for Open Exchange Rates
Resolved
The issue that affected your API service on June 29th was caused by an unusual underlying hardware failure at Amazon Web Services, relating to our Redis ElastiCache. AWS have informed us that the failover process (by which the master node is replaced by a replica in the event of failure) encountered a rare unrecoverable error, and manual intervention was needed by their technicians to correct it. There have not been any related issues since then.
Posted Jul 01, 2017 - 14:40 UTC
Monitoring
We experienced an API outage this morning, caused by upstream issues at our infrastructure and cloud services provider, Amazon Web Services (AWS), in their US East (N. Virginia) region.

This incident manifested as a high number of 5XX errors and timeouts for our API clients, lasting for a period of about 2 hours between approximately 06:05 and 08:07 UTC.

Our platform is built on AWS for high availability, with multiple layers of automatic failover and redundancy. As soon as we realised failover was not happening as expected, we worked with AWS support to identify the issue and were told "There was a internal hardware failure, that is supposed to be automatically recovered, but the recovery process failed."

They were able to resolve the issue and service returned to normal shortly after 8am. We are continuing to monitor throughout the day and waiting on further post-mortem from AWS. Going forward we are also investigating further safeguards or precautions we can take against such issues which arise outside of our control.

While our API did not suffer a total outage, during the failure period clients received a high proportion of failures and extremely high latency for any successful responses. We sincerely apologise for any issues caused by this outage.

If you have any questions or concerns about this incident, please don't hesitate to contact us at support@openexchangerates.org.
Posted Jun 29, 2017 - 09:45 UTC