Blog

On March 31st, 2017, CentOS will be halting support and maintenance updates for CentOS 5. On March 1st, 2017 Scalr will be also be ending CentOS 5 support for the Scalarizr agent. This will allow Scalr to provide new features in the agent that will not be tested on CentOS 5. Scalarizr agent 5.3.6 is the final version supporting CentOS 5.  Full testing of Scalarizr will continue on current (6 and later) versions of CentOS. 

We strongly encourage all users to upgrade their systems to a newer version of CentOS moving forward.

If you have any questions or need assistance, please reach out to us via your normal Support channel and we will do our best to quickly address your inquiries.


Many thanks,

Wm. Marc O'Brien and the Scalr team


The Scalr team will soon be rolling out new updates and features to the Scalr CMP. A key update is that VPC configuration will be moved from the Farm level to the individual Farm Role level.

The intent of this change is to provide greater configuration options when creating a Farm that uses AWS VPCs, as well as provide a more consistent experience when configuring networks across different cloud service providers. This update also lays down the groundwork for some exciting future updates. However, this change can potentially break compatibility for existing workflows:

User action may be required

IMPORTANTThis change will require a slight modification to your VPC configuration workflows in the Scalr UI and will also require minor updates to any API calls using Farm or Farm Role objects.  The specifics of this change and the modifications required to maintain compatibility are documented in our Wiki FAQ here.


Up until now, VPC configuration was done at the Farm level, starting Tuesday, Jan 10th, all VPC configuration will be done from the Farm Role level. 

In order to avoid any disruption to your normal Scalr activities, please review the Wiki FAQ page mentioned above. The FAQ page provides an overview of the relevant API calls and compares how they work before and after the update.

We are excited about the upcoming release and welcome your feedback and comments.  Please reach out to Scalr Support via your support channel if you have any questions or feedback related to this.

Happy New Year!

-The Scalr Team


Scalr 6.1.1 Released

Scalr is pleased to announce the release of Scalr 6.1.1, which is available immediately for our Enterprise Edition customers, and also has been deployed to our Hosted Scalr platform. Scalr 6.1.1 is an important release with over 40 product enhancements and updates, including several key new features: 

  • Discovery Manager - Phase 3: extends the capabilities of Discovery Manager, which allows users to import Virtual Machines, not originally provisioned through Scalr, but "discoverable" under existing Cloud Service account credentials. Phase 3 adds the ability to install the Scalarizr agent directly on imported infrastructure, thus enabling all the features and functionality of the Scalr platform for imported machines. 
  • Container Discovery - Docker Integration Phase 1: the first phase of container integration, includes Container Discovery, which automatically detects Docker containers deployed on virtual machines, and provides a view into deployed Containers, including key information such as container image, launch date, labels, volumes, network information, etc. Container Discovery also includes an intelligent search and filtering capability to locate containers based on various metadata. 
  • Google Cloud Policy Additions: the Scalr Policy Engine now includes the ability to enforce policy for Google Cloud networks and subnets. 
  • RDS and ELB Tagging: updates to Scalr tagging policy to tag RDS and ELB instances. RDS and ELB instances are tagged based on the Farm that they are associated with. 
  • Support for EC2 Dedicated Instances: adds support for Amazon EC2 dedicated instance types. 
  • New Global Variables to support RDS defaults: Scalr Global Variables can be used to set defaults for Amazon RDS, including maintenance window, backup window and backups retention period) 

We are excited about the 6.1.1 release and welcome your feedback and comments. We will be updating our documentation in coming days with more information on accessing and using these features. 

 

Thanks,

The Scalr Team. 

Summary of Incident

 

Last night we experienced a setback as we worked to resolve the final remaining issues associated with the incident that we experienced on Monday morning, as documented here.

During issue resolution process, a database action was taken that had the unintended consequence of terminating the records of some additional active servers, and subsequently creating duplicate servers. We believe this oversight was caused by fatigue, and will have a definitive answer following the post-mortem.

Our Engineering team has been working around the clock (36 hours awake) to resolve the remaining issues and will continue to do so until all remaining issues have been addressed. We have organized into shifts to avoid additional mistakes caused by tiredness.

We sincerely apologize for the additional impact, and will continue to provide updates as we resolve these issues.




 

Summary of Incident


On Monday, July 18th at approximately 6:14AM PDT, an update was applied to the Hosted Scalr management platform that contained a logic error, resulting in Hosted Scalr causing numerous server records to be removed from Scalr. While the Servers records were removed from Scalr, the actual (AWS, other cloud) servers themselves were not terminated.

Where Scalr lost communication with a running server (or when the server record was removed), the Scalr desired state engine took action to launch a new server in replacement of the previous, and then move resources such as EBS volumes and Elastic IP addresses from the old server to the new.

For approximately half of the servers impacted, Scalr was able to override the actions taken by the Scalr desired state engine and restore to the previous instance of the database prior to the incident occurring. For the other approximate 50% of servers, new servers were launched in replacement of the previous, and came up as expected.


Incident Response


For those servers that were stuck in a pending state (e.g. failed, initializing, pending state), Scalr worked to restore the original status from database backup, and make Hosted Scalr consistent with the original state of the servers.  Scalr also ensured that DNS records were updated, EBS volumes were restored, and Elastic IP’s were re-attached.

For those servers that were re-launched and came back up successfully, if everything appeared to be functioning normally, Scalr took no action.

Scalr has worked with a number of users over the day to ensure that any pending servers are recovered and/or that any relaunched servers are configured correctly.

Scalr is adding an “Orphaned Server” page within the user interface (under Discovery Manager), that will show servers that were “orphaned”, where a new server was launched in replacement of a server with a lost record. We recommend all Hosted Scalr users access the page, and determine if there are any identified orphaned servers that should be terminated. If no longer needed, orphaned servers can be removed.  In the event that an orphaned server needs to be preserved, please submit a new support ticket at support.scalr.net.

Additionally, for customers using Amazon’s Elastic Load Balancer (ELB), we did not modify configuration. ELBs should be checked and validated to ensure that any new servers are properly configured within the ELB.


Root Cause Explanation


At 6:14am PST, an update was applied to Hosted Scalr that contained an error in the server record removal logic.  The change unfortunately passed peer review and was pushed to production.

This change meant that Scalr now removed server records under the wrong conditions. For each server for which there was no longer a server record, the Scalr desired state engine correctly started auto-healing by launching new replacement servers, and moving configuration such as storage (EBS) and network (DNS, EIP) to them. Unfortunately, it did so by “stealing” that configuration from healthy instances which were still running but for which Scalr had no record. It is this remediation that caused customer outages.


Corrective Action


While there are a number of benefits associated with the Scalr Desired State Engine, in this particular case, the feature exacerbated the problem by launching new servers. Scalr does not see this as an inherent problem in the architecture, but rather, additional steps need to be taken to ensure issues are accurately identified and fixed.

To supplement our CI/CD pipeline, Scalr will introduce a new QA layer in between trunk and our Hosted Scalr deployment for additional testing of any changes being introduced to the platform. Additionally, we are adding a CI/CD plug-in that will scan for changes to SQL queries and won’t automatically merge those changes until a higher level of scrutiny is obtained.  

Currently, we have addressed all known, pending customer issues relating to the incident; however, we anticipate that users will identify further issues with their deployments and configurations. For any further issues, please submit a new support ticket at support.scalr.net, and we’ll work with you to resolve them.


Thank you for your patience, and we appreciate your collaboration and understanding in helping us to recover from this incident.

Edited to add: we are of course incredibly sorry about this, and will have a statement in the morning, plus a personal apology to every customer. 

Edit 2:  We will be providing additional communication regarding ongoing issues by Noon PST today.  Thank you for your patience in the meantime.


 

Earlier this morning, an update with flawed logic was deployed to Hosted Scalr causing server records to be removed but not actual (AWS, other cloud) servers.

By default, where Scalr lost communication with a running server (or when the server record is removed), the Scalr desired state engine will take action to launch a new server in replacement of the previous, and then move resources such as EBS volumes and EIP addresses from the old server to the new. As a result, there are two likely scenarios:

(1) In situations where servers were re-launched, and restarted successfully, including reconfigured EBS volumes, and instances added back to ELB, there should be no further action required, and services should be restored normally.  We will follow-up to see if any remaining issues; however, no specific action required at this time.

(2) In situations where servers were re-launched, but not restarted successfully (e.g. servers still in a failed, initializing or pending state), and/or EBS, EIP, and ELB not configured properly, Scalr is working on individually restoring these particular servers and will communicate any specific issues or information to the users directly.  We are going through these on a case-by-case basis to make sure we handle as cautiously as possible, and therefore complete recovery may take better part of day.

We will continue to provide regular status updates at support.scalr.net throughout the day, and will follow-up, post-recovery, with a Root Cause / Correct Action report.


Thank you for your patience and assistance with this issue.


 

Custom Scaling Metrics allow administrators to scale their applications up and down based on customized parameters that might not be covered by Scalr’s “out-of-the-box” Scaling Metrics.


Up until now, the available Custom Scaling Metrics were:

  • File-Read - Scalr will try to read a given file on your Server, and expect to find a number there. That number will be used as the Scaling Metric's value.

  • File-Execute - Scalr will try to execute a given script on your Server, and expect the script to output a number to standard output. That number will be used as the Scaling Metric's value.


We recently introduce a new method for obtaining Custom Scaling Metrics -

  • URL-Request - Scalr will send a HTTP GET request to the specified URL and expect a number in the returned content body. That number will be used as the Scaling Metric's value. The URL-Request Custom Scaling Metric method supports Global Variable interpolation.


WIth this new method, you are able to easily export a metric from a 3rd party application such as a monitoring platform, and use that information to scale your Scalr Farms. For example, use a Custom Scaling Policy based on Datadog’s analytics to scale a Farm Role up or down.



For more information please refer to the documentation.


As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 

We recently introduced a number of enhancements to the Scalr’s Governance and management capabilities. These features include:

Azure Cloud Resources Management

You can now manage and create Azure Resource Groups, Virtual Networks (including Subnets) and Security Groups through Scalr. Previously administrators could only leverage existing resources, and had no visibility for them. The added support for Azure Cloud Resources allows administrators to view, create and delete resource, as well as use them in Farms.




New Governance Policy for OpenStack - Restrict Usage of Floating IP Pools

You are now able choose which floating IP pools will be available for use in specific OpenStack regions. Use the new openstack.network.floating_ip_pools Governance policy to select the relevant pools:



The pools selected with this Governance policy will be the only ones permitted for use in the Environment for which the policy was configured. If openstack.network.floating_ip_pools is not configured all floating IP pools will be available for use. Multiple policies can be configured for multiple regions.


New Governance Policy for AWS: Require Multi-AZ RDS Deployments

When using a Multi-AZ RDS deployments, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Multi-AZ RDS provides greater availability and durability for production database workloads.


You can now use the aws.rds.instance.require_multi_az Governance policy to require all RDS deployments in the relevant Environment to be Multi-AZ.



As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 


Cloud Services integration has been enhanced with the ability to designate ownership, as well as simplified management of Cloud Services in the Farm Designer.


Cloud application infrastructure often includes more than compute instances. Applications rely on Cloud services for functions like relational databases and load balancing. In Enterprise Scalr 6.1 and Hosted Scalr, administrators are now able to designate individual and team ownership for Cloud services.


For example, when creating an RDS DB instance you can now specify ownership:

13 Allows for Ownership designation.png


Additionally, Cloud services can easily be linked to Farms from the Farm designer. This provides greater visibility into the complete application stack and easier management.



The enhanced integration for Cloud services is currently available for AWS RDS and ELB, and will soon expand to include more services. For more information please refer to the documentation.


As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 

The new Garbage Collection feature prevents “orphaned” cloud resource from accumulating and generating avoidable spend.

Normally, EBS volumes persist when their owning instances are scaled down by Scalr, so they may be reused in the future to minimize downtime.


However, there are several cases in which storage resources may become “orphaned”.  This can occur when an application is deleted, or if a tier in an application is scaled down with Scalr’s “Re-use storage” option not enabled. These situations produce waste, and after a while this waste can have a significant impact on budget consumption.


The new Garbage Collector feature serves as an aggregator for orphaned storage resources and allows administrators to easily discard unused volumes. As of publishing this announcement, the Garbage Collector is available for AWS EBS volumes, additional Garbage Collection options will be added soon!



With Garbage Collector’s current support for AWS, any orphaned EBS volumes will be placed in a clean-up queue. The administrator can then dispose of the collected volumes.

For more information please refer to the documentation.


As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 

Scalr now supports the x1.32xlarge instance, the largest memory-optimized EC2 instance. From the AWS announcement:


“X1 instances are ideal for running in-memory databases like SAP HANA, big data processing engines like Apache Spark or Presto, and high performance computing (HPC) applications. X1 instances are certified by SAP to run production environments of the next-generation Business Suite S/4HANA, Business Suite on HANA (SoH), Business Warehouse on HANA (BW), and Data Mart Solutions on HANA on the AWS cloud.”


Allow the usage of the X1 instance with the cloud.instance.types Governance policy:


Use the X1 instance in a Farm Role:


To lean more about the X1 instance please refer to the AWS Blog.


As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 

On June 1, 2016, Scalr will deprecate support for MongoDB Automated Roles. It is requested that Users who are currently running MongoDB Automated Roles redeploy using Base Roles.

To simplify the process of deploying MongoDB, Scalr has put together the this tutorial for deploying a MongoDB cluster using orchestration.

Currently, Scalr has disabled the ability to launch new MongoDB Automated Roles, and on June 1st, all MongoDB Automated Roles will be converted to Base Roles.

If you are using MongoDB Automated Roles, we encourage you as soon as possible to re-deploy using Base Roles and the information provided in the above tutorial.

 

As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team

 

We will soon deprecate the first-gen version of MySQL built-in automation. A new, improved version of MySQL automation is already available, if you don’t have any running Farm that uses the older version, you don’t need to do anything!


The next time you launch a MySQL Role it will automatically be equipped with the new automation tools, some of the benefits include:


  • Configuration Management - Easily change MySQL configs on the fly

  • Master storage control and visibility

  • Usage and performance statistics

  • Improved reliability




First-Gen MySQL automation will be deprecated on May 1st, 2016.

If you have instances currently running with First-Gen MySQL automation, they will not be disrupted, however, they will not work properly after a reboot past this date.


A notification will soon appear by running Farms that are using old MySQL automation, this will make it easier to learn which database Roles need to be migrated.


As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 

The Scalr Discovery Manager is a tool that discovers and imports existing cloud instances onto the Scalr Cloud Management Platform. Once the instances are imported, they are mapped to Farms and Farm Roles, and share many of the benefits of Scalr provisioned infrastructure.


The Discovery Manager does not disrupt imported instances in any way: the entire process is handled through the cloud platform’s APIs.


Normally, cloud infrastructure provisioned through the Scalr self-service catalog uses the Scalarizr agent. The agent allows Scalr to provide performance metrics and perform various automation tasks.


Recently, we introduced the option to create Roles that don’t use the agent. This means management is limited to actions that can be performed via cloud APIs. Instances of agentless roles still have many of the benefits of Scalr, but lose some automation capabilities.



 

Discovery Manager’s goal is to import existing cloud infrastructure without disrupting applications, and provide a single pane of glass for Scalr provisioned applications and applications created before Scalr was introduced to the environment.


To avoid disruption, instances imported through the Discovery Manager are agentless. Currently, Discovery Manager is available only for AWS EC2 instances.


How To Use The Discovery Manager


The Discovery Manager uses a simple, straightforward importing process.


Step 1: Discovery


In the relevant Environment, from the main menu, select Discovery Manager, then Servers. You will then see a list of discovered instances. Only running instances can be imported.






Step 2: Import


Select the instances you’d like to import into the same Farm Role, and click “Import”.

 

 

NOTE: Discovery Manager is subject to the Environment's Governance policy. This means that if an instance uses a VPC or subnet that’s not allowed in the Environment, you will not be able to import it. Make sure you adjust your Governance policies if needed.

 

 

 


Step 3:  Register Image in Scalr


Every Role in Scalr uses an image, images need to be registered in the catalog. If the image is already registered, this step will complete automatically, if not, click on the “Register Image in Scalr” link to go through a quick image registration wizard.


 


 

NOTE: When registering an image, make sure to get the OS family and version right, as choosing the wrong settings could potentially cause issues.

 



Step 4: Create a Role based on the registered image


Once the image is registered, a Role needs to be created. This Role will be used as the Farm Role later on, and can be used a template for future instances of the application tier.


You only have to do this once per Role. If you’ve already imported instances with the same image, that use the same Role, you will not need to create a new one. If one relevant does not yet exist, click the “Create Role With Image Above” link.



Clicking “Create Role With Image Above” will take you to the standard Role Creation screen, when you save the Role you’ll taken back to the Discovery Manager.


Step 5: Map imported instances to Farm and Farm Role


Once the Role is created, select a running Farm from the drop-down menu, and then select a Farm Role within that Farm. If a relevant Farm Role does not exist, you’ll have the option to create one.



 

NOTE: When creating a new Farm Role, make sure you select the correct subnet for VPC instances.

 


Once you’ve selected your Farm and Farm Role, click “Continue” at the bottom of the page.


Step 6: Done!


You will be prompted to review and confirm the configuration, once you hit “Confirm”, the instance will be instantly imported!



 

NOTE:  Whenever you import instances into a Farm Role, new or existing, auto-scaling will be disabled on it. This default is to avoid a situation where the import of new instances violates the scaling quota for the Farm Role, resulting in instance termination. Once the importing process is done, you can go to the relevant Farm to review and re-enable auto-scaling.

 


As always, if you have any trouble, please contact support!

For any suggestions, feedback or any questions you might have, feel free to contact me directly at ron@scalr.com .

-- Ron, for the Scalr Team


 

AWS have recently expanded their footprint  with a new region in Seoul, South Korea.

Scalr users can now leverage the new region while orchestrating AWS resources. The new Seoul region has two Availability Zones and supports the follow EC2 instance families:

  • T2

  • M4

  • C4

  • I2

  • D2

  • R3




The Seoul region marks the 12th region in Amazon’s global cloud infrastructure, and the fourth Asia Pacific region. Other Asia Pacific regions include Singapore, Tokyo and Sydney, all of which are supported by Scalr as well.


For more information about the new region, please read the official AWS announcement.


As always, if you have any trouble with these updates, please contact support!

For any suggestions or feedback, feel free to contact me directly at ron@scalr.com.

-- Ron, for the Scalr Team