User Stories

Streamlining OpenStack Upgrades: A Case Study of Indiana University’s Efficient Upgrade Strategy

Indiana University’s experience serves as a prime example of how strategic planning, reduced upgrade cadence, and the right tools can transform the upgrade experience and contribute to a more stable, reliable cloud environment.

April 21, 2025

Allison Price

OpenStack is an open-source cloud computing platform that powers many of today’s largest cloud environments. However, managing upgrades in a large-scale OpenStack deployment can present substantial challenges. This article examines the upgrade process and efficiencies gained by OpenInfra Foundation Associate Member, Indiana University (IU), which has been running OpenStack since 2015. The university’s experience highlights the importance of strategic upgrade planning, particularly for large deployments, and how the adoption of a more structured approach to upgrades has streamlined operations and minimized disruptions.

A Historical Perspective on OpenStack Upgrades at Indiana University

IU first deployed OpenStack in 2015, beginning with a cloud environment powered by 320 hypervisors. The university’s OpenStack deployment has grown significantly over the years, with their second cloud deployment expanding to 506 hypervisors. This scale introduces unique challenges during upgrades, especially in terms of database and message bus load—both critical components for the operation of OpenStack during the upgrade process. As the size of the deployment increased, the complexity of upgrades grew, and a more efficient approach became necessary.

Indiana University’s OpenStack upgrade path has evolved over the years, with several key milestones:

Juno, Kilo, Liberty: 2015
Mitaka, Newton: 2016
Ocata, Pike: 2017
Queens, Rocky: 2018
Stein, Train: 2019
Wallaby: 2021
Xena, Yoga, Zed: 2023
Antelope, Caracal: 2024

Each upgrade cycle presented challenges in managing and maintaining such a large and growing environment. As the number of hypervisors increased, so did the workload involved in the upgrade process, particularly when dealing with upgrades to the database, RPC systems, and network agents.

The Challenges of Large-Scale OpenStack Upgrades

One of the primary challenges IU faced during OpenStack upgrades was the sheer scale of the deployment. Specifically, the 300-500 hypervisor range proved to be an inflection point in terms of system scaling. The databases and RabbitMQ message bus that handle inter-component communication (RPC) were under heavy load during upgrades, requiring careful planning and execution.

Before adopting more streamlined upgrade processes, IU faced several key pain points during each upgrade cycle:

Upgrading Hypervisor Agents/Daemons: With hundreds of hypervisors, IU had to upgrade around 1500 agents and daemons across the environment. This required significant effort and coordination.
Data Migrations: Each OpenStack release often required data migrations, which added to the complexity and duration of the upgrade process.
Network Agent Outages: After upgrading centralized networking agents, IU experienced brief outages in tenant networks as agents were restarted, leading to service disruptions.

Despite having conducted 14 successful upgrades across production OpenStack clouds, the process was still a labor-intensive, multi-hour event.

The Introduction of SLURP: Streamlining the Upgrade Process

The adoption of SLURP (Stable Long-Term Upgrade Release Policy) was a game-changer for IU. SLURP’s goal is to reduce the frequency of major OpenStack upgrades while ensuring stability and timely patching of critical components. By reducing the number of upgrades required, IU could streamline the process and minimize the disruptions typically associated with frequent upgrades.

Key Benefits of SLURP for IU:

Faster Upgrades: Upgrading from Zed to Antelope took about 4 hours, while the upgrade from Antelope to Caracal took only 3 hours—a significant improvement. By skipping Bobcat, the university was able to upgrade two releases in half the time, reducing both the duration and the number of outages.
Minimized Outages: The upgrade process under SLURP resulted in half the outages compared to previous cycles. This was achieved through better planning and a more efficient process, which helped keep service disruptions to a minimum.
Time Efficiency: Instead of spreading the upgrade over two days, the Antelope to Caracal upgrade was completed in just one day. The reduction in time required to complete upgrades allowed for smoother operations and better resource management.

Scheduling Upgrades for Minimal Impact

One of the additional advantages of SLURP is its impact on IU’s ability to schedule upgrades during off-peak times. The university traditionally experiences lower usage during the summer months, particularly in July and August, when many users are on vacation. By aligning the upgrade schedule with this quieter period, IU minimizes the impact on end users and ensures that any potential issues are dealt with before the start of the academic year.

Improved User Experience

The change in upgrade cadence has had a noticeable positive effect on IU’s end users. Previously, users were often impacted by frequent, noticeable maintenance periods and interruptions. However, with the new approach under SLURP, users are far less likely to notice when an upgrade takes place unless they encounter new features or improvements.

With reduced maintenance windows and fewer disruptions, users now enjoy a more stable platform with fewer API issues and service outages, creating a more seamless and reliable experience.

The Evolution of OpenStack Upgrades: From a Multi-Day Ordeal to Routine Maintenance

IU’s approach to OpenStack upgrades has evolved significantly over the past decade. Initially, upgrades were a multi-day ordeal that occurred twice a year, requiring intensive work to patch bugs, deal with network outages, and resolve issues with intermediate versions. These upgrades were laborious, time-consuming, and disruptive to users.

Today, thanks to SLURP and a more efficient upgrade strategy, upgrades have become a routine maintenance task that takes place over the summer when user activity is lower. The upgrades are far less disruptive, and the team has become adept at handling them in a way that minimizes the impact on both infrastructure and users.

The experience of IU highlights how large-scale OpenStack deployments can benefit from a more structured and efficient upgrade process. By adopting SLURP, the university has been able to reduce the frequency of major upgrades, streamline the upgrade process, and minimize service disruptions.

Today, OpenStack upgrades at IU are no longer a significant event but rather a smooth, well-planned part of routine maintenance. This evolution from a challenging, multi-day ordeal to a more efficient and predictable process has improved operational efficiency, user satisfaction, and compliance—offering valuable lessons for other organizations running large-scale OpenStack environments.

As OpenStack continues to evolve, IU’s experience serves as a prime example of how strategic planning, reduced upgrade cadence, and the right tools can transform the upgrade experience and contribute to a more stable, reliable cloud environment.

Tags: OpenStack, SLURP, upgrades

Author
Recent Posts

Allison Price

VP of Marketing & Community at OpenInfra Foundation

Allison is the VP of Marketing & Community at the OpenInfra Foundation (previously the OpenStack Foundation). Her mission is to continue the storytelling of the global community by building relationships with the developers, operators and ecosystem around global open source communities. In her free time, she likes to listen to Celine Dion, rank local margaritas, and travel.