Data Centers

IMA Financial Group

Problem

Similar to the situation at ICAT, when I arrived at IMA Financial Group, the company only had one on-premises data center, located in its Wichita office. As the building suffered from frequent power outages, system downtime was deemed unacceptable to executive leadership. However, as there was no failover data center, there were no good solutions readily available. I immediately upgraded the UPSs for the data center, however, due to power requirements and space constraints, I was only able to attain about 45 minutes of UPS runtime. Unfortunately, many of the power outages experienced in the office exceeded UPS runtime.

Solution

Also similar to ICAT, IMA was planning to build a new office in Wichita. As a member of the new office planning committee, I was able to allocate and design a larger data center space in the new facility. Working with my Infrastructure team, I developed a new data center plan and budget that included all new server racks, network switches, servers and storage.

While the new office building was under construction, I had my Infrastructure team build the new servers, switches and storage in their lab and then replicate the data and virtual machines from the production data center. As the new building neared completion, the new data center space was prioritized and completed early. Once the space was clean and secure, the team installed the new hardware and enabled SAN replication. In this instance, we utilized the native SAN replication, which was Purity by Pure Storage, to replicate all data and VMs to the new data center. Then, at close of business on the day before the office move, we completed a final replication to the new data center, stood up the new systems and repointed public DNS to the new data center. Over the weekend, while the rest of the office was being moved, the Application and Data teams tested and validated their systems.

Once the new data center was live and validated, the old data center hardware was shipped to the IMA office in Denver, where the company had an empty data center that had been built when the office was constructed. The old data center hardware was reconfigured. powered up, and then data and VMs were replicated from the primary data center in Wichita. To speed replication and data flow between IMA's two largest offices (Wichita and Denver), additional internet circuits were ordered from a different carrier than the original. We then implemented a Software-Defined Wide Area Network (SD-WAN) solution from Silver Peak between the office locations. This allowed production traffic to flow over one circuit while replication traffic was optimized over another circuit. F5 load balancers were also acquired and installed in each location. Combined with Border Gateway Protocol (BGP) routing, we achieved full internet circuit redundancy, load balancing and automatic failover between circuits.

Outcome

When the project was complete, IMA gained two data centers for the price of one. The Recovery Time Objective (RTO) was reduced from several weeks to less than one hour and the Recovery Point Objective (RPO) was reduced from 24-hours to less than 15 seconds. Additionally, annual system uptime increased from around 89% to 99.999%.

International Catastrophe Managers

Problem

At International Catastrophe Managers (ICAT), I inherited another problematic data center. In this scenario, though, the servers, storage and networks were fine and fairly new. The problem was with the on-premises data center, itself. At the time, the company was leasing office space on the top floor of the University of Colorado (CU) Space Science Center in Boulder. Whenever it rained, or snow melted, the roof over the data center would leak and water would drip on the server racks. The racks were covered with tarps (I'm not joking) that drained the water into 5-gallon Home Depot buckets. The data center also doubled as a storage room for the business, which was the source of numerous other problems. As an additional problem, there was one data center at the time. While there were backup tapes that were sent to Iron Mountain on a weekly basis, if there was a problem with the data center, itself (and there were clearly issues), there was nowhere else to restore the data center to.

Solution

Fortunately, the lease was coming due on the office space and CU wanted to reclaim the office space for their needs. ICAT also wanted newer and nicer office space, so it was a win-win for all. After reviewing available space and requirements for the data center, I decided it would be better to move to a co-location solution. After seeking bids from multiple vendors, SunGard Availability Services was selected. Working with my Systems Administrators, we developed a new architecture that included a primary data center to be located at the Denver. CO SunGard location and a failover data center at their Scottsdale, AZ location.

As we wanted to minimize downtime during business hours, and needed to purchase additional servers, network switches and storage for a second data center, it made the most sense to purchase all new hardware for the new, co-located primary data center. This allowed use to build the new data center while the old one was still in operation. Once the new data center was built, we utilized VMware Site Recovery Manager (SRM) to replicate all data and Virtual Machines (VMs) to the new data center. Once the replication was complete, we targeted the following weekend for the cutover. On the Friday evening before the cutover, after business hours, we completed a final replication. The next morning, working with the Development and Applications teams, we began bringing up the new servers and restoring connections. This process was completed by noon on Saturday, which gave both teams, plus my DBAs, the opportunity to test and validate all applications and data.

Once the primary data center was live, we shipped all of the original servers, network switches and storage to the SunGard facility in Scottsdale. My Systems Administrators and I then flew to Scottsdale to populate the failover data center with the original hardware. As soon as the systems were up, running and tested, we used SRM to run a differential replication of the data and systems to the failover data center. After the Development, Applications and DBA teams completed and validated their testing the project was complete.

Outcome

The data center overhaul, migration and expansion projects were completed on time and on budget. For a little over $1.2 million, ICAT attained two co-located data centers, one primary and one failover, in state-of-the-art SunGard colocation sites, in two different states. System performance was greatly increased in the primary data center due to newer and faster hardware. The Recovery Time Objective (RTO) was reduced from several weeks to approximately 1.5 hours (based on failover testing). Also, we never had to worry about the rain again.

Longmont United Hospital

Problem

My first hands-on experience with Data Centers came when I was Manager of Technical Services for Longmont United Hospital. When I first arrived in the role, I found the on-premises data center in complete disarray. The hospital data center had grown "organically" through the years. As a new server was needed, it was thrown into whatever rack had space. If there was insufficient rack space, then a used rack was procured from a secondary market, such as eBay or Craigslist. The HVAC system was nearly 20 years old and couldn't keep up with the cooling demands of the installed equipment. Network and power cabling were jumbles of cable stuffed into every nook and cranny. Additionally, there was no redundancy in either the Local Area Network or Storage Area Network. As such, the hospital experienced frequent outages, which negatively impacted patient care.

Solution

As patient care was at risk, the state of the data center was unacceptable. Over the next year, I worked with my team of System Administrators, select vendors and the hospital's building services team to design an overhaul of the entire data center. Together, we developed a proposed design and budget, estimated at $2.5 million, for the data center overhaul project. I first presented the plan and budget to the CIO and, with his blessing, presented the proposal to the hospital's Board of Directors, who approved the project and budget.

As the current data center space was the only available space for the data center (hospital policy required the data center to be on premises) we had to plan for an in-place upgrade with minimal downtime so as to not negatively impact patient care. Over the course of the next 6 months, the project team:

Replaced the raised floor (with original racks in place)
Replaced all racks with standardized APC racks
Replaced with HVAC system with an Airstack free air-cooling system and redundant, APC rack-based, variable speed chilled-water cooling units arranged in a hot aisle containment configuration.
Installed new and upgraded 3-phase power transformers
Installed new Automatic Transfer Switch for the power to utilize city power, hospital generator power and/or truck-mounted generator power
Replaced the halon fire suppression system with an FM200 fire suppression system
Replaced UPS units with redundant APC UPS solution proving over 90-minutes of run time on both primary and backup
Replaced network switches with redundant, high-availability Cisco Nexus core switches and Cisco Catalyst top-of-rack switches
Installed Citrix network load balancers
Replaced SAN switch with redundant, multi-path Cisco Fibre Channel switches
Replaced all ethernet and fiber optic cabling with new, standardized and custom-length cabling in separate routing channels
Replaced SAN with NetApp SAN for increased storage and speed
Replaced all servers with HP Blade Chassis and servers
Installed motion-sensing, edge-lit LED panel lights

Outcome

When completed, the project was finished on-time, but we were a little over $100,000 under budget. With the CIO's blessing, we invested the project savings into upgrading the Technical Lab and Warehouse, which was badly needed, where hospital workstations were serviced and stored. Not only did the project come in under budget, but we only incurred a total of 4 hours of planned downtime/outages during the entire project. Upon completion, the data center achieved 99.997% uptime, which represented an almost 200% improvement over the original data center.

You can check-out a video of the entire project here:

Longmont United Hosptial Data Center Project

VMware Technical Support

As a Technical Support Manager for VMware, I managed the Systems, Network, and Storage support teams. In these roles, I managed teams that were responsible for supporting VMware's virtualization solutions for customers around the globe. As virtualization was still a fairly new concept, at the time, VMware Support was typically on the hook for supporting a number of non-VMware issues as customers regarded virtualization as "guilty until proven innocent". As such, my teams had to be knowledgeable of the entire server, operating system, network and storage stack, in addition to being experts of VMware's solutions. As Technical Support Manager, I was responsible for all escalations. This, in turn, required me to also be as knowledgeable as my Technical Support and Escalation Engineers.

Data Centers

IMA Financial Group

International Catastrophe Managers

Longmont United Hospital

VMware Technical Support

NAvigation

This website uses cookies.