Archive for September, 2009

Downtime: Tuesday, Sept 15, 2009

Saturday, September 12th, 2009

Update 3:25pm: — Copy operation is complete. E-mail service is back online. It may take an hour or two for all queued mail to be delivered.

Update 2:40pm: — Copy operation is 86% complete. Our guess is now 3:20pm +/- 30 minutes.

Update 1:20pm: — Copy operation is 57% complete. The past few percentage points have taken wildly differing amounts of time to reach. Therefore, the estimate for a completion time is highly speculative. Our guess is now 3:30pm +/- 1 hour.

Update 11:10am: — Copy operation is 42% complete. Our revised estimate for operational e-mail is now 2:15pm.

Update 9:45am: — We are in the midst of making a new working copy of the disk image of the e-mail store. We will be able to restart the e-mail server shortly after the copy finishes. Based on the progress meter for the copy (over a few percentage points), we estimate that e-mail will be back up at approximately 2:00pm. We understand that this is a major inconvenience and will we post periodic updates as the progress continues.

Update 8:10am: — Everything except E-mail service is back up and running. E-mail is down due to issues with the underlying virtual machine infrastructure. We are on the line with VMware now and hope to have this resolved soon.

Basically, we have two pools (the “old” and the “new”) of physical storage on which we run virtual machines. Within these virtual machines (VMs) we run various services within the department. Over the past few months we have migrated services from the old storage pool to the new storage pool. The old pool has each VM tied to a particular storage device while the new pool will let us migrate the VMs between devices as needed. On August 25, we were to migrate the last VMs (running e-mail) from the old pool over to the new pool. Due to a bug in the VM control software, this failed and we were forced to get the E-mail VMs running without having much control over them. The primary goal of today’s downtime is to complete this migration. For reasons unknown, the simple act of shutting down the virtual machines has put the system in a state where it can’t be started cleanly. We are on the line with VMware to address the issue.

 

On Tuesday, September 15, 2009, we will have a scheduled downtime from 4:00am to 8:00am EDT.

This downtime affects all users of the department’s computing and networking infrastructure.

During this time, most of the services (e.g., E-mail, web, cycle servers) will be unavailable. E-mail destined to the department will be queued and delivered at the end of the maintenance window.

Scheduled work includes:

  • Updating our virtual machine machine infrastructure
  • Rebooting remote switches in Sherrerd Hall and 221 Nassau

This work is a follow-up to our previous downtime on August 25 and will increase the stability of our infrastructure in preparation for the Fall semester.