Archive for January, 2012

Unplanned Outage – Email – January 28, 2012

Saturday, January 28th, 2012

We have a report of e-mail trouble. We are investigating.

UPDATE 10:05am: E-mail is “up” but running extremely slowly leading to time-out problems. We are continuing to work the issue.

UPDATE 11:08am: We are repairing the e-mail system’s database. The system will continue to be slow for a while. The problem appears to be a latent fault introduced when our previous storage system failed on January 12, 2012. That hardware has since been replaced, but a flaw in the data had migrated to the new system.

UPDATE 11:48 AM: E-mail service is actually down at the moment, while data is being repaired. We are working to restore service ASAP.

UPDATE 14:00: E-mail service is running again, but still very slow. It may be turned off again at some point, as troubleshooting is still underway.

UPDATE 16:00: E-mail service has been restored. We have located a source of delays in our storage and eliminated it for the time being. Our apologies for the extended inconvenience.

Downtime: Tue, January 24, 2012

Friday, January 20th, 2012

On Tuesday, January 24, 2012, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime affects all users of the department’s computing and networking infrastructure.

Scheduled work includes:

  • Operating System updates and miscellaneous work on the infrastructure.

Unplanned Outage – January 16, 2012

Monday, January 16th, 2012

We are still having trouble with the storage array that hosts our virtual infrastructure. We had a short outage on the CS websites but those sites are all back up again as of 12:35pm. Here is a list of things that are currently down:

1) The submission server that runs the check submit script on dropbox.cs
2) opus – the public server

We will post updates to this page as more information becomes available.

UPDATE 1:32pm: The submission server that runs the check submit script on dropbox.cs is now online again.

UPDATE 1:33pm: opus is now online again.

CS Websites and Tux – Unplanned Outage – January 15, 2012

Sunday, January 15th, 2012

As of 12:12am, the main web pages are down. We are investigating.

UPDATE 9:35am: The filesystem for the webserver is corrupt and is in the process of being repaired.

UPDATE 10:00am: The main web pages are now online again.

UPDATE 10:10am: Tux is still down please use Opus until it can be restored.

E-mail / Web – Unplanned Outage – January 12, 2012

Thursday, January 12th, 2012

As of 7:00am, E-mail and the main web pages are down. We are investigating.

UPDATE 8:05am: Problem is isolated to an old storage server that is used by our virtual machine servers.

UPDATE 8:35am: Storage server is coming back online. After that, all the virtual machines will need a clean restart.

UPDATE 8:45am: Storage server is still having trouble. No ETA yet.

UPDATE 9:35am: We are waiting for a return call from the storage server vendor.

UPDATE 10:05am: Systems are beginning to come back online. Simultaneously, the storage system is rebuilding and remirroring the data. During this time, we expect that systems will be slower. We are also working to stabilize the systems.

UPDATE 11:20am: Systems are basically online. Storage system rebuilding (and associated slowness) continues. E-mail was delayed but none lost.