Further Emergency Downtime & Update

In response to the file server problem we have been working on since Tuesday, a vendor field engineer will be coming out tonight to help troubleshoot and diagnose the problem. He is expected to arrive around 20:00 (8:00 PM) tonight, Thursday night.

It is very likely that downtime of the server will be required while the engineer is here. This will mean that all services provided by CS Staff, including public cycle servers, clusters, email service, web service, DNS, etc. will be shutdown, as they all depend on the file server. Please make sure to regularly save any data you are working on to protect against losing data when services are shutdown.

We appreciate your patience throughout these last few days, and apologize for any inconvenience.

Update 2007/07/13 @ 03:20 After over 7 grueling hours of emergency downtime and troubleshooting, the network and systems are again up and running, and initial signs look good. While we hesitate to declare victory, we would ask that you please report any instability you notice with as much detail as possible about what you were doing and what failed.

We thank you again for your patience, especially those of you working toward deadlines. If we have indeed licked this issue, look forward to some exciting announcements in the coming weeks.

Further Emergency Downtime & Update Read More »

Emergency Maintenance

Our systems are still not playing nice with each other after the installation of a new file server. To get everything in a known state, we must initiate a full shutdown/startup of the equipment in 218 at 8:45am this morning. We don\’t necessarily expect this to fix everything, but it will eliminate many variables. Thank you for your patience.

Update @ 13:45: We are continuing to experience unstable NFS performance, especially on the public linux servers. The public solaris machines (shades), while also affected, appear to be more usable under these conditions. We are working with vendor support to isolate the issue or issues. Further updates will appear on this site as they become available.

Update 7/12 @ 12:15: Problems continue. We are working with our vendor to determine if this is a software or hardware issue.

Emergency Maintenance Read More »

File System Issues

During our downtime this morning we replaced our aging file server. Despite our efforts, the linux cycle servers are not reliably maintaining their connection. We are re-booting the file server to (hopefully) bring things to a good state. This reboot will impact all our systems for 10 minutes or so. We apologize for the inconvenience.

File System Issues Read More »

Downtime: Tuesday, April 17, 2007

On Tuesday, April 17, 2007, we will have a scheduled downtime from 4:00am to 8:00am EDT.

Scheduled work includes:

  • Firmware upgrades on our disk arrays

This work will bring our infrastructure up-to-date with current software and firmware. Among other things, the firmware updates applied to the disk arrays during the last downtime will be freshened to correct some remaining issues.

Downtime: Tuesday, April 17, 2007 Read More »

Update: Tuesday, February 27, 2007

Our maintenance window for today has closed. At this time, everything is back online with the following exceptions:

  • Sunray server and Sunray terminals in the building
  • Compute node for the hbar system (only applies to COS 598A)

We are working on these issues and will update this post when we have more information.

Update (9:45am): the Sunray server and terminals are back online.

Update (10:30am): everything is back up.

Update: Tuesday, February 27, 2007 Read More »

Downtime: Tuesday, February 27, 2007

On Tuesday, February 27, 2007, we will have a scheduled downtime from 4:00am to 8:00am EST.

Planned maintenance includes:

  • Operating System patches
  • Firmware upgrades to our disk arrays
  • Firmware upgrades to and installation of network hardware

This work will bring our infrastructure up-to-date with current software and firmware. Among the patches are support for the new start of Daylight Savings Time which will begin March 11th this year.

Downtime: Tuesday, February 27, 2007 Read More »

Scroll to Top