Archive for July, 2007

Further Emergency Downtime & Update

Thursday, July 12th, 2007

In response to the file server problem we have been working on since Tuesday, a vendor field engineer will be coming out tonight to help troubleshoot and diagnose the problem. He is expected to arrive around 20:00 (8:00 PM) tonight, Thursday night.

It is very likely that downtime of the server will be required while the engineer is here. This will mean that all services provided by CS Staff, including public cycle servers, clusters, email service, web service, DNS, etc. will be shutdown, as they all depend on the file server. Please make sure to regularly save any data you are working on to protect against losing data when services are shutdown.

We appreciate your patience throughout these last few days, and apologize for any inconvenience.

Update 2007/07/13 @ 03:20 After over 7 grueling hours of emergency downtime and troubleshooting, the network and systems are again up and running, and initial signs look good. While we hesitate to declare victory, we would ask that you please report any instability you notice with as much detail as possible about what you were doing and what failed.

We thank you again for your patience, especially those of you working toward deadlines. If we have indeed licked this issue, look forward to some exciting announcements in the coming weeks.

Emergency Maintenance

Wednesday, July 11th, 2007

Our systems are still not playing nice with each other after the installation of a new file server. To get everything in a known state, we must initiate a full shutdown/startup of the equipment in 218 at 8:45am this morning. We don’t necessarily expect this to fix everything, but it will eliminate many variables. Thank you for your patience.

Update @ 13:45: We are continuing to experience unstable NFS performance, especially on the public linux servers. The public solaris machines (shades), while also affected, appear to be more usable under these conditions. We are working with vendor support to isolate the issue or issues. Further updates will appear on this site as they become available.

Update 7/12 @ 12:15: Problems continue. We are working with our vendor to determine if this is a software or hardware issue.

File System Issues

Tuesday, July 10th, 2007

During our downtime this morning we replaced our aging file server. Despite our efforts, the linux cycle servers are not reliably maintaining their connection. We are re-booting the file server to (hopefully) bring things to a good state. This reboot will impact all our systems for 10 minutes or so. We apologize for the inconvenience.