Archive for December, 2008

Connectivity Problems

Monday, December 22nd, 2008

Due to a yet-to-be identified source, we are seeing very large bursts of connections to large numbers of outside IP addresses. These hour-long bursts occurred at approximately 1:00am and 7:00pm on Sunday, and 1:00am and 7:00am on Monday. These events filled the firewall connection table and disrupted connections for about 3 hours each.

Update: While the source has been identified, we have not been able to reach the user. The traffic began again at 1:00pm today. We have disabled that port. You may notice some delays for a few more minutes while the network settles.

Downtime: Thursday, December 18, 2008

Wednesday, December 17th, 2008

On Thursday, December 18, 2008, we will have a scheduled downtime from 8:00am to 10:00am EST.

This downtime only affects direct and indirect users of the project file server. This includes the web servers, cycle servers, c2 cluster, ftp server, and the ftp mirror.

Note that e-mail, networking, the CVS server, and the database machines will remain operational during this time.

As one of the steps to clean up the file system mess, we will do a final sync between our temporary storage and our re-built production storage.

Downtime: Thursday, December 11, 2008

Wednesday, December 10th, 2008

On Thursday, December 11, 2008, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime only affects direct and indirect users of the project file server. This includes the web servers, cycle servers, c2 cluster, ftp server, and the ftp mirror.

Note that e-mail, networking, the CVS server, and the database machines will remain operational during this time.

As one of the steps to clean up the file system mess, we will do a final sync between our problematic storage and temporary storage. We will then put the the temporary storage into production until we rebuild our storage pool.

Downtime: Wednesday, December 10, 2008

Thursday, December 4th, 2008

On Wednesday, December 10, 2008, we will have a scheduled downtime from 4:00am to 8:00am EST.

This downtime only affects users of the beowulf clusters (c2, c3, and hbar). All other services (e.g., e-mail, web, databases, file servers, and cycle servers) will remain operational.

Scheduled work includes:

  • Move the nodes in the test cluster (c3) into the production cluster (c2).
  • Upgrade the production cluster (c2) to Rocks 5.
  • Decommission the hbar cluster with its single compute node.

With the participation of Jennifer Rexford, Fei-Fei Li, and David Blei, we are adding 14 additional nodes to the cluster. Eleven of these nodes have 16 GB RAM (instead of 8 GB RAM) and 8 cores per node (instead of 4 cores per node).

The hbar cluster was created specifically so that users could experiment with an 8-core machine. The expansion of c2 makes hbar obsolete. As a result, we will decommission hbar.