[downtime] Major Outage – Unplanned Power Failure in HPCRC

Good morning.
You may notice that CS systems have had a significant outage overnight. This was due to major power problems at the campus data center facility at Forrestal. All of our systems in the HPCRC were unexpectedly powered off and suffered various other power effects throughout the evening and night, and some did not recover on their own.
CS Staff members have been on site at the data center and also working remotely to recover systems this morning. At this time, all major systems are expected to be online again. Some ionic cluster nodes are still down, but are being brought back.
If you notice any persistent issues with CS systems, please let us know and we will do our best to address them. Thank you for your patience.
-CS Staff

[downtime] Major Outage – Unplanned Power Failure in HPCRC Read More »

[downtime] CS Network Downtime, Wednesday, May 31, 2023, 08:00-10:00

Date: Wednesday, May 31, 2023 (08:00-10:00)
Who is affected:
All users of CS Department computing facilities and services, including cycles, ionic, web services, email, DNS, and wired networking.
What is happening:
During this window, the core switch handling CS Department network traffic at the HPCRC will be replaced. The actual outage time for any particular service or network access point should be only a few seconds, and the total outage window is likely to be shorter than announced. However, owing to the uncertain nature of technological change, outages may occur throughout this window and may be up to several minutes in length.
Why is it happening:
This upgrade will replace a 12-year-old core switch with a new, much faster device. This is the first stage of more upgrades upcoming this Summer, primarily focused on increasing network capacity for the ionic HPC cluster, the Department’s central storage cluster, and other related systems.
We will post updates to the status page: www.csstaff.org as necessary.
If this downtime will cause you undue hardship, please contact csstaff@cs.princeton.edu immediately, so we can discuss options to reduce any negative impact. Your patience is appreciated.
Sincerely,
CS Staff

[downtime] CS Network Downtime, Wednesday, May 31, 2023, 08:00-10:00 Read More »

Email Service Outage *Unplanned Delay*

The email server upgrade scheduled for this morning has run into unexpected issues. As a result, email service is not working properly. We are working to correct the situation as quickly as possible, and will update here as new information becomes available.

Update 08:57 – We continue to work to recover the mail systems, but they will not be ready in the original scheduled window. We apologize for the inconvenience.

Update 10:02 – We are working with the vendor to recover the mail systems. We apologize for the inconvenience.

Update 15:18 – We now believe the service is back to normal operation. Most incoming emails were likely queued and have probably been delivered by now. If you have ongoing issues, please reach out to CS Staff. Thank you for your patience!

Email Service Outage *Unplanned Delay* Read More »

[downtime] TONIGHT: CS Building Power Outage, Monday, March 13,

Today is the day for this scheduled power shutdown. Please see the below announcement and remember to power down and unplug any and all equipment you control in the CS Building or Friend Center before ending your day today!

Thanks for your time and attention.

Sincerely,
CS Staff

—– Forwarded Message —–
From: \”csstaff\”
To: \”downtime\”
Sent: Friday, February 3, 2023 10:06:44 AM
Subject: [downtime] CS Building Power Outage, Monday, March 13, 2023, 22:00-02:00

Date: Monday, March 13, 2023 (22:00-02:00)

Who is affected:
ALL users and occupants of the CS Building (35 Olden St) and Friend Center

What is happening:
From 22:00 (10PM) until 02:00 (2AM) on the night of Monday, March 13, 2023,
ALL power to the Computer Science building and the Friend Center will be
cut.

Emergency generator power will remain, so emergency lighting and the
building network will remain powered. ALL OTHER POWER will be off. It is
VERY IMPORTANT that all equipment, including that in labs and in Room 002,
be powered off on Monday evening prior to the shutdown. This includes
computers, printers, copiers, or anything else that runs on electricity.
Sensitive equipment will further benefit from being unplugged or physically
switched off in order to avoid any effects from possible fluctuations in
power quality during the work.

Why is it happening:
In August of 2022, a sprinkler head in the basement power vault opened,
flooding the main power feeds for the building with water and causing
damage to a main breaker. Since that time, the building has been operating
on a single breaker from a redundant set while the damaged breaker was sent
away for repairs. This outage will be used to re-install the repaired
breaker and return the building to normal operating status. Until this
repair is completed, the building power is more vulnerable than usual to a
long-term outage in the event the single remaining breaker is compromised.

We will post updates to the status page: http://www.csstaff.org
as necessary.

Note that this outage DOES NOT affect the CS computing infrastructure,
which is housed in the Forrestal Campus data center. All departmental
computing and network services are expected to continue unaffected.

If you have questions or concerns about this outage, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] TONIGHT: CS Building Power Outage, Monday, March 13, Read More »

[downtime] CS Email Downtime, Thursday, March 16, 2023, 07:00-09:00

Date: Thursday, March 16, 2023 (07:00-09:00)

Who is affected:
Users of CS Department email services

What is happening:
During this window, the CS Department email servers will be upgraded. The
actual outage for any given account will be relatively brief, but the
overall work may be longer. The outage for any particular account may occur
at any time during the scheduled window.

The expected outage behavior is that you may be unable to read email on
your account for several minutes. Sending email may also be interrupted for
some configurations. If you find your account behaving strangely, wait a
few minutes and reconnect or reload, which should reestablish expected
behavior.

Why is it happening:
This upgrade will apply security and maintenance updates to the mail
servers as part of routine system hygiene.

We will post updates to the status page: http://www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] CS Email Downtime, Thursday, March 16, 2023, 07:00-09:00 Read More »

[downtime] REMINDER: CS Building Power Outage, Monday, March 13,

This is a reminder that we are ONE WEEK away from this scheduled power shutdown. Please see the below announcement and ensure you have a plan to power down and unplug any and all equipment you control in the CS Building or Friend Center before the start of the outage on Monday night.

Thanks for your time and attention.

Sincerely,
CS Staff

—– Forwarded Message —–
From: \”csstaff\”
To: \”downtime\”
Sent: Friday, February 3, 2023 10:06:44 AM
Subject: [downtime] CS Building Power Outage, Monday, March 13, 2023, 22:00-02:00

Date: Monday, March 13, 2023 (22:00-02:00)

Who is affected:
ALL users and occupants of the CS Building (35 Olden St) and Friend Center

What is happening:
From 22:00 (10PM) until 02:00 (2AM) on the night of Monday, March 13, 2023,
ALL power to the Computer Science building and the Friend Center will be
cut.

Emergency generator power will remain, so emergency lighting and the
building network will remain powered. ALL OTHER POWER will be off. It is
VERY IMPORTANT that all equipment, including that in labs and in Room 002,
be powered off on Monday evening prior to the shutdown. This includes
computers, printers, copiers, or anything else that runs on electricity.
Sensitive equipment will further benefit from being unplugged or physically
switched off in order to avoid any effects from possible fluctuations in
power quality during the work.

Why is it happening:
In August of 2022, a sprinkler head in the basement power vault opened,
flooding the main power feeds for the building with water and causing
damage to a main breaker. Since that time, the building has been operating
on a single breaker from a redundant set while the damaged breaker was sent
away for repairs. This outage will be used to re-install the repaired
breaker and return the building to normal operating status. Until this
repair is completed, the building power is more vulnerable than usual to a
long-term outage in the event the single remaining breaker is compromised.

We will post updates to the status page: http://www.csstaff.org
as necessary.

Note that this outage DOES NOT affect the CS computing infrastructure,
which is housed in the Forrestal Campus data center. All departmental
computing and network services are expected to continue unaffected.

If you have questions or concerns about this outage, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] REMINDER: CS Building Power Outage, Monday, March 13, Read More »

[downtime] CS Building Power Outage, Monday, March 13, 2023,

Date: Monday, March 13, 2023 (22:00-02:00)

Who is affected:
ALL users and occupants of the CS Building (35 Olden St) and Friend Center

What is happening:
From 22:00 (10PM) until 02:00 (2AM) on the night of Monday, March 13, 2023,
ALL power to the Computer Science building and the Friend Center will be
cut.

Emergency generator power will remain, so emergency lighting and the
building network will remain powered. ALL OTHER POWER will be off. It is
VERY IMPORTANT that all equipment, including that in labs and in Room 002,
be powered off on Monday evening prior to the shutdown. This includes
computers, printers, copiers, or anything else that runs on electricity.
Sensitive equipment will further benefit from being unplugged or physically
switched off in order to avoid any effects from possible fluctuations in
power quality during the work.

Why is it happening:
In August of 2022, a sprinkler head in the basement power vault opened,
flooding the main power feeds for the building with water and causing
damage to a main breaker. Since that time, the building has been operating
on a single breaker from a redundant set while the damaged breaker was sent
away for repairs. This outage will be used to re-install the repaired
breaker and return the building to normal operating status. Until this
repair is completed, the building power is more vulnerable than usual to a
long-term outage in the event the single remaining breaker is compromised.

We will post updates to the status page: http://www.csstaff.org
as necessary.

Note that this outage DOES NOT affect the CS computing infrastructure,
which is housed in the Forrestal Campus data center. All departmental
computing and network services are expected to continue unaffected.

If you have questions or concerns about this outage, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] CS Building Power Outage, Monday, March 13, 2023, Read More »

[downtime] CS Ionic/Cycles System Downtime, Tuesday, January 24,

Date: Tuesday, January 24, 2023 (06:00-10:00)

Who is affected:
All users of the CS Department Beowulf high performance computing cluster,
known as ionic.

All users of the CS Staff-managed public login systems, including the
cycles, courselab, and armlab systems.

What is happening:
Ionic nodes will have Nvidia, Cuda, and kernel drivers updated to fix
GPU-related failures. After the upgrade, machines will be rebooted.

Cycles, courselab, and armlab machines will be rebooted during this window
to clear some defunct user processes interfering with some research work.

Why is it happening:
Ionic nodes are experiencing various GPU-related failures. In an attempt
to fix them, we will be updating Nvidia, Cuda, and kernel drivers.

As some user processes have entered a defunct state, and those processes
prevent research work, machines require a system reboot to clear.

We will post updates to the status page: http://www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] CS Ionic/Cycles System Downtime, Tuesday, January 24, Read More »

[downtime] IMPORTANT – Project Web Server Upgrade 2022-08-09

Good morning,

Following up on the below downtime/upgrade announcement, it is important to note that this upgrade will bring a couple of significant changes to our project web server. Specifically, PHP on this server will be upgraded from version 5.6.25 to version 8.0.13, and Phusion Passenger, the system which allows for support of web application frameworks, will be upgraded from version 5.0.30 to version 6.0.14. There are several incompatibility changes between these versions, and some project web sites will need code upgrades/adjustments in order to work properly on the new server. You can read more about the changes between the PHP versions on these pages:

https://www.php.net/manual/en/migration70.php
https://www.php.net/manual/en/migration71.php
https://www.php.net/manual/en/migration72.php
https://www.php.net/manual/en/migration73.php
https://www.php.net/manual/en/migration74.php
https://www.php.net/manual/en/migration80.php

Note that each page has a \”Backward Incompatible Changes\” link, which is worth reviewing to prepare for your site update.

The most notable change in Passenger is that some configuration can no longer be accomplished in .htaccess files, so must, for security reasons, move to the server config files. The upgraded Passenger also introduced Generic Language Support, or: \”the ability to support any and all arbitrary apps\”. You can read more about the changes here:

https://blog.phusion.nl/2018/01/29/passenger-5-2-0/
https://blog.phusion.nl/2022/05/10/passenger-6-0-14/

CS Staff is performing a basic review of each project web site on the upgraded web server, and /most/ sites appear to be in good working order. For sites with obvious compatibility issues, we will be reaching out directly to the site owners to advise on expected changes. However, as it is impossible for us to review all possible aspects of your site, we strongly encourage you to review your site after the upgrade on August 9 to ensure it is working as expected, as well as reviewing the PHP changes before the upgrade in order to anticipate changes you may need to make.

Please note that the above changes apply ONLY to the project web sites at this time. Personal (\”tilde\”) sites, and any other content hosted under \”www.cs.princeton.edu\”, are not yet affected by this upgrade. If you are concerned that your site may need substantial change and would like to review it using the new web server prior to the upgrade, please reach out to [csstaff@cs.princeton.edu] for assistance in doing so. As always, please also let us know if you have any other questions or concerns.

Sincerely,
CS Staff

—– Forwarded Message —–
From: \”csstaff\”
To: \”downtime\”
Sent: Tuesday, July 26, 2022 1:32:22 PM
Subject: [downtime] CS Infrastructure Upgrades, Tuesday, August 9, 2022, 05:00-17:00

Date: Tuesday, August 9, 2022 (05:00-17:00)

Who is affected:
All users of the CS Department computing infrastructure.

What is happening:
CS Staff will upgrade the user-accessible servers in our infrastructure,
including cycles, ionic, courselab, armlab, and the project web servers.
The systems will be upgraded to the latest Springdale 8 distribution for
the x86_64 architecture and RockyLinux 8 distribution for the aarch64
architecture (i.e., armlab).

SPECIAL NOTE: As we are reloading the Linux servers, all crontabs will be
deleted. If you have crontabs that you wish to persist, you will need to
back up your crontabs before the downtime and restore them after.

Please note that the downtime window is significantly longer than our usual
windows due to the high-touch nature of OS reinstallations. It is our
intention that the cycles machines and web servers will see the earliest
returns to service. Some parts of the ionic cluster may extend later in the
day. Overall, we expect to finish all of the upgrades earlier than this
window, but the wide time frame acknowledges the uncertainties involved.

Why is it happening:
This is part of the routine maintenance of the publicly-accessible systems
and will bring newer versions of installed tools and software.

We will post updates to the status page: http://www.csstaff.org
as necessary.

If this downtime will cause you undue hardship, please contact
csstaff@cs.princeton.edu immediately, so we can discuss options to reduce
any negative impact. Your patience is appreciated.

Sincerely,
CS Staff
_______________________________________________
downtime mailing list
downtime@lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/downtime

[downtime] IMPORTANT – Project Web Server Upgrade 2022-08-09 Read More »

Scroll to Top