Blog: HPC Services at DellXL this week
Apr 24, 2015
Blog: Webspace End of Life Announcement
Apr 22, 2015
Blog: Policy on Smartwatches Apple Watch and Fitness Trackers
Apr 09, 2015
Blog: eRoom News - Down To 21 And Counting
Mar 31, 2015
Blog: World Backup Day 2015 - Time for a Backup Checkup
Mar 27, 2015
Blog: New Google Calendar App for iOS devices
Mar 16, 2015
Blog: Planned upgrade of IDM equipment - Friday March 13, 4pm
Mar 12, 2015
Blog: Unscheduled outage - Google Apps Marketplace Apps - March 5
Mar 05, 2015
Blog: Smartsheet Unscheduled Outage - March 4 2015 - Resolved
Mar 04, 2015
Blog: Intel® Xeon Phi™ Coprocessor Developer Training
Feb 28, 2015
Blog: IT and ESNet Help ALS Researchers Move their Big Data
Feb 28, 2015
Blog: Unscheduled IT Outage Feb 27 2015
Feb 27, 2015
HPC Services staff members Yong Qin and Michael Jennings gave talks highlighting their respective software tools, wwibcheck and NHC, at the DellXL High Performance Computing (HPC) conference this week, April 21-23, 2015, in Boulder, Colorado (agenda).
Most HPC systems rely on a high-performance, low-latency interconnect network to connect compute nodes together in a way that supports tightly-coupled computations, where the compute nodes need to exchange a lot of information as part of the computation. Yong’s talk will focus on how to troubleshoot failures in HPC infiniband interconnects using his software tool, wwwibcheck, which helps the system administrators isolate and identify infiniband equipment failures or performance problems affecting the execution time of compute jobs.
Michael Jennings will also be giving a talk on his Warewulf Node Health Check (NHC) utility software. NHC runs in conjunction with the system’s job scheduler, carrying out a pre-check to detect potential problems with compute nodes before the job starts, optionally marking bad nodes as “offline.” This highly configurable utility works with popular job schedulers, such as SchedMD’s Slurm job scheduler, and Adaptive Computing’s Moab scheduler and TORQUE resource manager.
Yong and Michael are part of High Performance Computing Services Group in the IT Division that supports the Lawrencium computational cluster for the use of Berkeley Lab PIs.
Webspace was one of our early ventures into providing a web based collaboration tool that allowed easy sharing of documents and access from any location in the world. We plan to end service in July 2016.
Google Drive has become a viable alternative to Webspace for many customers at the lab. Google now does this in a better and more economical way. With the exception of a few capabilities (sharing via a "ticket" with an expiration date), Google storage and sharing can do it all.
As a result, we are announcing a longer term exit plan - with the goal of concluding our migration off of Webspace by July, 2016 - over a year from now. We will work with customer over the next 15 months to migrate important data to Google (or other alternatives) and provide instruction on how you can continue to solve your business problems - until we contact you, there is nothing you have to do.
Our project plan and status will be documented here.
IT has issued new policy on the acquisition of smart watches and fitness trackers including the new Apple Watch (which everyone calls iWatch but is not actually called that).
The acquisition of these products now requires additional justification and approvals.
The policy is available to authenticated LBL employees by clicking here.
We continue to work on the retirement of our legacy eRoom service (an on-premise web content service used for over a decade). We will terminate the service on or before December 2015.
Many of these eRooms have been migrated to AODocs libraries for archival and reference purposes. (We have also provided zip files to customers who want the data in that form). AODocs is a Google Marketplace app - and, much like eRoom requires an administrator to create new libraries. (just send a request to the IT Help Desk if this tool seems right for you).
World Backup Day: Is your data backed up?
The IT Division is taking part in celebrating World Backup Day by encouraging the Lab community to double check that your backups are working and ensure that all your important data is backed up.
Don't Be An April Fool - Do A Backup Checkup
It's time for your checkup. We promise it won't hurt.
First: Check up on your strategy. What are you trying to achieve with your backup? Are you trying to backup all the data in your experiments, or just some, or just your important findings? Do your backups need to be able to survive a major earthquake that impacted the site? Are you using the best form of technology to ensure that your data remains safe? For example, if you're still using external hard drives as backup, make sure you've evaluated some of the newer alternatives that may provide more resilient backups. Remember that file sync (like Dropbox and Google Drive Sync) are not the same thing as Backups (see more on this below).
Second: Check up on your scope. Are you actually backing up the data you want to backup? Have your experimental results moved somewhere else and you're no longer backing them up? Make sure your backup software or process is correctly backing up the files and directories and systems you need.
Third: Check up on your data. Now it's time to go do a quick spot check on your backups. Does your backup client report that it's working? Can you see recent files in your backups or in the logs provided by the backup client? Does the size of your backup correspond to the amount of data you think you've backed up?
What if I don't know the answers?
If you can't complete the backup checkup because you're using systems managed by others, now is the time to ask some questions! Find your sysadmin or another cognizant person and confirm what and how is being backed up on your behalf. Ask them to run through the backup checkup too.
Can IT Help Me Backup My Data?
IT offers various options to help from simple desktop/laptop backup solutions (Carbonite) to infrastructure like Google Drive that is already backed up, to more complex backups for servers and shared storage.
What Else Should I Know?
A Quick Word About External Hard Drives
Historically, nothing has competed with external hard drives for ease and cost of doing major backups for research data. However, that's starting to change. Cloud services like Google Drive and Carbonite provide reasonably speedy and large volume alternatives at competitive prices (or even free). While External Hard Drives are pretty good, they do have surprising failure rates and, unless they are reliably stored offsite, they are unlikely to allow your research data to survive a major event at the Lab (or even a minor one like a particularly nasty virus or a fire sprinkler release). If you use external hard drives, take a minute to consider other options. Need help, contact email@example.com
A Quick Word About Google Drive Sync and Dropbox and Other File Synchronization Services
File synchronization services like the Google Drive Sync Client and Dropbox provide some of the features of backups but are not, fundamentally, backup tools. This is because sync clients are highly susceptible to accidental local changes that propagate through the backup This is even more true in collaborative file sharing environments where it's possible that a collaborator could accidentally delete your important folder or file and, potentially, delete your local copy as well! While file synchronization does provide some resiliency, it doesn't equate to a full backup solution.
However, you can safely make use of the Lab's Google Drive storage space as a backup location (all employees have 30GB or shared mail and drive space) by doing the following:
- Create a folder for your backup in the web interface of Google Drive (not the file browser on your computer). Make sure it's named something obvious and don't share it with anyone.
- Ensure that all your local Google Drive Sync clients are set to choose the directories you want to sync and make sure that new backup file is excluded.
- Now use the web interface or file uploader interface at drive.lbl.gov to upload files.
Provided you don't accidentally delete the files or accidentally begin syncing these files locally, this should provide a safe backup destination for your work. Need help, contact firstname.lastname@example.org
Following the successful launch of a Calendar app for Android last Fall, Google released a version for iOS devices last week.
For "power" calendar users, we hope this solves some of the issues we have seen with the built-in calendar app on iPhones. The description for the Android app is here.
Reference the Google Blog for a brief intro and a link to Apple's AppStore.
After downloading the app, tap the Google Calendar icon and watch the intro slides. You will then be prompted to login at the Google login screen, (only enter your full LBL email address at this stage, leave the password field blank and continue). This will redirect you to LBL's Single Sign-On page where you will enter your Berkeley Lab Identity credentials.
Once you are in the app, look at the top left hand corner and you will see a 3 bar icon, tap the icon and you will see the options available, such as what calendars to display. Settings are found at the bottom of this list: tap settings to add multiple Google accounts by tapping "Manage Accounts"
The Identity Management Team will perform a planned upgrade to equipment that front ends our services (the enterprise directory, authentication systems, phonebook, etc). This is a security upgrade and will be done at 4pm, Friday March 13. The outage window is 30 minutes or less. Anyone already logged into a lab system will not be impacted.
From approximately 7:30am through 9:00am March 5, a problem at Google resulted in authentication failures when Lab users attempted to use any of our Marketplace Apps (Lucidchart, Smartsheet, GQueues . etc).
This issue is resolved.
Smartsheet is currently experiencing an issue which prevents users from logging in. More at:
Intel is sponsoring a free one-day in-depth training on the Xeon Phi Coprocessor to be held at Perseverance Hall on March 27th, 2015 from 9:00am to 4:00pm. This training will provide software developers the foundation needed for modernizing their code to take advantage of parallel architectures found in both the Intel® Xeon® processor and the Intel® Xeon Phi™ coprocessor, which are currently available to Lab researchers and collaborators on the LBNL Lawrencium Cluster.
Lunch will be provided. For more information and registration, please go here
. This event, hosted by the IT Division High Performance Computing Services Group
, is open to all LBNL and UC Berkeley staff and faculty. Space is limited so please register early!
Working at the ALS generates huge amounts of data, and for many years this has caused users to have to carry hard drives and USB drives between the ALS and their home institutions for acquisition and analysis of experimental data. To avoid the physical transport of data and to make real-time analysis possible, staff at the ALS, ESnet, and Berkeley Lab's IT Division have collaborated to implement several best practices that allow the fast and secure transfer of data over the network to a users home institution. A case study, performed by ESNet, highlights the work of IT Division staff, Susan James, Yong Qin, and Karen Fernsler to build the Data Transfer Node and 10GBE network, integrate it with the data acquisition system and implement the Globus Online data transfer tools. The end result shows the improved workflow and data export for the x-ray tomography beamline.
Setting Up and Implementing Network Data Transfer
For researchers planning to use network data transfer, the following resources are available for assistance in setting up and implementing the workflow:
- To speak with a beamline scientist who has implemented the tools described below, contact Dula Parkinson.
- To obtain and use the best equipment to build a Data Transfer Node (DTN) or for software tools such as Globus Online, contact the High Performance Computing Services Group by sending email to email@example.com
- To connect your beamline to the Lab’s fast ScienceDMZ network, or to debug networking issues at LBNL, contact firstname.lastname@example.org
- To debug national network issues, or to find contact information for offsite campus or IT groups, contact email@example.com
To Achieve Faster Data Transfer
There are three main points for users and system administrators to consider:
1) Using the right file transfer tools
Instead of FTP or scp, use tools that have been designed specifically for high-speed data transfer. We recommendGridFTP or Globus Online. GridFTP is good if you want to automate transfers, but requires significant setup. Globus Online has a graphical user interface and is easy to use. Using a fast transfer tool is the simplest thing you can do to increase data transfer speeds. LBNL extensively uses both of these transfer tools and provides an overview from the 2014 LabTech workshop, with information on how to get additional help.
2) Using capable file transfer servers
Data can only be transferred as fast as it can be read from the source disk and written to the destination disk. Most systems aren’t tuned for high speed data transfer out of the box. Systems tuned for high speed data transfer are called Data Transfer Nodes (DTNs). Beamline 8.3.2 has recently implemented such a DTN based on the reference specification provided by ESnet, which, along with a new network designed by ESnet and LBLnet, has resulted in a more than 10-fold improvement in data transfer speeds.
3) Ensuring that the end-to-end network isn’t the bottleneck
If you are using fast data transfer tools between two fast data transfer nodes, the final thing to ensure is that the end-to-end network is not impeding the transfer. This becomes even more important over long distances. The need to resend just a small amount of data can dramatically increase transfer times. Unfortunately, this can also be the most complicated area to understand and correct. There are three main areas to consider:
Use capable network switches
For big, long distance data transfers, packet loss is a significant problem. Network switches (sometimes called hubs) are a notorious cause of retransmitted data. This can happen when there are several network connections on one side of the switch that share a single connection on the other side. In this case it’s important to have switches with enough memory to store packets from one connection long enough to allow the packets from other connections to move through the switch. LBNL or home institution networking professionals can recommend good switches for your environment and scientific application.
Firewalls are a common device used to secure networks. Because they generally look at every packet that flows through them, they can create bottlenecks for big science data transfers. There is a secure, alternate approach to using firewalls commonly referred as the ScienceDMZ. It works by establishing a fast, dedicated, but secure path around the firewall. You’ll generally need one at both facilities you are transferring data between. LBNL personnel can help you use the lab’s ScienceDMZ. ESnet personnel may also be able to provide some help implementing a ScienceDMZ at your home institution. See the help contacts above.
Use a "healthy" network path
It is extremely difficult to know which network path your data is taking between LBNL and your home institution and/or whether that path is "healthy." This issue is best left to the networking professionals (see above) after ensuring that all of the critical items above are not the problem (good data transfer tools and nodes, good switches and no firewalls). While network debugging is beyond the scope of this brief article, one of the tools ESnet finds indispensable in network path analysis is perfSONAR.
Involve Your Local Experts!
If Network Data Transfer would significantly increase your productivity but you don't run your data servers yourself, please get your system and network administrators involved in the process.
Unscheduled outage resulting from power-related problem on Feb 27, 2015 beginning at 3:48 PM. As of 4:50 PM most services are now operational. More at status.lbl.gov
As reported in a January article, the IT Division is moving forward with a plan to offer an Enterprise Directory self service password reset service. The next step in the process is to allow existing lab users to register secondary contact information (a non LBL email address or a mobile phone number for text notifications). Note: all new employees and affiliates have experienced this as part of their initial account activation process since January 29, 2015.
Our soft launch of the registration process will include primarily IT employees, but we may reach out to groups we frequently work with to get feedback on the new process. Feedback gathered from this effort will dictate when we launch this capability for the entire lab. In April, we will start to use the new Password Change Page followed by the self service reset capability soon after.
Initial Launch Point
For those of you who also use our Windows Active Directory, the initial launch point also includes an option to reset this password, as shown below.
Our Account Management FAQ has additional details on these tools.
Register and join us for some exciting training opportunities provided by LBNL-IT!
Are you interested in learning how to write programs to get and share scientific data over the web? Looking to learn how to get started quickly with Arduino for a new project? Need to create a pivot table but don’t know how? Over the next several months the IT Division will offer a unique training schedule we hope best meets your training needs.
For more information on our course offerings see below details or visit us at:
Course Title and Description
Intro to Arduino (Hands On) | March 2 – AM Session:
Learn the basics of installing the program on your laptop; writing Arduino programs; and how to connect actuators (LEDs, motors, speakers) and control them from a program. Arduino kits will be provided.
Advanced Arduino (Hands On) | March 2 – PM Session:
You’ll be introduced to advanced Aruino concepts focusing on sensors, actuators, and programming techniques that might be used to monitor or control equipment. To attend this course we recommend you have a basic familiarity with Arduino software and hardware. Arduino kits will be provided.
Software Carpentry: Instructor Training | March 10-11:
The two-day course led by Software Carpentry founder, Greg Wilson, will introduce you to basic ideas in education psychology and instructional design. This course will provide you with an opportunity to teach your fellow scientist/engineer how to build better software to work more effectively. No previous training in teaching is required, but participants should be comfortable writing medium-sized programs and using the command line. Experience with version control tools such as Git is desirable as well.
Software Carpentry: Web Programming | March 13:
The one-day course led by Software Carpentry founder, Greg Wilson, will show you how to write programs to get, share, and syndicate data over the web, and how to write simple web applications. Participants must have previous programming experience in Python. Prior experience writing HTML is useful but not essential.
Looking for training in intermediate and advanced Excel?? We recommend the following:
- Creating Advanced Functions | March 18
- Using the “What If” Analysis Tools & Recording Macros | March 18
- Excel for Science | May 20
- Becoming a Master of Data Analysis | April 15, June 17, June 24
- Presenting Financials: Make Numbers and Statistics Standout with Excel and Powerpoint | April 15, June 17
Don’t see a class in our course offerings and would like to suggest one, or two. Feel free to drop us a note at:
We’d also love to hear how we’re doing so do share with us!
WWW, Today, and Newscenter will be under scheduled maintenance beginning at 8PM on February 2, 2015 for 4 hours and will be unavailable. All other wordpress hosted sites will also be under maintenance. No redirection will be in place. No other services will be impacted. Directory services will still be available at phonebook.lbl.gov
Additional information will available on status.lbl.gov if the maintenace window changes during the outage.
Update: Outage completed normally at ~9:10PM