Working at the ALS generates huge amounts of data, and for many years this has caused users to have to carry hard drives and USB drives between the ALS and their home institutions for acquisition and analysis of experimental data. To avoid the physical transport of data and to make real-time analysis possible, staff at the ALS, ESnet, and Berkeley Lab's IT Division have collaborated to implement several best practices that allow the fast and secure transfer of data over the network to a users home institution. A case study, performed by ESNet, highlights the work of IT Division staff, Susan James, Yong Qin, and Karen Fernsler to build the Data Transfer Node and 10GBE network, integrate it with the data acquisition system and implement the Globus Online data transfer tools. The end result shows the improved workflow and data export for the x-ray tomography beamline.
Setting Up and Implementing Network Data Transfer
For researchers planning to use network data transfer, the following resources are available for assistance in setting up and implementing the workflow:
- To speak with a beamline scientist who has implemented the tools described below, contact Dula Parkinson.
- To obtain and use the best equipment to build a Data Transfer Node (DTN) or for software tools such as Globus Online, contact the High Performance Computing Services Group by sending email to [email protected]
- To connect your beamline to the Lab’s fast ScienceDMZ network, or to debug networking issues at LBNL, contact [email protected]
- To debug national network issues, or to find contact information for offsite campus or IT groups, contact [email protected]
To Achieve Faster Data Transfer
There are three main points for users and system administrators to consider:
1) Using the right file transfer tools
Instead of FTP or scp, use tools that have been designed specifically for high-speed data transfer. We recommendGridFTP or Globus Online. GridFTP is good if you want to automate transfers, but requires significant setup. Globus Online has a graphical user interface and is easy to use. Using a fast transfer tool is the simplest thing you can do to increase data transfer speeds. LBNL extensively uses both of these transfer tools and provides an overview from the 2014 LabTech workshop, with information on how to get additional help.
2) Using capable file transfer servers
Data can only be transferred as fast as it can be read from the source disk and written to the destination disk. Most systems aren’t tuned for high speed data transfer out of the box. Systems tuned for high speed data transfer are called Data Transfer Nodes (DTNs). Beamline 8.3.2 has recently implemented such a DTN based on the reference specification provided by ESnet, which, along with a new network designed by ESnet and LBLnet, has resulted in a more than 10-fold improvement in data transfer speeds.
3) Ensuring that the end-to-end network isn’t the bottleneck
If you are using fast data transfer tools between two fast data transfer nodes, the final thing to ensure is that the end-to-end network is not impeding the transfer. This becomes even more important over long distances. The need to resend just a small amount of data can dramatically increase transfer times. Unfortunately, this can also be the most complicated area to understand and correct. There are three main areas to consider:
Use capable network switches
For big, long distance data transfers, packet loss is a significant problem. Network switches (sometimes called hubs) are a notorious cause of retransmitted data. This can happen when there are several network connections on one side of the switch that share a single connection on the other side. In this case it’s important to have switches with enough memory to store packets from one connection long enough to allow the packets from other connections to move through the switch. LBNL or home institution networking professionals can recommend good switches for your environment and scientific application.
Avoid firewalls
Firewalls are a common device used to secure networks. Because they generally look at every packet that flows through them, they can create bottlenecks for big science data transfers. There is a secure, alternate approach to using firewalls commonly referred as the ScienceDMZ. It works by establishing a fast, dedicated, but secure path around the firewall. You’ll generally need one at both facilities you are transferring data between. LBNL personnel can help you use the lab’s ScienceDMZ. ESnet personnel may also be able to provide some help implementing a ScienceDMZ at your home institution. See the help contacts above.
Use a "healthy" network path
It is extremely difficult to know which network path your data is taking between LBNL and your home institution and/or whether that path is "healthy." This issue is best left to the networking professionals (see above) after ensuring that all of the critical items above are not the problem (good data transfer tools and nodes, good switches and no firewalls). While network debugging is beyond the scope of this brief article, one of the tools ESnet finds indispensable in network path analysis is perfSONAR.
Involve Your Local Experts!
If Network Data Transfer would significantly increase your productivity but you don't run your data servers yourself, please get your system and network administrators involved in the process.
Unscheduled outage resulting from power-related problem on Feb 27, 2015 beginning at 3:48 PM. As of 4:50 PM most services are now operational. More at status.lbl.gov
As reported in a January article, the IT Division is moving forward with a plan to offer an Enterprise Directory self service password reset service. The next step in the process is to allow existing lab users to register secondary contact information (a non LBL email address or a mobile phone number for text notifications). Note: all new employees and affiliates have experienced this as part of their initial account activation process since January 29, 2015.
Our soft launch of the registration process will include primarily IT employees, but we may reach out to groups we frequently work with to get feedback on the new process. Feedback gathered from this effort will dictate when we launch this capability for the entire lab. In April, we will start to use the new Password Change Page followed by the self service reset capability soon after.
Initial Launch Point
For those of you who also use our Windows Active Directory, the initial launch point also includes an option to reset this password, as shown below.
Our Account Management FAQ has additional details on these tools.
Register and join us for some exciting training opportunities provided by LBNL-IT!
Are you interested in learning how to write programs to get and share scientific data over the web? Looking to learn how to get started quickly with Arduino for a new project? Need to create a pivot table but don’t know how? Over the next several months the IT Division will offer a unique training schedule we hope best meets your training needs.
For more information on our course offerings see below details or visit us at:
https://commons.lbl.gov/display/itdivision/Training+and+Awareness
Course Title and Description | Registration | |
1. | Intro to Arduino (Hands On) | March 2 – AM Session:Learn the basics of installing the program on your laptop; writing Arduino programs; and how to connect actuators (LEDs, motors, speakers) and control them from a program. Arduino kits will be provided. | http://go.lbl.gov/arduino-mar2 |
2. | Advanced Arduino (Hands On) | March 2 – PM Session:You’ll be introduced to advanced Aruino concepts focusing on sensors, actuators, and programming techniques that might be used to monitor or control equipment. To attend this course we recommend you have a basic familiarity with Arduino software and hardware. Arduino kits will be provided. | http://go.lbl.gov/arduino-mar2 |
3. | Software Carpentry: Instructor Training | March 10-11:The two-day course led by Software Carpentry founder, Greg Wilson, will introduce you to basic ideas in education psychology and instructional design. This course will provide you with an opportunity to teach your fellow scientist/engineer how to build better software to work more effectively. No previous training in teaching is required, but participants should be comfortable writing medium-sized programs and using the command line. Experience with version control tools such as Git is desirable as well. | http://go.lbl.gov/sc-instructor-train-mar10-11 |
4. | Software Carpentry: Web Programming | March 13:The one-day course led by Software Carpentry founder, Greg Wilson, will show you how to write programs to get, share, and syndicate data over the web, and how to write simple web applications. Participants must have previous programming experience in Python. Prior experience writing HTML is useful but not essential. | http://go.lbl.gov/sc-web-prog-mar13 |
5. | Excel 2010: Intermediate and Advanced Courses | March - JuneLooking for training in intermediate and advanced Excel?? We recommend the following:
| https://hris.lbl.gov/self_service/training/ |
Don’t see a class in our course offerings and would like to suggest one, or two. Feel free to drop us a note at:
it-communications.lbl.gov
We’d also love to hear how we’re doing so do share with us!