Advanced Clustering Technologies has been building custom, turn-key high performance computing solutions including HPC clusters for more than 15 years. Follow this blog to get more information on High Performance Computing.
For those seeking funding for high performance computing equipment, the National Science Foundation’s Major Research Instrumentation (MRI) Program serves as an opportunity to gain access to scientific and engineering instrumentation for research and research training.
As the current round of funding is currently being announced, we set our eyes on the next opportunity. Applications for the 2019 MRI program are due Jan. 22, 2019. Full details are available here.
We have prepared a white paper to help you write your grant proposal. Based on interviews with customers who have succeeded in securing grant funding, the white paper offers tips for preparing a proposal. You may download it here.
When updating the kernel to 3.10.0-862.11.1.el7 through 3.10.0-220.127.116.11.el7, there’s a bug where InfiniBand or Omni-Path cards do not come back up after a kernel update (i.e. they cannot communicate with the rest of the IB/OPA network). This will be accompanied by the message “failed to modify QP to RTR: -22” in either dmesg or /var/log/messages.
Run a RDMA write bandwidth test. ib_write_bw is provided by the package perftest. On target node run :
On client side run :
$ ib_write_bw <target-IP>
If the test should fails (with the “failed to modify QP to RTR” message), the node is affected by the kernel bug.
While the next kernel version should have the long-term fix for this by patching the bug that causes this issue, the issue can be temporarily resolved until then.
If you have not yet updated to 7.5, you can prevent yum from using the kernel with the bug by running:
You can then run yum update and instead it will use 3.10.0-862.9.1, which we have found still works with IB/OPA. If you ever want to update to kernel 3.10.0-862.11.* in the future (e.g. if the fixed patch is in 862.11.7), just remove the exclude line from /etc/yum.conf and you can update the kernel to 3.10.0-862.11.* then.
If you’ve already updated to 7.5 and are experiencing this issue, you can get IB/OPA working again by booting from the older kernel on your system you had before the update (i.e. any kernel on your system older than 862.11.1). This can be done either interactively through the GRUB menu on boot, or by modifying the /etc/default/grub file to change GRUB_DEFAULT from “saved” to the index of the previous kernel version (typically, this will be “1“). You can verify which index to use by running:
0 : CentOS Linux (3.10.0-862.11.6.el7.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-862.9.1.el7.x86_64) 7 (Core)
2 : CentOS Linux (3.10.0-862.el7.x86_64) 7 (Core)
From the example above, the line in /etc/default/grub would be changed to:
After which, to rebuild the grub file run:
$ grub2-mkconfig -o /boot/grub2/grub.cfg
Then reboot so that the older kernel version is being used, at this point IB/OPA should be working on the node again.
As of August 21st, 2018:
The issue is being tracked with Red Hat’s bugzilla 1619624: Bug 1619624 – [Intel] RC QP failure to modify QP to RTR on -862.11.1 kernel [rhel-7.5.z] (RHEL 7.5.z). As of Tue, August 21 2018, the status of 1619624 is ASSIGNED. An engineer has been assigned to the bug but no patch has been posted that fixes the bug.
The issue is being tracked with Red Hat’s bugzilla 1616346: Bug 1616346 – [Intel] RC QP failure to modify QP to RTR on -862.11.1 kernel (RHEL 7). As of Tue, August 21 2018, the status of 1616346 is POST. A patch has been submitted to resolve this issue and is under review for inclusion in the next minor release of RHEL 7.
The session took place at 10:30 a.m. and was moderated by Thomas Hauser, Director of Research Computing University of Colorado – Boulder. The panel included Jonathan Anderson, Associate Director of Research Computing at the University of Colorado – Boulder; Tim Kaiser, Director of Research and High Performance Computing Colorado School of Mines; and Jim Paugh, Director of Sales at Advanced Clustering Technologies.
The panel offered best practices about how detailed the RFP document for HPC should be, including such topics as:
meeting the hard requirements (budget, power, etc.)
formulating this when you don’t know what you want
allowing for innovative solutions
structuring acceptance testing and payment parameters
When was the last time you evaluated your current storage and made plans for future growth?
Contact us to let our experts configure the perfect system for you, whether it’s our direct-attach storage boxes or one of our high-performance scalable parallel file systems. Contact us for a storage discussion.
Intel has just announced that it is discontinuing the Knights Landing or KNL Intel® Xeon Phi™, which was Intel’s first processor to deliver the performance of an accelerator with the benefits of a standard host CPU.
The product change notification includes the discontinuation of the following Knights Landing socketed 7200-series Xeon line, including the following part numbers: 7210, 7210F, 7230, 7230F, 7250, 7250F, 7290, and 7290F.
The last day to order KNL products is August 31, 2018, and the final shipment date is July 19, 2019.
The Xeon Phi line launched in 2012. The second generation Phi, known as Knights Landing, debuted at ISC in 2016. KNL marked the introduction of the AVX-512 instruction set. It was also the first Intel processor to offer the company’s Omni-Path Architecture (OPA) fabric integrated into the package.
Intel’s Skylake-F Xeon chips, introduced in the third quarter of 2017, are still available.
RMACC is a collaborative event among academic and research institutions located throughout the Rocky Mountain region that use high performance computing.
Dr. Rick Stevens, a director of Argonne National Laboratory’s Exascale Computing Project, and Lorna Rivera, a voice for diversity in high performance computing, will deliver the keynote address.
Advanced Clustering will participate in a panel discussion Best practices for Writing an RFP for HPC Hardware Purchases at 10:30 a.m. on Wednesday, Aug. 8. The company issued a white paper on this topic earlier this year. Download your copy of the white paper, “Best Practices for Writing an RFP for the Acquisition of High Performance Computing Equipment.”
You can register for RMACC. To schedule a meeting with us in Boulder, contact Advanced Clustering today toll-free at (866) 802-8222 or via email at email@example.com.
PEARC18 brings together scientists, engineers, scholars, artists, students and teachers for a program focused on the “efficiency, security, reliability and sustainability of increasingly complex and powerful digital infrastructure systems.”
The theme of this year’s event is Seamless Creativity. The tracks for this event, which will run in parallel, are:
Applications of Advanced Computing Infrastructure
Facilitation of Advanced Computing Infrastructure
Visualization and Data Analytics
Workforce Development and Diversity
PEARC18 offers tutorials, panel discussions, invited talks, workshops, posters, Birds of a Feather sessions and research papers. Some of the key topics to be covered include:
Bytes and BTUs: Lessons Learned from the World’s Most Energy Efficient Data Center, presented by Steve Hammond, National Renewable Energy Center
Towards a National Cyberinfrastructure Ecosystem: The Role of Software Institutes presented by Vipin Chaudhary, National Science Foundation
Neuroscience and Advanced Computing, presented by Gregory Farber, National Institute of Mental Health
Using Advanced Computing to Recover Black Women’s Lost History, presented by Ruby Mendenhall, University of Illinois at Urbana-Champaign
Town Hall: NSF Office of Advanced Cyberinfrastructure, presented by Manish Parashar, National Science Foundation, OAC
Hacking Academia/Trusted CI Panel Discussion, presented by Anita Nikolich, Illinois Institute of Technology
Sign up for your free trial of ACTnowHPC today and get 500 core hours free.
ACTnowHPC provides on-demand cloud high performance computing that gives you the freedom to scale a solution to fit your changing computational needs. There’s no complicated setup. Get online and start running your workload in minutes.
ACTnowHPC gives you on demand, pay as you go access to the latest HPC technology in the cloud. Experience the power of HPC without the upfront capital outlay – and enjoy the freedom of a scalable solution that fits any computational need. ACTnowHPC is owned and operated by the HPC experts at Advanced Clustering Technologies. Our engineers have been building customized, turn-key HPC solutions since our founding in 2001. The ACTnowHPC cloud is designed specifically for high performance computing users.