Loading...

OpenStack is, without doubt, an exciting project and the lead open source Infrastructure-as-a-Service platform. In the last couple years, I had the privilege to architect and deployed dozens of OpenStack clouds for multiple customers and use cases. One of the use cases that I worked on in the last year was High-Performance Computing (HPC) on OpenStack. In this blog, I am going to cover some of the considerations to host high performance and high throughput workloads.

First, let's’ start with three types of architectures that could be used when hosting HPC workloads on OpenStack:

  1. Virtualized HPC on OpenStack
    • In this architecture, all component of the HPC cluster is virtualized in OpenStack.
  2. Bare-metal HPC on OpenStack
    • In this architecture, all components of the HPC cluster are deployed in bare metal servers using OpenStack Ironic.
  3. Virtualized Head Node and Bare-metal compute nodes
    • In this architecture, the head node (scheduler, master and login node) are virtualized in OpenStack, and the compute nodes are deployed in bare metal servers using OpenStack Ironic.

Now that we discussed the 3 types of architecture that we could deploy HPC software in OpenStack, I am going to discuss a few OpenStack best practices when hosting this type of workloads.

Networking

For the networking aspect of OpenStack, there are two recommended configuration options:

  • Provider networks: The OpenStack administrator creates these networks and maps them directly to existing physical networks in the datacenter (L2). Because of the direct attachment to the L2 switching infrastructure, provider networks don’t need to route L3 traffic using the OpenStack control plane, as they should have an L3 gateway in the DC network topology.
  • SRIOV: SRIOV/SR-IOV (single root input/output virtualization) is recommended for HPC workloads based on performance requirements. SR-IOV enables OpenStack to extend the physical NIC’s capabilities directly through to the instance by using the available SRIOV NIC Virtual Functions (VF). Also, support for IEEE 802.1br allows virtual NICs to integrate with, and be managed by, the physical switch.
    • It’s important to mention that in tests conducted by various vendors, results show that SR-IOV can achieve near line rate performance at a low CPU overhead cost per Virtual Machine/Instance.
    • When implementing SRIOV, you need to take in consideration two essential limitations: not been able to use live migrations for instances using VF devices and bypassing OpenStack’s security groups.
Storage

For an HPC architecture, there are two major storage categories to consider:

  • OpenStack storage: image (glance), ephemeral (nova), and volume (cinder).
  • HPC cluster file-based data storage: Used by the HPC cluster to store data.

Based in both categories here are couple recommendations to consider while architecting your cluster:

OpenStack Storage:

  • Glance and Nova: For the Glance and Nova (ephemeral) storage, I like to recommend Ceph. One of the significant advantages of ceph (besides the tight integration with OpenStack) is the performances benefits that you could obtain at instance creation time that image copy-on-write offers with this backend. Another advantage for the ephemeral workloads (not using SRIOV in this case) is the ability to live migrate between the members of the compute cluster.
  • Cinder: For the cinder backend in this HPC use case, I like to recommend Ceph (same benefits apply from the previous point) and NFS/iSCSI backends like NetApp, EMC VNX, and similar systems with supported cinder drivers.

HPC Cluster file-based data storage:

Common used parallel file systems in HPC, like Lustre, GPFS, OrangeFS should be used by accessing them from dedicated SRIOV/Provider networks. Another recommended backend will be Ceph, also providing the access directly from the SRIOV/Provider networks for better performance.

Important Information:
Ceph as a backend, in general, is very flexible. A well-architected Ceph cluster could benefit multiple types of workloads in different configurations/architectures, e.g.:

  • Ethernet-based connectivity could benefit performance by higher throughput NIC interfaces for frontend and backend storage traffic (10/25/40/50/100 Gbps), plus LACP configurations that could double the amount of bandwidth available.
  • Storage servers components could be a combination of NVMe, SSD, SAS and SATA drives. Tailored to provide the required performance IO wise.
  • The distributed nature of the technology provides a flexible and resilient platform.

The next thing to consider after this will be to automate the deployment of your HPC application on OpenStack. For that multiple tools could be used: heat, Ansible, or API calls from an orchestrator system.

Happy HPC on OpenStack hacking!

Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Julio's Blog by Julio Villarreal Pelegrino - 1w ago

“Choices made, whether bad or good, follow you forever and affect everyone in their path one way or another.” --J.E.B. Spredemann, An Unforgivable Secret

For several years now, one of the most used words in the last couple years in every IT organization has been “cloud”. Why? Because using or providing “cloud” services is one of the main objectives for CIOs and CTOs across the globe.

In 2017, the “cloud” word is not new anymore, but still relevant and a big part of the IT transformation process. Here are some numbers that highlight the importance of cloud adoption:

  • According to an IDC study (The Salesforce Economy: Enabling 1.9 Million New Jobs and $389 Billion in New Revenue Over the Next Five Years, IDC), cloud computing is growing at 4.5 times the rate of IT spending since 2009, they also expect to grow at better than 6 times the rate of IT spending from 2015 to 2020.
  • Gartner predicts the worldwide public cloud services market will grow 18% in 2017 to $246.8B, up from $209.2B in 2016.
  • Gartner also predicts that just Infrastructure-as-a-Service (IaaS) is projected to grow 36.8% in 2017 and reach $34.6B.

The importance of cloud was reaffirmed when the big tech companies like Amazon, SalesForce, Google, Microsoft, IBM, VMware, Dell, Red Hat, Oracle, and HP joined the race to become “cloud” providers and get a piece of the market share. Enterprises across the world also knew that in order to compete and survive in a technology driven world, IT transformation was imperative: one must join the cloud revolution.

The Problem

One of the main problems with Cloud adoption is, without a doubt, the lack of a Cloud Strategy. Gartner estimates that “less than one-third of enterprises have a documented Cloud Strategy”. In my opinion, having a cloud strategy will provide multiple benefits to an enterprise, including:

  • Maximize the cloud business impact and benefits such as: accelerate time to market, increase agility and efficiency, and cut costs.
  • Translate business objectives and enterprise requirements into technology.
  • Be able to prepare for the cloud infrastructure needs.
    Have a detailed and well defined roadmap for a cloud adoption framework.

At the end of the day, having a Cloud Strategy should enable the IT leadership to plan and be effective using cloud technologies as the base for IT modernization and digital transformation.

Going beyond a Cloud Strategy: Adopting an Open Cloud Strategy

Open Cloud is not just using Open Source software to build private or hybrid clouds. In my opinion, it is also the adoption of the open source culture and best practices as the cornerstone of an Open Cloud Strategy.

For example, when talking about Open Cloud, let’s not forget that most public cloud providers use open source software as their foundation, and making their offerings able to have interoperability with open source software workloads has been a priority for large vendors like Amazon, Google, and Microsoft. The reason behind this strategy is simple: open source is great for business!

Here are some of the benefits of adopting an Open Cloud Strategy:

  • Expand the software support ecosystem by being part of open source projects and their communities.
  • Reduce costs.
  • Avoid vendor lock in.
  • Improve code quality.
  • Improve security.
  • Increase interoperability.

Some of the benefits outlined before could be exemplified by the following open source project examples:

  • Linux, which is no longer just for hobbyists and amateurs, runs deep in the enterprise powering critical applications in all industries. Linux not only is the best example of open source and community collaboration, but also the flagship product for several companies like Red Hat (my current employer).
  • OpenStack, a project where community work and best practices are providing an IaaS (Infrastructure-as-a-Service) alternative, 100% open source and with a vibrant community.

Here are some of the things to consider while adopting an Open Cloud Strategy:

  1. Business impact.
  2. Cultural impact
    a) Evaluate the cultural impact of adopting an open cloud approach in the organization.
    b) Evaluate the benefits of the open source community model and how that could drive collaboration and innovation in the organization.
  3. Workload impact
    a) What will it entail, from a technology point of view, to adopt open source? For example, which workloads will need to be migrated or re-architected.
  4. Learning curve
    a) The level of effort required from the employees to efficiently manage the new technology.
    b) Is there internal talent inside the organization with expertise in the technologies to be adopted that could accelerate the learning curve process.
  5. Software assessment
    a) While adopting an Open Source project to be part of the cloud strategy, there are several questions that should be asked to determine complexity and impact of the implementation and maintenance :
  • License type
  • Age of the project and maturity
  • Public references of success (enterprise usage)
  • Number of contributors
  • IT experts opinions about it
  • Enterprise support availability
  • Change rate: commits, frequency, number of releases
  • Size of community

Bottom line, adopting an open cloud strategy at the end of the day is a business decision. A decision that now more than ever is easier to make because of the increased benefits and popularity of open source projects and their communities, the impact of their use in the enterprise, and the amount of quality Cloud computing open source projects available (OpenStack, Kubernetes, Docker, LXC, KVM, Ansible, etc.).

Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

“Automation is not magic” --me

One of my favorite open source projects in the last couple years is Ansible. Things got even more interesting for me after Red Hat (my employer) acquired the software in 2015 and I got to architect, deploy and write Ansible roles and playbooks as part of my day-to-day. In this article, I want to share some recommendations based on first-hand experience in the field.

But before we start, what is Ansible?

  • Created by Michael DeHaan
  • Open source software is written in Python (https://github.com/ansible/ansible)
    • Modules can be written in any language that can return JSON
  • YAML configuration files (easy to read, write and maintain)
  • A simple automation engine that automates:
    • Cloud provisioning, configuration management, application deployment, network devices, infra-service orchestration, and many other IT needs
  • Designed for multi-tiered deployments since day one, modeling IT infrastructure by describing how all systems interrelate
  • Agentless, SSH-based
  • Idempotent by design: f(x) = f(f(x))

What Ansible is not :

Why?

  • You must know what you are doing when you write playbooks, roles and modules
  • It’s not for all cases. You must ask yourself, “Can I do it in Ansible?” and “Should I do it in Ansible?”
  • Requires a SSH connection
  • Ansible is not programming language (it’s meant to be declarative! Where each task represents a desired state)

Here are some of the "best practices" to consider while using Ansible and Ansible Tower:

  1. Must Use a Version Control System
  • A version control system (VCS) allows users to manage the changes to source code and keeps track of every modification. Some examples of VCS are Git, SVN,and Mercurial
  • Ansible does not require the use of a VCS, but it’s highly encouraged
  • Storing Ansible code in a VCS and adopting software development methodologies is key to create a scalable automation model
  • Scalable:
    • Share code and collaborate between teams
    • Distributed testing
    • Multiple life cycle environments for the Ansible code (i.e dev, test, qa & prod)
    • CI/CD pipeline integration and unit testing
  1. Playbook Repository Structure

One of the issues that we see in the field is the playbook repository structure. You must create a structure that works with both Ansible Core and Ansible Tower.
Here is an example of a repository structure:

playbookrepo
|-- groupvars
|----- all.yml
|----- dev.yml
|----- qa.yml
|----- prod.yml
|-- inventory
|-- library
|-- roles
|----- rolename.yml
|-- ansible.cfg
|-- testplaybook.yml
|-- deploylamp.yml
|-- updatesystem.yml

Some notes about this layout:

  • ansible.cfg is at the root of the repository. It allows you to keep version control on the basic Ansible settings.
  • groupvars defines the group variables for the playbooks
  • The inventory is ignored when using Ansible Tower
  • A best practice is to have a playbook repository structure by team. For example, Infrastructure, App_1, App_2, Patching
  • In this example, there are no roles in the repository /roles directory. A best practice is to separate “roles” into its own repository so they can version controlled, shared and life cycle independently of other playbooks
  1. Use and create Roles

Roles are a way of automatically loading specific vars_files, tasks, and handlers based on a known file structure. Grouping content by roles also allows for easy sharing with other users. Here are a couple best practices around roles:

  • A repository should be created by each role. This allows you to share an individual role with Ansible Galaxy or with other Playbook repositories inside your organization
  • You should be able to perform unit testing in each role in a CI model
  • A convenient way to test roles is by using containers. This will allow you to test the role across multiple distributions
  • Virtual machines are recommended to test roles when the role is performing low level actions like bootloader setting, kernel parameters, firewall settings
  • Use the ansible-galaxy command to create the role structure: ansible-galaxy init role_name --offline
  • A .gitignore file should be used to ignore every role inside a playbook repository and allow them to be managed individually with a VCS.
  1. Use YAML properly
  • Here are some formatting guidelines to follow when writing Ansible playbooks and roles:
  • Every playbook should begin with 3 (three) dashes “---” and end with 3 (three) dots “...”. This does not apply to files inside roles
  • Indentation is important! Each line should be indented with two spaces underneath it’s parent
  • You should leave exactly one blank line between the tasks and zero inside of the tasks. You should also separate “plays” by using two blank lines
  • When writing a task, avoid using a one-line syntax in your playbook. Not only are they ugly, they are difficult to read
  • Variables should be unique to each role, descriptive and a name convention should be used, i.e: mysql_dnsname1

Here is an example of the use of Identation and proper formating in a playbook:

  • Only use Folded Scalars “>” to separate arguments on each line inside a long shell or command:

  • All members of a list are lines beginning at the same indentation level starting with a "- " (a dash and a space) :

You should be able to verify the syntax of your playbooks and role by using --syntax-check.
More examples at http://docs.ansible.com/ansible/latest/YAMLSyntax.html

  1. Use the following development best practices
  • Define the state parameter. In some modules this could be: present, latest, absent, etc.
  • Verify that the service you started is actually running! Because you declared it in a playbook does not mean that it is working. You could do this in your playbooks by using “uri”, “waitforconnection” or any other validation method
  • Ensure that every play has a “name” variable set with a meaningful name. This will facilitate others to read and understand your code
  • Define and use tags to target plays during the execution
  • Do not store large files as part of your playbooks and ansible repositories! If you need to distribute files consider using a remote copy from network shares (nfs/ftp/web) or from a binary repository
  • Replace your “prompt” module usage with surveys, since “prompt” will not work in tower
  1. Monitor Ansible Tower

You should monitor the following components of Tower:

  • API and Web Response (should also be use by LB VIPs):
    • You should do this by monitoring the following URL in your Tower installations: https://tower-server-hostname/api/v1/ping
    • If you have more than one Tower instance, you should monitor each one of the instances for a successful response (2XX).
  • Check that RabbitMQ and Supervisord is running: systemctl status supervisord; systemctl status rabbitmq-server
  • Main services in supervisord:
    • aws-uwsgi, awx-daphne, aws-celeryd
  • Logs: /var/log/tower and /var/log/supervisor/

  1. Avoid the following things
  • When you develop playbooks and roles, avoid using the following: shell, command, raw, and script.
  • Other modules should be used instead
  • If you can’t avoid using one of this modules, test what you are executing and ensure that it is idempotent
  • If you are using shell tasks as a handler, ensure that the task calling the handler comes from a module that is idempotent
  • Modules that gather information should have check_mode set to “true” in order to be able to run them on check mode
  • Do not use set_facts to set a fact that has been registered by another task
  • Do not restart services without using a handler. Services restarts should always be done with a handler!
  • Do not chain handlers! If you do, tasks may fail if a previous handler fails
  • Not using dynamic inventory with Cloud providers

Thanks for reading and happy Ansible hacking!

Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Julio's Blog by Julio Villarreal Pelegrino - 1w ago

First of all: 1. I work for Red Hat Inc. 2. Opinions are my own and not the views of my employer.

2017 is gone and with it a great year for technology and innovation. Now, it is time to look a little bit into the future and talk about couple technologies that I believe that will become more relevant in this new year 2018.

1. Managed Security

Last year was a terrible year for IT Security after multiple high profile breaches like the ones that happened to Equifax, BCBS, Alterix, and a few other companies.
I do believe that this year security will be top-priority for CIOs and CTOs across the world, this will manifest by bringing external vendors to complement in-house security teams as part of the IT security strategies.
Managed Security service providers have a great opportunity in front of them in 2018: to help minimize breaches, drive compliance and innovate in new ways to secure the cloud.

2. Blockchain

Blockchain is, without doubt, one of my favorite technologies out there. Initially created for the cryptocurrency use case, is a great way to distribute and secure data.
For 2018, I believe that blockchain will become more mainstream, using their “distributed ledger” capabilities for multiple use cases in banking, trading, asset management, decentralized records, government, insurance, and healthcare.

3. Serverless

Serverless computing will continue to grow in 2018 by making the developer's life more comfortable and the ability for a faster time to market a reality. AWS Lambda should continue to grow this year, but I do believe what will make it serverless an exciting technology to follow is the rise of other players in this arena, like Microsoft Azure Functions, Google Cloud Functions, and the opensource Apache OpenWhisk project.

4. Artificial Intelligence

Artificial Intelligence will continue to evolve in 2018 with more companies investing in AI initiatives. Here are couple use cases that I believe will be the top priority for AI and Machine Learning this year:

  • Self-driving cars and ships
  • Better personal digital assistants and perfecting the way that we interact with them (voice)
  • More and more healthcare applications
  • IT Security applications of AI will grow

5. IoT will continue growing

Internet of Things (IoT) should continue to grow in 2018, generating more and more data (that will be used by companies to take better decisions using Machine Learning and AI). For IoT providers, couple things will be in their top priorities:

  • Securing devices and data
  • Market focus will be beyond the consumer
  • More IoT startups will be created
  • Focus on home automation and interoperability with other devices
  • Edge computing application to IoT will grow

6. Kubernetes

Kubernetes will win the PaaS war in 2018! How will this happen?
Grow of Kubernetes Certified Service providers
Adoption of big players of Kubernetes as the core of their container offerings, like:

  • Red Hat OpenShift Container Platform and OpenShift Online
  • Microsoft Azure Container Service
  • Google Kubernetes Engine
  • Amazon Elastic Container Service for Kubernetes
  • Oracle Container Services
  • Kubernetes will go beyond containers, managing virtualization with KubeVirt.

7. Bare Metal is back!

Bare Metal should grow in 2018 with more use cases that require bigger, specialized, and dedicated hardware. The comeback of bare metal will be possible thanks to cloud computing: allowing provisioning, configuration, management, and lifecycle as-a-service of physical hardware.
The bare metal needs will be addressed in public and private clouds, with projects like OpenStack Bare Metal Provisioning (Ironic) leading in the private cloud space.

8. Automation and Configuration Management

Adopting DevOps and Cloud Computing is an almost impossible task without automation and configuration management. In 2018 automation and configuration management tools like Ansible and Ansible Tower by Red Hat will go beyond cloud and datacenter automation. They will drive the adoption of serverless computing and be used in other use cases like application migrations, operating system modernization, and network device orchestration and configuration.

And by the way, Happy New Year 2018

Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Here is a presentation that we (Rimma Iontel) and me delivered at Red Hat Summit 2017. The topic of the presentation is "Best practices for successfully deploying NFV". Enjoy!

Best practices for successfully deploying NFV - YouTube
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

At OpenStack Summit Sydney, fellow Red Hatter Roger Lopez and I presented on designing and deploying kubernetes on an OpenStack multi-site environment.
Here is the abstract and video of the presentation.

Bringing Worlds Together: Designing and Deploying Kubernetes on an OpenStack multi-site environment

As companies expand their reach to meet new customer demands and needs, so do their IT infrastructures. This expansion brings to the forefront the complexities of managing technologies such as OpenStack in multiple regions and/or countries. Prior to building and expanding these technologies IT teams are likely to ask themselves:

  • How will we manage our growing infrastructure and applications?
  • How will we handle authentication between regions and/or countries?
  • How will we backup/restore these environments?

In order to simplify these complexities and to answer these questions, we look towards a multi-site solution. This session will focus on the best practices on building a highly available multi-site Kubernetes container platform environment on OpenStack.

This session is best suited for OpenStack administrators, system administrators, cloud administrators and container platform administrators.

Video

Bringing Worlds Together Designing and Deploying Kubernetes on an OpenStack multi-site environment - YouTube
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

I had the honor to present at OpenStack Summit Sydney. One of my presentations was with my co-worker Rimma Iontel. Here is the abstract and the video recording of the presentation.

OpenStack: The Perfect Virtual Infrastructure Manager (VIM) for a Virtual Evolved Packet Core (vEPC)

Virtualizing core services to reduce costs and increase efficiency is a priority for the telecommunications industry. A great example of this trend is the virtualization of the evolved packet core (EPC), a key component that provides voice and data on 4G long-term evolution (LTE) networks.

This presentation will address, with real-life examples and architectures, why OpenStack is the perfect virtual infrastructure manager for this use case. We will also answer the following questions:

  • How does OpenStack fit within the ETSI NFV Reference Architecture?
  • What is the use case for virtual evolved packet core (vEPC)?
  • Why OpenStack?
  • How to architect and design a vEPC deployment on OpenStack to meet a provider’s scale and performance requirements?
  • What are the considerations and best practices?

This session is best suited for telco operators and OpenStack and cloud administrators that want to get exposure to real-life vEPC deployments, their use case, and architectures.

Video

OpenStackThe Perfect Virtual Infrastructure Manager (VIM) for a Virtual Evolved Packet Core (vEPC) - YouTube
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Julio's Blog by Julio Villarreal Pelegrino - 1w ago

First things first, I do work for Red Hat! Now with that out of the way let’s jump into today’s blog topic: What is new in Red Hat OpenStack Platform 10!

Red Hat OpenStack Platform 10 was released yesterday (press release here, this new version is based on the OpenStack “Newton” release and also includes fixes to know issues of the Red Hat OpenStack Platform.

The installer (Red Hat OpenStack Platform director) includes the next new features:

Custom and composable roles:
  • Templates have been decomposed into a set of multiple smaller discrete templates, each representing a composable service.
  • Everything can be split except Pacemaker-managed services.
Graphical user interface:
  • Red Hat OpenStack Platform director can now be managed using a Graphical User Interface.
    The interface includes:
  • Templates
  • built-in workflow
  • pre- and post-flight validation
  • create role assignments
  • perform node registration and introspection
Hardware Deployment and Generic Node Deployment separation
  • There is a clear separation of the hardware deployment phase.This allows you to deploy Red Hat Enterprise Linux onto a hardware node and hand it over to a user.

The next OpenStack new features are included in this release:

Nova:
  • Guest Device Role Tagging and Metadata Injection: OpenStack Compute creates and injects an additional metadata file which allows
    the guest to identify the instance based on tags like:
    • type of device
    • the bus it is attached to
    • device address
    • MAC address
    • network
    • disk device name
Horizon:
  • Improvements to provide a better user experience and better integration with OpenStack core services.
Keystone:
  • Fernet tokens support
  • Multi-domain LDAP support
  • Support for domain-specific roles and implied roles.
Swift:
  • Update Container on Fast-POST to allows fast, efficient updates of metadata without the need to fully re-copy the contents of an object.
Neutron:
  • Full support for DVR (Distributed Virtual Routing).
  • DSCP markings
  • Enhanced NFV Datapath with Director Integration:
  • Added support for SR-IOV (using vnic_type=direct-physical), in addition to VF passthrough SR-IOV deployment can now be automated using the director
  • OVS-DPDK 2.5 is now fully supported and integrated with director
Manila:
  • Now can be deployed by the director and fully supported.
Control plane High Availability:
  • Big improvements in how HA is done at the control plane. The majority of OpenStack services are now managed by systemd.
  • Pacemaker only used for the next services (that can’t be separated as individual roles):
  • HAProxy/virtual IPs
  • RabbitMQ
  • Galera (MariaDB)
  • Manila-share
  • Cinder Volume
  • Cinder Backup
  • Redis
Ironic:
  • Bare Metal to Tenant Support, allowing for a pool of shared hardware resources to be provisioned on demand by the OpenStack tenants.

This new release also includes a group of “Technology Preview” items. To check the support scope provided by Red Hat on Technology Preview items visit: https://access.redhat.com/support/offerings/techpreview/ .

  • At-Rest Encryption: Objects can now be stored in encrypted form (using AES in CTR mode with 256-bit keys).

  • Erasure Coding (EC): The Object Storage service includes an EC storage policy type for devices with massive amounts of data that are infrequently accessed.

  • Neutron VLAN Aware Virtual Machines: Certain types of virtual machines require the ability to pass VLAN-tagged traffic over one interface, which is now represented as a trunk neutron port.

  • Open vSwitch Firewall Driver: The OVS firewall driver is now available as a Technology Preview. The conntrack-based firewall driver can be used to implement Security Groups. With conntrack, Compute instances are connected directly to the integration bridge for a more simplified architecture and improved performance.

Also, Red Hat will keep as a tech preview the following features included in previously released products:

  • Benchmarking service
  • Nova cells
  • CephFS native driver for Manila
  • Containerized compute nodes
  • DNSaaS
  • FWaaS
  • Google Cloud Storage backup driver
  • OpenDaylight integration
  • Real Time KVM
  • Red Hat SSO
  • VPNaaS

For more details about Red Hat OpenStack Platform 10, you could visit https://www.redhat.com/en/insights/openstack

Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Here is the presentation that I delivered with fellow Red Hatter Dave Costakos at OpenStack Summit Boston 2017.

Here the abstract:

"OpenStack is a unique set of cross-functional, fast-moving technologies that challenge the status quo in any IT organization. It has been easy to read about and buy into the interweaving of compute, storage, and networking technologies in a single platform. Despite such amazing promise and technology, technology is littered with companies who have tried and failed to deliver a successful, scaleable, supportable OpenStack cloud to their customers. Architects and IT teams tasked with designing"

Enjoy!

Don't Fail at Scale- How to Plan for, Build, and Operate a Successful OpenStack Cloud - YouTube
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Julio's Blog by Julio Villarreal Pelegrino - 1w ago

In this blog post, we will talk a little bit about High Availability and a little bit more about Pacemaker. Here are some of the topics of this post:

  • Introduction to High Availability (HA) and Clustering.

  • Benefits of Highly Available applications.

  • How HA is implemented on Red Hat Enterprise Linux (RHEL) 7.

  • HA requirements on RHEL 7.

  • Demo: Building a 3 node Apache cluster.

Introduction to High Availability (HA) and Clustering.

What is High Availability (HA)?

In IT, High Availability refers to a system or component that is continuously operational for a desirably long length of time.

3 cores principles to HA

  • Elimination of single point of failures.

  • Reliable crossover.

  • Detection of failures as they occur.

How High Availability is implemented on RHEL 7?

CLUSTERING!

What is Clustering?

A cluster is a set of computers working together on a single task. Which task is performed, and how that task is performed, differs from cluster to cluster.

There are four (4) different kinds of clusters:

High-availability clusters: Known as an HA cluster or failover cluster, their function is to keep running services as available as they can be. You could find them in two main configurations:

  • Active-Active (where a service runs on multiple nodes).
  • Active-Passive (where a service only runs on one node at a time).

Load-balancing clusters: All nodes run the same software and perform the same task at the same time and the requests from the clients are distributed between all the nodes.

Compute clusters: Also know as high-performance computing (HPC) cluster. In these clusters, tasks are divided into smaller chunks, which then get computed on different nodes.

Storage clusters: All nodes provide a single cluster file system that will be used by clients to read and write data simultaneously.

Benefits of Highly Available applications

In two words: "Application resiliency".

  • Apply patches.
  • Planned outages.
  • Unplanned outages due to failures (server, software, network, storage).

How HA is implemented on RHEL 7?

Red Hat Enterprise Linux High Availability Add-On.

The High Availability Add-On consists of the following major components:

Cluster infrastructure: Provides fundamental functions for nodes to work together as a cluster: configuration file management, membership management, lock management, and fencing.

High-availability Service Management: Provides failover of services from one cluster node to another.

Cluster administration tools: Configuration and management tools for setting up, configuring, and management.

To provide the above services multiple
software components are required on the cluster nodes.

Software

The cluster infrastructure software is provided by Pacemaker and performs the next set of functions:

  • Cluster management
  • Lock management
  • Fencing
  • Cluster configuration management
Cluster software:

pacemaker: It's responsible for all cluster-related activities, such as monitoring cluster membership, managing the services and resources, and fencing cluster members. The RPM contains three (3) important components:

  • Cluster Information Base (CIB).
  • Cluster Resource Management Daemon (CRMd).

corosync: This is the framework used by Pacemaker for handling communication between the cluster nodes.

pcs: Provides a command-line interface to create, configure, and control every aspect of a Pacemaker/corosync cluster.

Requirements and Support.

Here are some requirements and limits for Pacemaker.

Number of Nodes:

  • Up to 16 nodes per cluster.
  • Minimum number of nodes: 3.
  • 2 nodes cluster could be configured but is not recommended.

Cluster location:

Single site: A cluster setup where all cluster members are in the same physical location, connected by a local area network. (Supported).

Multisite: Two clusters, one active and one for disaster recovery. Failover for multisite clusters must be managed manually. (Supported).

Strech (or) Geo Clusters: Clusters stretched out over multiple physical locations. (Required architecture review to be supported).

Fencing:
Fencing is the process of cutting a node off from shared storage. This can be done by power cycling a node or disabling communication to the storage level.

WARNING: Fencing is required for all nodes in the cluster, either via power fencing, storage fencing, or a combination of both.

NOTE: If the cluster will use integrated fencing devices like ILO or DRAC, the systems acting as cluster nodes must power off immediately when a shutdown signal is received, instead of initiating a clean shutdown.

Virtualization

Virtual Machines supported as nodes and resources.

NOTE: VM as a resource means that virtualization host is participating in a cluster and the virtual machine is a resource that can move between cluster nodes.

Networking

Required:

  • Multicast and IGMP (Internet Group Management Protocol).
  • Gratuitous ARP used for floating IP Address.

Ports:

  • 5405/UDP - corosync
  • 2224/TCP - pcsd
  • 3121/TCP - pacemaker
  • 21064/TCP - dlm

RHN Channels
Required:

  • rhel-7-server-rpms
  • rhel-ha-for-rhel-7-server-rpms
Building a 3 node Apache cluster.

Now, we will build a basic pacemaker cluster serving Apache. To do this we will need 3 systems running RHEL 7.X.

Preparing the systems :

All these actions will happen on the all the cluster nodes.

Configure Firewall

Let's start by configuring FirewallD to allow traffic.

firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload

Install required software

yum install pcs fence-agents-all

The pcs package requires corosync and pacemaker, so all your software be installed by doing this._

Enable pcsd

pcsd provides cluster configuration sync and the web front end. Needs to be enable in all the servers.

systemctl enable pcsd; systemctl start pcsd

Set the ha cluster user password

After the software install, a user hacluster will be created. This user will be used for all cluster communication pcsd.

NOTE: You should use the same password across all cluster nodes for this user. If you echo your password like is show below, clear your history afterward :) .

echo password | passwd hacluster --stdin

Configuring DNS

You should be able to resolve all the nodes in the cluster by name. On this example, we are going to use host files to define our 3 nodes. This is what I added to my hosts files (/etc/hosts).

192.168.1.10    node1   node1.example.local 
192.168.1.20    node2   node2.example.local
192.168.1.30    node3   node3.example.local

Preparing the systems

Authenticate pcsd.
pcsd requires that the cluster nodes authenticate, we are going to use the ha cluster user and password. This actions only needs to happen on one of the nodes.

[root@node1] pcs cluster auth node1.example.local node2.example.local node3.example.local

Creating the cluster

Let's create the cluster:

pcs cluster setup --name demo-cluster --start node1.example.local node2.example.local node3.example.local

An important step will be to enable the cluster services on all nodes. By default, if a node is rebooted will not join the cluster until started manually. To avoid this do:

[root@node1] pcs cluster enable --all

Check the cluster status:

[root@node1] pcs cluster status

Configuring Fencing

This is a critical step, you must have fencing on the cluster! In this example, I am assuming that we are using KVM for the demo and we will use fence_xvm.

[root@node1] pcs stonith create fence_node1_vm fence_xvm port="node1" pcmk_host_list="node1.example.local"

[root@node1] pcs stonith create fence_node2_vm fence_xvm port="node2" pcmk_host_list="node2.example.local"

[root@node1] pcs stonith create fence_node3_vm fence_xvm port="node3" pcmk_host_list="node3.example.local"

Open the port for the fencing agent fencing:

[root@node1] for i in `seq 1 3`; do ssh root@node$i.example.local firewall-cmd --add-port=1229/tcp --permanent; done

[root@node1] for i in `seq 1 3`; do ssh root@node$i.example.local firewall-cmd reload; done

Check fencing status.

[root@node1] pcs stonith show
Resources

Clustered services consist of one or more resources. A resource can be:

  • IP address
  • file system
  • Service (example: httpd)

Also usually the resources are a member of resource groups.

Creating the resources for our demo-cluster

First let's create the resource group for our Apache cluster, we are going to name it personal-web and will have a floating IP.

[root@node1] pcs resource create floatingip IPaddr2 ip=192.168.1.254 cidr_netmask=24 --group personal-web

Install Apache (httpd)

[root@node1] yum install httpd -y

Create the web-1 resource using Apache and put it on the personal-web group.

[root@node1] pcs resource create web-1 apache --group personal-web

Let's check that the created resources are present in the cluster configuration and check the cluster status.

[root@node1] pcs resource show

[root@node1] pcs status

Create a file in /var/www/html/ with this content:

echo "Website responding from $HOSTNAME" > /var/www/html/index.html
Manipulating the cluster

Here are some commands that will help you to manage the cluster.

Start and Stop the cluster

to do it in all nodes use the --all switch.

[root@node1] pcs cluster start

[root@node1] pcs cluster stop

Stop the cluster service on a specific remote node

[root@node1] pcs cluster stop node2.example.com

Disable cluster on reboot on a node

[root@node1] pcs cluster disable

How to add a node to the cluster

[root@node1] pcs cluster node add new.example.com

On the new node, you need to Authenticate the rest of the cluster. Also, you will need to add a fence device for it too.

[root@new] pcs cluster auth

How to remove a node to the cluster

[root@node1] pcs cluster node remove new.example.com

[root@node1] pcs stonith remove fence_newnode new.example.com 

Set a node in standby

(This bans the node from hosting resources).

[root@node1] pcs cluster node standby new.example.com

Set the cluster in standby

[root@node1] pcs cluster standby --all

Unset standby

[root@node1] pcs cluster unstandby --all

Displaying quorum status

[root@node1] corosync-quorumtool
Documentation

Slides:
https://github.com/juliovp01/txlf-2015-HA-slides

High Availability Add-On overview: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Overview/

Cluster Labs website:
http://clusterlabs.org/

Read Full Article
Visit website

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free year
Free Preview