Loading...
Over the last few months I've spent a lot of my time looking at ways to rework the heat auth model, in an attempt to solve two long-standing issues:


  1. Requirement to pass a password when creating a stack which may perform deferred orchestration actions (for example AutoScaling adjustments)
  2. Requirement for users to have administrative roles when creating certain types of resource.


So, fixes to these issues have been happening (in Havana and Icehouse respectively), but discussions with various folks indicates significant confusion re differentiating the two changes, probably because I've not got around to writing up the documentation yet (it's in progress, honest!) ;)

In an attempt to clear up the confusion, and provide some documentation ahead of the upcoming Icehouse Heat release, I'm planning to cover each feature in this and a subsequent post - below is a discussion of the "Requirement to pass a password" problem, and the method used to solve it.




What? Passwords? Don't we pass tokens?Well, yes mostly we do.  However the problem with tokens is they expire, and we have no way of knowing how long a stack may exist for, so we can't store user tokens to do deferred operations after the initial creation of the heat stack (not that it's a good idea from a security perspective either..)

So in previous versions of heat, we've required the user to pass a password (yes, even if they are passing us a token), which we'd then encrypt and store in the heat database, such that we can then obtain a token to act on behalf of the user and to whatever deferred operations are required during the lifetime of the stack.  It's not a nice design, but when it was implemented, Trusts did not exist in Keystone so there was no viable alternative.  Here's exactly what happens:


  • User requests stack creation, providing a token and username/password (python-heatclient or Horizon normally requests the token for you)
  • If the stack contains any resources marked as requiring deferred operations heat will fail validation checks if no username/password is provided
  • The username/password are encrypted and stored in the heat DB
  • Stack creation is completed
  • At some later stage we retrieve the credentials and request another token on behalf of the user, the token is not limited in scope and provides access to all roles of the stack owner.
Clearly this is suboptimal, and is the reason for this strange additional password box in horizon:

You already entered your password, right?!



Happily, after discussions with Adam Young, Trusts were implemented during Grizzly and Heat integrated with the functionality during the Havana cycle.  I get the impression not that many people have yet adopted it, so I'm hoping we can move towards making the new trusts based method the default, which has already happened for devstack quite recently.


Keystone Trusts 101So, in describing the solution to Heat storing passwords, I will be referring to Keystone Trusts, because that is the method used to implement the solution.  There's quite a bit of good information out there, including the Keystone Wiki, Adam Young's blog and the API documentation, but here's a quick summary of terminology which should be sufficient to understand how we're using trusts in Heat:

Trusts are a keystone extension, which provide a method to enable delegation, and optionally impersonation via keystone.  The key terminology is trustor (the user delegating) and trustee (the user being delegated to).

To create a trust, the trustor (in this case the user creating the heat stack) provides keystone with the following information:


  • The ID of the trustee (who you want to delegate to, in this case the heat service user)
  • The roles to be delegated (configurable via the heat configuration file, but it needs to contain whatever roles are required to perform the deferred operations on the users behalf, e.g launching a nova instance in response to an AutoScaling event)
  • Whether to enable impersonation
Keystone then provides a trust_id, which can be consumed by the trustee (and only the trustee) to obtain a trust scoped token.  This token is limited in scope such that the trustee has limited access to those roles delegated, along with effective impersonation of the trustor user, if it was selected when creating the trust.


Phew! Ok so how did you fix it?
Basically we now do the following:

  • User creates a stack via an API request (only the token is required)
  • Heat uses the token to create a trust between the stack owner (trustor) and the heat service user (trustee), delegating a special role (or roles) as defined in the trusts_delegated_roles list in the heat configuration file.  By default heat sets this to "heat_stack_owner", so this role must exist and the user creating the stack must have this role assigned in the project they are creating a stack.  Deployers may modify this list to reflect local RBAC policy, e.g to ensure the heat process can only access those services expected while impersonating a stack owner.
  • Heat stores the trust id in the heat DB (still encrypted, although in theory it doesn't need to be since it's useless to anyone other than the trustee, e.g the heat service user)
  • When a deferred operation is required, Heat retrieves the trust id, and requests a trust scoped token which enables the service user to impersonate the stack owner for the duration of the deferred operation, e.g to launch some nova instances on behalf of the stack owner in response to an AutoScaling event.

The advantages of this approach are hopefully clear, but to clarify:
  • It's better for users, we no longer require a password and can provide full functionality when provided with just a token (like all other OpenStack services... and we can kill the Horizon password box, yay!)
  • It's more secure, as we no longer store any credentials or other data which could use used by any attacker - the trust_id can only be consumed by the trustee (the heat service user).
  • It provides much more granular control of what can be done by heat in deferred operations, e.g if the stack owner has administrative roles, there's no need to delegate them to Heat, just the subset required.

I'd encourage everyone to switch to using this feature, enabling it is simple, first update your heat.conf file to have the following lines:


deferred_auth_method=trusts
trusts_delegated_roles=heat_stack_owner

Hopefully this will soon become the default from Juno for Heat.

Then ensure all users creating heat stacks have the "heat_stack_owner" role (or whatever roles you want them to delegate to the heat service user based on your local RBAC policies).

That is all, more coming soon on "stack domain users" which is new for Icehouse and resolves the second problem mentioned at the start of this post! :)
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Some time ago I wrote a post about debugging TripleO heat templates, which contained some details of possible debug workflows when TripleO deployments fail.

In recent releases (since the Pike release) we've made some major changes to the TripleO architecture - we makes more use of Ansible "under the hood", and we now support deploying containerized environments.  I described some of these architectural changes in a talk at the recent OpenStack Summit in Sydney.

In this post I'd like to provide a refreshed tutorial on typical debug workflow, primarily focussing on the configuration phase of a typical TripleO deployment, and with particular focus on interfaces which have changed or are new since my original debugging post.

We'll start by looking at the deploy workflow as a whole, some heat interfaces for diagnosing the nature of the failure, then we'll at how to debug directly via Ansible and Puppet.  In a future post I'll also cover the basics of debugging containerized deployments.

The TripleO deploy workflow, overviewA typical TripleO deployment consists of several discrete phases, which are run in order:

Provisioning of the nodes
  1. A "plan" is created (heat templates and other files are uploaded to Swift running on the undercloud
  2. Some validation checks are performed by Mistral/Heat then a Heat stack create is started (by Mistral on the undercloud)
  3. Heat creates some groups of nodes (one group per TripleO role e.g "Controller"), which results in API calls to Nova
  4. Nova makes scheduling/placement decisions based on your flavors (which can be different per role), and calls Ironic to provision the baremetal nodes
  5. The nodes are provisioned by Ironic

This first phase is the provisioning workflow, after that is complete and the nodes are reported ACTIVE by nova (e.g the nodes are provisioned with an OS and running).

Host preparation The next step is to configure the nodes in preparation for starting the services, which again has a specific workflow (some optional steps are omitted for clarity):

  1. The node networking is configured, via the os-net-config tool
  2. We write hieradata for puppet to the node filesystem (under /etc/puppet/hieradata/*)
  3. We write some data files to the node filesystem (a puppet manifest for baremetal configuration, and some json files that are used for container configuration)

Service deployment, step-by-step configuration The final step is to deploy the services, either on the baremetal host or in containers, this consists of several tasks run in a specific order:

  1. We run puppet on the baremetal host (even in the containerized architecture this is still needed, e.g to configure the docker daemon and a few other things)
  2. We run "docker-puppet.py" to generate the configuration files for each enabled service (this only happens once, on step 1, for all services)
  3. We start any containers enabled for this step via the "paunch" tool, which translates some json files into running docker containers, and optionally does some bootstrapping tasks.
  4. We run docker-puppet.py again (with a different configuration, only on one node the "bootstrap host"), this does some bootstrap tasks that are performed via puppet, such as creating keystone users and endpoints after starting the service.

Note that these steps are performed repeatedly with an incrementing step value (e.g step 1, 2, 3, 4, and 5), with the exception of the "docker-puppet.py" config generation which we only need to do once (we just generate the configs for all services regardless of which step they get started in).

Below is a diagram which illustrates this step-by-step deployment workflow:
TripleO Service configuration workflow

The most common deployment failures occur during this service configuration phase of deployment, so the remainder of this post will primarily focus on debugging failures of the deployment steps.
 Debugging first steps - what failed?
Heat Stack create failed.
 

Ok something failed during your TripleO deployment, it happens to all of us sometimes!  The next step is to understand the root-cause.

My starting point after this is always to run:

openstack stack failures list --long <stackname>

(undercloud) [stack@undercloud ~]$ openstack stack failures list --long overcloud
overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0:
resource_type: OS::Heat::StructuredDeployment
physical_resource_id: 421c7860-dd7d-47bd-9e12-de0008a4c106
status: CREATE_FAILED
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |

PLAY [localhost] ***************************************************************

...

TASK [Run puppet host configuration for step 1] ********************************
ok: [localhost]

TASK [debug] *******************************************************************
fatal: [localhost]: FAILED! => {
"changed": false,
"failed_when_result": true,
"outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
"Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8",
"Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"
]
}
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8dd0b23a-acb8-4e11-aef7-12ea1d4cf038_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=18 changed=12 unreachable=0 failed=1
 

We can tell several things from the output (which has been edited above for brevity), firstly the name of the failing resource

overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0
  • The error was on one of the Controllers (ControllerDeployment)
  • The deployment failed during the per-step service configuration phase (the AllNodesDeploySteps part tells us this)
  • The failure was during the first step (Step1.0)
Then we see more clues in the deploy_stdout, ansible failed running the task which runs puppet on the host, it looks like a problem with the puppet code.

With a little more digging we can see which node exactly this failure relates to, e.g we copy the SoftwareDeployment ID from the output above, then run:

(undercloud) [stack@undercloud ~]$ openstack software deployment show 421c7860-dd7d-47bd-9e12-de0008a4c106 --format value --column server_id
29b3c254-5270-42ae-8150-9fc3f67d3d89
(undercloud) [stack@undercloud ~]$ openstack server list | grep 29b3c254-5270-42ae-8150-9fc3f67d3d89
| 29b3c254-5270-42ae-8150-9fc3f67d3d89 | overcloud-controller-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | oooq_control |
 

Ok so puppet failed while running via ansible on overcloud-controller-0.
 Debugging via Ansible directlyHaving identified that the problem was during the ansible-driven configuration phase, one option is to re-run the same configuration directly via ansible-ansible playbook, so you can either increase verbosity or potentially modify the tasks to debug the problem.

Since the Queens release, this is actually very easy, using a combination of the new "openstack overcloud config download" command and the tripleo dynamic ansible inventory.

(undercloud) [stack@undercloud ~]$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud ~]$ cd /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ls
common_deploy_steps_tasks.yaml external_post_deploy_steps_tasks.yaml templates
Compute global_vars.yaml update_steps_playbook.yaml
Controller group_vars update_steps_tasks.yaml
deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml upgrade_steps_playbook.yaml
external_deploy_steps_tasks.yaml post_upgrade_steps_tasks.yaml upgrade_steps_tasks.yaml
 

Here we can see there is a "deploy_steps_playbook.yaml", which is the entry point to run the ansible service configuration steps.  This runs all the common deployment tasks (as outlined above) as well as any service specific tasks (these end up in task include files in the per-role directories, e.g Controller and Compute in this example).

We can run the playbook again on all nodes with the tripleo-ansible-inventory from tripleo-validations, which is installed by default on the undercloud:

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml --limit overcloud-controller-0
...
TASK [Run puppet host configuration for step 1] ********************************************************************
ok: [192.168.24.6]

TASK [debug] *******************************************************************************************************
fatal: [192.168.24.6]: FAILED! => {
"changed": false,
"failed_when_result": true,
"outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
"Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend",
"exception: connect failed",
"Warning: Undefined variable '::deploy_config_name'; ",
" (file & line not available)",
"Warning: Undefined variable 'deploy_config_name'; ",
"Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile
/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"
]
}

NO MORE HOSTS LEFT *************************************************************************************************
to retry, use: --limit @/home/stack/tripleo-VOVet0-config/deploy_steps_playbook.retry

PLAY RECAP *********************************************************************************************************
192.168.24.6 : ok=56 changed=2 unreachable=0 failed=1
 

Here we can see the same error is reproduced directly via ansible, and we made use of the --limit option to only run tasks on the overcloud-controller-0 node.  We could also have added --tags to limit the tasks further (see tripleo-heat-templates for which tags are supported).

If the error were ansible related, this would be a good way to debug and test any potential fixes to the ansible tasks, and in the upcoming Rocky release there are plans to switch to this model of deployment by default.
 Debugging via Puppet directlySince this error seems to be puppet related, the next step is to reproduce it on the host (obviously the steps above often yield enough information to identify the puppet error, but this assumes you need to do more detailed debugging directly via puppet):

Firstly we log on to the node, and look at the files in the /var/lib/tripleo-config directory.

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ssh heat-admin@192.168.24.6
Warning: Permanently added '192.168.24.6' (ECDSA) to the list of known hosts.
Last login: Fri Feb 9 14:30:02 2018 from gateway
[heat-admin@overcloud-controller-0 ~]$ cd /var/lib/tripleo-config/
[heat-admin@overcloud-controller-0 tripleo-config]$ ls
docker-container-startup-config-step_1.json docker-container-startup-config-step_4.json puppet_step_config.pp
docker-container-startup-config-step_2.json docker-container-startup-config-step_5.json
docker-container-startup-config-step_3.json docker-container-startup-config-step_6.json
 

The puppet_step_config.pp file is the manifest applied by ansible on the baremetal host

We can debug any puppet host configuration by running puppet apply manually. Note that hiera is used to control the step value, this will be at the same value as the failing step, but it can also be useful sometimes to manually modify this for development testing of different steps for a particular service.

[root@overcloud-controller-0 tripleo-config]# hiera -c /etc/puppet/hiera.yaml step
1
[root@overcloud-controller-0 tripleo-config]# cat /etc/puppet/hieradata/config_step.json
{"step": 1}[root@overcloud-controller-0 tripleo-config]# puppet apply --debug puppet_step_config.pp
...
Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain
 

Here we can see the problem is a typo in the /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp file at line 181, I look at the file, fix the problem (ugeas should be augeas) then re-run puppet apply to confirm the fix.

Note that with puppet module fixes you will need to get the fix either into an updated overcloud image, or update the module via deploy artifacts for testing local forks of the modules.

That's all for today, but in a future post, I will cover the new container architecture, and share some debugging approaches I have found helpful when deployment failures are container related.
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
OpenStack Days UKYesterday I attended the OpenStack Days UK event, held in London.  It was a very good day and there were a number of interesting talks, and it provided a great opportunity to chat with folks about OpenStack.

I gave a talk, titled "Deploying OpenStack at scale, with TripleO, Ansible and Containers", where I gave an update of the recent rework in the TripleO project to make more use of Ansible and enable containerized deployments.

I'm planning some future blog posts with more detail on this topic, but for now here's a copy of the slide deck I used, also available on github.



Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
We've been having a productive week here in Boston at the OpenStack Summit, and one of the sessions I was involved in was a TripleO project Onboarding session.

The project onboarding sessions are a new idea for this summit, and provide the opportunity for new or potential contributors (and/or users/operators) to talk with the existing project developers and get tips on how to get started as well as ask any questions and discuss ideas/issues.

The TripleO session went well, and I'm very happy to report it was well attended and we had some good discussions.  The session was informal with an emphasis on questions and some live demos/examples, but we did also use a few slides which provide an overview and some context for those new to the project.

Here are the slides used (also on my github), unfortunately I can't share the Q+A aspects of the session as it wasn't recorded, but I hope the slides will prove useful - we can be found in #tripleo on Freenode if anyone has questions about the slides or getting started with TripleO in general.

Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
During the newton/ocata development cycles, TripleO made changes to the architecture so we make use of Mistral (the OpenStack workflow API project) to drive workflows required to deploy your OpenStack cloud.

Prior to this change we had workflow defined inside python-tripleoclient, and most API calls were made directly to Heat.  This worked OK but there was too much "business logic" inside the client, which doesn't work well if non-python clients (such as tripleo-ui) want to interact with TripleO.

To solve this problem, number of mistral workflows and custom actions have been implemented, which are available via the Mistral API on the undercloud.  This can be considered the primary "TripleO API" for driving all deployment tasks now.

Here's a diagram showing how it fits together:

Overview of Mistral integration in TripleO


Mistral workflows and actionsThere are two primary interfaces to mistral, workflows which are a yaml definition of a process or series of tasks, and actions which are a concrete definition of how to do a specific task (such as call some OpenStack API).

Workflows and actions can defined directly via the mistral API, or a wrapper called a workbook.  Mistral actions are also defined via a python plugin interface, which TripleO uses to run some tasks such as running jinja2 on tripleo-heat-templates prior to calling Heat to orchestrate the deployment.

Mistral workflows, in detailHere I'm going to show how to view and interact with the mistral workflows used by TripleO directly, which is useful to understand what TripleO is doing "under the hood" during a deployment, and also for debugging/development.

First we view the mistral workbooks loaded into Mistral - these contain the TripleO specific workflows and are defined in tripleo-common


[stack@undercloud ~]$ . stackrc 
[stack@undercloud ~]$ mistral workbook-list
+----------------------------+--------+---------------------+------------+
| Name | Tags | Created at | Updated at |
+----------------------------+--------+---------------------+------------+
| tripleo.deployment.v1 | <none> | 2017-02-27 17:59:04 | None |
| tripleo.package_update.v1 | <none> | 2017-02-27 17:59:06 | None |
| tripleo.plan_management.v1 | <none> | 2017-02-27 17:59:09 | None |
| tripleo.scale.v1 | <none> | 2017-02-27 17:59:11 | None |
| tripleo.stack.v1 | <none> | 2017-02-27 17:59:13 | None |
| tripleo.validations.v1 | <none> | 2017-02-27 17:59:15 | None |
| tripleo.baremetal.v1 | <none> | 2017-02-28 19:26:33 | None |
+----------------------------+--------+---------------------+------------+

The name of the workbook constitutes a namespace for the workflows it contains, so we can view the related workflows using grep (I also grep for tag_node to reduce the number of matches).


[stack@undercloud ~]$ mistral workflow-list | grep "tripleo.baremetal.v1" | grep tag_node
| 75d2566c-13d9-4aa3-b18d-8e8fc0dd2119 | tripleo.baremetal.v1.tag_nodes | 660c5ec71ce043c1a43d3529e7065a9d | <none> | tag_node_uuids, untag_nod... | 2017-02-28 19:26:33 | None |
| 7a4220cc-f323-44a4-bb0b-5824377af249 | tripleo.baremetal.v1.tag_node | 660c5ec71ce043c1a43d3529e7065a9d | <none> | node_uuid, role=None, que... | 2017-02-28 19:26:33 | None | 

When you know the name of a workflow, you can inspect the required inputs, and run it directly via a mistral execution, in this case we're running the tripleo.baremetal.v1.tag_node workflow, which modifies the profile assigned in the ironic node capabilities (see tripleo-docs for more information about manual tagging of nodes)


[stack@undercloud ~]$ mistral workflow-get tripleo.baremetal.v1.tag_node
+------------+------------------------------------------+
| Field | Value |
+------------+------------------------------------------+
| ID | 7a4220cc-f323-44a4-bb0b-5824377af249 |
| Name | tripleo.baremetal.v1.tag_node |
| Project ID | 660c5ec71ce043c1a43d3529e7065a9d |
| Tags | <none> |
| Input | node_uuid, role=None, queue_name=tripleo |
| Created at | 2017-02-28 19:26:33 |
| Updated at | None |
+------------+------------------------------------------+
[stack@undercloud ~]$ ironic node-list
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| 30182cb9-eba9-4335-b6b4-d74fe2581102 | control-0 | None | power off | available | False |
| 19fd7ea7-b4a0-4ae9-a06a-2f3d44f739e9 | compute-0 | None | power off | available | False |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
[stack@undercloud ~]$ mistral execution-create tripleo.baremetal.v1.tag_node '{"node_uuid": "30182cb9-eba9-4335-b6b4-d74fe2581102", "role": "test"}'
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| ID | 6a141065-ad6e-4477-b1a8-c178e6fcadcb |
| Workflow ID | 7a4220cc-f323-44a4-bb0b-5824377af249 |
| Workflow name | tripleo.baremetal.v1.tag_node |
| Description | |
| Task Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2017-03-03 09:53:10 |
| Updated at | 2017-03-03 09:53:10 |
+-------------------+--------------------------------------+

At this point the mistral workflow is running, and it'll either succeed or fail, and also create some output (which in the TripleO model is sometimes returned to the Ux via a Zaqar queue).  We can view the status, and the outputs (truncated for brevity):


[stack@undercloud ~]$ mistral execution-list | grep  6a141065-ad6e-4477-b1a8-c178e6fcadcb
| 6a141065-ad6e-4477-b1a8-c178e6fcadcb | 7a4220cc-f323-44a4-bb0b-5824377af249 | tripleo.baremetal.v1.tag_node | | <none> | SUCCESS | None | 2017-03-03 09:53:10 | 2017-03-03 09:53:11 |
[stack@undercloud ~]$ mistral execution-get-output 6a141065-ad6e-4477-b1a8-c178e6fcadcb
{
"status": "SUCCESS",
"message": {
...

So that's it - we ran a mistral workflow, it suceeded and we looked at the output, now we can see the result looking at the node in Ironic, it worked! :)


[stack@undercloud ~]$ ironic node-show 30182cb9-eba9-4335-b6b4-d74fe2581102 | grep profile
| | u'cpus': u'2', u'capabilities': u'profile:test,cpu_hugepages:true,boot_o |
 Mistral workflows, create your own!Here I'll show how to develop your own custom workflows (which isn't something we expect operators to necessarily do, but is now part of many developers workflow during feature development for TripleO).

First, we create a simple yaml definition of the workflow, as defined in the v2 Mistral DSL - this example lists all available ironic nodes, then finds those which match the "test" profile we assigned in the example above:


This example uses the mistral built-in "ironic" action, which is basically a pass-through action exposing the python-ironicclient interfaces.  Similar actions exist for the majority of OpenStack python clients, so this is a pretty flexible interface.

Now we can now upload the workflow (not wrapped in a workbook this time, so we use workflow-create), run it via execution create, then look at the outputs - we can see that  the matching_nodes output matches the ID of the node we tagged in the example above - success! :)

[stack@undercloud tripleo-common]$ mistral workflow-create shtest.yaml 
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
| ID | Name | Project ID | Tags | Input | Created at | Updated at |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
| 2b8f2bea-f3dd-42f0-ad16-79987c75df4d | test_nodes_with_profile | 660c5ec71ce043c1a43d3529e7065a9d | <none> | profile=test | 2017-03-03 10:18:48 | None |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
[stack@undercloud tripleo-common]$ mistral execution-create test_nodes_with_profile
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| ID | 2392ed1c-96b4-4787-9d11-0f3069e9a7e5 |
| Workflow ID | 2b8f2bea-f3dd-42f0-ad16-79987c75df4d |
| Workflow name | test_nodes_with_profile |
| Description | |
| Task Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2017-03-03 10:19:30 |
| Updated at | 2017-03-03 10:19:30 |
+-------------------+--------------------------------------+
[stack@undercloud tripleo-common]$ mistral execution-list | grep 2392ed1c-96b4-4787-9d11-0f3069e9a7e5
| 2392ed1c-96b4-4787-9d11-0f3069e9a7e5 | 2b8f2bea-f3dd-42f0-ad16-79987c75df4d | test_nodes_with_profile | | <none> | SUCCESS | None | 2017-03-03 10:19:30 | 2017-03-03 10:19:31 |
[stack@undercloud tripleo-common]$ mistral execution-get-output 2392ed1c-96b4-4787-9d11-0f3069e9a7e5
{
"matching_nodes": [
"30182cb9-eba9-4335-b6b4-d74fe2581102"
],
"available_nodes": [
"30182cb9-eba9-4335-b6b4-d74fe2581102",
"19fd7ea7-b4a0-4ae9-a06a-2f3d44f739e9"
]
}

Using this basic example, you can see how to develop workflows which can then easily be copied into the tripleo-common workbooks, and integrated into the TripleO deployment workflow.

In a future post, I'll dig into the use of custom actions, and how to develop/debug those.
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
This is a follow-up to my previous post outlining the new composable services interfaces , which covered the basics of the new for Newton composable services model.

The final piece of the composability model we've been developing this cycle is the ability to deploy user-defined custom roles, in addition to (or even instead of) the built in TripleO roles (where a role is a group of servers, e.g "Controller", which runs some combination of services).

What follows is an overview of this new functionality, the primary interfaces, and some usage examples and a summary of future planned work.



Fully Composable/Custom Roles As described in previous posts TripleO has for a long time provided a fixed architecture with 5 roles (where "roles" means groups of nodes) e.g Controller, Compute, BlockStorage, CephStorage and ObjectStorage.

This architecture has been sufficient to enable standardized deployments, but it's not very flexible.  With the addition of the composable-services model, moving services around between these roles becomes much easier, but many operators want to go further, and have full control of service placement on any arbitrary roles.

Now that the custom-roles feature has been implemented, this is possible, and operators can define arbitrary role types to enable fully composable deployments. When combined with composable services represents a huge step forward for TripleO flexibility! :)

Usage examplesTo deploy with additional custom roles (or to remove/rename the default roles), a new interface has been added to the python-tripleoclient “overcloud deploy interface”, so you simply need to copy the default roles_data.yaml, modify to suit your requirements (for example by moving services between roles, or adding a new role), then do a deployment referencing the modified roles_data.yaml file:

cp /usr/share/openstack-tripleo-heat-templates/roles_data.yaml my_roles_data.yaml
<modify my_roles_data.yaml>
openstack overcloud deploy –templates -r my_roles_data.yaml


Alternatively you can copy the entire tripleo-heat-templates tree (or use a git checkout):

cp -r /usr/share/openstack-tripleo-heat-templates my-tripleo-heat-templates
<modify my-tripleo-heat-templates/roles_data.yaml>
openstack overcloud deploy –templates my-tripleo-heat-templates

Both approaches are essentially equivalent, the -r option simply overwrites the default roles_data.yaml during creation of the plan data (stored in swift on the undercloud), but it's slightly more convenient if you want to use the default packaged tripleo-heat-templates instead of constantly rebasing a copied tree.

So, lets say you wanted to deploy one additional node, only running the OS::TripleO::Ntp composable service, you'd copy roles_data.yaml, and append a list entry like this:

- name: NtpRole
  CountDefault: 1
  ServicesDefault:
    - OS::TripleO::Services::Ntp


(Note that in practice you'll probably also want some of the common services deployed on all roles, such as OS::TripleO::Services::Kernel, OS::TripleO::Services::TripleoPackages, OS::TripleO::Services::TripleoFirewall and OS::TripleO::Services::VipHosts)
 Nice, so how does it work?
The main change made to enable custom roles is a pre-deployment templating step which runs Jinja2. We define a roles_data.yaml file(which can be overridden by the user), which contains a list of role names, and optionally some additional data related to default parameter values (such as the default services deployed on the role, and default count in the group)

The roles_data.yaml definitions look like this:

- name: Controller
CountDefault: 1
ServicesDefault:
  - OS::TripleO::Services::CACerts
  - OS::TripleO::Services::CephMon
    - OS::TripleO::Services::CinderApi
    - ...

The format is simply a yaml list of maps, with a mandatory “name” key in each map, and a number of optional FooDefault keys which set the parameter defaults for the role (as a convenience so the user won't have to specify it via an environment file during the overcloud deployment).

A custom mistral action is used to run Jinja2 when creating or updating a “deployment plan” (which is a combination of some heat templates stored in swift, and a mistral environment containing user parameters) – and this basically consumes the roles_data.yaml list of required roles, and outputs a rendered tree of Heat templates ready to deploy your overcloud.
Custom Roles, overview


There are two types of Jinja2 templates which are rendered differently, distinguished by the file extension/suffix:

foo.j2.yamlThis will pass in the contents of the roles_data.yaml list, and iterate over each role in the list, The resulting file in the plan swift container will be named foo.yaml.
Here's an example of the syntax used for j2 templating inside these files:

enabled_services:
list_join:
   - ','
{% for role in roles %}
   - {get_attr: [{{role.name}}ServiceChain, role_data, service_names]}
{% endfor %}

This example is from overcloud.j2.yaml, it does a jinja2 loop appending service_names for all roles *ServiceChain resources (which are also dynamically generated via a similar loop), which is then processed on deployment via a heat list_join function,

foo.role.j2.yamlThis will generate a file per-role, where only the name of the role is passed in during the templating step, with the resulting files being called rolename-foo.yaml. (Note that If you have a role which requires a special template, it is possible to disable this file generation by adding the pathto the j2_excludes.yaml file)

Here's an example of the syntax used in these files (taken from the role.role.j2.yaml file, which is our new definition of server for a generic role):

resources:
{{role}}:
type: OS::TripleO::Server
metadata:
os-collect-config:
command: {get_param: ConfigCommand}
properties:
image: {get_param: {{role}}Image}

As you can see, this simply allows use of a {{role}} placeholder, which is then substituted with the role name when rendering each file (one file per role defined in the roles_data.yaml list).


Debugging/Development tipsWhen making changes to either the roles_data.yaml, and particularly when making changes to the *.j2.yaml files in tripleo-heat-templates, it's often helpful to view the rendered templates before any overcloud deployment is attempted.

This is possible via use of the “openstack overcloud plan create” interface (which doesn't yet support the -r option above, so you have to copy or git clone the tree), combined with swiftclient:

openstack overcloud plan create overcloud –templates my_tripleo_heat_templates
mkdir tmp_templates && pushd tmp_templates
swift download overcloud

This will download the full tree of rendered files from the swift container (named “overcloud” due to the name passed to plan create), so you can e.g view the rendered overcloud.yaml that's generated by combining the overcloud.j2.yaml template with the roles_data.yaml file.

If you make a mistake in your *.j2.yaml file, the jinja2 error should be returned via the plan create command, but it can also be useful to tail -f /var/log/mistral/mistral-server.log for additional information during development (this shows the output logged from running jinja2 via the custom mistral action plugin).

Limitations/future workThese new interfaces allow for much greater deployment flexibility and choice, but there are a few remaining issues which will be addressed in future development cycles:
  1. All services managed by pacemaker are still tied to the Controller role. Thanks to the implementation of a more lightweight HA architecture during the Newton cycle, the list of services managed by pacemaker is considerably reduced, but there's still a number of services (DB & RPC services primarily) which are, and until the composable-ha blueprint is completed (hopefully during Ocata), these services cannot be moved to a non Controller role.
  2. Custom isolated networks cannot be defined. Since arbitrary roles types can now be defined, there may be a requirement to define arbitrary additional networks for network-isolation, but right now this is not possible.
  3. roles_data.yaml must be copied. As in the examples above, it's necessary to copy either roles_data.yaml, (or the entire tripleo-heat-templates tree), which means if the packaged roles_data.yaml changes (such as to add new services to the built-in roles), you must merge these changes in with your custom roles_data. In future we may add a convenience interface which makes it easier to e.g add a new role without having to care about the default role definitions.
  4. No model for dependencies between services.  Currently ensuring the right combination of services is deployed on specific roles is left to the operator, there's no validation of incompatible or inter-dependent services, but this may be addressed in a future release.
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Over the newton cycle, we've been working very hard on a major refactor of our heat templates and puppet manifiests, such that a much more granular and flexible "Composable Services" pattern is followed throughout our implementation.

It's been a lot of work, but it's been a frequently requested feature for some time, so I'm excited to be in a position to say it's complete for Newton (kudos to everyone involved in making that happen!) :)

This post aims to provide an introduction to this work, an overview of how it works under the hood, some simple usage examples and a roadmap for some related follow-on work.



Why Composable Services?
It probably helps to start with some historical context here.  As described in previous posts TripleO has provided a fixed architecture with 5 roles (where "roles" means groups of nodes) Controller, Compute, BlockStorage, CephStorage and ObjectStorage.

To configure each of these roles, we used puppet, and we had a large manifest per role, with some relatively inflexible assumptions about which services would run on each role.

This worked OK, but many users have been requesting more flexibility, such as:

  • Ability to easily disable services they don't need
  • Allow service placement choice, such as co-locating the Ceph OSD service with nova-compute services to reduce the required hardware footprint (so-called "hyperconverged" deployments)
  • Make it easier to integrate new services and integrate third-party pieces (get closer to a strongly defined "plugin" interface)


The pre-newton Tripleo architecture, one manifest and heat template per role.

So, how does it work?So, basically we've made two fundamental changes to our interfaces:

  • Each service, e.g "nova-api" is now defined by an individual heat template.  The interfaces for these are standardized so all services must implement a basic subset of input parameters and output values.
  • Every service defines a small puppet "profile", which is a puppet manifest fragment that defines configuring that service.  Again a standard interface is used, in particular a "step" variable is passed to each puppet profile, so you can choose which step configuration occurs in (we apply configuration in a series of six steps so the author of the profile can choose when a service is configured relative to other services).
This is the basis of the TripleO "service plugin" interface, and it should enable *much* easier integration of new services, and hopefully provide a more accessible interface to new contributors.

Inside the TripleO templates, we made use of a new-for-mitaka Heat ResourceChain interface to compose a deployment of multiple services.  Basically a ResourceChain is a group of resources that may have different types, but conform to the same interfaces, which is what we need to combine a bunch of service templates that all have some standard interfaces.

Here's an illustration of how it works - essentially you define an input parameter which is a list of services, e.g OS::TripleO::Services: NovaApi which then maps to the heat template for that service, e.g puppet/services/nova-api.yaml via the resource_registry interface discussed in previous posts.   

For Newton, each role has a ServiceChain that combines the chosen services for that role.

If you'd like more information on the implementation details, I'd encourage you to check out the developer documentation where we're starting to document these interfaces in more detail.

Ok, how do I use it?Here I'm going to focus on usage of the feature vs developing new services (which is pretty well covered in the aforementioned developer docs), and hopefully illustrate why this is an important step forward that improves operator deployment choices.

Scenario 1 - All in one minimal deploymentLets say for a moment that you're a keystone developer and you want a shorter debug cycle and/or are resource constrained.  With the new interfaces, it's become very easy to deploy a minimal subset of services on a single node:

First you create an environment file that overrides the default ControllerServices list (which at the time of writing contains about 50 services!) so it only includes OS::TripleO::Services::Keystone and the services keystone depends on.  We also set ComputeCount to zero as we don't need any compute nodes.

$ cat keystone-only.yaml
parameter_defaults:
  ControllerServices:
      - OS::TripleO::Services::Keystone
      - OS::TripleO::Services::RabbitMQ
      - OS::TripleO::Services::HAproxy
      - OS::TripleO::Services::MySQL
  ComputeCount: 0

(Note that in some environments it may also be necessary to include the OS::TripleO::Services::Pacemaker too)

You can then deploy your single node keystone-only environment:

openstack overcloud deploy --templates -e keystone_only.yaml

When this completes, you'll see the following message, and you can source the overcloudrc and get a token to prove the deployed keystone is working:

...
Overcloud Endpoint: http://192.0.2.15:5000/v2.0
Overcloud Deployed
[stack@instack ~]$ . overcloudrc
[stack@instack ~]$ openstack token issue
+------------+----------------------------------+
| Field      | Value                            |
+------------+----------------------------------+
| expires    | 2016-08-05 10:16:16+00:00        |
| id         | 976d5fcf9f744a5a9cf840e83d825560 |
| project_id | 99e92ae58d1f4147a5d7eda0af516060 |
| user_id    | 29fe578e45b24406ba6c5fd0baaeaa9c |
+------------+----------------------------------+

We can see by looking at the undercloud nova (don't forget to source the stackrc after interacting with the overcloud above!) that there is one controller node):


[stack@instack ~]$ . stackrc
[stack@instack ~]$ nova list
+--------------------------------------+------------------------+--------+------------+-------------+--------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks           |
+--------------------------------------+------------------------+--------+------------+-------------+--------------------+
| d5155616-d2a6-4cee-a6d1-37bb83fccfe0 | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.7 |
+--------------------------------------+------------------------+--------+------------+-------------+--------------------+


Scenario 2 - "hyperconverged" ceph deploymentIn this case, we want to move the Ceph OSD services, which normally run on the CephStorage role, and instead have them run on the Compute role.

To do this, we first look at the default values for the ComputeServices and CephStorageServices parameters in overcloud.yaml (as in the example above for the Controller role, these lists define the services to be deployed on the Compute and CephStorage roles respectively):

ComputeServices:
    default:
      - OS::TripleO::Services::CephClient
      - OS::TripleO::Services::CephExternal
      - OS::TripleO::Services::Timezone
      - OS::TripleO::Services::Ntp
      - OS::TripleO::Services::Snmp
      - OS::TripleO::Services::NovaCompute
      - OS::TripleO::Services::NovaLibvirt
      - OS::TripleO::Services::Kernel
      - OS::TripleO::Services::ComputeNeutronCorePlugin
      - OS::TripleO::Services::ComputeNeutronOvsAgent
      - OS::TripleO::Services::ComputeCeilometerAgent

  CephStorageServices:
    default:
      - OS::TripleO::Services::CephOSD
      - OS::TripleO::Services::Kernel
      - OS::TripleO::Services::Ntp
      - OS::TripleO::Services::Timezone


Our aim is to deploy one Compute node, running both the standard compute services, and the OS::TripleO::Services::CephOSD service (the other services are clearly common to both roles).  We also don't need the OS::TripleO::Services::CephExternal service defined in ComputeServices, because we won't be referencing any external ceph cluster, which gives us this:

$ cat ceph_osd_on_compute.yaml
parameter_defaults:
  ComputeServices:
      - OS::TripleO::Services::CephClient
      - OS::TripleO::Services::CephOSD
      - OS::TripleO::Services::Timezone
      - OS::TripleO::Services::Ntp
      - OS::TripleO::Services::Snmp
      - OS::TripleO::Services::NovaCompute
      - OS::TripleO::Services::NovaLibvirt
      - OS::TripleO::Services::Kernel
      - OS::TripleO::Services::ComputeNeutronCorePlugin
      - OS::TripleO::Services::ComputeNeutronOvsAgent
      - OS::TripleO::Services::ComputeCeilometerAgent

That is all that's required to enable a hyperconverged ceph deployment!  :)

Since the default count for CephStorage is zero, we can then deploy like this:


[stack@instack ~]$ openstack overcloud deploy --templates /tmp/tripleo-heat-templates -e ceph_osd_on_compute.yaml -e /tmp/tripleo-heat-templates/environments/storage-environment.yaml

Here we can see I'm specifying a non-default location /tmp/tripleo-heat-templates for the template tree (this defaults to /usr/share/openstack-tripleo-heat-templates), passing the ceph_osd_on_compute.yaml environment to enable the OSD service on the Compute role, and finally passing the storage-environment.yaml that configures things so they are backed by Ceph.

Logging onto the compute node after deployment we see this:

[root@overcloud-novacompute-0 ~]# ps ax | grep ceph
17437 ?        Ss     0:00 /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f
17438 ?        Sl     0:00 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f

So, it worked, and we have the OSD service running on the Compute role! :)


Similar patterns to those described above can be used to achieve various deployment topologies which were not previously possible (an all-in-one deployment including nova-compute on a single node for example, as is done in one of our CI jobs now)

Future WorkHopefully by now you can see that these new interfaces provide a much cleaner abstraction for services, and a lot more operator flexibility regarding their placement.  However for some environments this is not enough, and completely new roles may be needed.  We're working towards enabling that via the custom-roles blueprint, which will hopefully land for Newton.

Another related piece of work is enabling more flexible environment merging inside Heat.  This will mean there is less need to specify the full list of Services as described above, and instead we'll be able to build up a list of services based on multiple environment files (which are then merged appending to the final list).
 
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Disclaimer, what follows is either pretty neat, or pure-evil depending your your viewpoint ;)  But it's based on a real use-case and it works, so I'm posting this to document the approach, why it's needed, and hopefully stimulate some discussion around optimizations leading to a improved/simplified implementation in the future.




The requirementIn TripleO we have a requirement enable composition of different services onto different roles (groups of physical nodes), we need input data to configure the services which combines knowledge of the enabled services, which nodes/role they're running on, and which overlay network each service is bound to.

To do this, we need to input several pieces of data:

1. A list of the OpenStack services enabled for a particular deployment, expressed as a heat parameter it looks something like this:


  EnabledServices:
    type: comma_delimited_list
    default:
      - heat_api
      - heat_engine
      - nova_api
      - neutron_api
      - glance_api
      - ceph_mon

2. A mapping of service names to one of several isolated overlay networks, such as "internal_api" "external" or "storage" etc:


  ServiceNetMap:
    type: json
    default:
      heat_api_network: internal_api
      nova_api_network: internal_api
      neutron_api_network: internal_api
      glance_api_network: storage
      ceph_mon_network: storage

3. A mapping of the network names to the actual IP address (either a single VIP pointing to a loadbalancer, or a list of the IPs bound to that network for all nodes running the service):

  NetIpMap:
    type: json
    default:
      internal_api: 192.168.1.12
      storage: 192.168.1.13

The implementation, step by step

Dynamically generate an initial mapping for all enabled servicesHere we can use a nice pattern which combines the heat repeat function with map_merge:

  map_merge:
    repeat:
      template:
        SERVICE_ip: SERVICE_network
      for_each:
         SERVICE: {get_param: EnabledServices}



Step1: repeat dynamically generates lists (including lists of maps as in this case), so we use it to generate a list of maps for every service in the EnabledServices list with a placeholder for the network, e.g:

  - heat_api_ip: heat_api_network
  - heat_engine_ip: heat_engine_network
  - nova_api_ip: nova_api_network
  - neutron_api_ip: neutron_api_network
  - glance_api_ip: glance_api_network
  - ceph_mon_ip: ceph_mon_network

Step2: map_merge combines this list of maps with only one key to one big map for all EnabledServices




  heat_api_ip: heat_api_network
  heat_engine_ip: heat_engine_network 
  nova_api_ip: nova_api_network
  neutron_api_ip: neutron_api_network
  glance_api_ip: glance_api_network
  ceph_mon_ip: ceph_mon_network

Substitute placeholder for the actual network/IP
We approach this in two passes, with two nested map_replace calls (a new function I wrote for newton Heat which can do key/value substitutions on any mapping):



  map_replace:
    - map_replace:

    - heat_api_ip: heat_api_network
      heat_engine_ip: heat_engine_network 
      nova_api_ip: nova_api_network
      neutron_api_ip: neutron_api_network
      glance_api_ip: glance_api_network
      ceph_mon_ip: ceph_mon_network
         - values: {get_param: ServiceNetMap}
     - values: {get_param: NetIpMap}


Step3: The inner map_replace substitutes the placeholder into the actual network provided in the ServiceNetMap mapping, which gives e.g

  heat_api_ip: internal_api
  heat_engine_ip: heat_engine_network
  nova_api_ip: internal_api
  neutron_api_ip: internal_api
  glance_api_ip: storage
  ceph_mon_ip: storage

  
Note that if there's no network assigned in ServiceNetMap for the service, no replacement will occur, so the value will remain e.g heat_engine_network, more on this later..

Step4: the outer map_replace substitutes the network name, e.g internal_api, with the actual VIP for that network provided by the ServiceNetMap mapping, which gives the final mapping of:


  heat_api_ip: 192.168.1.12
  heat_engine_ip: heat_engine_network 
  nova_api_ip: 192.168.1.12
  neutron_api_ip: 192.168.1.12
  glance_api_ip: 192.168.1.13
  ceph_mon_ip: 192.168.1.13

Filter any values we don't wantAs you can see we got a value we don't want - heat_engine is like many non-api services in that it's not bound to any network, it only talks to rabbitmq, so we don't have any entry in ServiceNetMap for it.

We can therefore remove any entries which remain in the mapping using the yaql heat function, which is an interface to run yaql queries inside a heat template. 

It has to be said yaql is very powerful, but the docs are pretty sparse (but improving), so I tend to read the unit tests instead of the docs for usage examples.

  yaql:
    expression: dict($.data.map.items().where(isString($[1]) and not $[1].endsWith("_network")))
      data:
        map:

          heat_api_ip: 192.168.1.12
          heat_engine_ip: heat_engine_network 
          nova_api_ip: 192.168.1.12
          neutron_api_ip: 192.168.1.12
          glance_api_ip: 192.168.1.13
          ceph_mon_ip: 192.168.1.13


Step5: filter all map values where the value is a string, and the string ends with "_network" via yaql, which gives:

  heat_api_ip: 192.168.1.12

  nova_api_ip: 192.168.1.12
  neutron_api_ip: 192.168.1.12
  glance_api_ip: 192.168.1.13
  ceph_mon_ip: 192.168.1.13


So, that's it - we now transformed two input maps and a list into a dynamically generated mapping based on the list items! :)


Implementation, completed
Pulling all of the above together, here's a full example (you'll need a newton Heat environment to run this), it combines all steps described above into one big combination of nested intrinsic functions:
 
Edit - also available on github


heat_template_version: 2016-10-14

description: >
  Example of nested heat functions

parameters:
  NetIpMap:
    type: json
    default:
      internal_api: 192.168.1.12
      storage: 192.168.1.13

  EnabledServices:
    type: comma_delimited_list
    default:
      - heat_api
      - nova_api
      - neutron_api
      - glance_api
      - ceph_mon

  ServiceNetMap:
    type: json
    default:
      heat_api_network: internal_api
      nova_api_network: internal_api
      neutron_api_network: internal_api
      glance_api_network: storage
      ceph_mon_network: storage


outputs:
  service_ip_map:
    description: Mapping of service names to IP address for the assigned network
    value:
      yaql:
        expression: dict($.data.map.items().where(isString($[1]) and not $[1].endsWith("_network")))
        data:
          map:
            map_replace:
              - map_replace:
                  - map_merge:
                      repeat:
                        template:
                          SERVICE_ip: SERVICE_network
                        for_each:
                          SERVICE: {get_param: EnabledServices}
                  - values: {get_param: ServiceNetMap}
              - values: {get_param: NetIpMap}

  
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 


For a while now, TripleO has supported a "DeployArtifacts" interface, aimed at making it easier to deploy modified/additional files on your overcloud, without the overhead of frequently rebuilding images.

This started out as a way to enable faster iteration on puppet module development (the puppet modules are by default stored inside the images deployed by TripleO, and generally you'll want to do development in a git checkout on the undercloud node), but it is actually a generic interface that can be used for a variety of deployment time customizations.




Ok, how do I use it?
Lets start with a couple of usage examples, making use of some helper scripts that are maintained in the tripleo-common repo (in future similar helper interfaces may be added to the TripleO CLI/UI but right now this is more targetted at developers and advanced operator usage).

First clone the tripleo-common repo (you can skip this step if you're running a packaged version which already contains the following scripts):





[stack@instack ~]$ git clone https://git.openstack.org/openstack/tripleo-common




There are two scripts of interest, firstly a generic script that can be used to deploy any kind of file (aka artifact) tripleo-common/scripts/upload-swift-artifacts and a slightly modified version which optimizes the flow for deploying directories containing puppet modules called tripleo-common/scripts/upload-puppet-modules
To make using these easier, I append this to my .bashrc


export PATH="$PATH:/home/stack/tripleo-common/scripts"
 Example 1 - Deploy Artifacts "Hello World"
So, let's start with a really simple example.  First lets create a tarball containing a single /tmp/hello file:


[stack@instack ~]$ mkdir tmp
[stack@instack ~]$ echo "hello" > tmp/hello
[stack@instack ~]$ tar -cvzf hello.tgz tmp
tmp/
tmp/hello


Now, we simply run the upload-swift-artifacts script, accepting all the default options other than to pass a reference to hello.tgz


[stack@instack ~]$ upload-swift-artifacts -f hello.tgz
Creating heat environment file: /home/stack/.tripleo/environments/deployment-artifacts.yaml
Uploading file to swift: hello.tgz
hello.tgz
Upload complete.

There are currently only two supported file types:

  •     A tarball (will be unpacked from / on all nodes)
  •     An RPM file (will be installed on all nodes)

Taking a look inside the environment file the script generated, we can see it's using the DeployArtifactURLs parameter, and passing a single URL (the parameter accepts a list of URLs).  This happens to be a swift tempurl, created by the upload-swift-artifacts script but it could be any URL accessible to the overcloud nodes at deployment time.

[stack@instack ~]$ cat /home/stack/.tripleo/environments/deployment-artifacts.yaml
# Heat environment to deploy artifacts via Swift Temp URL(s)
parameter_defaults:
  DeployArtifactURLs:
    - 'http://192.0.2.1:8080/v1/AUTH_e9bcd2a11af94c319b164eba73c59a28/overcloud/hello.tgz?temp_url_sig=96ae277d85c3ee38dd61234b8c99351e64c8bd45&temp_url_expires=1502273853'

This environment file is automatically generated by the upload-swift-artifacts script, and put into the special ~/.tripleo/environments directory.  This directory is read by tripleoclient and any environment files included here are always included automatically (no need for any -e options), but you can also pass a --environment option to upload-swift-artifacts if you prefer some different output location (e.g so it can be explicitly included in your overcloud deploy command).

Testing this example, you simply do an overcloud deployment, no additional arguments are needed if you use the default .tripleo/environments/deployment-artifacts.yaml environment path:

[stack@instack ~]$ openstack overcloud deploy --templates

Then check on one of the nodes for the expected file (note the tarball is unpacked from / in the filesystem):

[root@overcloud-controller-0 ~]# cat /tmp/hello
hello
Note the deploy artifact files are written to all roles, currently there is no way to deploy e.g only to Controller nodes.  We might consider an enhancement that allows role specific artifact URL parameters in future should folks require it.

Hopefully despite the very simple example you can see that this is a very flexible interface - you can deploy a tarball containing anything, e.g even configuration files such as policy.json files to the nodes.

Note that you have to be careful though - most service configuration files are managed by puppet, so if you attempt using the deploy artifacts interface to overwrite puppet managed files it will not work - puppet runs after deploy artifacts are created (this is deliberate, as you will see in the next example) so you must use puppet hieradata to influence any configuration managed by puppet.  (In the case of policy.json files, there is a puppet module that handles this, but currently TripleO does not use it - this may change in future though).

Example 2 - Puppet development workflowThere is coupling between tripleo-heat-templates and the puppet modules it interfaces with (and in particular with the puppet profiles that exist in puppet-tripleo, as discussed in my composable services tutorial recently), so a common pattern for a developer is:

  1. Modify some puppet code
  2. Modify tripleo-heat-templates to match the new/modified puppet profile
  3. Deploy an overcloud
  4. *OH NO* it doesn't work!
  5. Debug the issue (hint, "openstack stack failures list overcloud" is a super-useful new heatclient command which helps a lot here, as it surfaces the puppet error in most cases)
  6. Make coffee; goto (1) :)
Traditionally for TripleO deployments all puppet modules (including the puppet-tripleo profiles) have been built into the image we deploy (stored in Glance on the undercloud), so one missing step above is getting the modified puppet code into the image.  There are a few options:

  • Rebuild the image every time (this is really slow)
  • Use virt-customize or virt-copy-in to copy some modifications into the image, then update the image in glance (this is faster, but it still means you must redeploy the nodes every time and it's easy to lose track of what modifications have been made).
  • Use DeployArtifactUrls to update the puppet modules on the fly during the deployment!
This last use-case is actually what prompted implementation of the DeployArtifacts interface (thanks Dan!), and I'll show how it works below:

First, we clone one or more puppet modules to a local directory - note the name of the repo e.g "puppet-tripleo" does not match the name of the deployed directory (on the nodes it's /etc/puppet/modules/tripleo), so you have to clone it to the "tripleo" directory.

mkdir puppet-modules
cd puppet-modules 
git clone https://git.openstack.org/openstack/puppet-tripleo tripleo 

Now you can make whatever edits are needed, pull under review code (or just do nothing if you want to deploy latest trunk of a given module).  When you're ready you run the upload-puppet-modules script:

upload-puppet-modules -d puppet-modules

This works a little bit differently to the previous upload-swift-artifacts script, it takes the directory, creates a tarball using the --transform option, so we rewrite the prefix from /somewhere/puppet-modules to /etc/puppet/modules

The process after we create the tarball is exactly the same - we upload it to swift, get a tempurl, and create a heat environment file which references the location of the tarball.  On deployment, the updated puppet modules will be untarred and this always happens before puppet runs, which makes the debug workflow above much faster, nice!

NOTE: There is one gotcha here - upload-puppet-modules creates a differently named environment file ($HOME/.tripleo/environments/puppet-modules-url.yaml) to upload-swift-artifacts by default, and their content is conflicting - if both environment files exist, one will be ignored as they will get merged together.  (This is something we can probably improve in future when this heat feature lands, but right now the only option is to stick to one script or the other, or accept manually merging the environment files (to append rather than overwrite the DeployArtifactUrls parameter)




So how does it work?
Deploy Artifacts Overview

So, it's actually pretty simple, as illustrated in the diagram above

  • A tarball is created containing the files you want to deploy to the nodes
  • This tarball is uploaded to swift on the undercloud
  • A Swift tempurl is created, so the tarball can be accessed using a signed URL (no credentials needed in the nodes to access)
  • A Heat environment passes the Swift tempurl to a nested stack "deploy-artifacts.yaml", which defines a DeployArtifactUrls parameter (which is a list)
  • deploy-artifacts.yaml defines a Heat SoftwareConfig resource, which references a shell script that can download files from a list of URLs, check the file type and do something (e.g in the case of a tarball, untar it!)
  • The deploy-artifacts SoftwareConfig is deployed inside the per-role "PostDeploy" template, which is where we perform the puppet steps (5 deployment passes which apply puppet in a series of steps). 
  • We use the heat depends_on directive to ensure that the DeployArtifacts deployment (ControllerArtifactsDeploy in the case of the Controller role) always runs before any of the puppet steps.
  • This pattern is replicated for all roles (not just the Controller as in the diagram above)
As you can see,  there are a few steps to the process, but it's pretty simple and it leverages the exact same Heat SoftwareDeployment patterns we use throughout TripleO to deploy scripts (and apply puppet manifests, etc).
Read Full Article
Visit website
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Recently I was asked if it's possible to do a partial update of a TripleO overcloud - the answer is yes, so I thought I'd write a post showing how to do it.  Much of what follows is basically an update on my old post on nested resource introspection (some interfaces have changed a bit since I wrote that), combined with an introduction to heat PATCH updates.

Partial update?!  Why?So, the first question is why would you do this - TripleO heat templates are designed to enforce a consistent state for your entire OpenStack deployment, so in most cases you really should update the entire overcloud, and not mess with the underlying nested stacks directly.

However, for some development usage, this creates a long feedback loop - you change something (perhaps one line in a puppet manifest or heat template), then have to wait several minutes for Heat to walk the entire tree of nested stacks, puppet to run all steps on all nodes, etc.

So, while you would probably never do this in production (seriously, please don't!), it can be a useful technique for developers seeking a quicker hack-then-test cycle, and also when attempting to isolate root-causes for some subset of overcloud stack update behavior.

Ok, with that disclaimer clearly stated, here's how you do it:

Step 1 - Find the nested stack to update
Lets take a specific example - I want to update only the ControllerNodesPostDeployment resource which is defined in overcloud.yaml - this is a resource that maps to a nested stack that uses the cluster configuration interfaces I described in this previous post to apply puppet in a series of steps to all controller nodes.

Here's our overcloud (some CLI output removed for brevity):

$ heat stack-list
| 01c51e7e-ad2f-41d3-b056-3c4c84395114 | overcloud  | CREATE_COMPLETE |
2016-06-08T18:07:00 | None         |







Here's the ControllerNodesPostDeployment resource:


$ heat resource-list overcloud | grep ControllerNodesPost
| ControllerNodesPostDeployment             |
e67fff24-8089-4cf8-adf4-9c6064bf01d6          |
OS::TripleO::ControllerPostDeployment             | CREATE_COMPLETE |
2016-06-08T18:07:00 |
e67fff24-8089-4cf8-adf4-9c6064bf01d6 is the resource ID of
ControllerNodesPostDeployment, which is a nested stack - you can confirm
this via:

$ heat stack-list -n | grep "^| e67fff24-8089-4cf8-adf4-9c6064bf01d6"
| e67fff24-8089-4cf8-adf4-9c6064bf01d6 |
overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
| UPDATE_COMPLETE | 2016-06-08T18:10:34 | 2016-06-09T08:52:45 |
01c51e7e-ad2f-41d3-b056-3c4c84395114 |
Note here the first column is the stack ID, and the last is the parent
stack ID (e.g "overcloud" above).

overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 is the name of the stack that implements ControllerNodesPostDeployment - we can refer to it by either that name or the ID (e67fff24-8089-4cf8-adf4-9c6064bf01d6).

Step 2 - Basic update of the stackHeat supports PATCH updates, so it is possible to trigger a no-op update without passing any template or parameters (the existing data will be used), or to patch in some specific modification.

Here's now it works, we simply use either the name or ID we discovered above, and use heat stack-update (or the new openstack client equivalent commands.

First, however, we want to get the last event ID before triggering the update (or, on recent heatclient versions you can instead use openstack stack event list --follow):

$ heat event-list overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | tac | head -n2
+------------------------------------------------------+--------------------------------------+-------------------------------------+--------------------+---------------------+
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 89e535ef-d414-4121-b726-9924eccb4fc3 | Stack UPDATE completed successfully | UPDATE_COMPLETE    | 2016-06-09T09:09:09 |


So the last event logged by this nested stack has the ID of 89e535ef-d414-4121-b726-9924eccb4fc3 - we can use this as a marker so we hide all previous events for the stack:

 $ heat event-list -m 89e535ef-d414-4121-b726-9924eccb4fc3 overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
+----+------------------------+-----------------+------------+
| id | resource_status_reason | resource_status | event_time |
+----+------------------------+-----------------+------------+
+----+------------------------+-----------------+------------+
 Now, we can trigger the update, and use the marker event-list to follow progress:

heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26

<wait a short time>

$ heat event-list -m 89e535ef-d414-4121-b726-9924eccb4fc3 overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
+------------------------------------------------------+
| resource_name | id | resource_status_reason | resource_status | event_time |
+------------------------------------------------------+
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 2e08a022-ce0a-4e57-bf30-719fea6cbb74 | Stack UPDATE started | UPDATE_IN_PROGRESS | 2016-06-09T10:00:52 |
| ControllerArtifactsConfig | a55f9b17-f26c-4664-9ea5-535949c368e8 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:00 |
| ControllerPuppetConfig | 21679c7f-c354-4319-9688-7fa290168664 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:00 |
| ControllerPuppetConfig | f5761452-91dd-45dc-92e8-a5c371fa5004 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:02 |
| ControllerArtifactsConfig | 01abec3c-f472-4ec2-893d-0fddb8fc1696 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:02 |
| ControllerArtifactsDeploy | f8f7a21f-9169-4f8c-ab46-46ecbb141be8 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:02 |
| ControllerArtifactsDeploy | 75937a57-e2f0-4d66-9b4c-2308593e56b1 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:04 |
| ControllerLoadBalancerDeployment_Step1 | 6058e29c-cded-4ad3-94d9-65909fd4911d | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:04 |
| ControllerLoadBalancerDeployment_Step1 | c9f93f1f-177c-4721-827f-a7d409b2cd50 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:06 |
| ControllerServicesBaseDeployment_Step2 | 92409e4c-24f2-4e68-bad9-47ce09107d7a | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:06 |
| ControllerServicesBaseDeployment_Step2 | a9203aa1-c438-47c0-977b-8e34669777bc | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:08 |
| ControllerOvercloudServicesDeployment_Step3 | aa7d78dc-d243-4d54-8ea6-3b59a6ed302a | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:08 |
| ControllerOvercloudServicesDeployment_Step3 | 4a1a6885-29d7-4708-a884-01f481ac1b35 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:10 |
| ControllerOvercloudServicesDeployment_Step4 | 7afd52c1-cbbc-431a-a22c-dd7459ed2255 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:10 |
| ControllerOvercloudServicesDeployment_Step4 | 0dac2e72-0919-4e91-ac94-100d8d811c67 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:13 |
| ControllerOvercloudServicesDeployment_Step5 | ec57867f-e401-4756-bd30-0a566eced343 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:13 |
| ControllerOvercloudServicesDeployment_Step5 | 427582fb-acd1-4939-a13c-7b3cbbc7527b | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:15 |
| ExtraConfig | 760fd961-fff6-4f4c-848e-80773e09e04b | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:15 |
| ExtraConfig | caee58b6-01bb-4805-b41f-4c48a8c7d767 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:16 |
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 35f527a5-0761-46bb-aecb-6eee0e0f083e | Stack UPDATE completed successfully | UPDATE_COMPLETE | 2016-06-09T10:01:25 |

So, we can see that we triggered an update on the nested stack, and it ran to completion in around 30 seconds (much less time than updating the entire overcloud).

Step 3 - Update of the stack with modificationsSo, those paying attention may have noticed that 30 seconds is too fast for puppet to run on all the controller nodes, and it is - the reason being that we did a no-op update, and so Heat detects that no inputs have changed, thus it doesn't cause puppet to re-run.

To work around this, and enable puppet to re-assert state on every overcloud update, we have an identifier in the nested stack that is normally updated to a value that changes every update (in includes a timestamp when updates are triggered via python-tripleoclient vs heatclient directly)

We can emulate this behavior in our patch update, and force puppet to re-run through all the deployment steps - lets first look at the NodeConfigIdentifers parameter value:


$ heat stack-show overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | grep NodeConfigIdentifiers
"NodeConfigIdentifiers": "{u'deployment_identifier': u'1465409217', u'controller_config': {u'0': u'os-apply-config deployment bb67a1d5-f0a5-48ec-9883-1f2ae578a8bd complet ed,Root CA cert injection not enabled.,TLS not enabled.,None,'}, u'allnodes_extra': u'none'}"

Here we can see various data, including a deployment_identifier, which is the timestamp-derived unique identifier normally passed via python-tripleoclient.

We could update just that field, but the content of this mapping isn't important, only that it changes (this data is not currently consumed by puppet on update, it's just used to trigger the SoftwareDeployment to re-apply the config due to an input value changing).

So we can create an environment file that looks like this (note this must use parameters, not parameter_defaults, so that it overrides the value passed from the parent stack) - any value can be used, but you must change it each update if you want the SoftwareDeployment resources to be re-applied to the nodes.

$ cat update_env.yaml
parameters:
  NodeConfigIdentifiers: 123






Then we can trigger another PATCH update including this data:

heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 -e update_env.yaml

This time I'm using the new openstack stack event list --follow approach to monitor progress (if you don't have this, you can repeat the marker event-list approach described above):


$ openstack stack event list --follow2016-06-09 08:52:46 [overcloud-ControllerNodesPostDeployment-smy5ygz2lc26]: UPDATE_IN_PROGRESS  Stack UPDATE started
2016-06-09 08:52:54 [ControllerPuppetConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:54 [ControllerArtifactsConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:56 [ControllerPuppetConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:56 [ControllerArtifactsConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:56 [ControllerArtifactsDeploy]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:58 [ControllerArtifactsDeploy]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:58 [ControllerLoadBalancerDeployment_Step1]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:53:32 [ControllerLoadBalancerDeployment_Step1]: UPDATE_COMPLETE  state changed
2016-06-09 08:53:32 [ControllerServicesBaseDeployment_Step2]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:54:00 [ControllerServicesBaseDeployment_Step2]: UPDATE_COMPLETE  state changed
2016-06-09 08:54:00 [ControllerOvercloudServicesDeployment_Step3]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:54:57 [ControllerOvercloudServicesDeployment_Step3]: UPDATE_COMPLETE  state changed
2016-06-09 08:54:57 [ControllerOvercloudServicesDeployment_Step4]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:56:14 [ControllerOvercloudServicesDeployment_Step4]: UPDATE_COMPLETE  state changed
2016-06-09 08:56:14 [ControllerOvercloudServicesDeployment_Step5]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:57:16 [ControllerOvercloudServicesDeployment_Step5]: UPDATE_COMPLETE  state changed
2016-06-09 08:57:16 [ExtraConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:57:17 [ExtraConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:57:26 [overcloud-ControllerNodesPostDeployment-smy5ygz2lc26]: UPDATE_COMPLETE  Stack UPDATE completed successfully
So, here we can see the update of the stack took a little longer (around 5 minutes in my environment), and if you were to check the os-collect-config logs on each controller node, you would see puppet re-applying on each node, fore every step defined in the template.

This approach can be extended if you want to e.g test changes to the stack template (or files it references such as puppet manifests or scripts), you would do something like:

$ cp -r /usr/share/openstack-tripleo-heat-templates .
$ cd openstack-tripleo-heat-templates/
$ heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 -e update_env.yaml -f puppet/controller-post.yaml

Note that if you want to do a final update of the entire overcloud, you would need to point to this copied tree (assuming you want to maintain any changes), e.g

$ openstack overcloud deploy --templates /path/to/copy/openstack-tripleo-heat-templates

Read Full Article
Visit website

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free year
Free Preview