Ever since we launched, Cognitive Class has hit many milestones. From name changes (raise your hand if you remember DB2 University) to our 1,000,000th learner, we’ve been through a lot.
But in this post, I will focus on the milestones and evolution of the technical side of things, specifically how we went from a static infrastructure to a dynamic and scalable deployment of dozens of Open edX instances using Docker.
Open edX 101
Open edX is the open source code behind edx.org. It is composed of several repositories, edx-platform being the main one. The official method of deploying an Open edX instance is by using the configuration repo which uses Ansible playbooks to automate the installation. This method requires access to a server where you run the Ansible playbook. Once everything is done you will have a brand new Open edX deployment at your disposal.
This is how we run cognitiveclass.ai, our public website, since we migrated from a Moodle deployment to Open edX in 2015. It has served us well, as we are able to serve hundreds of concurrent learners over 70 courses every day.
But this strategy didn’t come without its challenges:
Open edX mainly targets Amazon’s AWS services and we run our infrastructure on IBM Cloud.
Deploying a new instance requires creating a new virtual machine.
Open edX reads configurations from JSON files stored in the server, and each instance must keep these files synchronized.
While we were able to overcome these in a large single deployment, they would be much harder to manage for our new offering, the Cognitive Class Private Portals.
Cognitive Class for business
When presenting to other companies, we often hear the same question: “how can I make this content available to my employees?“. That was the main motivation behind our Private Portals offer.
A Private Portal represents a dedicated deployment created specifically for a client. From a technical perspective, this new offering would require us to spin up new deployments quickly and on-demand. Going back to the points highlighted earlier, numbers two and three are especially challenging as the number of deployments grows.
Creating and configuring a new VM for each deployment is a slow and costly process. And if a particular Portal outgrows its resources, we would have to find a way to scale it and manage its configuration across multiple VMs.
At the same time, we were experiencing a similar demand in our Virtual Labs infrastructure, where the use of hundreds of VMs was becoming unbearable. The team started to investigate and implement a solution based on Docker.
The main benefits of Docker for us were twofold:
Increase server usage density;
Isolate services processes and files from each other.
These benefits are deeply related: since each container manages its own runtime and files we are able to easily run different pieces of software on the same server without them interfering with each other. We do so with a much lower overhead compared to VMs since Docker provides a lightweight isolation between them.
By increasing usage density, we are able to run thousands of containers in a handful of larger servers that could pre-provisioned ahead of time instead of having to manage thousands of smaller instances.
For our Private Portals offering this means that a new deployment can be ready to be used in minutes. The underlying infrastructure is already in place so we just need to start some containers, which is a much faster process.
Herding containers with Rancher
Docker in and of itself is a fantastic technology but for a highly scalable distributed production environment, you need something on top of it to manage your containers’ lifecycle. Here at Cognitive Class, we decided to use Rancher for this, since it allows us to abstract our infrastructure and focus on the application itself.
In a nutshell, Rancher organizes containers into services and services are grouped into stacks. Stacks are deployed to environments, and environments have hosts, which are the underlying servers where containers are eventually started. Rancher takes care of creating a private network across all the hosts so they can communicate securely with each other.
Getting everything together
Our Portals are organized in a micro-services architecture and grouped together in Rancher as a stack. Open edX is the main component and itself broken into smaller services. On top of Open edX we have several other components that provide additional functionalities to our offering. Overall this is how things look like in Rancher:
There is a lot going on here, so let’s break it down and quickly explain each piece:
lms: this is where students access courses content
cms: used for authoring courses
forum: handles course discussions
nginx: serves static assets
rabbitmq: message queue system
glados: admin users interface to control and customize the Portal
companion-cube: API to expose extra functionalities of Open edX
compete: service to run data hackathons
learner-support: built-in learner ticket support system
lp-certs: issue certificates for students that complete multiple courses
cms-workers and lms-workers: execute background tasks for `lms` and `cms`
glados-worker: execute background tasks for `glados`
letsencrypt: automatically manages SSL certificates using Let’s Encrypt
load-balancer: routes traffic to services based on request hostname
mailer: proxy SMTP requests to an external server or sends emails itself otherwise
ops: group of containers used to run specific tasks
rancher-cron: starts containers following a cron-like schedule
The ops service behaves differently from the other ones, so let’s dig a bit deeper into it:
Here we can see that there are several containers inside ops and that they are usually not running. Some containers, like edxapp-migrations, run when the Portal is deployed but are not expected to be started again unless in special circumstances (such as if the database schema changes). Other containers, like backup, are started by rancher-cron periodically and stop once they are done.
In both cases, we can trigger a manual start by clicking the play button. This provides us the ability to easily run important operational tasks on-demand without having to worry about SSH into specific servers and figuring out which script to run.
One key aspect of Docker is that the file system is isolated per container. This means that, without proper care, you might lose important files if a container dies. The way to handle this situation is to use Docker volumes to mount local file system paths into the containers.
Moreover, when you have multiple hosts, it is best to have a shared data layer to avoid creating implicit scheduling dependencies between containers and servers. In other words, you want your containers to have access to the same files no matter which host they are running on.
Each Portal has its own directory in the NFS drive and the containers mount the directory of that specific Portal. So it’s impossible for one Portal to access the files of another one.
One of the most important file is the ansible_overrides.yml. As we mentioned at the beginning of this post, Open edX is configured using JSON files that are read when the process starts. The Ansible playbook generates these JSON files when executed.
To propagate changes made by Portal admins on glados to the lms and cms of Open edX we mount ansible_overrides.yml into the containers. When something changes, glados can write the new values into this file and lms and cms can read them.
We then restart the lms and cms containers which are set to run the Ansible playbook and re-generate the JSON files on start up. ansible_overrides.yml is passed as a variables file to Ansible so that any values declared in there will override the Open edX defaults.
By having this shared data layer, we don’t have to worry about containers being rescheduled to another host since we are sure Docker will be able to find the proper path and mount the required volumes into the containers.
By building on top of the lessons we learned as our platform evolved and by using the latest technologies available, we were able to build a fast, reliable and scalable solution to provide our students and clients a great learning experience.
We covered a lot on this post and I hope you were able to learn something new today. If you are interested in learning more about our Private Portals offering fill out our application form and we will contact you.
Every data scientist I know spends a lot of time handling data that originates in CSV files. You can quickly end up with a mess of CSV files located in your Documents, Downloads, Desktop, and other random folders on your hard drive.
I greatly simplified my workflow the moment I started organizing all my CSV files in my Cloud account. Now I always know where my files are and I can read them directly from the Cloud using JupyterLab (the new Jupyter UI) or my Python scripts.
This article will teach you how to read your CSV files hosted on the Cloud in Python as well as how to write files to that same Cloud account.
I’ll use IBM Cloud Object Storage, an affordable, reliable, and secure Cloud storage solution. (Since I work at IBM, I’ll also let you in on a secret of how to get 10 Terabytes for a whole year, entirely for free.) This article will help you get started with IBM Cloud Object Storage and make the most of this offer. It is composed of three parts:
How to use IBM Cloud Object Storage to store your files;
Reading CSV files in Python from Object Storage;
Writing CSV files to Object Storage (also in Python of course).
The “Storage” part of object storage is pretty straightforward, but what exactly is an object and why would you want to store one? An object is basically any conceivable data. It could be a text file, a song, or a picture. For the purposes of this tutorial, our objects will all be CSV files.
Unlike a typical filesystem (like the one used by the device you’re reading this article on) where files are grouped in hierarchies of directories/folders, object storage has a flat structure. All objects are stored in groups called buckets. This structure allows for better performance, massive scalability, and cost-effectiveness.
By the end of this article, you will know how to store your files on IBM Cloud Object Storage and easily access them using Python.
Provisioning an Object Storage Instance on IBM Cloud
Visit the IBM Cloud Catalog and search for “object storage”. Click the Object Storage option that pops up. Here you’ll be able to choose your pricing plan. Feel free to use the Lite plan, which is free and allows you to store up to 25 GB per month.
Sign up (it’s free) or log in with your IBM Cloud account, and then click the Create button to provision your Object Storage instance. You can customize the Service Name if you wish, or just leave it as the default. You can also leave the resource group to the default. Resource groups are useful to organize your resources on IBM Cloud, particularly when you have many of them running.
Working with Buckets
Since you just created the instance, you’ll now be presented with options to create a bucket. You can always find your Object Storage instance by selecting it from your IBM Cloud Dashboard.
There’s a limit of 100 buckets per Object Storage instance, but each bucket can hold billions of objects. In practice, how many buckets you need will be dictated by your availability and resilience needs.
For the purposes of this tutorial, a single bucket will do just fine.
Creating your First Bucket
Click the Create Bucket button and you’ll be shown a window like the one below, where you can customize some details of your Bucket. All these options may seem overwhelming at the moment, but don’t worry, we’ll explain them in a moment. They are part of what makes this service so customizable, should you have the need later on.
If you don’t care about the nuances of bucket configuration, you can type in any unique name you like and press the Create button, leaving all other options to their defaults. You can then skip to thePutting Objects in Buckets section below. If you would like to learn about what these options mean, read on.
Configuring your bucket
Your data is stored across three geographic regions within your selected location
High availability and very high durability
Your data is stored across three different data centers within a single geographic region
High availability and durability, very low latency for regional users
Single Data Center
Your data is stored across multiple devices within a single data center
Storage Class Options
Frequency of Data Access
IBM Cloud Object Storage Class
Weekly or monthly
Less than once a month
Feel free to experiment with different configurations, but I recommend choosing “Standard” for your storage class for this tutorial’s purposes. Any resilience option will do.
Putting Objects in Buckets
After you’ve created your bucket, store the name of the bucket into the Python variable below (replace cc-tutorial with the name of your bucket) either in your Jupyter notebook or a Python script.
There are many ways to add objects to your bucket, but we’ll start with something simple. Add a CSV file of your choice to your newly created bucket, either by clicking the Add objects button, or dragging and dropping your CSV file into the IBM Cloud window.
Whatever CSV file you decide to add to your bucket, assign the name of the file to the variable filename below so that you can easily refer to it later.
We’ve placed our first object in our first bucket, now let’s see how we can access it. To access your IBM Cloud Object Storage instance from anywhere other than the web interface, you will need to create credentials. Click the New credential button under the Service credentials section to get started.
In the next window, you can leave all fields as their defaults and click the Add button to continue. You’ll now be able to click on View credentials to obtain the JSON object containing the credentials you just created. You’ll want to store everything you see in a credentials variable like the one below (obviously, replace the placeholder values with your own).
Note: If you’re following along within a notebook be careful not to share this notebook after adding your credentials!
Reading CSV files from Object Storage using Python
The recommended way to access IBM Cloud Object Storage with Python is to use the ibm_boto3 library, which we’ll import below.
The primary way to interact with IBM Cloud Object Storage through ibm_boto3 is by using an ibm_boto3.resource object. This resource-based interface abstracts away the low-level REST interface between you and your Object Storage instance.
Run the cell below to create a resource Python object using the IBM Cloud Object Storage credentials you filled in above.
After creating a resource object, we can easily access any of our Cloud objects by specifying a bucket name and a key (in our case the key is a filename) to our resource.Object method and calling the get method on the result. In order to get the object into a useful format, we’ll do some processing to turn it into a pandas dataframe.
We’ll make this into a function so we can easily use it later:
Adding files to IBM Cloud Object Storage with Python
IBM Cloud Object Storage’s web interface makes it easy to add new objects to your buckets, but at some point you will probably want to handle creating objects through Python programmatically. The put_object method allows you to do this.
In order to use it you will need:
The name of the bucket you want to add the object to;
You can now easily access your newly created object using the function we defined above in the Reading from Object Storage using Python section.
Get 10 Terabytes of IBM Cloud Object Storage for free
You now know how to read from and write to IBM Cloud Object Storage using Python! Well done. The ability to pragmatically read and write files to the Cloud will be quite handy when working from scripts and Jupyter notebooks.
If you build applications or do data science, we also have a great offer for you. You can apply to become an IBM Partner at no cost to you and receive 10 Terabytes of space to play and build applications with.
You can sign up by simply filling the embedded form below. If you are unable to fill the form, you can click here to open the form in a new window.
Just make sure that you apply with a business email (even your own domain name if you are a freelancer) as free email accounts like Gmail, Hotmail, and Yahoo are automatically rejected.
If you’re reading this then you’ve most likely heard all the buzz around chatbots. In fact, you may have come up with a few scenarios where it would be really helpful for you to use one.
Most people consider chatbots to be in the realm of what only programmers can create, out of reach of business users who would otherwise have a need for them.
Thankfully, IBM provides the Watson Conversation service on their IBM Cloud platform which, combined with our WordPress plugin, solves that.
The plugin provides you with an easy way to deploy chatbots you create with IBM Watson Conversation to WordPress sites. In fact, you may have noticed a floating chatbot icon at the bottom of this page. Click on it to see the plugin in action.
What is Watson Conversation?
Watson Conversation is IBM’s chatbot service. Its intuitive interface allows chatbot creators to build their chatbot and have it ready to deploy in short time. You can sign up for a free IBM Cloud Lite account to get started.
Building your chatbot won’t be covered in this article but we have a great Chatbot course that guides you through this process and doesn’t require any coding expertise.
How do I add a chatbot to my website?
This is where the Watson Conversation WordPress plugin saves you time and money. If you have a website built using WordPress, deploying your chatbot to your website takes about 5 minutes and no code at all (as opposed to having to build your own application just to deploy a chatbot on the web.)
You can install it like any other WordPress plugin from your Admin page, that is, the first page you see after signing in.
Just search for Watson Conversation in the “Add New” section of the Plugins page and click “Install Now”.
Now you can find a page for “Watson” in your Settings. This is where you’ll find all the settings and customization to do with the plugin. When you first open it, you’ll see several tabs along the top.
For now, the only one you have to worry about is “Main Setup”.
You can find the credentials for the three required fields on the Deploy page of your Watson Conversation workspace.
Now just click save changes and you’re done. Browse your website and see your chatbot in action!
If you’re not quite satisfied with the appearance, you can customize this in the “Appearance” tab of the settings page.
You can also choose which pages to display the chat box on from the “Behaviour” tab. However, that’s not all you can do.
If you want to make the options clear to the user, you can create predefined responses to the chatbot messages for the users to select. The VOIP feature can connect users to your phone line over the internet from directly within the plugin.
In this brief article, we focused on how to deploy Watson Conversation chatbots to WordPress. Stay tuned for future articles on how to customize and use these exciting advanced features!
Dealing with multiple dimensions is difficult, this can be compounded when working with data. This blog post acts as a guide to help you understand the relationship between different dimensions, Python lists, and Numpy arrays as well as some hints and tricks to interpret data in multiple dimensions. We provide an overview of Python lists and Numpy arrays, clarify some of the terminologies and give some helpful analogies when dealing with higher dimensional data.
Before you create a Deep Neural network in TensorFlow, Build a regression model, Predict the price of a car or visualize terabytes of data you’re going to have to learn Python and deal with multidimensional data. So this blog post is expanded from our introductory course on Python for Data Science and help you deal with nesting lists in python and give you some ideas about numpy arrays.
Nesting involves placing one or multiple Python lists into another Python list, you can apply it to other data structures in Python, but we will just stick to lists. Nesting is a useful feature in Python, but sometimes the indexing conventions can get a little confusing so let’s clarify the process expanding from our courses on Applied Data Science with Python We will review concepts of nesting lists to create 1, 2, 3 and 4-dimensional lists, then we will convert them to numpy arrays.
Lists and 1-D Numpy Arrays
Lists are a useful datatype in Python; lists can be written as comma separated values. You can change the size of a Python list after you create it and lists can contain an integer, string, float, Python function and Much more. Indexing for a one-dimensional (1-D) list in Python is straightforward; each index corresponds to an individual element of the Python list. Python’s list convention is shown in figure 1 where each item is accessed using the name of the list followed by a square Bracket. For example, the first index is obtained by A:”0″; the means that the zeroth element of the List contains the string 0. Similarly, the value of A is an integer 4. For the rest of this blog, we are going to stick with integer values and lists of uniform size as you may see in many data science applications.
Figure 1: Indexing Conventions for a list “A”
Lists are useful but for numerical operations such as the ones you will use in data science, Python has many useful libraries one of the most commonly used is numpy.
From Lists to 1-D Numpy Arrays
Numpy is a fast Python library for performing mathematical operations. The numpy class is the “ndarray” is key to this framework; we will refer to objects from this class as a numpy array. Some key differences between lists include, numpy arrays are of fixed sizes, they are homogenous I,e you can only contain, floats or strings, you can easily convert a list to a numpy array, For example, if you would like to perform vector operations you can cast a list to a numpy array. In example 1 we import numpy then cast the two list to numpy arrays:
import nunpy as np
Example 1: casting list [1,0] and [0,1] to a numpy array u and v.
If you check the type of u or v (type(v) ) you will get a “numpy.ndarray”. Although u and v points in a 2 D space there dimension is one, you can verify this using the data attribute “ndim”. For example, v.ndim will output a one. In numpy dimension or axis are better understood in the context of nesting, this will be discussed in the next section. It should be noted the sometimes the data attribute shape is referred to as the dimension of the numpy array.
The numpy array has many useful properties for example vector addition, we can add the two arrays as follows:
Example 2: add numpy arrays u and v to form a new numpy array z.
Where the term “z:array([1,1])” means the variable z contains an array. The actual vector operation is shown in figure 2, where each component of the vector has a different color.
Figure 2: Example of vector addition
Numpy arrays also follow similar conventions for vector scalar multiplication, for example, if you multiply a numpy array by an integer or float:
Example 3.1: multiplying numpy arrays y by a scaler 2.
The equivalent vector operation is shown in figure 3:
Figure 3: Vector addition is shown in code segment 2
Like list you can access the elements accordingly, for example, you can access the first element of the numpy array as follows u:1. Many of the operations of numpy arrays are different from vectors, for example in numpy multiplication does not correspond to dot product or matrix multiplication but element-wise multiplication like Hadamard product, we can multiply two numpy arrays as follows:
Figure 4: multiplication of two numpy arrays expressed as a Hadamard product.
Nesting lists and two 2-D numpy arrays
Nesting two lists are where things get interesting, and a little confusing; this 2-D representation is important as tables in databases, Matrices, and grayscale images follow this convention. When each of the nested lists is the same size, we can view it as a 2-D rectangular table as shown in figure 5. The Python list “A” has three lists nested within it, each Python list is represented as a different color. Each list is a different row in the rectangular table, and each column represents a separate element in the list. In this case, we set the elements of the list corresponding to row and column numbers respectively.
Figure 5: List “A” two Nested lists represented as a table
In Python to access a list with a second nested list, we use two brackets, the first bracket corresponds to the row number and the second index corresponds to the column. This indexing convention to access each element of the list is shown in figure 6, the top part of the figure corresponds to the nested list, and the bottom part corresponds to the rectangular representation.
Figure 6: Index conventions for list “A” also represented as a table
Let’s see some examples in figure 4, Example 1 shows the syntax to access element A, example 2 shows the syntax to access element A and example 3 shows how to access element A.
Figure 7: Example of indexing elements of a list.
We can also view the nesting as a tree as we did in Python for Data Science as shown in figure 5 The first index corresponds to a first level of the tree, the second index corresponds to the second level.
Figure 8: An example of matrix addition
2-D numpy arrays
Turns out we can cast two nested lists into a 2-D array, with the same index conventions. For example, we can convert the following nested list into a 2-D array:
V=np.array([[1, 0, 0],[0,1, 0],[0,0,1]])
Example 4: creating a 2-D array or array with two access
The convention for indexing is the exact same, we can represent the array using the table form like in figure 5. In numpy the dimension of this array is 2, this may be confusing as each column contains linearly independent vectors. In numpy, the dimension can be seen as the number of nested lists. The 2-D arrays share similar properties to matrices like scaler multiplication and addition. For example, adding two 2-D numpy arrays corresponds to matrix addition.
Example 5.2: the result of multiplying numpy arrays
Or Hadamard product:
Figure 10: An example of Hadamar product.
To perform standard matrix multiplication you world use np.dot(X,Y). In the next section, we will review some strategies to help you navigate your way through arrays in higher dimensions.
Nesting List within a List within a List and 3-D Numpy Arrays
We can nest three lists, each of these lists intern have nested lists that have there own nested lists as shown in figure 11. List “A” contains three nested lists, each color-coded. You can access the first, second and third list using A, A and A respectively. Each of these lists contains a list of three nested lists. We can represent these nested lists as a rectangular table as shown in figure 11. The indexing conventions apply to these lists as well we just add a third bracket, this is also demonstrated in the bottom of figure 6 where the three rectangular tables contain the syntax to access the values shown in the table above.
Figure 11: List with three nested, each nested list has three nested lists.
Figure 12 shows an example to access elements at index A which contains a value of 132. The first index A contains a list that contains three lists, which can be represented as a rectangular table. We use the second index i.e A to access the last list contained in A. In the table representation, this corresponds to the last row of the table. The list A corresponds to the list [131,132,133]. As we are interested in accessing the second element we simply append the index ; Therefore the final result is A.
Figure 12: Visualization of obtaining A
A helpful analogy is if you think of finding a room in an apartment building on the street as shown in Figure 13. The first index of the list represents the address on the road, in Figure 8 this is shown as depth. The second index of the list represents the floor where the room is situated, depicted by the vertical direction in Figure 13. To keep consistent with our table representation the lower levels have a larger index. Finally, the last index of the list corresponds to the room number on a particular floor, represented by the horizontal arrow.
Figure 13: Street analogy for list indexing
For example, in figure 9 the element in the list A: corresponds to building 2 on the first floor the room is in the middle, the actual element is 332.
Figure 14: Example of List indexing Street analogy for list indexing
3D Numpy Arrays
The mathematical operations for 3D numpy arrays follow similar conventions i.e element-wise addition and multiplication as shown in figure 15 and figure 16. In the figures, X, Y first index or dimension corresponds an element in the square brackets but instead of a number, we have a rectangular array. When the add or multiply X and Y together each element is added or multiplied together independently. More precisely each 2D arrays represented as tables is X are added or multiplied with the corresponding arrays Y as shown on the left; within those arrays, the same conventions of 2D numpy addition is followed.
Figure 15: Add two 3D numpy arrays X and Y.
Figure 16: Multiplying two 3D numpy arrays X and Y.
Beyond 3D Lists
Adding another layer of nesting gets a little confusing, you cant really visualize it as it can be seen as a 4-dimensional problem but let’s try to wrap our heads around it. Examining, figure 17 we see list “A” has three lists, each list contains two lists, which intern contain two lists nested in them. Let’s go through the process of accessing the element that contains 3122. The third element A contains 2 lists; this list contains two lists in figure 10 we use the depth to distinguish them. We can access the second list using the second index as follows A. This can be viewed as a table, from this point we follow the table conventions for the previous example as illustrated in figure 17.
Figure 17: Example of an element in a list, with a list, within a list nested in list “A”
We can also use the apartment analogy as shown in figure 18 this time the new list index will be represented by the street name of 1st street and 2nd street. As before the second list index represents the address, the third list index represents the floor number and the fourth index represents the apartment number. The analogy is summarized in Figure 11. For example directions to element A would be 2nd Street , Building 1, Floor 0 room 0.
Figure 18: Street analogy for figure 11
We see that you can store multiple dimensions of data as a Python list. Similarly, a Numpy array is a more widely used method to store and process data. In both cases, you can access each element of the list using square brackets. Although Numpy arrays behave like vectors and matrices, there are some subtle differences in many of the operations and terminology. Finally, when navigating your way through higher dimensions it’s helpful to use analogies.