Bill Venables, member of the R Foundation, co-author of the Introduction to R manual, R package developer, and one of the smartest and nicest (a rare combo!) people you will ever meet, received some... ...
Ever since we launched, Cognitive Class has hit many milestones. From name changes (raise your hand if you remember DB2 University) to our 1,000,000th learner, we’ve been through a lot.
But in this post, I will focus on the milestones and evolution of the technical side of things, specifically how we went from a static infrastructure to a dynamic and scalable deployment of dozens of Open edX instances using Docker.
Open edX 101
Open edX is the open source code behind edx.org. It is composed of several repositories, edx-platform being the main one. The official method of deploying an Open edX instance is by using the configuration repo which uses Ansible playbooks to automate the installation. This method requires access to a server where you run the Ansible playbook. Once everything is done you will have a brand new Open edX deployment at your disposal.
This is how we run cognitiveclass.ai, our public website, since we migrated from a Moodle deployment to Open edX in 2015. It has served us well, as we are able to serve hundreds of concurrent learners over 70 courses every day.
But this strategy didn’t come without its challenges:
Open edX mainly targets Amazon’s AWS services and we run our infrastructure on IBM Cloud.
Deploying a new instance requires creating a new virtual machine.
Open edX reads configurations from JSON files stored in the server, and each instance must keep these files synchronized.
While we were able to overcome these in a large single deployment, they would be much harder to manage for our new offering, the Cognitive Class Private Portals.
Cognitive Class for business
When presenting to other companies, we often hear the same question: “how can I make this content available to my employees?“. That was the main motivation behind our Private Portals offer.
A Private Portal represents a dedicated deployment created specifically for a client. From a technical perspective, this new offering would require us to spin up new deployments quickly and on-demand. Going back to the points highlighted earlier, numbers two and three are especially challenging as the number of deployments grows.
Creating and configuring a new VM for each deployment is a slow and costly process. And if a particular Portal outgrows its resources, we would have to find a way to scale it and manage its configuration across multiple VMs.
At the same time, we were experiencing a similar demand in our Virtual Labs infrastructure, where the use of hundreds of VMs was becoming unbearable. The team started to investigate and implement a solution based on Docker.
The main benefits of Docker for us were twofold:
Increase server usage density;
Isolate services processes and files from each other.
These benefits are deeply related: since each container manages its own runtime and files we are able to easily run different pieces of software on the same server without them interfering with each other. We do so with a much lower overhead compared to VMs since Docker provides a lightweight isolation between them.
By increasing usage density, we are able to run thousands of containers in a handful of larger servers that could pre-provisioned ahead of time instead of having to manage thousands of smaller instances.
For our Private Portals offering this means that a new deployment can be ready to be used in minutes. The underlying infrastructure is already in place so we just need to start some containers, which is a much faster process.
Herding containers with Rancher
Docker in and of itself is a fantastic technology but for a highly scalable distributed production environment, you need something on top of it to manage your containers’ lifecycle. Here at Cognitive Class, we decided to use Rancher for this, since it allows us to abstract our infrastructure and focus on the application itself.
In a nutshell, Rancher organizes containers into services and services are grouped into stacks. Stacks are deployed to environments, and environments have hosts, which are the underlying servers where containers are eventually started. Rancher takes care of creating a private network across all the hosts so they can communicate securely with each other.
Getting everything together
Our Portals are organized in a micro-services architecture and grouped together in Rancher as a stack. Open edX is the main component and itself broken into smaller services. On top of Open edX we have several other components that provide additional functionalities to our offering. Overall this is how things look like in Rancher:
There is a lot going on here, so let’s break it down and quickly explain each piece:
lms: this is where students access courses content
cms: used for authoring courses
forum: handles course discussions
nginx: serves static assets
rabbitmq: message queue system
glados: admin users interface to control and customize the Portal
companion-cube: API to expose extra functionalities of Open edX
compete: service to run data hackathons
learner-support: built-in learner ticket support system
lp-certs: issue certificates for students that complete multiple courses
cms-workers and lms-workers: execute background tasks for `lms` and `cms`
glados-worker: execute background tasks for `glados`
letsencrypt: automatically manages SSL certificates using Let’s Encrypt
load-balancer: routes traffic to services based on request hostname
mailer: proxy SMTP requests to an external server or sends emails itself otherwise
ops: group of containers used to run specific tasks
rancher-cron: starts containers following a cron-like schedule
The ops service behaves differently from the other ones, so let’s dig a bit deeper into it:
Here we can see that there are several containers inside ops and that they are usually not running. Some containers, like edxapp-migrations, run when the Portal is deployed but are not expected to be started again unless in special circumstances (such as if the database schema changes). Other containers, like backup, are started by rancher-cron periodically and stop once they are done.
In both cases, we can trigger a manual start by clicking the play button. This provides us the ability to easily run important operational tasks on-demand without having to worry about SSH into specific servers and figuring out which script to run.
One key aspect of Docker is that the file system is isolated per container. This means that, without proper care, you might lose important files if a container dies. The way to handle this situation is to use Docker volumes to mount local file system paths into the containers.
Moreover, when you have multiple hosts, it is best to have a shared data layer to avoid creating implicit scheduling dependencies between containers and servers. In other words, you want your containers to have access to the same files no matter which host they are running on.
Each Portal has its own directory in the NFS drive and the containers mount the directory of that specific Portal. So it’s impossible for one Portal to access the files of another one.
One of the most important file is the ansible_overrides.yml. As we mentioned at the beginning of this post, Open edX is configured using JSON files that are read when the process starts. The Ansible playbook generates these JSON files when executed.
To propagate changes made by Portal admins on glados to the lms and cms of Open edX we mount ansible_overrides.yml into the containers. When something changes, glados can write the new values into this file and lms and cms can read them.
We then restart the lms and cms containers which are set to run the Ansible playbook and re-generate the JSON files on start up. ansible_overrides.yml is passed as a variables file to Ansible so that any values declared in there will override the Open edX defaults.
By having this shared data layer, we don’t have to worry about containers being rescheduled to another host since we are sure Docker will be able to find the proper path and mount the required volumes into the containers.
By building on top of the lessons we learned as our platform evolved and by using the latest technologies available, we were able to build a fast, reliable and scalable solution to provide our students and clients a great learning experience.
We covered a lot on this post and I hope you were able to learn something new today. If you are interested in learning more about our Private Portals offering fill out our application form and we will contact you.
he fourteenth annual worldwide R user conference, useR!2018, was held last week in Brisbane, Australia and it was an outstanding success. The conference attracted around 600 users from around the... ...
Do you remember these recent stories? On July 31, 2012 Dropbox admitted it had been hacked. (Information Week, 8/1/2012). Hackers had gained access to an employee’s account and from there were able to access LIVE usernames and passwords which could allow them to gain access to huge amounts of personal and corporate data. Just four days later, Wired® writer Mat Honan’s Twitter account was hacked via his Apple and Amazon accounts (story in Wired and also reported by CBS, CNN, NPR and others).
Did you notice the common theme behind these reports? Hackers didn’t get through the defenses of the Cloud by brute force. Instead, they searched out weak points and exploited other vulnerabilities led to by those entry points. In these examples – as in countless others – the weak points were processes and people.
The Dropbox hack was made possible by an employee using the same password to access multiple corporate resources, one of which happened to be a project site which contained a “test” file of real unencrypted usernames and passwords. Either one could be considered a lapse in judgment – I mean, who thinks it is a good idea to store unencrypted user access information on a project site??? – but added together, these lapses made a result much more dangerous than the sum of their parts.
Mat Honan’s hack was made possible in part by process flaws at large and popular companies. Again, each chink taken individually would likely not have been as damaging as the series of flaws building on each other. Apple or Amazon individually didn’t provide enough information for hackers to take over Mr. Honan’s account, but taken together their processes and individual snippets of data provided the opportunity.
My purpose in writing this isn’t to scare anyone away from the Cloud or its legitimate providers. The Cloud is cost-effective, portable, scalable, stable, and here to stay. And it is as secure as technology will allow. But as these stories illustrate, technology isn’t the risk. Information wasn’t compromised by brute-force hacking or breaking encryption algorithms. Data was put at risk by people and processes.
Have you ever worked with someone who messed up something royally by not following a documented process? Or do you know someone who clicked a link in a bogus email and infected their laptop – or even the whole company – with a virus? They might be working for your Cloud provider now. Don’t rely on those folks to protect your data in the Cloud. Instead, protect it yourself with Backups, Password Safety and Data Encryption before entrusting your precious data to the Cloud. If a hacker gets into your Cloud, at least you won’t be the easiest target.
If you have ever shopped at Amazon you may have noticed a “Featured Recommendations” section that appears after your initial visit. These recommendations get automatically updated after the system notices a change in the shopping pattern of a particular member. This is real-time analytics at work. The system is using the data at hand and coming up with suggestions in near real-time. With more companies investing into a mobile business intelligence initiative, real-time analytics is an essential requirement to ensure a good return on investment.
I think that the implementation of a solution to get real-time analytics could be a costly endeavor. This would require implementation of technologies like Master Data Management and delivery options like cloud and/or mobile BI. Cloud BI presents its own set of security concerns, which is why some of the region’s largest companies are hesitant to implement such a solution. According to one BI manager, the company’s executives do not support the notion of putting their data into the cloud without the implementation of certain security measures. Their need for a mobile BI strategy would require security that would enable the company to delete everything from a device if it is stolen or misplaced.
Insurance companies and retail stores can greatly benefit from such technology. The off-site sales reps will be able to see current information about potential customers including updated life changing events right on their mobile devices, which would increase the likelihood of either gaining a new customer or retaining an existing one*. In-store managers at grocery stores can get a real-time report about slow moving items allowing them to increase sales by changing displays. Real-time analytics can be on-demand where the system responds to a certain request by an insurance sales rep or it can be a continuous hourly report to the store manager of a grocery store**.
Overall, real-time analytics gives a company a competitive advantage over its rivals but requires heavy investment into the implementation of the technology and the guarantee of proper security measures being put in place with delivery options like the cloud. This information is helpful for quick decisions, but companies should still make all major decisions by looking at historical data and studying the trends.
*Pat Saporito, “Bring your Best”, Best’s Review, September 2011
**Jen Cohen Crompton, “Real-Time Data Analytics: On Demand and Continuous”.
Typically, strategic goals start off as high-level initiatives that involve revenue-based targets. Revenue targets are followed up with operational efficiency goals (or ratios) that keep expenses in line and improve profit margins. These goals and ratios serve as the ultimate yardstick in measuring top-end strategic performance. There may also be competitive goals that utilize different measures such as market share, product perception, etc. Companies believe they can achieve these results based on internal and external competitive factors. It is important to note that the internal and external factors typically drive the timing and define the tactical activities that will be employed to achieve results.
For example, a change in government regulation may present a significant opportunity for the company that is first to capitalize on the change. An example of an internal factor may be outstanding customer service that can serve as a market differentiator to attract and retain customers.
These competitive factors and performance measures drive the definition of the tactical operations (or plan) needed to achieve strategic goals. Tactical operations are ultimately boiled down to human activities and assigned to managers and their employees. Human activities impact revenue, profit, and quality. Even quality activities ultimately impact revenue and profit.
Example, an insurance company may excel at gathering high quality claims data that results in lower claim expenses and legal costs.
Human activities are incorporated into an individual’s performance plan. Before defining the human activities though, the goals, competitive factors, and tactical operations need to be gathered into a data repository. Once gathered, they will be used to gain and communicate corporate alignment.
Depending on your role in the organization, you may be called upon to help define and capture the financial performance ratios. You may also be responsible for gathering and storing external factors such as survey results, industry statistics, etc.
If all goes well, the corporation captures the revenue and performance goals and defines how performance is to be measured. This is also communicated across the enterprise (gaining alignment). The performance goals and target financial ratios can be stored in the corporation’s data repository. The measuring and communicating of progress will be accomplished using a company’s reporting toolset. The company has to decide the best frequency to communicate actual performance compared to stated goals. This frequency can be daily, weekly, monthly, or quarterly with the emphasis on providing continual feedback. Reporting on performance results is the first, and most basic, step in the adoption of BI practices. Performance reporting answers the question “What happened?” (Davenport & Harris, 2007). It is very important but only the first step.
Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics The New Science of Winning. In T. H. Davenport, & J. G.
Harris, Competing on Analytics The New Science of Winning (p. 8). Boston: Harvard Business School Press.
Wikipedia defines Forecasting as the process of making statements about events whose actual outcomes (typically) have not yet been observed.
Examples of forecasting would be predicting weather events, forecasting sales for a particular time period or predicting the outcome of a sporting event before it is played.
Wikipedia defines Predictive Analytics as an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.
Examples of predictive analytics would be determining customer behavior, identifying patients that are at risk for certain medical conditions or identifying fraudulent behavior.
Based on these definitions, forecasting and predictive analytics seem to be very similar…but are they? Let’s break it down.
Both forecasting and predictive analytics are concerned with predicting something in the future, something that has not yet happened. However, forecasting is more concerned with predicting future events whereas predictive analytics is concerned with predicting future trends and/or behaviors.
So, from a business perspective, forecasting would be used to determine how much of a material to buy and keep on stock based on projected sales numbers. Predictive analytics would be used to determine customer behavior like what and when are they likely to buy, how much do they spend when they do buy, and when they buy one product what else do they buy (also known as basket analysis).
Predictive analytics can be used to drive sales promotions targeting certain customers based on the information we know about their buying behavior. Likewise, the information obtained from predictive analytics can be used to influence sales projections and forecasting models.
Both, predictive analytics and forecasting, use data to achieve their purposes. But, it’s how they use that data that is much different.
In forecasting, data is used to look at past performance to determine future outcomes. For instance, how much did we sell last month or how much did we sell last year at this time of year. In predictive analytics, we are looking for new trends, things that are occurring now and in the future that will affect our future business. It is more forward looking and proactive.
So, although forecasting and predictive analytics are similar and closely related to one another, they are two distinctively different concepts. In order to be successful at either one, you have to have the right resources and tools in place to be able to extract, transform and present the data in a timely manner and in a meaningful way.
A common problem in business today is people spend much more time preparing and presenting information than they do actually determining what the data is telling them about their business. This is because they don’t have the right resources and tools in place.
At Making Data Meaningful we have the resources, strategies and tools to help businesses access, manage, transform and present their data in a meaningful way. If you would like to learn more about how we can help your business, visit our website or contact us today.
Are you looking for an analytics tool that is simple enough to get up and running fast and has the capability to keep up with your company as its Business Intelligence requirements mature? If so, you will want to check out the new offerings by MicroStrategy.
Industry Leading Analytics – Enterprise Capable
MicroStrategy has long been known for its large scale enterprise reporting and analytics solutions. They have also been the leaders in offering analytics on the mobile platform.
MicroStrategy’s best-in-class capabilities have traditionally been expensive to purchase and require expert technical assistance to implement and maintain. Large organizations are able to realize economies of scale but small and medium sized companies may find it difficult to justify an initial large investment in software, resources, and infrastructure.
To overcome an initial software investment, MicroStrategy does offer a free enterprise version of its analytics suite for up to 10 users, called Analytics Suite. This 10-user license provides the opportunity to try-before-you-buy before rolling out to the larger enterprise. This product can be hosted on-premise or on MicroStrategy’s cloud.
Companies still have to develop internal resources to handle security, data architecture, metadata, and reporting requirements.
Competition from the Edges
In recent years companies like MicroStrategy, Cognos, SAP, and Oracle have lost ground to smaller, more agile, startups like Tableau and Qlikview. The newer companies have made it faster (and easier) to get up and running with appealing visualizations.
These smaller companies are now trying to make their products scalable with respect to handling complex security, data architecture, and metadata requirements that are part of all mid- to large-sized implementations.
MicroStrategy has responded to the competition by offering two smaller-scale solutions that can be implemented in a matter of weeks: Analytics Desktop and Analytics Express.
As its name implies, Analytics Desktop is installed on an individual’s computer. This product can attach to a variety of relational, columnar, and map reduce databases. It can also attach to Excel worksheets and SalesForce data. Analytics Desktop is designed for individual data discovery, possesses some advanced analytics capabilities, and can publish personal interactive dashboards. Data sharing is limited to distribution via PDF, Spreadsheet export, image files, and distributable flash files. Best of all, Analytics Desktop is free and offers free online training.
Analytics Express has all of the features of Desktop except that it is hosted completely in MicroStrategy’s cloud environment. There are no internal MicroStrategy hardware requirements. A secure VPN connection between MicroStrategy’s cloud and your company’s firewall can be configured to protect your data. The cloud-based analytics solution can import data from your organization’s back-end databases and refresh the data on a regularly scheduled basis. Importing the data provides the benefit of much improved analytics performance and data availability.
It’s a Mobile World
Additional visualization options are available plus the ability to deploy solutions tailored for the iPad. Access to Drop Box and Google Drive are also available.
Security features include user authentication, single-sign-on, user and user group management, row-level security, dashboard level security, and user-based data filtering.
Dashboards can be embedded in other web pages or on intranet sites. Visualizations can also be scheduled for email distribution.
Deploy Before You Buy
Organizational risk is minimized because MicroStrategy offers a free one-year trial of Analytics Express. With all architecture hosted in the cloud, your organization won’t have to belly up any hardware or technical resources to support this product either.
Agile and Scalable
Both the Desktop and Express editions benefited from an improved web-based user interface designed to make the creation of dashboards easier. MicroStrategy also leveraged its extensive portfolio of Enterprise-Level features by making them available in the hosted solution. This ensures that MicroStrategy can meet the ever-evolving Business Intelligence and Analytics needs of your organization.
When I first heard the expression “Death by Meeting”, I thought it was the latest Stephen King novel, but after being the project manager of a project where I was expected to be involved in 20 meetings per week, dying seemed like a welcome alternative. You can avoid this slow, painful death by creating a project structure that focuses efforts and communications and reduces meetings.
In addition to the typical project management issues associated with the multitude of tasks required for large projects, there is a significant challenge in creating an efficient, effective project structure that drives the project effort to the correct worker-bee level and enables good project status communications, but streamlines the number of meetings required to achieve these goals. One approach that has worked for me is the use of Project Workgroups.
Most large projects consist of numerous tasks that can usually be grouped together in some manner. These groupings may be by departmental function (Finance, IT, Purchasing, etc.) by activity (sales, development, implementation, training, etc.), by deliverable (software release, management reporting, etc.), or perhaps some other logical division. Regardless of the grouping, there will be common goals and activities that will enable creation of workgroups reflecting these goals.
Once you have determined some logical workgroups, the next step is to define a project team structure. At the top of the structure is the Steering Committee. This is the group that is made up of senior management who are the key stakeholders for the project. The role of this group is to provide high-level direction, provide resources (monetary and personnel), and resolve major roadblocks to the success of the project. Steering Committees may oversee multiple concurrent projects, and will meet on a monthly or quarterly basis.
At this level, the Steering Committee members want to know where the project stands in terms of schedule, budget, and final deliverables. A fantastic tool for providing the Steering Committee this information is via a project dashboard. This dashboard should consist of a few key measurements with a status of each, using easy-to-read indicators like traffic lights or gauges. Here is an example:
This dashboard eliminates the need for developing voluminous detailed reports, and provides for exception level discussions. Only items that are yellow or red require explanation, so meetings are focused and their lengths are minimized.
The next level down from the Steering Committee is the Project Management Team, sometimes referred to as the Project Core Team. This team consists of key middle-management personnel representative of the primary functional areas affected by the project. The Core team should meet weekly or bi-weekly and is responsible for the direct management of the project activities. The RAID (Risks, Action Items, Issues, Decisions) document I referenced in my previous blog is the perfect communications tool for the Core Team. It provides a clear, concise mechanism for letting the team members see the critical items that require their attention.
The next level of the project organization below the Project Core Team contains the working groups for the project. The makeup of the workgroups will vary by project; however, this is the level where the daily tasks of the project are managed. This is the level that can bring you closest to a near-death experience since the number of teams and meetings is highest here.
Analyze your project and its deliverables to determine the best method for defining the workgroups. An excellent place to start is with the desired deliverables since it is difficult to split a single deliverable across workgroups. Another factor to consider is inter-departmental dependencies. Departments that closely interact with each other and/or are dependent upon each other can be combined on a workgroup to leverage that interdependency.
Meetings at this level of the project team need to be at least weekly. As above, the RAID document can be used to focus and track activities of the group, and facilitate communications to the project manager and the Project Core Team. If the tracking and reporting mechanism is standardized, then the project manager does not have to participate in all of these meetings. Focus the workgroups on the RAID documents and they will drive the agendas and reports so that meeting death takes a holiday!
In summary, to avoid the prospect of having the next project you manage being the planning of your own funeral after a painful “death by meeting” experience, try using the techniques described in this article. By constructing a project team structure as described, you can keep all the affected parties updated, involved, and focused in a manner that streamlines communications, maximizes resources, and minimizes wasteful meetings. The use of standardized task tracking and reporting tools will enable you as project manager to have visibility of all the project workgroups’ activities, and provide you the tools necessary to drive the project home successfully.