This afternoon I created a correlation matrix which I then conditionally formatted to give me a heatmap-like take on the data:
Going to Data | Conditional Format | Color Scales, I decided that green would be good and red bad (Logical, right? Thanks, Excel.).
Notice a problem? The 1’s across the diagonal are skewing the relative intensity of our colors. Of COURSE they are going to be the highest value as 1 is the highest possible correlation, anyway! Not only that, they are superfluous information.
Here’s a solution.
1. Copy and paste your correlation matrix below the original
2. Using an IF statement, we will replace the diagonals with a dash (not uncommon formatting for correlation matrices). We’ll do this by using the ROW and COLUMN functions – when the row and column numbers are equal, convert to a dash.
3. Fill that formula through your matrix. Notice the color change! Excel does not format what is not a number, and you have a more useful conditional format.
A histogram is a bar-chart visualization where the data is grouped into ranges. While Excel offers a built-in histogram chart, I like to build them using (drumroll…) PivotTables.
The exercise file contains sales quantity and dollars. I want to know the distribution of sales quantity per transaction. For example, how many transactions involved 2 items? 6 items?
1. Add a Count variable to your dataset
This looks goofy, but it’s a shortcut for our next steps. I will simply drag and fill the number 1 down our entire dataset.
2. Insert PivotTable
I put our sales quantity in the Rows box and the “Count” variable in the Values box. Now we have a count of the number of transactions per sales quantity which we will visualize with a histogram.
3. Insert PivotChart
Next, I insert a PivotChart based on this PivotTable data.
An extra tip: Usually histograms are displayed with the bars quite close, even touching each other.
To reduce the gaps between bars, right-click on any of the bars in the histogram and select “Format Data Point.” You will see a scroll-bar labeled “Gap Width.” Reduce the width to about 25%, or whatever you prefer.
4. Slice and Dice!
The true power of this method come when you want to re-group the number of “bins” in your histogram. Let’s say that you want to visualize the groups in increments of two.
Simply right-click your PivotTable, select “Group” and build your table in increments of 2. Change to 3, 5… whatever.
Now you get a sense of how this variable is distributed and can get a better sense of the average unit per transaction – does it skew high or low? Or does it appear normal? In this case the number of units per transaction appears about consistent across the range.
Notice in the GIF that when I set the histogram to intervals of five, the difference between groups looked pretty serious!
That’s because I allowed Excel to default to a cardinal sin of data visualization, floating the Y axis from 0! Because the Y axis floated to 63, of course a difference between 64 and 65 looks high.
When exploring a workbook, I like to color-code cells so that I remember what I need to do with that information. Red cells indicate I need to take action on something, while green means all systems go.
Yellow means either neutral, or that I haven’t decided yet!
That’s my system, anyway. And maybe you have your own (please leave it in the comments!).
One thing we can all agree on, though, is that filling cells from from the home ribbon is annoying. It requires precise, time-consuming, pointing-and-clicking from the mouse.
Wouldn’t it be great to just make our own keyboard shortcuts to fill cells? For example, Ctrl + Shift + R, and our selection is filled red, and so forth.
Consider it done.
Simply click to Developer | Visual Basic and paste the below code into a module under your Personal.XLSB workbook.
To assign keyboard shortcuts to the macros you created, go to Developer | Macros and select the pertinent macro (Red, Yellow or Green). Select Options and type your keyboard shortcut.
For example, I assigned Ctrl + Shift + G to the Green macro. When I hit that keystroke, I get a green cell.
This one-line macro code saves me lots of time when I’m marking up my worksheets, and I hope you can use it too.
This exercise takes advantage of Excel’s color system and VBA. This technical documentation from Microsoft has all the gory details of how colors in Excel work. It’s a deep topic – dark, ocean blue (see what I did there?).
Let’s say you need to do some data analysis or manipulation in R but want to bring it back into Excel for visualization or distribution to colleagues.
In the past, I may have used write.csv() to export the file to a csv, then gone to Excel to open.
That’s still a solid option, but I really like this XLView() function from the DescTools() library. Get this function running with the below packages…
In this demonstration, I am using the famous iris dataset (download here)
Like magic, something like the below workbook should open in Excel. From there, simply head to Data | Text to Columns and split the columns as semicolon-delimited. All about XLView()
Like with any function in R, learn about the arguments it takes with the str() function. Try str(XLView):
XLView has three optional arguments: whether you want column names (i.e. header names), row names (i.e. an index/ID number as your first column), and how to label missing values (usually as a blank or “NA.”) By default, XLView will give you column but not row names, and label missing values as blanks or ” “.
I imagine this is what the majority of users would want. Still, it’s good to know the options.
This is a great function for integrating R and Excel. I like that it opens Excel for you on command unlike write.csv(). It is also relatively easy to download, unlike some competing Excel/R packages.
Did this post help you? Please like, comment and share.Got questions about R, Excel or their colliding worlds? Leave a comment below.
Through its Get & Transform modules Excel can import data from a variety of sources. It’s really powerful and this exercise barely scratches its capabilities. For more on Get & Transform, check out Oz du Soleil’s Lynda.com course.
In this lesson we will import crypto prices from Yahoo! Finance. Yahoo! Finance is a stalwart resource for financial analysts and I was eager to find that it would not be too hard to integrate their cryptocurrency price updates into Excel with Get & Transform.
To do this, open a new worksheet, then select Data | New Query | From Other Sources | From Web.
A large box entitled “Navigator” should appear on your screen and you should see something like this:
Click on Table 0 on the left-hand column of your box, then Load toward the lower right.
This process takes time!
It took about a minute on my laptop’s wi-fi with very good signal. So let it run.
The whole process sped up looks like this:
Awesome – you’ve got real-time cryptocurrency prices from Yahoo! Finance delivered straight to your workbook. For updated information, right-click anywhere in that table and select Refresh.
But because I’ve spent too much time in a cubicle with CPAs, I won’t stop there. Oh no, let’s make this a little more user-friendly.
First, notice that if you look over to your right, you’ll see in the Workbook Queries menu that our Table is called Table 0. That’s not very helpful!
So I’m going to head over to that label, right-click and select Properties.
I’ll name the table Cryptos. I could even put a description in the table (“Crypto prices from Yahoo! Finance,” for example).
Now after this part you’re going to wonder what happened to me as a child.
Click anywhere in your beautiful table.
Now, go ahead and type right over that.
Did I just cause the collapse of Bitcoin? Well, no, but that’s what the spreadsheet says.
Of course, you can refresh the workbook and all changes will correct.
But I find this really annoying, so I am going to protect my worksheet so users cannot modify our table.
So I click Review | Protect Sheet. I have some options as to what I will allow users to modify. I can also choose whether to protect the sheet with a password. I choose not to, as this is public information anyway.
Nice. Now if you go to key over the table, you’ll get the following error message:
The only problem with this is that now you can no longer refresh your table, because it’s on a protected worksheet!
Damn Excel! Why you making this so difficult?
So let’s pause for a moment, and cope with a meme.
Fortunately, refreshing data in a protected sheet is possible with VBA.
Now, I’m not a VBA expert and I am not going to try teaching it. For that, I suggest my friends Chris Newman and Jon Acampora.
For this, I will simply instruct you to save your file as an .xlsm macro-enabled workbook, hit Alt+F11, and paste some code into the Modules section of your worksheet.
Learning VBA is frustrating. Fortunately there is a lot to work with that’s already online.
I was able to borrow the below from our friends at ExtendOffice with little modification:
Now you’ll able to have your workbook protected and refresh it. Simply head to Developer | Macros and you should see your “Data Refresh” and “Data Refresh2” macros. For “Data Refresh” click Options.
See how you can assign a keyboard shortcut for your macro? I’ll do Ctrl + Shift + B. Now when I use that keyboard shortcut, the macro will run, i.e. the table will refresh with the latest data from Yahoo! Finance.
If everything is running smoothly, you’ll see a spinning globe on the lower-left of your screen.
Now you’re ready to track crypto prices without leaving Excel, aka live the dream.
Did this this post help you? Please like, comment and share.
Ah, guest posting. It’s something that you’re supposed to do as a blogger. They give you access to new, bigger audiences who are eager to find new authors.
Sometimes I find the right outlet difficult. The economics of blogging allows your voice to be so niche and unique that the only place right to publish your work, well, your own blog.
But, what better match than writing for the blog of my alma mater, Hillsdale College?
My post is about the liberal arts and innovation. As a graduate student in design and innovation, I have found my liberal arts training to be invaluable to how I think about technology, rather than a waste of time, as commonly assumed.
Liberal arts are essential for innovation because liberal arts transcend any particular time, place or technology. They promote empathy for the human condition, a precondition for great design. The disdain for the liberal arts is counterproductive – they are a necessary piece of the innovation equation.
An old professor of mine used to quip that, for proof of the permanent income hypothesis, observe how engineering majors often drive nicer cars than philosophy majors. May it be because, he jested, the top five engineering firms are hiring at more competitive salaries than the top five philosophy firms?
Madsbjerg is steeped in philosophy, and his knowledge transforms the organizations he works with. Know anything about Heidegger? Well, get ready to want to learn more. What’s the deal with Descartes? You will see how his influence permeates the Big Data paradigm – and what’s limiting about that.
As I was reading Madsbjerg’s arguments and perspectives, I kept noticing similarities with the design thinking movement. Madsbjerg addresses this comparison head-on – and he doesn’t like it. He even compares the design thinking process pioneered by firms like IDEO as a “BS tornado” (and it’s uncensored in the book!).
He believes that “design science” is self-contradictory; places like IDEO deal too much abstract for truly humanistic thinking to flourish. (It all goes back to Heidegger vs. Descartes… seriously. Read the book, because this post could easily go dissertation-esque esoteric explaining it all.)
While compelling points, I am disappointed they were developed so late in the book. A large deal of the books’ readers are at least familiar or even active practitioners of the design thinking methodology, and Madjsberg waited too long to claim his unique perspective.
Will computers give a damn? Do you, anyway?
Another compelling argument that Madjsberg waits too late in the book to develop is that computers do not “give a damn” and will this always fail to care, and thus never empathize, a precondition to design and innovation. “Computers simply do not give a damn; they will never understand that caring is the whole point,” he argues.
True today – but for how long? Forever? I’m not going to predict whether computers will ever “care,” because I’d rather focus on what matters today – do you give a damn? I often come back to Oz du Soleil’s argument that a good analyst is someone who gives a damn, and you can’t teach that.
So, maybe computers won’t care. But do you? And what is caring?
As an incoming analyst, I was amazed at how much time most professionals spend writing and reading reports and how little time they spend immersed in the “analog world” of their business, their customers, and so forth.
Excel (and other platforms) is a tool to “automate the boring stuff.” To me, mastery of Excel paradoxically offered a promise to spend less time in Excel.
Maybe that third part should be “Giving a Damn”
I frequently reflect back on Drew Conway’s Data Science Venn Diagram (below). It’s a good model! That lower section, “Substantive Expertise,” is often overlooked – often because it is so hard-to-define and unpredictable. Maybe another way to put it is “Giving a Damn?”
When you have a perspective—when you actually give a damn—you intuitively sense what’s important and what’s trivial. You can see what connects with what, and you know the data, input, and knowledge that matter. Caring is the connective tissue that makes all these things possible….If you are in the beauty business, you simply can’t make sense of cultural insights regarding beauty ideals if you don’t care about the meaning of beauty products. If you are in the car industry, you have to care about cars and transportation—otherwise, the human phenomenon of driving will not make sense to you.
Do you care about your organization does? If not, do you have something else you care about that factors in professionally? Sure, learning the shiniest new algorithms and reporting tools are fun, and valuable. But giving a damn is not just about the data itself. Because, as computers improve at the “Hacking” and “Stats” bits of the equation, you are increasingly left with “Substantive Expertise” as what you have to offer.
I don’t use Excel for everything these days, but I continue to use it for formatting tables.
Because, let’s be honest. Word is simply not up to the task when it comes to cleanly-formatted tables.
Generally tables in publications do not include gridlines – it’s pixel overload. So to remove my gridlines before saving the images into my publications, I hide gridlines in Excel.
Turns out, though, that I was doing it wrong…
Don’t be this guy (me)
I had been removing gridlines by filling the entire worksheet in white.
Try this instead
Recently I learned about going to the View tab and checking of the “Gridlines” selection. This requires fewer keystrokes and is more natural — if you can achieve the same result by adding a feature versus removing it, seems like Occam would be pleased if you removed.
This week, with a little time off between semesters, I’ve been helping a family member with financial reporting & analysis for his business.
It’s been over a year since I worked in finance and this exercise reminds me how frustrating this can be and how critical a strong command of Excel is to modern data literacy and career success.
It also got me doubled down on the staying power of Excel. Yes, with my research in Big Data and Natural Language Processing I barely touch it these days. But those circumstances are entirely different from those of a small business operator.
Big Data requires the data be, well… big. Even classical statistics requires some minimum sample size for meaningful analysis. So, is Small Data irrelevant? Is anything done in Excel by its very nature insignificant, unsophisticated?
Data can be small, and still critical to monitor. Small business operators don’t need million-dollar dashboards or ML algorithms, but they do need to track expenses and revenues and their trends so they know they won’t go broke broke. And they can’t afford to spend loads of time or money on either learning or implementing the hottest new BI tool.
Small business operators are better off in the field, interacting with customers and generating revenue than exhaustively modeling their small datasets. Of course they should not completely ignore their data, either.
Where am I going with this? My main point is that Big Data is not the Only Data. Small Business tends to generate Small Data, of which Excel is an optimal choice. The Excel doomsayers overlook Small Data – which is to say, they overlook Small Business – neither are going away, nor will Excel.