I’ve always been a big fan of the Import from Folder feature in Power Query. So much so, that I wrote my first post on the topic way back in February of 2015. It’s still one of most popular posts on Excel Unplugged and has been commented on 57 times and counting. I also did a follow up post in May of 2016 about Importing from multiple Excel files in one folder, and this one also quickly took it’s place among the 10 most read posts on Excel Unplugged. Just like with the first post, this one also got commented on quite a lot.
Since consolidating many files from a folder into one consolidated range is obviously a hot topic, but even more so because I keep getting questions on the same snags of the process, I decided to write a post on what I do to make the process “bulletproof”. This is paramount when you plan to load the result to the Power Pivot Data Model where a failed query could result in a corrupted Data Model.
1.Making sure only the right files get transformed
Imagine you get monthly reports in Excel format. To consolidate them, you plan to create a Power Query function as I described in the articles mentioned above and then apply that function to every file in a folder that you would then monthly add files to. This all sounds good in theory but in practice many things can happen in a folder. Someone might put their favourite song by the Beatles in that folder, or maybe a picture of themselves on vacation… Imagine applying your function to any of those. So, bellow is our folder, with the monthly reports but notice the irregularities.
1. The three non-excel files
2. The XLSX (capital letters) extension of the February report
When we do a Get Data/From File/From Folder and select the folder above, we get something like this
Of course, we choose Edit and enter the Query Editor. We focus our attention on the Extension column.
Now I’m guessing a large percentage of users would just go wild with the filters, but the smart move would be to first select the Extension column, then choose Transform/Format/UPPERCASE (could have gone for the lowercase too). With this we solve the XLSX vs. xlsx problem. The reason this is a problem that needs solving, is because Power Query is case sensitive and treats XLSX and xlsx as two different extensions. Now we know that whatever the case was, now all extensions are uppercase.
Now we are ready to do some filtering. And here again, things could get tricky. One might only deselect the Select All and then check the XLSX extension but that could be perceived wrongly by Power Query, so the right way is Filter/Text Filters/Equals…
We write .XLSX. Be careful of the dot preceding the extension.
And here we go. The list of our Reports. We could also Filter the Name column and say we only want files whose name contains Report and get rid of other unwanted Excel files.
2.Fixing transformation errors before loading
Now we are at the “Applying the function to all remaining files in the Folder” step. And since the Function is nothing more than a series of transformation steps that worked on one file, there is no guarantee that they would work on other files in a folder. All automation projects tend to become a hunt for sporadic changes in file contents. I had quite a few great VBA projects fail over one invisible space added to the name of the Sheet or something even more bizarre. So if we have a function there is a chance that it might fail to do the desired transformations on one file or the other. Let’s simulate this and show how to handle this sort of errors.
To keep things simple, we will have an anomaly of one report that will have a data on a Sheet named differently than in all the other reports. Since our function will be written to work with the regular reports it will return an error when applied to the “black sheep” file.
Just to keep things “organized” the “black sheep” will once again be the February Report. And to keep things simple, all our function is supposed to do is to get the value of cell A1 from sheet called Special Sheet from all Reports in the target folder. Invoking the function will get me this.
All A-Okay except for the February Report. Now there are two ways how to handle this. Number one is a Crude but Effective solution.
2.1 Removing errors
Warning: using this method will result in data loss! (Sounds almost like Power Query when you change a load destination )
At this point we know that there are rows that result in an error. To make matters worse, this error is returned by one whole file. But at this point we can select the Column with errors and select Home/Remove Rows/Remove Errors.
And there it is…
No more errors and the data is ready to load. But again, this means that we lost the February data.
2.2 try … otherwise
The lower case in the title is intentional and very important since Power Query is case sensitive.
This will be a different approach. Sort of a IFERROR from Excel. We know why the function we had failed. It was because of the Sheet name. So we create a copy of that function and change the name of the Sheet to whatever it is in the February file.
At this point we have two functions, each fitted to a specific case.
But now we want to use them both in the same step. This will require some custom code change. The right tool for this is the Advanced Editor. You can find the command on the Home tab, but it seems way cooler to select it from the View tab
Now we see the entire code of the Import From Folder query. The highlighted line in the image bellow is the one returning errors.
This is the code
#"Invoked Custom Function" = Table.AddColumn(#"Merged Columns", "fnGetReportSpecialSheet", each fnGetReportSpecialSheet([Merged]))
It basically tells Power Query to add a column by invoking the function fnGetReportSpecialSheet (not the best name for a function I know) on column Merged that has our Filepaths to all the files in the folder. Now we will customise this by changing the last part to
each try fnGetReportSpecialSheet([Merged]) otherwise fnGetReportSheet1([Merged]))
So try to do the first and if it produces an error, use the second function.
This weeks post will be a short testament to a development venture by a few of my countrymen. Like most of Excel fans, I’m also a big fan of Power BI and it’s reporting capabilities. But having said that, the built-in visuals have often left me wishing I was in Excel and I could do a bit of customising. About two weeks ago I started playing around with Zebra BI Power BI visuals and I must admit, I was extremely impressed. Up to a point where I said to myself, people should hear about this and that is why I wrote this post.
This is a follow up post on the final result of last week’s post Table.Join function in Power Query. So basically, we want to get from this
To this
Only by using Conditional Formatting.
First, we need to make sure, that the column in which we will be simulating blank cells (Name) is sorted correctly. We need to group the same filenames together…
In our case, that was the result of Table.Join in Power Query and no sorting was needed, but in other scenarios, the table would have to be sorted by the Name column.
Now we can set up the Conditional Formatting rule, that will hide all filenames beyond the first for each unique filename. As with any Conditional Formatting rule, we start of by selecting the cells where conditional formatting will be applied. In our case that’s the first column (Name).
Then we go to Home/Conditional Formatting/New Rule
And in the New Formatting Rule we choose Use a formula to determine which cells to format
The formula we enter is
=COUNTIF($A$2:A2,A2)<>1
Which basically translates to Check if the current filename appeared more than once in a range from the beginning of the column and up to the current row. Now we set up formatting if the condition is met. We select the Format… button and on the Number tab select Custom and write the following custom Format code
0;-0;0;
The code says this: “Show positive numbers and show them rounded to a whole number, do the same with negative numbers and zeroes. But don’t (!!!) show text“. If you want to learn more about this custom cell Format, a great place to start is the following article:
Only the first appearance of each filename is shown and every successive one is hidden by the conditional formatting rule we’ve set up.
To achieve the same in the second column (NumberOfApperances), we repeat the process from above with two distinct differences. First difference being the selection of the column. We select the NumberOfApperances column and not the Name column. Then we follow the checklist from the first sample up to a point where we write the same formula…
And now we arrive at the second difference, the custom Format. This time the code will be
;-0;0;@
This one translates to: “hide positive numbers, show negative numbers rounded to a whole number, show zeroes and text“.
It’s the newest “Quick Excel Guide” template in Excel and all I can say is BRAVO Microsoft. As the name implies, the target audience are Excel beginners, but it’s also quite a comprehensive short guide from A to Z or better said, it takes you from basic formulas to Pivot Tables. What I really like about it is the fact that on Page one there’s also a Ctrl+= shortcut for SUM. Leaving the usefulness of that one shortcut aside, I love the attitude and the completeness of the “educational” service where even an advanced Excel user can pick a few nice pointers. I think completing the tasks in this template should be a quarterly exercise for Excel newbies in every company.
I strongly encourage you to try it out! You can find it on Excel Start Screen or if you go to File|New
If you can’t find it there, you use the following links
The idea of the template is to take you on a 10 stops journey (notice the 10 Sheets bellow).
Each Sheet is a topic of it’s own but I suggest you go through them in the given order. Challenges presented on each sheet begin with a problem and a step by step solution of the given problem. Finishing that, you can dig deeper into the given topic with many tips & tricks and important remarks.
The only section that does leave you wanting more (even keeping in mind the target audience) is PivotTables. The feeling is that it was done in a hurry and with a lot of room for improvement which the two “More information” links don’t fill.
But I’ll say it once again, well done Microsoft and I encourage the readers to test it out and to try and test it on some Excel newbies and I think it is a useful resource that every company could (should) find a use for.
The other day I made a Dashboard with a classic by time and relevant dimensions analysis. The data was pulled from Azure SQL DB, so Power Query was the method I used, to get the data into Excel. The dashboard build was very straightforward, until the customer desired to see a whole section that would show a “Random” analysis or rather analysis by random conditions. Sort of an element of surprise on the dashboard. With that in mind I jumped right into the “random” side of Power Query, since my goal was to only import a random subset of data. This post will take you through the process of creating a query that will return a random subset of a table in a different workbook.
We will take data from a simple Excel Table called factTbl, that looks like this…
…and we want the “random” part of the analysis to focus on a random value of the “Goes To” column. So basically, all the rows belonging to One of the Fab Four J.
We start with a blank Excel File and go to
Excel 2016:
Data/New Query/From File/From Workbook
Excel 2010 or 2013:
Power Query/From File/From Excel
Select the desired file, and the right table and choose Edit
At this point you should see something similar to this.
Now we should rename the last step, so it will be easier for us to recall it later. We do this simply by right clicking on the step name (Changed Type) and selecting Rename.
I’ll name it BasicData. You can call it anything you want (if the name is not already in use), but make it sensible, so you can refer to it any time.
Now we are ready to plant that random seed J
First we should remove all other columns, except for the “Goes To” column. Right click the “Goes To” column header and select Remove Other Columns.
Next up, we remove all the duplicate values from the “Goes To” column. Same procedure, right click the column header and select Remove Duplicates.
Now, we start writing a function that will select a random Beatle J.
Click the fx icon in the Formula Bar, leave the automatic formula as is, and just add [Goes To]{0} at the end.
If you did everything by the book, you got
= #"Removed Duplicates"[Goes To]{0}
What that essentially means is: “Based on a result of the “Removed Duplicates” step (that is the “table” we are working with), go to the “Goes To” column and return the first (!!!) element. Power Query is 0 based so “0” means “one” or rather “first”. Now you can play around with this, by substituting the 0 by any other number up to three. And you will get the corresponding Beatle. But the trick is, how to make that number random. Therefore, we need a sort of RANDBETWEEEN (Excel) function in Power Query. The Power Query equivalents of Excel functions are:
Excel Function
Power Query Function
=RAND()
=Number.Random()
=RANDBETWEEN(min,max)
=Number.RandomBetween(min,max)
But there is a catch! The Rand and Number.Random functions are equivalent and work basically the same, but the Number.RandomBetween function on the other hand, does NOT work like the RANDBETWEEN function in Excel. The Excel version returns a whole (!!!) number between desired min and max values, but the Power Query version returns a random decimal number between min and max values. So to use that function instead of a {0} in our function, we need to round it up to a whole number. And to get the equal representation of all numbers, we will use the following.
Number.RoundDown(Number.RandomBetween(0,3.99))
After replacing the 0 above with the function above, we get this:
Remark: Since we know there will never be another Beatle, the top can be a constant (3.99), but realistically we should add a row count of the table of all unique’s to the query, to make the top dynamic. The function we would use to count the number of rows would be = Table.RowCount(#”Sorted Rows”)
Now go wild with the Refresh button and you will see, that you do get a random Beatle…
This is our magical step that will give us that randomness. To make the reuse of this result as simple as we can, we should rename this last step into something more sensible. I’ll rename it to Beatle.
Now let’s call back our original table. Click the fx icon in the Formula Bar and call back the name you gave the original table (in my case that is BasicData).
=BasicData
Now we set up the filter that we need. So select the filter in the “Goes To” column and filter out your favorite Beatle.
With this we created a formula that only needs a little tweak…
…all that remains is to change the constant “John” (in my case) with our random variable Beatle (in my case, you might have used a different name).
And then once again go wild with the Refresh button J
At this point you can try and create more filters by other columns and add some more randomness to your result.
But to bring this post home, just Close & Load…
…the data into Excel and then if you so desire, you can go wild with Refreshing the Query.
We start with a range of values in Excel (A1:G20).
Now we are looking for a formula to get the closest value to the value we input in cell I1. First we’re looking for the closest value lower than the selected one.
Closest value lower then selected
Here’s the formula
=SMALL($A$1:$G$20,COUNTIF($A$1:$G$20,"<"&I1))
The key to this formula is the function SMALL which works like this
=SMALL(Range,n)
And it returns the n-th smallest number in a selected range. In our case, we pair SMALL with COUNTIF which simply gives us a count of how many values are smaller than the chosen number. Once this count is inserted into the SMALL function, we actually get the largest value smaller than the selected number, so exactly what we need.
What about the closest number higher than the selected one?
Closest value higher then selected
Here’s the formula
=LARGE($A$1:$G$20,COUNTIF($A$1:$G$20,">"&I1))
The key to this formula is the function LARGE, which is the “big” sister of the SMALL function mentioned above. Here’s the functions syntax.
=LARGE(Range,n)
You would never have guessed it, but LARGE returns the n-th largest number in a selected range. This time the COUNTIF function simply gives us a count of how many values are larger than the chosen number. This in combination with the LARGE function, gives us the smallest value larger than the selected number, so again, just what the doctor ordered.
You can download the sample workbook that I used for all the screenshots here.
This was a very short post that could easily be entitled “what do functions LARGE and SMALL do?”, and trust me, these are two functions that you should know and use during your Excel adventures. Putting aside the obvious use of simply finding the second largest or 10th smallest value, and the use implied in this post, you could actually use them to SUM up the 10 largest values in a range, which I wrote about in THE SUM OF 10 LARGEST VALUES post, which I strongly urge you to read.
Read Full Article
Visit website
Show original
.
Share
.
Favorite
.
Email
.
Add Tags
close
Scroll to Top
Separate tags by commas
To access this feature, please upgrade your account.