Excel help?
Feb. 15th, 2018 10:21 amI have a spreadsheet with approx forty thousand rows. Around 6000 of them are irrelevant - they're mixed in with the rest, but are identifiable based on data in one of the columns. The data covers three years. The years are not recorded as proper dates, but as plain text saying things like
My task is that for each year, I need to count the unique values in one of the columns. This column contains only text, no blanks. It's not made out of meaningful English words, but serial numbers containing letters and digits. The values in each column are repeated anywhere from 1 to 60 times; I just want to know how many different serial numbers there are, not overall, but separately for each year.
My own knowledge of Excel, supplemented by searching for things like
I tried following the instructions in this official Microsoft article, and it's not quite working for my case. As soon as I try to filter a column by
Does anyone have any suggestions for how to approach this?
(The reason why I'm trying to wrangle this myself rather than delegating it to someone who has relevant expertise is, well, annoying work politics. But the fact remains that I need to do it.)
2015/16.My task is that for each year, I need to count the unique values in one of the columns. This column contains only text, no blanks. It's not made out of meaningful English words, but serial numbers containing letters and digits. The values in each column are repeated anywhere from 1 to 60 times; I just want to know how many different serial numbers there are, not overall, but separately for each year.
| Year | Serial Number | Flag |
|---|---|---|
| 2015/16 | AAA111 | Relevant |
| 2015/16 | AAA128 | Irrelevant |
| 2016/17 | AAA111 | Irrelevant |
| 2016/17 | AAB139 | Relevant |
| 2016/17 | AAA111 | Relevant |
My own knowledge of Excel, supplemented by searching for things like
Excel count unique values, isn't quite sufficient. Anything that involves doing this semi-manually (eg sorting the columns then counting by hand) is unfeasible over tens of thousands of rows. Anything automated needs to not make Excel choke with a large-ish spreadsheet. I tried following the instructions in this official Microsoft article, and it's not quite working for my case. As soon as I try to filter a column by
unique values only, it overrides the filter I started with for taking out the 6000 rows marked 'irrelevant' in a different column. Even worse than that, instead of copying the roughly 10,000 cells in the same column that I tried to select, it copies a whole chunk of the spreadsheet with several columns and I'm not sure exactly how it's related to the area I selected. The second method, count using functions, I don't understand well enough to try, and since the method I thought I understood behaved very unexpectedly, I don't want to start blindly pasting in a formula I really won't be able to debug.Does anyone have any suggestions for how to approach this?
(The reason why I'm trying to wrangle this myself rather than delegating it to someone who has relevant expertise is, well, annoying work politics. But the fact remains that I need to do it.)
(no subject)
Date: 2018-02-15 10:38 am (UTC)Then you want COUNTIFS. That lets you count on more than one condition.
There's a page here:
https://exceljet.net/formula/count-unique-text-values-with-criteria
which does counting of unique values using FREQUENCY and MATCH. I'm slightly too much at work myself to put these together, but they're the things I'd look at. I'd be aiming for a column which was TRUE if my serial number didn't occur in any of the rows above it, but FALSE if it did (that's the FREQUENCY and MATCH thing), and then I'd COUNTIFS the year was 2015-16 and the FREQUENCY-MATCH thing was TRUE and the relevance was RELEVANT.
(no subject)
Date: 2018-02-15 10:49 am (UTC)(no subject)
Date: 2018-02-15 10:55 am (UTC)(no subject)
Date: 2018-02-15 11:00 am (UTC)=IF(AND(A2="2015/16",C2="Relevant"),B2,"")
which gives me the serial number only if the year is 2015/16 and the flag is Relevant. (You can type that in the first row and then drag it down, or copy and paste it down, and it will do the right thing in all further rows.)
Now you can filter on just that column, so you only have one filter in play. Does that work for filtering on unique values only?
(no subject)
Date: 2018-02-15 01:00 pm (UTC)(no subject)
Date: 2018-02-15 02:00 pm (UTC)Just repeating this, as it is the single most useful piece of Excel advice ever and I've had to dig someone else out of not doing it today...
(no subject)
Date: 2018-02-15 10:40 am (UTC)Do you have Access or another database program you're familiar with? If so it can probably import the .xls, and databases are made for this sort of query. IIRC you want an aggregate query with three columns, 'group by' on the year and on the serial number, and 'count' on another column (maybe one that's unique for each row, like a primary id?)
Alternatively, looking at the knowledge base article, I don't quite know how the filter works, but can you specify both conditions (unique and year) in one filter, instead of two successive filters? I don't know if you can, but that seems to be what's needed.
(no subject)
Date: 2018-02-15 10:57 am (UTC)(no subject)
Date: 2018-02-15 11:03 am (UTC)Can you create a new row which is the concatenation of the serial number and year? And then count all the unique entries in that? So you get "AAA111 2015/16" "30" "AAA112 2015/16" "25" "AAA111 2016/2017" "19" etc
Or, can you filter by year, then copy all the cells to separate sheet, and then do the unique counting thing? I'm not sure that's how copying when you filter works, but I think it might be (and of course, it might be too slow, I'm not sure).
(no subject)
Date: 2018-02-15 11:18 am (UTC)(no subject)
Date: 2018-02-15 11:20 am (UTC)(no subject)
Date: 2018-02-15 11:28 am (UTC)Then you can have a little summary set of your choice of sumifs or countifs for each year: =countifs( [Year column: Year column], "[Year]", [helper column: helper column ], 1)
If you need the spreadsheet to be resortable after, just copy the helper column and save as values first.
(no subject)
Date: 2018-02-15 11:51 am (UTC)You can see how many variations you have to convert by selecting the entire Year column, copying it, opening a new sheet, pasting as values there, and while you have that copied column still selected using the extremely handy option under the Data tab to 'Remove Duplicates'.
(no subject)
Date: 2018-02-15 11:40 am (UTC)- 2015/16
AAA111
- 2016/17
AAA111
AAB139
Which you can easily then count from to get your answers (if necessary, you can select all the rows in a given category, and the "count" will show in the toolbar on the bottom right of the Excel window).
If you want any more help with pivot tables, feel free to give me a shout! I love them a lot.
(no subject)
Date: 2018-02-15 07:17 pm (UTC)(no subject)
Date: 2018-02-15 07:58 pm (UTC)(no subject)
Date: 2018-02-15 08:44 pm (UTC)Who are you? :)
(no subject)
Date: 2018-02-15 08:50 pm (UTC)(no subject)
Date: 2018-02-15 08:51 pm (UTC)If I did that right, it should return:
(no subject)
Date: 2018-02-17 09:10 am (UTC)