This will not be my most scintillating post of the month...or even the day. If you're not interested in transferring data from webpages to spreadsheets, just keep moving along, nothing to see here, folks. But today I found an easy way to take rows of HTML data and convert it to columns in an Excel spreadsheet--I'd expected this to be a gruesomely time-consuming task, so I thought I'd document the process for future reference.
To illustrate, let's say you have a some data in alternating rows on a web page, like this:
Name: John Doe
Site: www.johndoe.com
Email: [email protected]Name: Jane Roe
Site: www.janeroe.com
Email: [email protected]Name: Ron Mexico
Site: www.ronmexico.com
Email: [email protected]
And you want it all in columnar format in an Excel spreadsheet, sort of like this:
Name Site John Doe www.johndoe.com [email protected] Jane Roe www.janeroe.com [email protected] Ron Mexico www.ronmexico.com [email protected]
Here's what you need to do:
Step One: Transfer the HTML Data To Excel
- Click-and-drag to select the HTML data.
- Copy and paste it all into a single column on a sheet--easy.
Step Two: Remove Extraneous Data
- Every row in the spreadsheet will now have extraneous data, such as "Name:" or "Site:"
- Select the entire range, and use the Replace function to delete any extraneous data. (Replacing "Name:", for example, with nothing will delete it.) Repeat as necessary.
- You should be left with only the data you want to display in columnar form, but trapped in alternating rows.
Step Three: Filter and Copy the Data by Row Type
- OK, now it gets tricky. We need to configure the data in the column so that only the rows of a certain type are visible, and all the other cells are hidden. Then we'll copy and paste the visible cells to a separate column on a new sheet, repeating the process for each separate type of row.
- In the column adjoining the one with the data, insert the following formula in the first cell:
=MOD(ROW(1:1),x)=(x-(x-1))
- Having entered the formula in the first cell, grab the handle and extend it down the entire column (or just copy and paste, whatever method you prefer.)
- You should now have two columns: one with your data in alternating rows, and a second with cells reading TRUE or FALSE. If you have three types of rows, every third row, beginning with the first, should read TRUE and the others should read FALSE.
- Insert an Autofilter so you can sort the data by the TRUE and FALSE values. Go to Data > Filter, and select "Autofilter". A little rounded box with a downward-pointing arrow will appear in the top cell of each column.
- Click the rounded box at the top of the second column, and select "TRUE". Only the rows with TRUE as the value of the second column will remain visible; all others will disappear.
- Select the visible cells. If you have a large amount of data--and you should; otherwise, why the hell are you doing this?--hit F5 > Special > Visible Cells Only.
- Copy and paste the selected cells to a new sheet. You should have two columns, the first one consisting only of data from rows with TRUE in the second column from your original data set, and the second one a series of TRUE values. Delete the second column and return to the original sheet.
- Click the rounded box at the top of the second column (the TRUE/FALSE column, which now shows only TRUE values), and select (All). All of your original data should now be visible again.
- You now need to alter the formula in the second column, so that it reads as follows:
=MOD(ROW(1:1),x)=(x-(x-2))
- Repeat the steps above x times, increasing the last number in the final variable by 1 each time until it equals x. At that point, substitute 0 for the entire final variable before proceeding with the final series. Each time you complete these series of steps, you'll have a selection of data to paste to your second sheet consisting only of data from one type of row in the original data set (along with some TRUE values, which can be deleted.)
- The final result should be an Excel sheet with data in columnar form, which will allow you to easily select all the data in a single column, such as names or email addresses.
You can thank me later.
tag: excel