From HTML to Excel

From HTML to Excel

One of our projects makes extensive use of web-based reports. Our customer, however, often prefers to extract these reports to Excel, for further analysis or manipulation. This is easy with Internet Explorer, as there's a right-click option "Export to Excel", which handily takes anything on a web page, especially a table, and automatically formats and imports it into a spreadsheet. Except for formatting. Our customer has been troubled with leading zeros in account numbers (a lot of these being General Ledger reports), and Excel stubbornly insists on treating these as numbers, and thus dropping the leading zero. We needed to find a way to give a hint to Excel, to tell it to treat this data as text, not numbers.

So, we searched for a solution on the web, but it surprisingly took a bit of time to find a simple solution. In fact, could find nothing on Microsoft's own support site. So, when we finally solved it, we thought it'd be worth a write-up, and hopefully make this solution easier to find for future searchers.

The solution was finally found courtesy of 15 Seconds, an ASP tutorials website. An article written by Bill Jeffries, Updating Excel from the Web, gave us the hint we needed.

Now, we're pretty much sticklers for keeping to web standards. Would normally frown upon any non-standard HTML being inserted into our code; our application's front-end uses CSS for all formatting, and the output (driven by servlets, which themselves deal with a data presentation layer that's pure XML) is HTML 4.0 Strict compliant.

But, we'll make a bit a bit of exception in this case, to adopt a Microsoft-specific style attribute. Actually, the output is still HTML 4.0 compliant; it just isn't standard CSS. No problem; doesn't mess up other browsers, and it's just a hint for the Internet Explorer "Export to Excel" option anyway. At least Microsoft chose to do this the "right" way -- formatting hints are declared as style sheet directives, thus cleanly separating data from formatting. Nice.

The solution? The mso-number-format style attribute, to be put on table cells (<td>). Several number formats are available. These are some of the more common:

mso-number-format:\@
text
mso-number-format:"0\.000"
3 decimals
mso-number-format:\#\,\#\#0\.000
comma separators (and 3 decimals)
mso-number-format:"mm\/dd\/yy"
Date format
mso-number-format:"d\\-mmm\\-yyyy"
another date format
mso-number-format:Percent
percent

'mso' stands for Microsoft Office, so these formatting hints will hold if table data is imported into any Office product.

To solve our initial problem -- not dropping leading zeros from account numbers, we used the text format in a CSS style sheet:

 td.accountnum {mso-number-format:\@} 

Then, on the the actual table, just use the  accountnum class:
 <td class="accountnum">01070000<td> 

On the web, the report table still looks the same. But when imported into Excel, via the "Export to Excel" menu item in Internet Explorer, or via a "web query" from within Excel, the account numbers won't be converted into numbers, thus saving the leading zero. Problem solved

There are several other ways to interface between web pages and Excel. We've considered creating a XSL stylesheet that will convert our XML output into the Microsoft XML definition. This would be fairly simply (we already use XSL extensively on this project), but for now, taking advantage of Internet Explorer and Excel's integration capabilities, this simple trick for converting HTML tables into a spreadsheet works just fine for our users.

你可能感兴趣的:(Excel)