Reading Excel Files In .NET Core

Opening Excel files in code has been a painful experience long before .NET Core came along. In many cases, you actually needed the Excel application installed on the target/users machine to be able to open excel files via code. If you’ve ever had to use those “OLE DB Jet ” queries before, you know it’s not a great experience. Luckily there are some pretty good open source solutions now that don’t require excel on the target machine. This is good for Windows users so that you don’t have to install excel on a target users machine or web server, but also for people hosting .NET Core applications on Linux (And even Mac/ARM) – where Excel is obviously no where to be seen!

My methodology for this article is pretty simple. Create a standardized excel workbook with a couple of sheets, couple of formulas, and a couple of special formatting cases. Read the same data out in every single library and see which one works the best for me. Simple! Let’s get going!

Note On CSV Formats

I should note that if you are reading a CSV, or more so a single excel sheet that doesn’t have formulas or anything “excel” specific on the sheet, you should instead just parse it using standard CSV technique. We have a great article here on parsing CSV in .NET Core that you should instead follow. CSV parsers are great for taking tabular data and deserializing it into objects and should be used where they can.

Example Data

I figure the best way to compare the different libraries on offer is to create a simple spreadsheet to compare the different ways we can read data out. The spreadsheet will have two “sheets”, where the second sheet references the first.

Sheet 1 is named “First Sheet” and looks like so :

Notice that cell A2 is simply the number “1”. Then in column B2, we have a reference to cell A2. This is because we want to check if the libraries allow us to not only get the “formula” from the cell, but also what the computed value should be.

We are also styling cell A2 with a font color of red, and B2 has a full border (Although hard to see as I’m trying to show the fomula). We will try and extract these styling elements out later.

Sheet 2 is named “Second Sheet” and looks like so :

So we are doing a simple “SUM” formula and referencing the first sheet. Again, this is so we can test getting both the formula and the computed value, but this time across different sheets. It’s not complicated for a person used to working with Excel, but let’s see how a few libraries handle it.

In general, in my tests I’m looking for my output to always follow the same format of :

Sheet 1 Data
Cell A2 Value   : 
Cell A2 Color   :
Cell B2 Formula :
Cell B2 Value   :
Cell B2 Border  :

Sheet 2 Data
Cell A2 Formula :
Cell A2 Value   :

That way when I show the code, you can pick the library that makes the most sense to you.

EPPlus

When I first started hunting around for parsing excel in .NET Core, I remembered using EPPlus many moons ago for some very lightweight excel parsing. The nuget package can be found here : https://www.nuget.org/packages/EPPlus/. It’s also open source so you can read through the source code if that’s your thing here : https://github.com/JanKallman/EPPlus

The code to read our excel spreadsheet looks like so :

static void Main(string[] args)
{
    using(var package = new ExcelPackage(new FileInfo("Book.xlsx")))
    {
        var firstSheet = package.Workbook.Worksheets["First Sheet"];
        Console.WriteLine("Sheet 1 Data");
        Console.WriteLine($"Cell A2 Value   : {firstSheet.Cells["A2"].Text}");
        Console.WriteLine($"Cell A2 Color   : {firstSheet.Cells["A2"].Style.Font.Color.LookupColor()}");
        Console.WriteLine($"Cell B2 Formula : {firstSheet.Cells["B2"].Formula}");
        Console.WriteLine($"Cell B2 Value   : {firstSheet.Cells["B2"].Text}");
        Console.WriteLine($"Cell B2 Border  : {firstSheet.Cells["B2"].Style.Border.Top.Style}");
        Console.WriteLine("");

        var secondSheet = package.Workbook.Worksheets["Second Sheet"];
        Console.WriteLine($"Sheet 2 Data");
        Console.WriteLine($"Cell A2 Formula : {secondSheet.Cells["A2"].Formula}");
        Console.WriteLine($"Cell A2 Value   : {secondSheet.Cells["A2"].Text}");
    }
}

Honestly what can I say. This was *super* easy and worked right out of the box. It picks up formulas vs text perfectly! The styles on our first sheet was also pretty easy to get going. The border is slightly annoying because you have to check the “Style” of the border, and if it’s a style of “None”, then it means there is no border (As opposed to a boolean for “HasBorder” or similar). But I think I’m just nit picking, EPPlus just works!

NPOI

NPOI is another open source option with a Github here : https://github.com/tonyqus/npoi and Nuget here : https://www.nuget.org/packages/NPOI/. It hasn’t had a release in over a year which isn’t that bad because it’s not like Excel itself has tonnes of updates throughout the year, but the Issues list on Github is growing a bit with a fair few bugs so keep that in mind.

The code to read our data using NPOI looks like so :

…..

…Actually you know what. I blew a bunch of time on this to try and work out the best way to use NPOI and the documentation is awful. The wiki is here : https://github.com/tonyqus/npoi/wiki/Getting-Started-with-NPOI but it has a few samples but most/all of them are about creating excel workbooks not reading them. I saw they had a link to a tutorial on how to read an Excel file which looked promising, but it was literally reading the spreadsheet and then dumping the text out.

After using EPPlus, I just didn’t see any reason to continue with this one. Almost every google answer will lead you to StackOverflow with people using NPOI with such specific use cases that it never really all pieced together for me.

ExcelDataReader

ExcelDataReader appeared in a couple of stackoverflow answers on reading excel in .NET Core. Similar to others in this list, it’s open source here : https://github.com/ExcelDataReader/ExcelDataReader and on Nuget here : https://www.nuget.org/packages/ExcelDataReader/

I wanted to make this work but…. It just doesn’t seem intuitive at all. ExcelDataReader works on the premise that you are reading “rows” and “columns” sequentially in almost a CSV fashion. That sort of works but if you are looking for a particular cell, it’s rough as hell.

Some example code :

static void Main(string[] args)
{
    System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
    using (var stream = File.Open("Book.xlsx", FileMode.Open, FileAccess.Read))
    {
        using (var reader = ExcelReaderFactory.CreateReader(stream))
        {
            do
            {
                while (reader.Read()) //Each ROW
                {
                    for (int column = 0; column < reader.FieldCount; column++)
                    {
                        //Console.WriteLine(reader.GetString(column));//Will blow up if the value is decimal etc. 
                        Console.WriteLine(reader.GetValue(column));//Get Value returns object
                    }
                }
            } while (reader.NextResult()); //Move to NEXT SHEET

        }
    }
}

The first line in particular is really annoying (Everything blows up without it). But you’ll notice that we are moving through row by row (And sheet by sheet) trying to get values. Ontop of that, calling things like “GetString” doesn’t work if the value is a decimal (Implicit casts would have been better IMO). I also couldn’t find any way to get the actual formula of the cell. The above only returns the computed results.

I was going to slog my way through and actually get the result we were looking for, but it’s just not a library I would use.

Syncfusion

Syncfusion is one of those annoying companies that create pay-to-use libraries, upload them to nuget, and then in small print  say you need to purchase a license or else. Personally, I would like to see Microsoft not allow paid libraries into the public Nuget repo. I’m going to include them here but their licensing starts at $995 per year, per developer, so I don’t see much reason to use it for the majority of use cases. The nuget page can be found here https://www.nuget.org/packages/Syncfusion.XlsIO.Net.Core/

The code looks like :

static void Main(string[] args)
{
    ExcelEngine excelEngine = new ExcelEngine();
    using (var stream = File.Open("Book.xlsx", FileMode.Open, FileAccess.Read))
    {
        var workbook = excelEngine.Excel.Workbooks.Open(stream);

        var firstSheet = workbook.Worksheets["First Sheet"];
        Console.WriteLine("Sheet 1 Data");
        Console.WriteLine($"Cell A2 Value   : {firstSheet.Range["A2"].DisplayText}");
        Console.WriteLine($"Cell A2 Color   : {firstSheet.Range["A2"].CellStyle.Font.RGBColor.Name}");
        Console.WriteLine($"Cell B2 Formula : {firstSheet.Range["B2"].Formula}");
        Console.WriteLine($"Cell B2 Value   : {firstSheet.Range["B2"].DisplayText}");
        Console.WriteLine($"Cell B2 Border  : {firstSheet.Range["B2"].CellStyle.Borders.Value}");
        Console.WriteLine("");

        var secondSheet = workbook.Worksheets["Second Sheet"];
        Console.WriteLine($"Sheet 2 Data");
        Console.WriteLine($"Cell A2 Formula : {secondSheet.Range["A2"].Formula}");
        Console.WriteLine($"Cell A2 Value   : {secondSheet.Range["A2"].DisplayText}");
    }
}

So not bad. I have to admit, I fiddled around trying to understand how it worked out borders (As the above code doesn’t work), but gave up. The font color also took some fiddling where the library returns non standard objects as the color. Some of the properties for the actual data are also a bit confusing where you have value, text, displaytext etc. All returning slightly different things so you sort of have to just spray and pray and see which one works.

If EPPlus didn’t exist, and Syncfusion wasn’t fantastically overpriced, this library would actually be pretty good.

TL;DR;

Use EPPlus. https://github.com/JanKallman/EPPlus

 

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

11 comments

  1. There is also ClosedXML which is a thin wrapper around Microsoft’s OpenXML implementation. It’s not that popular comparing to EPPlus, but mature and still actively maintained.

    1. Assuming this is a plug, do you want to write a little bit about what makes ZetExcel great? It looks to be a paid library that can cost up to $300 so it’s going to have to be pretty good to beat the cost of Epplus (free).

    1. Yup, they just got me on that when I upgraded to version 5. You can still pull down 4.x, but you’ve got no support if you do.

  2. Great post. Just what I was looking for.
    Thank you so much.
    Btw, what will happen with apps built with EPPlus once it stops being free?

    1. As long as you’re running 4.x or older, you’re fine. It’s only 5.0 or newer that’s moved to the paid license.

  3. Awesome Rick, thank you.
    Do you have a recomendation on something to use to convert xlsx to pdf?
    Currently I’m using:
    excelWorkbook.ExportAsFixedFormat(Microsoft.Office.Interop.Excel.XlFixedFormatType.xlTypePDF, pdfFile);

    But this requires Excel installed on the server.

  4. I suspect that the limitations in ExcelDataReader are intentional, because it’s intended as an implementation of .NET’s data reader (via the IDataReader interface), which are always sequential and forward only (see https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/ado-net-architecture#choosing-a-datareader-or-a-dataset). The idea is, if you need random access, you use a DataSet which can be loaded from a DataReader; AFAICT ExcelDataReader supplies such access with an AsDataSet method.

Leave a Reply

Your email address will not be published. Required fields are marked *