One of the most popular posts on this blog is a very simple write-up on how to parse JSON in C# .NET. I mostly wrote it because I thought that there was definitely a “proper” way of doing things, and people were almost going out of their way to make life difficult for themselves when working with JSON.
I think working with XML is slightly different because (just IMO), there still isn’t a “gold standard” library for XML.
Unlike JSON which has the incredible JSON.NET library to handle everything and anything, the majority of cases when you work with XML you’ll use one of the inbuilt XML Parsers inside the .NET Core framework. These can be frustrating at times and incredibly brittle. Part of it is that they were created very early on in the creation of .NET, and because of that, always need to be backwards compatible so you lose out on things like Generics. The other part is that the actual XML spec that involves things like namespaces and DTDs, while at first look simple, can be incredibly harsh. By harsh I mean that things will just plain not work if you are missing just one piece of the puzzle, and it can take hours to work out what’s wrong.
Anyway, let’s jump right in and check out our options for working with XML in C# .NET.
Extend your C# Spreadsheet Capabilities – Get Started with IronXL
Manipulate Excel datasets with IronXL. Create, and parse Excel files in C# .NET Core with IronXL. You can even parse into numeric value, Boolean value, arrays, data tables, and datasets.
IronXL extends your abilities by letting you read and write excel file in C# .NET Core in just a few lines of code. It works with other excel formats XLS/XLSX/CSV/TSV. Our premium client portfolio (Lego and NASA) allows us to offer you the best – join us with a 30-day free trial key or contact our 24-hour engineering support team.
Our Example XML File
I’m going to be using a very simple XML file that has an element, an attribute property and a list. I’ll use these as we check out the options so we are always comparing trying to read the same file.
<?xml version="1.0" encoding="utf-8" ?> <MyDocument xmlns="https://dotnetcoretutorials.com/namespace"> <MyProperty>Abc</MyProperty> <MyAttributeProperty value="123" /> <MyList> <MyListItem>1</MyListItem> <MyListItem>2</MyListItem> <MyListItem>3</MyListItem> </MyList> </MyDocument>
Using XMLReader
So the first option we have is using the class “XMLReader”. It’s a forward only XML Parser (By that I mean that you read the file line by line almost). I’ll warn you now, it’s very very primitive. For example our code might look a bit like so :
XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; using (var fileStream = File.OpenText("test.xml")) using(XmlReader reader = XmlReader.Create(fileStream, settings)) { while(reader.Read()) { switch(reader.NodeType) { case XmlNodeType.Element: Console.WriteLine($"Start Element: {reader.Name}. Has Attributes? : {reader.HasAttributes}"); break; case XmlNodeType.Text: Console.WriteLine($"Inner Text: {reader.Value}"); break; case XmlNodeType.EndElement: Console.WriteLine($"End Element: {reader.Name}"); break; default: Console.WriteLine($"Unknown: {reader.NodeType}"); break; } } }
With the output looking like :
Unknown: XmlDeclaration Start Element: MyDocument. Has Attributes? : True Start Element: MyProperty. Has Attributes? : False Inner Text: Abc End Element: MyProperty Start Element: MyAttributePropety. Has Attributes? : True Start Element: MyList. Has Attributes? : False Start Element: MyListItem. Has Attributes? : False Inner Text: 1 End Element: MyListItem Start Element: MyListItem. Has Attributes? : False Inner Text: 2 End Element: MyListItem Start Element: MyListItem. Has Attributes? : False Inner Text: 3 End Element: MyListItem End Element: MyList End Element: MyDocument
It sort of reminds me of using ADO.NET and reading data row by row and trying to store it in an object. The general idea is because you are only parsing line by line, it’s less memory intensive. But you’re also having to handle each line individually with any number of permutations of elements/attributes/lists etc. I think the only reason to use this method would be if you have extremely large XML files (100+MB), or you are looking for something very very specific. e.g. you only want to read a single element from the file, and you don’t want to load the entire thing while looking for that one element.
Another thing I will point out is that XML Namespaces and the difficulty around those wasn’t there with XMLReader. It just sort of powered through and there wasn’t any issue around prefixes, namespaces, DTDs etc.
But again in general, I wouldn’t use XMLReader in the majority of cases.
Using XPathDocument/XPathNavigator
So another way of getting individual XML Nodes, but being able to “search” a document is using the XPathNavigator object.
First, the code :
using (var fileStream = File.Open("test.xml", FileMode.Open)) { //Load the file and create a navigator object. XPathDocument xPath = new XPathDocument(fileStream); var navigator = xPath.CreateNavigator(); //Compile the query with a namespace prefix. XPathExpression query = navigator.Compile("ns:MyDocument/ns:MyProperty"); //Do some BS to get the default namespace to actually be called ns. var nameSpace = new XmlNamespaceManager(navigator.NameTable); nameSpace.AddNamespace("ns", "https://dotnetcoretutorials.com/namespace"); query.SetContext(nameSpace); Console.WriteLine("My Property Value : " + navigator.SelectSingleNode(query).Value); }
Now honestly… This is bad and I made it bad for a reason. Namespaces here are really painful. In my particular case because I have a default namespace, this was the only way I could find out there that would get the XPath working. Without the namespace, things would actually be a cinch. So with that said I’m going to admit something here… I have totally used string replace functions to remove namespaces before… Now I know someone will jump in the comments and say “but the XML spec says blah blah blah”. I honestly think every headache I’ve ever had with working with XML has been because of namespaces.
So let me put a caveat on my recommendation here. If the document you are working with does not make use of namespaces (Or you are willing to remove them), and you need use an XPath expression to get a single node, then using the XMLNavigator actually isn’t a bad option. But that’s a big if.
Using XMLDocument
XMLDocument can be thought of like an upgraded version of the XPathNavigator. It has a few easier methods to load documents, and allows you to modify XMLDocuments in memory too!
XmlDocument document = new XmlDocument(); document.Load("test.xml"); XmlNamespaceManager m = new XmlNamespaceManager(document.NameTable); m.AddNamespace("ns", "https://dotnetcoretutorials.com/namespace"); Console.WriteLine(document.SelectSingleNode("ns:MyDocument/ns:MyProperty", m).InnerText);
Overall you still have to deal with some namespace funny business (e.g. Default Namespaces are not handled great), and you still have to get each element one by one as you need it, but I do think this is the best option if you are looking to load out only a small subset of the XML doc. The fact you can modify the XML and save it back to file is also a pretty good one.
Using XMLSerializer
Now we are cooking with gas, XMLSerializer in my opinion is the very best way to parse XML in .NET Core. If you’ve used JSONDocument from JSON.NET before, then this is very close to being the same sort of setup.
First we simply create a class that models our actual XML file. We use a bunch of attribute to specify how to read the doc, which namespace we are using, even what type of element we are trying to deserialize (e.g. An attribute, element or array).
[XmlRoot("MyDocument", Namespace = "https://dotnetcoretutorials.com/namespace")] public class MyDocument { public string MyProperty { get; set; } public MyAttributeProperty MyAttributeProperty { get; set; } [XmlArray] [XmlArrayItem(ElementName = "MyListItem")] public List MyList { get; set; } } public class MyAttributeProperty { [XmlAttribute("value")] public int Value { get; set; } }
Really really simple. And then the code to actually read our XML and turn it into this class :
using (var fileStream = File.Open("test.xml", FileMode.Open)) { XmlSerializer serializer = new XmlSerializer(typeof(MyDocument)); var myDocument = (MyDocument)serializer.Deserialize(fileStream); Console.WriteLine($"My Property : {myDocument.MyProperty}"); Console.WriteLine($"My Attribute : {myDocument.MyAttributeProperty.Value}"); foreach(var item in myDocument.MyList) { Console.WriteLine(item); } }
No messing about trying to get namespaces right, no trying to work out the correct XPath, it just works. I think once you start using XMLSerializer, you will wonder why you ever bothered trying to manually read out XML documents again.
Now there is a big caveat. If you don’t really care about the bulk of the document and you are just trying to get a really deep element, it can be painful creating these huge models and classes just go get a single element.
Overall, in 99.9% of cases, try and use XMLSerializer to parse XML. It’s less brittle than other options and follows a very similar “pattern” to that of JSON serialization meaning anyone who has worked with one, can work with the other.
What about LinqToXml and XDocument.Load() ?
True. I haven’t used XDocument that much. The Linq is nice but you’re still plucking the elements similar to XMLDocument. I also found the Linq wasn’t as straight forward (e.g. You still have to use XML specific things like .Descendents() etc). It’s an option for sure, but I think if you want to pluck a single element, using XPath (With XMLDocument or XDocument) would be the way to go, and if you wanted the entire document XMLSerializer would be the way to go.
Why not use the edit->paste special->paste xml as class option to have VS build your classes for you?
paste xml as class it’s a quite nice idea but done with a “good enough” approach which means that some real world more complicated XMLs end with:
—————————
Microsoft Visual Studio
—————————
Paste XML As Classes The operation failed due to System.NullReferenceException: Object reference not set to an instance of an object.
—————————
OK
—————————
In your opinion, is XMLSerializer good enough when dealing with files larget than 1 GB? Or would you recomend as you said in the post, the use of XMLReader
For a 1GB file, streaming the file is the correct option (So XMLReader). You could try XMLSerializer though, I’ve definitely opened some rather large files using XMLSerializer without issue.
Just come across this Wade – absolutely spot on and chimes exactly with my experience, especially with the dreaded Namespaces! Great article