All about C#8
By now, a lot of developers are more than familiar with the fact that DotNet core 3+ is the mainstream version of DotNet and that .NET 5 is fast approaching, what many don't realize however is that a lot of the changes and features added to DotNet core 3/3.1 have been made available because of changes to the underlying C# compiler.
Many of the efforts in moving the C# compiler forward to version 8 (C#8) have been instrumental to allowing Microsoft to improve existing functionality and add exciting new features.
The C# Compiler
Before we go any further however, just what is the C# compiler?
As many folks might be aware, C# is the primary programming language used to create dotnet and dotnet core applications.
It's not by any means the only one either, there are currently compilers for "Visual Basic" and "F#" by Microsoft themselves, and a number of others for other languages created by 3rd party companies.
The compilers job is to take the code written by the programmer and turn it into something that the computer is able to run.
In the case of DotNet the code produced is called "IL Code", a special type of machine independent code that runs inside the DotNet virtual machine environment. This Virtual Machine environment is a big part of what makes DotNet portable from one platform to another.
Any of the compilers currently available, can produce code for this DotNet virtual machine environment, so a software developer, working in Visual Basic for example can create exactly the same projects and code as a software developer working in C#.
If we use the online web tool "SharpLab" at (https://sharplab.io) we can see exactly what happens with a simple console mode program:
In the image above we have simple C# code on the left hand side, similar to how it might be written in visual studio, and on the right, we see the IL code that the C# compiler produces for us.
We can see other outputs too such as the low level JIT assembly code produced, but we'll leave it there for now as this article is purely just about the C# compiler itself.
C# Version 8
With the release of DotNet core 3, Microsoft also introduced a new version of the C# compiler, this new version brought with it some interesting new abilities.
In the remainder of this post, we'll go through some of the more interesting ones, but because of space restrictions we won't be able to cover them all, we'll kick off the list with one of my personal favourites…
The humble switch expression has been the backbone of many multiple case decisions in program code over the years.
Implemented in some fashion in more or less every language ever produced, its purpose is simple, take a value, run it through a list of possible matches, and if a match is in the list, execute the code at that position, it looks something like the following:
In the image above, myValue is set to 3, so the line of code that writes out the message "Your value was Three" will be executed.
In many cases, people use switch statements primarily in functions, and then they call those functions to produce a result into a variable of some description, for example:
In some cases, these switch statements can become very, very complex, for example in an app I wrote not too long ago, I had to use a switch statement to calculate the cost of postage on the shipping of a package.
Not only did I have to decide based on the "Shipping Type", but I also had to create switch statements within switch statements to decide on further criteria, such as weight and package size.
All of this lead to a pretty long, hard to debug and difficult to read switch statement, one which I'm not even going to attempt to screen grab for this blog, as it's just too long.
With C#8, Microsoft introduced "Switch Expressions".
Switch Expressions have all the power of regular Switch Statements, but are far more compact, have way better ability to decide on multiple criteria and are designed mostly to be expressed as variable declarations directly on a class or application module, rather than as a bunch of insanely long functions to be called.
Switch Expressions also ALWAYS have to have a default, unlike Switch Statements which can fall through without executing any code.
With the older expression, it was always possible for something not to match and for you to end up with bad data in an application, the new Switch Statement prevents this by not allowing you to compile your code unless you have a default.
The image below shows what our "Switch Expression" version of the "Postage Price Cost" switch statement, might look like:
If we had to include extra checks on weight, size etc now, we might change the shipping type to a small class and add a couple of methods to check the weight and size, which would then allow us to do the following:
As you can see, we now have 2 lines in the switch statement for each of the FirstClass and SecondClass cases, which increase the postage cost slightly if the weight or size exceeds 10.
Checking a switch statement in this manner is called "Property Matching", the next feature I'm going to introduce you too, takes switch statements and a whole lot more to the next level and beyond.
The first thing that goes through many developers minds when they hear the phrase "Pattern Matching" is regex or "Regular Expressions". Regular Expressions are a branch of software development where you match complicated patterns describing your intent, to a fixed string of data to for example extract a name or an address, so they can be split up into different bits of data.
Pattern matching in C#8 provides a similar idea, but NOT for strings , instead the use case, is against data objects already defined in C# code.
It's more than just searching for patterns though, it allows the developer to be extremely specific about how an object should be matched against a given criteria.
Take for example the postage calculation scenario we used in the last section on switch expressions.
In this example, we used property matching to work out if an entry used a specific delivery type and if it was over a specific weight or not.
Using pattern matching we could explicitly queried the actual variables in the data object, instead of having to add extra logic to our objects for the switch statement to access.
We could even have gone one step further, and checked the actual type of the data being accessed, and used the same decision logic for multiple types of object.
For example, consider the following example, taken from the Microsoft Doc's page on the subject, that allows you to implement the game of "Rock, Paper, Scissors" in ONE SINGLE EXPRESSION, known as a Tuple pattern:
The entire logic of the game is in the tuple pattern based switch statement, allowing us to make decisions on exactly two user provided strings at exactly the same time, but without ever having to build a class or data object, containing checking methods to express our intent.
What's more, reading the code, even to a non-programmer, is pretty straight forward and very understandable, a great bonus for developers trying to explain and demonstrate application logic to a client, boss or manager.
Another disadvantage of the "Property Pattern" approach, is that the person using the data object, needs to have knowledge of the internals of the code used to create it, that is they need to be aware that there's a property called "Weight", they need to know there's a function called "IsHeavyPackage" and that it returns a "Boolean" data type.
Very often when developers create classes to form data objects in an application, they like to "hide" parts of the code from the users of those objects. They don't do this out of malice, they do this to make it easier to understand and objects intent, or to stop internal data from being changed by a user of the object.
Let's think back to our shipping cost calculator, we hard coded a limit of 10, into that code, such so that if it was more than 10, the package was classed as a heavy package.
To the developer of the PackageInfo object, it was implicit to them that, that 10 represented 10kg as the name of the check was defined as "WeightInKg" being greater than 10.
If that 10 value where assigned to a property for the user of the object to access, they might be tempted to set the value to 100, in the belief that 100 is heavy, not realizing the intent of the number being to express "Kg", and since 100 Kg is an exceptionally heavy weight you might have just allowed your code to calculate huge weights, with the mistaken thought they fall into a lower category.
Positional Pattern matching rescues you from that problem, by making things so the internals of the object in question are not visible to the user, but that a value should it be needed for something else be easy to read.
In short, you can extract values from the data object, without running the risk of accidently changing them, and without needing to know what those values represent.
Mores the point, with positional pattern matching, you get to be much, much more expressive with your switch statements, and can use more traditional C# operators such as equals, greater than, and/or along with others, that you might use in a traditional IF/THEN statement.
Let's borrow another example from the Microsoft Docs site:
In this example, we have a plain old data object called point, and all it has added to it is a "Deconstruct" function.
The deconstruct function, is what makes positional property matching possible.
It's entirely up to the object what it fills the returned x & y with, all the consumer of the object cares about is that it gets 2 values.
In our example above, we are extracting them out into the switch statement as x & y, but we could just as easy call them bob and june if we wanted to, it really doesn't matter.
The only thing you really have to be careful of, is the position.
A positional pattern match is exactly as the name suggests, the first variable value you expect will always be whatever the object puts in the first deconstructed parameter.
As you can see in the previous image, we have far more control in what we can actually check.
For a (0,0) we don't do anything other than straight return a result, same with the (default, default) case and the straight forward "default" all on its own. In the four others however, you can see where doing some actual if style logic, right in the object users code, and not in the object itself.
The last type of pattern matching that C#8 brings into reality for us, is Type and Property matching.
The property matching here is used in the same manner that positional matching is used, but can only be done so when the type is also being accessed, and a type orientated object accessor is supplied.
Let's imagine that where building a system to handle shapes.
The different types of shapes we need to deal with are "Circles", "Rectangles", "Squares" and "Polygons".
A "Circle" is an easy item to work with, it only has one data element, and that's a radius.
A "Rectangle" meanwhile, has a "Width" and a "Height", and a "Square" is really just a special rectangle whose sides are equal.
A "Polygon" is a multisided shape, who's only data element is the number of sides it has, but, technically both a rectangle and a square are 4 sided polygons, so could possibly also have just a number of sides.
The point here is that we have 4 distinct entities that our application needs to keep track of, each with different parameters but each also similar in more ways than one.
Take a look at the next example:
On the right, we have 4 different object types, all inherited from a base "Shape" class. Each of these technically are of type "Shape", so we can easily use them as the entry type into the "GetShapeInfo" expression in the left window.
Without the ability to cast the type as we check, we could easily have had to do this as 3 separate switch statements (One for each type), simply so we could be sure we were actually accessing an object that had a radius, or an object that had a width and height.
Look closer though, not only have we reduced the need for 3 switch statements, but we've also been able to detect the edge cases of an equal rectangle being a square, and a 4 sided polygon being a rectangle without having to build or check extra data objects to represent them, as a bonus, we've also been able to check for triangles too, an edge case that was obviously overlooked when we put together the specifications.
When you push the inbound object in to a type bound pattern in this manner, you get FULL access to all public properties and methods in the object, with the ability to do pretty much any calculation you wish on the left side of the switch case before returning the result on the right.
Not only that, but as you can see in the previous image, the cast variable AND it's properties were also available on the right, allowing me to use them in an interpolated string for return to the caller.
That string however, could just have easily been a function to calculate a single value from a specific object type, or even another set of switch expression statements to break things down even further.
So far we've used switch statements to demonstrate the power of the C#8 pattern matching features, but if you start to think of other uses, you'll quickly realise that all of this is available at an object level, and thus all of this extra power for checking on and deconstructing objects, is now available in all the usual places too, such as LINQ statements, Entity Framework criteria, Dataset objects and many, many, many more.
The last feature where going to cover in detail is another very usefull one, designed to make a developers life much easier.
Indices and Ranges
We've all had to deal with arrays at some point in our software development work.
Sometimes there's millions of items, sometimes only a few, but all are the same, they take an index, and providing that index is not outside the number they have, they return that item.
Quite simple right?
Well, yes and no.
For all they are easy to access, it's fairly difficult to get a slice of an array, or to get the exact number of elements from the end without extra code.
For example, if I had an array of words, and that array has 20 words in it, but I only needed words 10, 11 and 12 for the task I was using the array for, I would need to check the length of the array was greater than 10, then I'd need to make sure there was at least 3 more after that, then I'd need 3 statements to copy the 3 found entries into my variables, and that's just a simple case.
Going beyond that, I could easily be creating the array dynamically from an IEnumerable list, and having no idea how many elements I'm getting of even if they have valid data in them.
Then there's the "getting stuff from the end" use case, where I need to find the length of the array, then count back from there remembering to subtract 1 due to arrays being zero based, before then looping over a count of entries I want and reading them out, it all gets quite messy, and often quite fast too.
2 new data types have been added to the C#8 compiler, called "System.Index" and "System.Range" in an attempt to make array access for these common use cases easier.
Take a look at the following example:
At line 21 we define a range from 3 to 6, and number the elements in the array:
You can see that 3,4,5 and 6 are the words "is", "an", "interesting" and "time".
The length of the array is 10 items (0 to 9) and due to the fact that arrays start at 0, item 10 is actually item 9 hence the subtraction of 1 you normally need to do in order to index an array correctly.
This minus 1 on an index means that "Ranges" in C#8 have a peculiar property that might catch you out if you're not careful. The first number in a range is always inclusive, the last number is always exclusive, in the image above you might be thinking that "3..6" will return all 4 of the words mentioned, but in actual fact it only returns 3 ("is", "an" and "interesting) because the 6 is exclusive.
It has to be done this way, so that the range type knows when it overruns the end of an array correctly and more importantly so that the range type can calculate the end correctly too.
A range type can also use the new hat operator the '^' symbol to refer to a count from the end of a sequence, for example if we wanted to take the last 3 words from the array, without caring how long the array was, we'd use the following:
As you can see in the image above we've changed the range used in our array to '..^3' (Two full stops then a hat symbol then the number 3).
If you try to use '^0' then you'll get an exception thrown, because '^0' is syntactically the same as 'words[words.Length]' and as we mentioned previously, the last element in an array is always length minus one due to the start being zero based. '^0' is allowed however, and will cause an error, just as [Length] is allowed and will cause an error so as not to make things too much of a change for developers used to the 0 based numbering.
As you can see from the previous image too, I've defined the "Range" in a separate variable, and passed that variable into the array accessor.
When I covered this subject in the "DotNet core new features" presentation I did for Hainton at their Leeds venue (and later at their Newcastle venue - https://www.youtube.com/channel/UCVhSZ1HyN98wTs3uHGS2BXQ) the most common question that popped up around Indices and Ranges was the subject of defining them as a variable and re-using them.
The Range variable has a "Start" and an "End" property which unfortunately are read only, but because it's a variable you can easily assign a complete new value to it at any point in your code, allowing you to dynamically take slices from your array.
The range type is also open ended, so if you want to start at item number 4 and select the rest of the array after that, it's simply a case of using:
As your range criteria.
Likewise if you want everything from the beginning up to 5 items before the end without caring about or having to work out any values for the array length then the following will do the trick:
One thing you cannot do with an index array however, is deconstruct it using the pattern matching techniques we looked at above, well not by default anyway.
The "Deconstruct" functionality used by positional pattern matching can be expressed as an extension method onto any existing type, so with a little bit of clever coding, you can do the following:
What we've done here is defined a de-constructor for our array type, which unpacks the first 3 items in the array (or part of) passed to it into the 3 given separate variables "word1", "word2" and "word3" using the tuple syntax.
This allows us to use the range type to deconstruct the first three items in any slice passed to the array, which means we can easily slide up and down an array, taking items in groups of 3 at a time, something which could be useful for parsing things like GPS Sentences, or Log file entries.
At this point in the post, we've covered almost 3500 words just on those 3 topics alone, and there's still much, much more we haven’t even mentioned.
From the official Microsoft Doc's page, the full lust of features added to version 8 of the C# compiler goes something like:
Default interface methods
Pattern matching enhancements:
Static local functions
Disposable ref structs
Nullable reference types
Indices and ranges
Unmanaged constructed types
Stackalloc in nested expressions
Enhancement of interpolated verbatim strings
I cover more in the linked video above, and the full list and documentation can be found at: https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-8#more-patterns-in-more-places on the Microsoft Docs site.
I may in a future post if the interest is high enough, revisit this subject and go into detail on other features, it won't be long now however before C#9 is with us, and that will bring its own long list of further new features with it.