Learn about Linq in C # (part 1 – Functional programming)

If you are a .NET developer (or Mono), and you have learned C #, then it is likely that you have used Linq already (or already used it without knowing the name Linq). Linq allows you to write expressions like

or

These are nice and understandable codes that are rare in a language other than C # (or VB). The sad thing is now, most of the documents about Linq in Vietnamese only focus on using Linq syntax to access Database, or do some simple operations on data arrays, without the documentation of the nature of Linq, it is much more powerful than the small applications mentioned above .

Linq is designed to help programmers manipulate data sources with a highly functional syntax (this will be explained in more detail later). Linq is one of the points that makes C # the beauty that few languages ​​have. So understanding the nature of Linq will help you capture any application, from array processing, to data retrieval through Linq to Sql. Understanding Linq also helps you grasp the functional programming concepts ( Functional programming ), to take full advantage of C # and .NET.

This series of articles will write about the academic and nature issues of Linq. You will not learn how to use Linq to access the Database here – those applications will only be reviewed to support the article. The main problem is that we capture what Linq is, and how Linq allows users to write such neat and beautiful code.

Linq in .NET is used on two main interfaces: IEnumerable <T> andIObservable <T>. Articles in this series will mainly write about IEnumerable <T>. IObservable <T> will be covered later for those who like to learn more about the Reactive Extensions library.

What is functional programming?

To understand Linq, we need to know a little bit about function programming (Functional programming, now called FP). Why? Because Linq is designed to use FP style.

The best way to understand the FP concept is to find out why people think of this concept, and what it is to solve the problem. Think back to the first day you went to programming, you received a simple "Hello World" programming exercise that the teacher gave you, and was happy to do it. More and more, the more complex exercises you have to do, the more you have to write code and manipulate many different data structures. This raises 3 basic problems. First, you want to use a logical segment multiple times, with many different data objects. To do this, you separate your program into different processes (procedures). If these processes (call them processes A and B for example) work together on a certain data structure and change that structure together, that means during the operation, A and B must always know how the remaining process works. Try taking the simplest example: show a message when the software has an error:

At first glance, the code looks simple and works correctly. However, think carefully: you must always remember that only HandleError () is allowed to be called after ShowErrorMessage (), if the HandleError () method will not work correctly. When working with a large project, you will not be able to remember all the details like this, but must read and read the same code again, and even so, the error can still occur. Even worse, when you work with others, they will be hard-pressed to guess what the ShowErrorMessage () method does, but must go into reading your code to understand.

The second problem is that processes A and B change the general data structure so that your code becomes confusing, even for the writer himself. In the above example, no one thinks that the Calculate () method is in addition to calculating, but it is possible to change some properties of the class. This change is called side-effects in the function

The third problem is that when the above processes operate independently, it will be difficult for you to test and debug. Testing the Calculate () method above will be difficult, because many properties need to be set up correctly before the Calculate () is called.

Of course, the above code is intended to be created for example purposes (although in fact, there are some such code snippets in use). But if you've ever programmed, it must sometimes fall into the same situation.

The question is: How can FP help me to deal with such situations?

Answer: FP introduces the concept of a "pure" function. A "pure" function is a pure mathematical function, meaning that it receives input values, returns the output value, has no hidden state and does not change any data structure . The phrase "no hidden state" means that your function can only use input, plus fixed informatics (plus subtract multiplication, loop …) to create output. This means that with the same input set, how many times you implement that function, the return value is unique.

Let's try to see, by the above concepts, how the above code can be improved:

Now our code is much better: the Calculate function becomes a void function, in other words an Action, implying that this is a certain process. The CalculateResult, ShowErrorMessage, and HandleError functions merely receive and return values ​​without changing any properties. The above code also has an implicit benefit: When you want to change the computational logic, for the purpose of optimizing or changing program logic, you just need to edit that logic in the CalculateResult function. This makes your function highly reusable (also called modularized).

Today's software is getting more and more complex, so FP is becoming more and more a trend. There are languages ​​designed to program less than 100% in the form of FP (Haskell, F # …) C # is an object-oriented language, but provides a lot of functional programming features. For example, in C #, there is a data type to describe a function: Func <>. A Func <int, bool> means a function that takes an argument of type int, and returns a boolean. For processes that have no return value (also called void function), C # displays as Action <>.

Let's see how we can change the above Calculate Method using these data types

The level of modularization of the code is raised to a "new height". Now the Calculate method is no longer required to use the CalculateResult function to compute, but can use any function provided when called. This technique is called decoupling. It allows the Calculate () method to be more widely used.

In short: So what does this have to do with Linq?

If you don't know anything about FP before reading this article, congratulations. You have more useful knowledge. However, our main topic is still Linq.

The reason why I devoted a long article was to introduce FP because it is the basis for Microsoft engineers to design Linq. Linq is a library to help manipulate collections, or more broadly, data groups. Therefore one of the first criteria of Linq is the preservation of data integrity: All functions in Linq do not transform the existing data structure, but only return the new structure. If you have studied deep enough, you will find that software functions can be attributed to manipulating and using data sets. Therefore, the functions in Linq can be said to be the most powerful functions you can learn, and this is not possible without the concepts of Functional Programming.

ITZone via goatysite

Share the news now