Differences between programming languages

Tram Ho

On today’s informatics forums, there are many blog posts and Q&A sections related to the topic “How is the A and B programming language different?”. I think this is an important topic for both programming beginners and longtime programmers. However, through a round of online forums, I found that most of the articles on this topic were unsatisfactory, and worse still, sometimes it contained a lot of false information.

This article will try to systematically explain the differences between programming languages, at the same time correct the misconceptions that I see appearing on many forums / blogs.

Why are so many different programming languages ​​born?

First of all, we need to understand programming language is a tool that helps the programmer issue commands for the computer to execute. It is like a craftsman’s craft. For each different type of job, the mechanic needs to use different tools (eg, for welding, using a welding machine, for drilling, using a drill). Likewise, in the process of building software, people notice that there are many different types of software, with distinct properties. Therefore, different languages ​​are developed to support working effectively with these types of software.

In this article, I will classify programming languages ​​according to the most common criteria: 1) how languages ​​are compiled before computers operate (compilation); 2) the programming style of the language (paradigm).

Compilation process

Instruction set, machine language and Assembly language

Every CPU in a computer has internal memory (called the registry) and a set of instructions that it can execute, called an instruction set . CPUs with different architectures (eg x86, x64, ARM etc …) have different registry and instruction set. No matter what language you use, your program must be converted into the instructions contained in the instruction set above in order for the computer to execute.

Instruction can be expressed in two forms. The first form is the binary code (binary code), that is, it only consists of a sequence of bits 0 and 1 . This is the form that computers can understand and execute directly. Statements in this binary code are called machine language.

Obviously, this kind of language is not very human friendly. Therefore, they also represent instruction in text, called the Assembly language. This Assembly language contains the most basic commands that the CPU can execute, manipulate directly with the CPU registry, for example:

Native language

Although Assembly is easier to read than machine language, since it is just direct instructions for the CPU hardware, for programmers it is still very difficult to use. For example, if we want to add two variables a and b together, we just want to issue a+b , instead of writing a series of registry related commands like in the above example. In other words, the programmer wants to write commands that show his wishes, instead of having to write instructions according to the CPU’s operating mechanism (for example, if you go to a restaurant, you just want to order a bowl of pho, not want to order the chef to follow the steps). In computer science, this concept is called abstraction (abstraction).

Higher level languages ​​are developed to provide abstraction to programmers. Through the process of compilation (compilation), high-level syntax will be transformed into machine language for computers to work. The tool that does this is called the compiler . If the language’s compiler can compile the language directly into machine code, that language is called a Native language. Some common examples of this type of language include C, C ++, Go, Rust … For these languages, the programmer has to compile the code before the program can run. Compiling directly into the machine language is often time-consuming, but so the compiler has the best chance of optimizing the code. This is the reason that Native languages ​​often perform better than other language types.

Common misunderstanding: C / C ++ is a low-level programming language . In essence, a low level language is, by definition, one that does not provide abstraction to the programmer. Therefore, according to this definition, low-level languages ​​include Assembly language and machine language. C / C ++ is classified as a high-level language.

Managed language

The aforementioned Native languages ​​offer abstraction for programmers, but using them has some difficulties:

  • Since the program is compiled directly into the machine language, code written in the Native language is often bound to a type of CPU architecture (x86, x64, ARM).
  • If the program interacts with the operating system (working with files, with networks …) then the code is also bound by the operating system’s APIs. For example, to open a file, the Linux API is open() , while Windows is OpenFile() . This makes it not easy to write software that can run on multiple OSs.

A system of programming languages, including Java, C #, Scala … is developed to solve the above problems. These languages, apart from compiler compilers, are also provided with a virtualization platform, eg Java JVM or .NET for C #. This platform makes it possible for programs programmed in these languages ​​to run on many different OSs. The working mechanism of these languages ​​is as follows:

  • When the developer writes the code, the code uses a common API for all OSs and CPU architectures that the language / platform supports.
  • When compiled, the code will be compiled into an intermediate code by the compiler. This code is relatively close to Assembly code, but still has a certain level of abstraction that prevents the code from being tied to any fixed OS or CPU architecture. For Java / Scala, this code is called Java bytecode , for C #, this code is called that MSIL (Microsoft Intermediate Language) .
  • This intermediate code does not run directly on the CPU, but is run by a virtualization program that comes with the language. This program is called a Virtual machine (or Runtime – depending on the name that the developer of the language coined). It is responsible for converting intermediate code into machine code for CPU to operate. This transformation is called Just-In-Time compilation, or JIT for short. Since the JIT is performed while the software is running, the machine code will not be best optimized (due to time constraints). For this reason, Managed languages ​​often perform worse than Native languages.

Trivia Knowledge: Due to the aforementioned performance limitations, developers often provide a tool that helps convert intermediate code into machine code before the software runs. This tool is called the Ahead-Of-Time compiler, or AOT. If you’ve ever heard of Android Runtime technology on Android, it’s such an AOT tool.

Scripting language

The Scripting line of languages ​​is developed to further simplify writing and running software. Examples of this language line include Python, Ruby, PHP, Javascript, or shell languages ​​like Bash, Powershell … For these languages, compilation is completely omitted. Instead, the code is run through an interpreter . For example, the ./python program.py command will operate the code directly in the program.py file, where python is an interpreter. The Interpreter converts the code into machine code at the moment it is active.

The difference of the Scripting languages ​​compared to the Managed languages ​​above is that the Managed languages ​​have a compiling process of producing intermediate code closer to the machine code. This helps the Managed language’s Runtime to be more optimized for the code. So in general Scripting languages ​​have lower performance than Managed or Native languages. However, its simplicity and ease of use make Scripting languages ​​like Python, Javascript … very popular.

Common Misunderstanding: Scripting languages ​​(Python, Ruby …) allow to code more concisely than others. The brevity or verbosity depends on the syntax of each particular language, not on the compilation (or compilation) mechanism. Traditional OOP languages ​​like Java, C # generally have longer syntax than Scripting languages, but some new functional programming- oriented Managed languages ​​such as Scala, F # have very concise, even more concise syntax Common Scripting languages. Of course, brevity is not the only factor determining superiority of a language.

Garbage collection

Any program that works also needs to use the RAM (memory) of the computer. In modern operating systems, processes when using memory need to ask the operating system to provide memory for them (via APIs like malloc/alloc or new() in C ++). Once used, the process is responsible for releasing (free) that area of ​​memory. If memory is not released, memory leak may occur, affecting the program’s functionality and performance.

C / C ++ languages ​​require programmers to manage memory usage by themselves and free themselves when needed. In many cases, this management is quite complicated. In general, the more code a software contains, the harder it is for programmers to properly manage memory.

Therefore, many programming languages ​​include a garbage collector component, which saves the programmer from having to worry about managing and freeing up memory. C #, Java, Go … and Scripting languages ​​all have this feature. Essentially, garbage collector works on keeping track of the memory addresses requested by the process, and freeing up the corresponding memory area when that address is no longer in use by the process. To achieve this goal, garbage collector generally works in parallel with the process and sometimes suspends the process to free up and rearrange memory. For this reason, garbage collector can decrease the performance of the process.

Programming style (Paradigm)

Static & dynamic data types (static & dynamic typing)

Traditional languages ​​often require the programmer to define the data types when using them in the program. This definition is fixed and cannot be changed in the entire code of that program. For example, as the struct structure of C language:

Here the name and age fields have data types char[] and int that the code working with the person structure must respect, otherwise the compiler will not compile the program. This rule is called static typing , used in languages ​​like C / C ++, C #, Java, Rust, Go …

For many people, having to define a data type makes programming more verbose and laborious. Therefore, many languages, especially Scripting languages ​​(Python, Ruby, Javascript), actively remove this step, allowing programmers to use the data type arbitrarily. This usage of data is called dynamic typing . For example, in Javascript you are allowed to do the following:

So which mechanism is more preeminent? This really depends on the point of view of different programmers. The biggest plus of dynamic typing is that you don’t have to define all the data types, saving you time with programming. However, I personally prefer working with static typing languages, for the following reasons:

  • Performance : Since the data types in dynamic typing can change at any time, accessing the fields within the objects is more complicated. In the example above, when you access person.job for example, the Runtime program will have to work to determine if the job field exists, and what its value is. The only way to do this is to structure each object into a hash table, where the key of the hash table is the name of the field ( job here). Therefore, each time we access the object, we are supposed to perform a hash. With static typing , since you can only access defined fields, this access is as simple as accessing predefined memory addresses. Therefore, static typing languages ​​often have better performance than dynamic typing languages.
  • Ability to catch errors during compilation : The above example also shows that if we use data with the wrong syntax, the compiler will report an error and we can promptly correct it. This will help to reduce errors during the actual program operation. A dumb error that I often encounter when writing Javascript is a “misspelled” error (eg name written as nane ). This error is hard to catch in Javascript because person.nane is completely syntactically person.nane .
  • Better support programming tools . Because static typing defines data types clearly, programming tools (IDE or code editor) are able to help you better in the programming process.

Object Oriented Programming (OOP)

A C language programming style oriented procedures (procedural programming), that is the logic of a program defined in the procedure (procedure), or in the C function call. For many practical business applications, this organization makes it difficult to divide separate logical regions. Therefore, the concept of object oriented programming (OOP) was invented. The main purpose of this type of programming is to simulate a software program similar to the way entities / objects interact with each other in real life. Accordingly, the data and logic of the program are arranged according to objects, interacting with each other through calling each other’s method . Languages ​​that support OOP often come with concepts such as class/object , inheritance , overloading … Some examples include C ++, C #, Java, Kotlin, Swift …

Common Misconception: Java / C ++ / C # … are OOP-style languages. This expression is not very correct, because the above languages ​​allow you to program in many styles, such as aspect-oriented programming or functional programming, not limited to OOP. You can write a Java / C # … program without using OOP. So OOP is more like a feature than a definition of a programming language.

Functional programming

This is an increasingly popular type of programming. To write full about Functional programming is very long, so to make it easy to understand, I will talk briefly about why Functional programming is used (ie it solves any problems).

Software programs all operate by activating the functions / methods (collectively called methods) defined in the source code. A method usually includes input (the input values ​​you provide), output (the value to be returned), and the logic in that method. For a traditional language (Java or C ++ for example), when you call a method, the following can happen:

  • That method can change the value of the input. This means that after that function has finished, the input you entered may have been transformed.
  • That method can change other data in the program.

This change is called side-effect . The main problem with side-effects is that, you can hardly tell when side-effects happen and when they don’t, meaning that in your code the data can be mutated at any point. This makes it difficult to read and understand code as well as find bugs, especially when the program has a large amount of code.

Functional programming solves this problem based on the following principles:

  • Functions are used as the main constituent components of the program. Functions can take regular data types, or as parameters of another function.
  • Functions are not allowed to have side-effects. This means that the output of the function depends only on the input of that function.
  • The program’s objects cannot be transformed (immutable). When it is necessary to establish new data a new object must be instantiated.

Let’s give an example of a simple code: compute the squared value of an array of integers. In traditional programming style, we can have the following code:

The above code changes the value of input nums . This is contrary to the principles of Functional programming . A Functional programming-style code looks like this (F # language):

We see here square is a function, defined as a variable in the program. This function is passed into the List.map function as a parameter, and its result is a list result . Note that input [ 1; 2; 3; 4; 5 ] unchanged, instead result is a new list initialized by List.map to store the output.

F #, Haskell, Erlang … are a few languages ​​developed with Functional programming as the focus. Functional programming, although helping to solve the above problem, is relatively difficult to program Functional programming effectively, especially for systems that need I / O processing. Therefore, Functional programming today is often applied in parallel with traditional programming styles.

Fun comparison of the performance of programming languages

The Benchmark Game Site ( https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html ) is a website that compares the computing performance of multiple languages. Note that this comparison is only done on some specific problems and is for reference only, not conclusive about the actual performance.

Similarly, Techempower Benchmark ( https://www.techempower.com/benchmarks/ ) is a website that compares the performance of Web platforms.

Share the news now

Source : Viblo