Introducing char data types in C ++

Tram Ho

Up to this point, the basic data types we have learned to use for keeping numbers (integers and floating point numbers) or true / false values ​​(booleans). But what if we want to store text? The char data type is designed for such purpose.

Exercises to help improve case handling skills in C ++ ”]

The char data type is an integral type, meaning the underlying value is stored as an integer and it is guaranteed to be 1 byte in size. However, similar to how boolean values ​​are interpreted as true or false, char values ​​are interpreted as ASCII characters.

ASCII stands for Standard American Code for information exchange and it defines a specific way to represent English characters (plus a few other symbols) as numbers from 0 to 127 (called is the ASCII code). For example, the ASCII 97 code is interpreted as the character ‘a.

Char characters are always placed between single quotes.

Here, a full table of ASCII characters:

0NUL (null)32(space)sixty four@96Definition of
firstSOH (start of header)33!65A97a
2STX (start of text)3466B98b
3ETX (end of text)35#sixty sevenC99c
4EOT (end of transmission)36$68D100d
5ENQ (inquiry)37%69E101e
6ACK (acknowledge)38&70F102f
7BEL (bell)3971G103g
8BS (backspace)40(72H104H
9HT (horizontal tab)41)seventy threeI105i
tenLF (line feed / new line)42*74J106j
11VT (vertical tab)43+75K107k
twelfthFF (form feed / new page)44,76L108l
13CR (carriage return)4577M109m
14SO (shift out)forty six.78N110n
15SI (shift in)47/79O111o
16DLE (data link escape)48080P112p
17DC1 (data control 1)49first81Q113q
18DC2 (data control 2)50282R114r
19DC3 (data control 3)51383S115S
20DC4 (data control 4)52484T116t
21NAK (negative acknowledge)53585U117u
22SYN (synchronous idle)546eighty sixV118v
23ETB (end of transmission block)557eighty sevenW119w
24CAN (cancel)56888X120x
25EM (end of medium)57989Y121y
26SUB (substitute)58:90Z122z
27ESC (escape)59;91[123{
28FS (file separator)60<92124|
29GS (group separator)sixty one=93]125}
30RS (record separator)62>ninety four^126~
thirty firstUS (unit separator)63?95_127DEL (delete)

Code 0-31 are called non-printable characters and they are primarily used to perform printer formatting and control. Most of these are outdated.

Code 32-127 are called printable characters and they represent the letters, numbers and punctuation marks that most computers use to display basic English text.

main content

Character initialization

You can initialize char variables by using the characters:

You also have

It is possible to initialize characters with integers, but this should be avoided if possible

Be careful not to mix the number of characters with the whole number. The following two initializations are not the same:

Numbers as characters will be used when we want to represent numbers as text, rather than numbers to apply operations.

Print characters

When using std :: cout to print char, std :: cout outputs char variables as ASCII characters:

This produces the result:

We can also output char directly:

This produces the result:

The int8_t fixed-width integer is usually treated like a signed char in C ++, so it will usually print as char instead of an integer.

Print characters as integers via static_cast

If we want to output a char as a number instead of a character, we have to tell std :: cout to print char as if it were an integer. One way to do this is to assign char to an integer and print the integer:

However, this is difficult. A better way is to use a cast type. A cast creates a value for one type from a value of another type. To convert between basic data types (for example, from char to int or vice versa), we use static_cast.

The syntax for cast is as follows:

static_cast takes the value from an expression as input and converts it into any of the basic types represented by new_type (e.g., int, bool, char, double).

Here, use static_cast to generate integer values ​​from our char values:

This results in:

It is important to note that the parameter to the evaluation static_cast is an expression. When we pass a variable, that variable is estimated to create its value, which is then converted to a new type. The variable is not affected by passing its value to the new type. In the above case, the variable ch remains char and remains the same.

Also note that the static_cast does not perform any range checking (magnitude), so if you pass a large integer into a char, you will overflow your char type.

We will talk more about static_cast and different types in a future lesson.

Enter the characters

The following program requires the user to enter a character, then prints out both its character and ASCII code:

Here is the output from one run:

Note that std :: cin will allow you to enter multiple characters. However, the variable ch can only contain 1 character. Therefore, only the first input character is extracted to the variable ch. The rest of the user input is left in the input buffer that std :: cin uses and can be extracted by subsequent calls to std :: cin.

You can see this behavior in the following example:

Char size, range and default indication

Char is defined by C ++ to always be 1 byte in size. By default, a char can be signed or unsigned (although it is usually signed). If you use characters to hold ASCII characters, you don’t need to specify a symbol (because both signed and unsigned characters can hold values ​​between 0 and 127).

If you use char to keep integers small (things you shouldn’t do unless you explicitly optimize for space), you must always specify whether or not it’s signed. A signed char can hold a number between -128 and 127. An unsigned char can hold a number between 0 and 255.

Sequence escape a certain line

There are several characters in C ++ that have special meanings. These characters are called escape sequences. An escape sequence begins with a ‘ (backslash) character and then the following letter or number.

You have seen the most common escape sequence: ‘ n, which can be used to embed a new line in a text string:

This result:

Another commonly used escape string is’ t, which embeds a horizontal tab:

Which output:

The other three notable escape sequences are:

Print prints a quote
‘Print a double quote
\ print backslash

Here, a table of all strings to help us escape in one line:

AlertaCreate an alert, such as a beep
BackspacebMove the cursor back to a space
FormfeedfMove the cursor to the next logical page
NewlinenMove the cursor to the next line
Carriage returnrMove the cursor to the beginning of the line
Horizontal tabtPrint a horizontal tab
Vertical tabvPrint a vertical tab
Single quotePrint a quote
Double quotePrint a double quote
Backslash\Print backslash.
Question mark?Print a question mark.
Octal number(number)Translate char from octal number
Hex numberx (number)Translate char from hex number

Here are some examples:

Print version

New line ( n) with std :: endl

The difference between putting a symbol in parentheses or quotes

Independent characters are always enclosed in single quotes (e.g. ‘a’, ‘+’, ‘5’). A char can only represent a symbol (for example, the letter a, the sign symbol, the number 5). The following example is wrong:

The text between quotation marks (eg, “Hello, world!”) Is called a string. A string is a sequence of consecutive characters (and therefore, a string can contain many symbols).

Currently, you can use the character string in your code:

However, strings are not the basic data types in C ++ and it is a bit more complicated, so we’ll reserve the discussion about those data types until we mention types. Compound data.

Always enclose the independent characters in parentheses. This helps the compiler optimize more effectively.

What about other types of char, wchar_t, char16_t and char32_t?

The use of wchar_t should be avoided in most cases (except when communicating with the Windows API). Its dimensions are determined, and unreliable. It has largely been deprecated when used.

Like ASCII maps integers 0-127 to US characters, other character encoding standards exist to map integers (of different sizes) to characters in other languages. . The most famous mapping outside of ASCII is the Unicode standard, which maps more than 110,000 integers to characters in many different languages. Because Unicode contains a lot of code points, a single Unicode code point needs 32 bits to represent a character (called UTF-32). However, Unicode characters can also be encoded with multiple 16-bit or 8-bit characters (called UTF-16 and UTF-8, respectively).

char16_t and char32_t have been added to C ++ 11 to provide clear support for 16-bit and 32-bit Unicode characters. char8_t was added in C ++ 20.

You need to use char8_t, char16_t or char32_t unless you plan to do a Unicode-related program. Unicode and the local language are generally outside the scope of these guidelines, so we will explore in the next section.

Meanwhile, you should only use ASCII characters when working with characters (and strings). Using characters from other character sets may cause your characters to display incorrectly.

Share the news now

Source : Techtalk