Introducing char data types in C ++

Tram Ho

4 years ago

Up to this point, the basic data types we have learned to use for keeping numbers (integers and floating point numbers) or true / false values (booleans). But what if we want to store text? The char data type is designed for such purpose.

Exercises to help improve case handling skills in C ++ ”]

The char data type is an integral type, meaning the underlying value is stored as an integer and it is guaranteed to be 1 byte in size. However, similar to how boolean values are interpreted as true or false, char values are interpreted as ASCII characters.

ASCII stands for Standard American Code for information exchange and it defines a specific way to represent English characters (plus a few other symbols) as numbers from 0 to 127 (called is the ASCII code). For example, the ASCII 97 code is interpreted as the character ‘a.

Char characters are always placed between single quotes.

Here, a full table of ASCII characters:

Code	Symbol	Code	Symbol	Code	Symbol	Code	Symbol
0	NUL (null)	32	(space)	sixty four	@	96	Definition of
first	SOH (start of header)	33	!	65	A	97	a
2	STX (start of text)	34	”	66	B	98	b
3	ETX (end of text)	35	#	sixty seven	C	99	c
4	EOT (end of transmission)	36	$	68	D	100	d
5	ENQ (inquiry)	37	%	69	E	101	e
6	ACK (acknowledge)	38	&	70	F	102	f
7	BEL (bell)	39	‘	71	G	103	g
8	BS (backspace)	40	(	72	H	104	H
9	HT (horizontal tab)	41	)	seventy three	I	105	i
ten	LF (line feed / new line)	42	*	74	J	106	j
11	VT (vertical tab)	43	+	75	K	107	k
twelfth	FF (form feed / new page)	44	,	76	L	108	l
13	CR (carriage return)	45	–	77	M	109	m
14	SO (shift out)	forty six	.	78	N	110	n
15	SI (shift in)	47	/	79	O	111	o
16	DLE (data link escape)	48	0	80	P	112	p
17	DC1 (data control 1)	49	first	81	Q	113	q
18	DC2 (data control 2)	50	2	82	R	114	r
19	DC3 (data control 3)	51	3	83	S	115	S
20	DC4 (data control 4)	52	4	84	T	116	t
21	NAK (negative acknowledge)	53	5	85	U	117	u
22	SYN (synchronous idle)	54	6	eighty six	V	118	v
23	ETB (end of transmission block)	55	7	eighty seven	W	119	w
24	CAN (cancel)	56	8	88	X	120	x
25	EM (end of medium)	57	9	89	Y	121	y
26	SUB (substitute)	58	:	90	Z	122	z
27	ESC (escape)	59	;	91	[	123	{
28	FS (file separator)	60	<	92		124	\|
29	GS (group separator)	sixty one	=	93	]	125	}
30	RS (record separator)	62	>	ninety four	^	126	~
thirty first	US (unit separator)	63	?	95	_	127	DEL (delete)

Code 0-31 are called non-printable characters and they are primarily used to perform printer formatting and control. Most of these are outdated.

Code 32-127 are called printable characters and they represent the letters, numbers and punctuation marks that most computers use to display basic English text.

main content

Character initialization

You can initialize char variables by using the characters:

char ch2{ 'a' }; // initialize with code point for 'a' (stored as integer 97) (preferred)

				1

						char ch2{ 'a' }; // initialize with code point for 'a' (stored as integer 97) (preferred)

You also have

char ch1{ 97 }; // initialize with integer 97 ('a') (not preferred)

1	char ch1{ 97 }; // initialize with integer 97 ('a') (not preferred)

It is possible to initialize characters with integers, but this should be avoided if possible

Be careful not to mix the number of characters with the whole number. The following two initializations are not the same:

char ch{5}; // initialize with integer 5 (stored as integer 5)
char ch{'5'}; // initialize with code point for '5' (stored as integer 53)

				1
2

						char ch{5}; // initialize with integer 5 (stored as integer 5)
char ch{'5'}; // initialize with code point for '5' (stored as integer 53)

Numbers as characters will be used when we want to represent numbers as text, rather than numbers to apply operations.

Print characters

When using std :: cout to print char, std :: cout outputs char variables as ASCII characters:

#include <iostream>
  
int main()
{
    char ch1{ 'a' }; // (preferred)
    std::cout << ch1; // cout prints a character
  
    char ch2{ 98 }; // code point for 'b' (not preferred)
    std::cout << ch2; // cout prints a character
  
  
    return 0;
}

				
					
				1
2
3
4
5
6
7
8
9
10
11
12
13

						#include <iostream>
  
int main()
{
    char ch1{ 'a' }; // (preferred)
    std::cout << ch1; // cout prints a character
  
    char ch2{ 98 }; // code point for 'b' (not preferred)
    std::cout << ch2; // cout prints a character
  
  
    return 0;
}

					

			


This produces the result:

We can also output char directly:

cout << 'c';

				1

						cout << 'c';

This produces the result:

c

				1

						c

The int8_t fixed-width integer is usually treated like a signed char in C ++, so it will usually print as char instead of an integer.

Print characters as integers via static_cast

If we want to output a char as a number instead of a character, we have to tell std :: cout to print char as if it were an integer. One way to do this is to assign char to an integer and print the integer:

#include <iostream>
  
int main()
{
    char ch{97};
    int i(ch); // assign the value of ch to an integer
    std::cout << i; // print the integer value
    return 0;
}

				
					
				1
2
3
4
5
6
7
8
9

						#include <iostream>
  
int main()
{
    char ch{97};
    int i(ch); // assign the value of ch to an integer
    std::cout << i; // print the integer value
    return 0;
}

					

			

However, this is difficult. A better way is to use a cast type. A cast creates a value for one type from a value of another type. To convert between basic data types (for example, from char to int or vice versa), we use static_cast.

The syntax for cast is as follows:

static_cast<new_type>(expression)

				1

						static_cast<new_type>(expression)

static_cast takes the value from an expression as input and converts it into any of the basic types represented by new_type (e.g., int, bool, char, double).

Here, use static_cast to generate integer values from our char values:

#include <iostream>
  
int main()
{
    char ch{ 'a' };
    std::cout << ch << 'n';
    std::cout << static_cast<int>(ch) << 'n';
    std::cout << ch << 'n';
    return 0;
}

#include <iostream>

int main()

{

char ch{ 'a' };

std::cout << ch << 'n';

std::cout << static_cast<int>(ch) << 'n';

std::cout << ch << 'n';

return 0;

}

This results in:

a
97
a

				1
2
3

						a
97
a

It is important to note that the parameter to the evaluation static_cast is an expression. When we pass a variable, that variable is estimated to create its value, which is then converted to a new type. The variable is not affected by passing its value to the new type. In the above case, the variable ch remains char and remains the same.

Also note that the static_cast does not perform any range checking (magnitude), so if you pass a large integer into a char, you will overflow your char type.

We will talk more about static_cast and different types in a future lesson.

Enter the characters

The following program requires the user to enter a character, then prints out both its character and ASCII code:

#include <iostream>
  
int main()
{
    std::cout << "Input a keyboard character: ";
  
    char ch{};
    std::cin >> ch;
    std::cout << ch << " has ASCII code " << static_cast<int>(ch) << 'n';
  
    return 0;
}

				
					
				1
2
3
4
5
6
7
8
9
10
11
12

						#include <iostream>
  
int main()
{
    std::cout << "Input a keyboard character: ";
  
    char ch{};
    std::cin >> ch;
    std::cout << ch << " has ASCII code " << static_cast<int>(ch) << 'n';
  
    return 0;
}

					

			


Here is the output from one run:

Input a keyboard character: q
q has ASCII code 113

1 2	Input a keyboard character: q q has ASCII code 113

Note that std :: cin will allow you to enter multiple characters. However, the variable ch can only contain 1 character. Therefore, only the first input character is extracted to the variable ch. The rest of the user input is left in the input buffer that std :: cin uses and can be extracted by subsequent calls to std :: cin.

You can see this behavior in the following example:

#include <iostream>
  
int main()
{
    std::cout << "Input a keyboard character: "; // assume the user enters "abcd" (without quotes)
  
    char ch{};
    std::cin >> ch; // ch = 'a', "bcd" is left queued.
    std::cout << ch << " has ASCII code " << static_cast<int>(ch) << 'n';
  
    // Note: The following cin doesn't ask the user for input, it grabs queued input!
    std::cin >> ch; // ch = 'b', "cd" is left queued.
    std::cout << ch << " has ASCII code " << static_cast<int>(ch) << 'n';
     
    return 0;
}

#include <iostream>

int main()

{

std::cout << "Input a keyboard character: "; // assume the user enters "abcd" (without quotes)

char ch{};

std::cin >> ch; // ch = 'a', "bcd" is left queued.

std::cout << ch << " has ASCII code " << static_cast<int>(ch) << 'n';

// Note: The following cin doesn't ask the user for input, it grabs queued input!

std::cin >> ch; // ch = 'b', "cd" is left queued.

std::cout << ch << " has ASCII code " << static_cast<int>(ch) << 'n';

return 0;

}

Input a keyboard character: abcd
a has ASCII code 97
b has ASCII code 98

Input a keyboard character: abcd

a has ASCII code 97

b has ASCII code 98

Char size, range and default indication

Char is defined by C ++ to always be 1 byte in size. By default, a char can be signed or unsigned (although it is usually signed). If you use characters to hold ASCII characters, you don’t need to specify a symbol (because both signed and unsigned characters can hold values between 0 and 127).

If you use char to keep integers small (things you shouldn’t do unless you explicitly optimize for space), you must always specify whether or not it’s signed. A signed char can hold a number between -128 and 127. An unsigned char can hold a number between 0 and 255.

Sequence escape a certain line

There are several characters in C ++ that have special meanings. These characters are called escape sequences. An escape sequence begins with a ‘ (backslash) character and then the following letter or number.

You have seen the most common escape sequence: ‘ n, which can be used to embed a new line in a text string:

#include <iostream>
  
int main()
{
    std::cout << "First linenSecond linen";
    return 0;
}

				
					
				1
2
3
4
5
6
7

						#include <iostream>
  
int main()
{
    std::cout << "First linenSecond linen";
    return 0;
}

					

			


This result:

First line
Second line

1 2	First line Second line

Another commonly used escape string is’ t, which embeds a horizontal tab:

#include <iostream>
  
int main()
{
    std::cout << "First parttSecond part";
    return 0;
}

				
					
				1
2
3
4
5
6
7

						#include <iostream>
  
int main()
{
    std::cout << "First parttSecond part";
    return 0;
}

					

			


Which output:

First part        Second part

				1

						First part        Second part

The other three notable escape sequences are:

Print prints a quote
‘Print a double quote
\ print backslash

Here, a table of all strings to help us escape in one line:

Name	Symbol	Meaning
Alert	a	Create an alert, such as a beep
Backspace	b	Move the cursor back to a space
Formfeed	f	Move the cursor to the next logical page
Newline	n	Move the cursor to the next line
Carriage return	r	Move the cursor to the beginning of the line
Horizontal tab	t	Print a horizontal tab
Vertical tab	v	Print a vertical tab
Single quote	‘	Print a quote
Double quote	“	Print a double quote
Backslash	\	Print backslash.
Question mark	?	Print a question mark.
Octal number	(number)	Translate char from octal number
Hex number	x (number)	Translate char from hex number

Here are some examples:

#include <iostream>
  
int main()
{
    std::cout << ""This is quoted text"n";
    std::cout << "This string contains a single backslash \n";
    std::cout << "6F in hex is char 'x6F'n";

				
					
				1
2
3
4
5
6
7

						#include <iostream>
  
int main()
{
    std::cout << ""This is quoted text"n";
    std::cout << "This string contains a single backslash \n";
    std::cout << "6F in hex is char 'x6F'n";

					

			

Print version

"This is quoted text"
This string contains a single backslash 
6F in hex is char 'o'

				1
2
3

						"This is quoted text"
This string contains a single backslash 
6F in hex is char 'o'

New line ( n) with std :: endl

The difference between putting a symbol in parentheses or quotes

Independent characters are always enclosed in single quotes (e.g. ‘a’, ‘+’, ‘5’). A char can only represent a symbol (for example, the letter a, the sign symbol, the number 5). The following example is wrong:

char ch('56'); // a char can only hold one symbol

				1

						char ch('56'); // a char can only hold one symbol

The text between quotation marks (eg, “Hello, world!”) Is called a string. A string is a sequence of consecutive characters (and therefore, a string can contain many symbols).

Currently, you can use the character string in your code:

std::cout << "Hello, world!"; // "Hello, world!" is a string literal

				1

						std::cout << "Hello, world!"; // "Hello, world!" is a string literal

However, strings are not the basic data types in C ++ and it is a bit more complicated, so we’ll reserve the discussion about those data types until we mention types. Compound data.

Always enclose the independent characters in parentheses. This helps the compiler optimize more effectively.

What about other types of char, wchar_t, char16_t and char32_t?

The use of wchar_t should be avoided in most cases (except when communicating with the Windows API). Its dimensions are determined, and unreliable. It has largely been deprecated when used.

Like ASCII maps integers 0-127 to US characters, other character encoding standards exist to map integers (of different sizes) to characters in other languages. . The most famous mapping outside of ASCII is the Unicode standard, which maps more than 110,000 integers to characters in many different languages. Because Unicode contains a lot of code points, a single Unicode code point needs 32 bits to represent a character (called UTF-32). However, Unicode characters can also be encoded with multiple 16-bit or 8-bit characters (called UTF-16 and UTF-8, respectively).

char16_t and char32_t have been added to C ++ 11 to provide clear support for 16-bit and 32-bit Unicode characters. char8_t was added in C ++ 20.

You need to use char8_t, char16_t or char32_t unless you plan to do a Unicode-related program. Unicode and the local language are generally outside the scope of these guidelines, so we will explore in the next section.

Meanwhile, you should only use ASCII characters when working with characters (and strings). Using characters from other character sets may cause your characters to display incorrectly.

Share the news now