In this blog post, you will learn about C String Literals and their types. As programmers, we frequently use string literals in C/C++ code. So it is important to understand the concept of a string literal.
Here, I will also write some programming examples using the string literals. But before going into the depth let’s first understand what string Literals is in C programming?
What is a String Literals?
A character string literal is a sequence of zero or more multibyte characters enclosed in double quotes.
Example: "abc"
, it is a string literal because it is a sequence of a character and enclosing in the double-quotes.
A wide string literal is the same, except the letters L, u, or U are prefixed. Also, A UTF−8 string literal is the same, except prefixed by u8.
Syntax of different String Literals:
" s-char-sequence " |
(1) | ||||||||
u8" s-char-sequence " |
(2) | (since C11) | |||||||
u" s-char-sequence " |
(3) | (since C11) | |||||||
U" s-char-sequence " |
(4) | (since C11) | |||||||
L" s-char-sequence " |
(5) |
where,
s-char-sequence: Any member of the source character set (except the double quotation mark ("
), backslash (\
), or newline character
) or character escape, hex escape, octal escape, or universal character name (since C99) as defined in escape sequences.
Before explaining each type I want to explain a very important concept. In translation phase 7, A byte or code of value zero (terminating null character) is appended to each multibyte character sequence. It marks the end of, each string literal. After this operation, the multibyte character sequence is then used to initialize an array of static storage duration; and length will be just sufficient to contain the character sequence.
Consider the below example code,
//String literals char* ptr = "aticleworld"; //creates static char[12] array holding {'a','t','i','c','l','e','w','o','r','l','d','\0'} //sets ptr to point to the first element of the array
1.
character string literals:
For character string literals, the array elements have type char
and are initialized with the individual bytes of the multibyte character sequence.
2.
UTF-8 string literals:
For UTF−8 string literals, the array elements have type char
and are initialized with the characters of the multibyte character sequence, as encoded in UTF−8.
3.
Wide string literal:
For wide string literals, we use the letter L as a prefixed. The array elements have type wchar_t
and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the mbstowcs function with an implementation-defined current locale.
4.
16-bit wide string literals:
For 16-bit wide string literals prefixed by the letter u, the array elements have type char16_t
, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by calls to the mbrtoc16 function with an implementation-defined current locale.
5. 32-bit wide string literals:
For 32-bit wide string literals prefixed by the letter U, the array elements have type char32_t
, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by calls to the mbrtoc32 function with an implementation-defined current locale.
Important points related to C string literals:
1. A string literal might not be a string, because a null character may be embedded in it by a \0 escape sequence. It represents an array that contains more than one string.
char* ptr = "aticle\0world"; // strlen(ptr) == 6, but the array has size 13
2. String literals are not modifiable (Immutable). If a program attempts to modify the static array formed by a string literal, the behavior is undefined.
//You can not modify the string literal char* ptr = "aticleworld"; ptr[2] = 'I'; // Undefined behavior
3. At translation phase 6, the adjacent string literals (that is, string literals separated by whitespace only) are concatenated.
For Example,
#include <stdio.h> int main() { char* ptr = "abc" "d"; printf("%s\n",ptr); return 0; }
Output: abcd
But C standard has different rules with different versions regarding concatenation. So let’s see it one by one.
If one literal is unprefixed, the resulting string literal has the width/encoding specified by the prefixed literal (Since C99
). See the below example where the resulting string has the prefixed of on the string.
"a" "b" L"c" "a" L"b" "c" L"a" "b" L"c" L"a" L"b" L"c" is equivalent to the string literal L"abc"
If the two string literals have different encoding prefixes, concatenation is implementation-defined (Since C11
Until C23
).
Note:
UTF-8 string literal and a wide string literal cannot be concatenated.
If the two string literals have different encoding prefixes, concatenation is ill-formed. (Since C23
).
4. Using the backslash (\) you split a string literal in multiple lines. The backslash causes the compiler to ignore the following newline character.
For Example.
#include <stdio.h> int main() { char* ptr = "Aticle\ world"; printf("%s\n",ptr); return 0; }
Output: Aticleworld
Note:
Using backslash (\) you should take care of the indentation, either it will be part of the literal string.
5. In translation phase 7, A terminating null character is appended to each literal string.
6. A literal string can be used to initialize arrays.
For Example.
char arr2[] = "aml"; // arr2 is char[4] holding {'a', 'm', 'l', '\0'} char arr2[4] = "aml"; // arr2 is char[4] holding {'a', 'm', 'l', '\0'}
7. A pointer to character points to first character of the string literal if it is initialized with string literal.
char* ptr = "Aticleworld";
In the above C code, pointer ptr point to the first character of the string literal “Aticleworld” that means it points to the character ‘A’.
8. You can calculate the size of a string literal using the sizeof operator. Consider the below example code,
#include <stdio.h> int main() { const unsigned int len = sizeof("Aticleworld"); printf("%u\n",len); return 0; }
Output: 12