Primitive Type char

1.0.0 ·

Expand description

A character type.

The char type represents a single character. More specifically, since ‘character’ isn’t a well-defined concept in Unicode, char is a ‘Unicode scalar value’.

This documentation describes a number of methods and trait implementations on the char type. For technical reasons, there is additional, separate documentation in the std::char module as well.

Validity

A char is a ‘Unicode scalar value’, which is any ‘Unicode code point’ other than a surrogate code point. This has a fixed numerical definition: code points are in the range 0 to 0x10FFFF, inclusive. Surrogate code points, used by UTF-16, are in the range 0xD800 to 0xDFFF.

No char may be constructed, whether as a literal or at runtime, that is not a Unicode scalar value:

// Each of these is a compiler error
['\u{D800}', '\u{DFFF}', '\u{110000}'];

Run

// Panics; from_u32 returns None.
char::from_u32(0xDE01).unwrap();

Run

// Undefined behaviour
unsafe { char::from_u32_unchecked(0x110000) };

Run

USVs are also the exact set of values that may be encoded in UTF-8. Because char values are USVs and str values are valid UTF-8, it is safe to store any char in a str or read any character from a str as a char.

The gap in valid char values is understood by the compiler, so in the below example the two ranges are understood to cover the whole range of possible char values and there is no error for a non-exhaustive match.

let c: char = 'a';
match c {
    '\0' ..= '\u{D7FF}' => false,
    '\u{E000}' ..= '\u{10FFFF}' => true,
};

Run

All USVs are valid char values, but not all of them represent a real character. Many USVs are not currently assigned to a character, but may be in the future (“reserved”); some will never be a character (“noncharacters”); and some may be given different meanings by different users (“private use”).

Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String. For example:

let v = vec!['h', 'e', 'l', 'l', 'o'];

// five elements times four bytes for each element
assert_eq!(20, v.len() * std::mem::size_of::<char>());

let s = String::from("hello");

// five elements times one byte per element
assert_eq!(5, s.len() * std::mem::size_of::<u8>());

Run

As always, remember that a human intuition for ‘character’ might not map to Unicode’s definitions. For example, despite looking similar, the ‘é’ character is one Unicode code point while ‘é’ is two Unicode code points:

let mut chars = "é".chars();
// U+00e9: 'latin small letter e with acute'
assert_eq!(Some('\u{00e9}'), chars.next());
assert_eq!(None, chars.next());

let mut chars = "é".chars();
// U+0065: 'latin small letter e'
assert_eq!(Some('\u{0065}'), chars.next());
// U+0301: 'combining acute accent'
assert_eq!(Some('\u{0301}'), chars.next());
assert_eq!(None, chars.next());

Run

This means that the contents of the first string above will fit into a char while the contents of the second string will not. Trying to create a char literal with the contents of the second string gives an error:

error: character literal may only contain one codepoint: 'é'
let c = 'é';
        ^^^

Another implication of the 4-byte fixed size of a char is that per-char processing can end up using a lot more memory:

let s = String::from("love: ❤️");
let v: Vec<char> = s.chars().collect();

assert_eq!(12, std::mem::size_of_val(&s[..]));
assert_eq!(32, std::mem::size_of_val(&v[..]));

Primitive Type char

Implementations§

impl char

pub const MAX: char = '\u{10ffff}'

pub const REPLACEMENT_CHARACTER: char = '�'

pub const UNICODE_VERSION: (u8, u8, u8) = crate::unicode::UNICODE_VERSION

pub fn decode_utf16<I>(iter: I) -> DecodeUtf16<<I as IntoIterator>::IntoIter> ⓘwhere I: IntoIterator<Item = u16>,

pub const fn from_u32(i: u32) -> Option<char>

pub unsafe fn from_u32_unchecked(i: u32) -> char

pub const fn from_digit(num: u32, radix: u32) -> Option<char>

pub fn is_digit(self, radix: u32) -> bool

pub const fn to_digit(self, radix: u32) -> Option<u32>

pub fn escape_unicode(self) -> EscapeUnicode ⓘ

pub fn escape_debug(self) -> EscapeDebug ⓘ

pub fn escape_default(self) -> EscapeDefault ⓘ

pub const fn len_utf8(self) -> usize

pub const fn len_utf16(self) -> usize

pub fn encode_utf8(self, dst: &mut [u8]) -> &mut str

pub fn encode_utf16(self, dst: &mut [u16]) -> &mut [u16]

pub fn is_alphabetic(self) -> bool

pub fn is_lowercase(self) -> bool

pub fn is_uppercase(self) -> bool

pub fn is_whitespace(self) -> bool

pub fn is_alphanumeric(self) -> bool

pub fn is_control(self) -> bool

pub fn is_numeric(self) -> bool

pub fn to_lowercase(self) -> ToLowercase ⓘ

pub fn to_uppercase(self) -> ToUppercase ⓘ

pub const fn is_ascii(&self) -> bool

pub const fn as_ascii(&self) -> Option<AsciiChar>

pub const fn to_ascii_uppercase(&self) -> char

pub const fn to_ascii_lowercase(&self) -> char

pub const fn eq_ignore_ascii_case(&self, other: &char) -> bool

pub fn make_ascii_uppercase(&mut self)

pub fn make_ascii_lowercase(&mut self)

pub const fn is_ascii_alphabetic(&self) -> bool

pub const fn is_ascii_uppercase(&self) -> bool

pub const fn is_ascii_lowercase(&self) -> bool

pub const fn is_ascii_alphanumeric(&self) -> bool

pub const fn is_ascii_digit(&self) -> bool

pub fn is_ascii_octdigit(&self) -> bool

pub const fn is_ascii_hexdigit(&self) -> bool

pub const fn is_ascii_punctuation(&self) -> bool

pub const fn is_ascii_graphic(&self) -> bool

pub const fn is_ascii_whitespace(&self) -> bool

pub const fn is_ascii_control(&self) -> bool

Trait Implementations§

impl AsciiExt for char

type Owned = char

fn is_ascii(&self) -> bool

fn to_ascii_uppercase(&self) -> Self::Owned

fn to_ascii_lowercase(&self) -> Self::Owned

fn eq_ignore_ascii_case(&self, o: &Self) -> bool

fn make_ascii_uppercase(&mut self)

fn make_ascii_lowercase(&mut self)

impl Clone for char

fn clone(&self) -> char

fn clone_from(&mut self, source: &Self)

impl Debug for char

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Default for char

fn default() -> char

impl Display for char

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl<'a> Extend<&'a char> for String

fn extend<I>(&mut self, iter: I)where I: IntoIterator<Item = &'a char>,

fn extend_one(&mut self, _: &'a char)

fn extend_reserve(&mut self, additional: usize)

impl Extend<char> for String

fn extend<I>(&mut self, iter: I)where I: IntoIterator<Item = char>,

fn extend_one(&mut self, c: char)

fn extend_reserve(&mut self, additional: usize)

impl From<char> for String

fn from(c: char) -> String

impl From<char> for u128

fn from(c: char) -> u128

impl From<char> for u32

fn from(c: char) -> u32

impl From<char> for u64

fn from(c: char) -> u64