| Unicode {ISOcodes} | R Documentation |
Basic Unicode data, including the Universal Character Set (UCS) code points as defined by the ISO/IEC 10646 International Standard.
data("Unicode")
A data frame with the following variables:
Code:Name:General_Category:Canonical_Combining_Class:Bidi_Class:Decomposition:Numeric_Value_Decimal_Digit:Numeric_Value_Digit:Numeric_Value:Bidi_Mirrored:"Y" and
"N" indicating whether the character has been identified as
a “mirrored” character in bidirectional text or not.Unicode_1_Name:ISO_Comment:Simple_Uppercase_Mapping:Simple_Lowercase_Mapping:Simple_Titlecase_Mapping:
Variable General_Category has the following property values
(levels).
| Lu | Letter, Uppercase |
| Ll | Letter, Lowercase |
| Lt | Letter, Titlecase |
| Lm | Letter, Modifier |
| Lo | Letter, Other |
| Mn | Mark, Nonspacing |
| Mc | Mark, Spacing Combining |
| Me | Mark, Enclosing |
| Nd | Number, Decimal Digit |
| Nl | Number, Letter |
| No | Number, Other |
| Pc | Punctuation, Connector |
| Pd | Punctuation, Dash |
| Ps | Punctuation, Open |
| Pe | Punctuation, Close |
| Pi | Punctuation, Initial quote (may behave like Ps or Pe depending on usage) |
| Pf | Punctuation, Final quote (may behave like Ps or Pe depending on usage) |
| Po | Punctuation, Other |
| Sm | Symbol, Math |
| Sc | Symbol, Currency |
| Sk | Symbol, Modifier |
| So | Symbol, Other |
| Zs | Separator, Space |
| Zl | Separator, Line |
| Zp | Separator, Paragraph |
| Cc | Other, Control |
| Cf | Other, Format |
| Cs | Other, Surrogate |
| Co | Other, Private Use |
| Cn | Other, Not Assigned (no characters in the file have this property) |
Variable Canonical_Combining_Class has the following property
values (levels).
| 0 | Spacing, split, enclosing, reordrant, and Tibetan subjoined |
| 1 | Overlays and interior |
| 7 | Nuktas |
| 8 | Hiragana/Katakana voicing marks |
| 9 | Viramas |
| 10 | Start of fixed position classes |
| 199 | End of fixed position classes |
| 200 | Below left attached |
| 202 | Below attached |
| 204 | Below right attached |
| 208 | Left attached (reordrant around single base character) |
| 210 | Right attached |
| 212 | Above left attached |
| 214 | Above attached |
| 216 | Above right attached |
| 218 | Below left |
| 220 | Below |
| 222 | Below right |
| 224 | Left (reordrant around single base character) |
| 226 | Right |
| 228 | Above left |
| 230 | Above |
| 232 | Above right |
| 233 | Double below |
| 234 | Double above |
| 240 | Below (iota subscript) |
Variable Bidi_Class has the following property values (levels).
| L | Left-to-Right |
| LRE | Left-to-Right Embedding |
| LRO | Left-to-Right Override |
| R | Right-to-Left |
| AL | Right-to-Left Arabic |
| RLE | Right-to-Left Embedding |
| RLO | Right-to-Left Override |
| Pop Directional Format | |
| EN | European Number |
| ES | European Number Separator |
| ET | European Number Terminator |
| AN | Arabic Number |
| CS | Common Number Separator |
| NSM | Non-Spacing Mark |
| BN | Boundary Neutral |
| B | Paragraph Separator |
| S | Segment Separator |
| WS | Whitespace |
| ON | Other Neutrals |
The decomposition types in variable Decomposition are as
follows.
| <font> | A font variant (e.g., a blackletter form). |
| <noBreak> | A no-break version of a space or hyphen. |
| <initial> | An initial presentation form (Arabic). |
| <medial> | A medial presentation form (Arabic). |
| <final> | A final presentation form (Arabic). |
| <isolated> | An isolated presentation form (Arabic). |
| <circle> | An encircled form. |
| <super> | A superscript form. |
| <sub> | A subscript form. |
| <vertical> | A vertical layout presentation form. |
| <wide> | A wide (or zenkaku) compatibility character. |
| <narrow> | A narrow (or hankaku) compatibility character. |
| <small> | A small variant form (CNS compatibility). |
| <square> | A CJK squared font variant. |
| <fraction> | A vulgar fraction form. |
| <compat> | Otherwise unspecified compatibility character. |
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
http://en.wikipedia.org/wiki/Unicode, http://en.wikipedia.org/wiki/ISO_10646; http://www.unicode.org/Public/UNIDATA/UCD.html for details on the Unicode data sets.