Text file encoding
You can use this calculators to encode a text with an encoding.
In the previous article, I already touched on the topic of text encodings, described in more detail Unicode and its UTF-8 representation as a sequence of variable length characters.
This calculator can convert text to several outdated encodings. I call them outdated because, in modern applications, it is possible to use Unicode and its most convenient representation, UTF-8.
However, old encodings can also be useful when you need to compactly encode the text, for example, for subsequent compression and transmission, when the receiving party knows for sure in what encoding the text is transmitted. For example, Russian text encoded in Windows-1251 will take up half the space than text in UTF-8.
So the calculator below allows you to download a file in the selected encoding or view a hexadecimal dump of the encoded text.
You can view the created file using the Text file decoder.
The calculator will return an error if an incompatible encoding is selected. In the case of Unicode, this is not possible - it contains characters from all modern languages. But outdated 8-bit encodings contain a limited set of characters. For text in several languages, the required encoding may not be found at all.
Many encodings were invented for different languages and character sets in the years before Unicode, so choosing the right encoding for your text can be a daunting task. The following calculator finds all encodings compatible with the entered text.
The calculators support 70 different encodings:
IBM EBCDIC
EBCDIC - standard 8-bit encoding developed by IBM for use on IBM mainframes.
Encoding | Languages / Countries |
---|---|
EBCDIC 424 Hebrew | Hebrew |
EBCDIC 037 USA/Canada | USA, Canada, Portugal, Brazil, Australia, New Zealand, South Africa |
EBCDIC 1026 Turkish | Turkish |
EBCDIC 500 International | International |
EBCDIC 875 Greek | Greek |
ISO 8859 encodings
Family of ASCII compliant encodings developed by International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC)
Encoding | Languages / Countries |
---|---|
ISO 8859-2 (Latin-2) | Eastern European languages using the Latin alphabet |
ISO 8859-5 | Cyrillic |
ISO 8859-6 | Arabic |
ISO 8859-7 | Modern Greek |
ISO/IEC 8859-1 (Latin-1) | Western European languages |
ISO/IEC 8859-10 (Latin-6) | Northern European languages |
ISO/IEC 8859-11 | Thai |
ISO/IEC 8859-13 (Latin-7) | Estonian, Latvian, Lithuanian |
ISO/IEC 8859-14 | Celtic languages |
ISO/IEC 8859-15 (Latin-9) | Western European languages |
ISO/IEC 8859-16 (Latin-10) | Eastern European languages using the Latin alphabet |
ISO/IEC 8859-3 | Turkish, Maltese, Esperanto |
ISO/IEC 8859-4 (Latin-4) | Estonian, Latvian, Lithuanian, Greenlandic, Sami |
ISO/IEC 8859-8 | Hebrew |
ISO/IEC 8859-9 | Turkish |
KOI8 encoding family
KOI8 - 8-bit ASCII compatible encoding to represent letters of Cyrillic alphabets
Encoding | Languages |
---|---|
KOI8-R | Russian |
KOI8-U | Ukrainian |
Mac OS Encodings
Encoding | Languages / Countries |
---|---|
Mac OS Celtic | Celtic languages |
Mac OS Gaelic | Gaelic |
Mac OS Central European | Central European languages |
Mac OS Croatian | Croatian |
Mac OS Cyrillic | Cyrillic |
Mac OS Greek | Greek |
Mac OS Icelandic | Icelandic |
Mac OS Inuit | Inuit |
Mac OS Roman | Western European languages |
Mac OS Romanian | Romanian |
Mac OS Turkish | Turkish |
DOS Cod Pages
Encodings for MS-DOS and similar operating systems.
Encoding | Languages / Countries |
---|---|
DOS Latin US (CP437) | Eastern European languages using the Latin alphabet |
DOS Greek (CP737) | Greek |
DOS Baltic Rim (CP775) | Estonian, Latvian, Lithuanian |
DOS Latin 1 (CP850) | Western European languages |
DOS Latin 2 (CP852) | Eastern European languages using the Latin alphabet |
DOS Cyrillic (CP855) | Cyrillic |
CP 856 Hebrew | Hebrew |
DOS Turkish (CP857) | Turkish |
DOS Portuguese (CP860) | Portuguese |
DOS Icelandic (CP861) | Icelandic |
DOS Hebrew (CP862) | Hebrew |
DOS French Canada (CP863) | French |
DOS Arabic (CP864) | Arabic |
DOS Nordic (CP865) | Nordic |
DOS Cyrillic Russian (CP866) | Russian |
DOS Greek 2 (CP869) | Greek |
Windows encodings
Encoding | Languages / Countries |
---|---|
Windows-1250 | Central and Eastern European languages |
Windows-1251 | Russian, Ukrainian Belarusian, Serbian, Macedonian, Bulgarian |
Windows-1252 | Western European languages |
Windows-1253 | Modern Greek |
Windows-1254 | Turkish |
Windows-1255 | Hebrew |
Windows-1256 | Arabic |
Windows-1257 | Estonian, Latvian, Lithuanian |
Windows-1258 | Vietnamese |
Windows-874 | Thai |
Windows-932 | Japanese |
Windows-936 | Simplified Chinese |
Windows-949 | Korean |
Windows-950 | Traditional Chinese |
KZ-1048 | Kazakh |
Others
Encoding | Description |
---|---|
Atari ST | Encoding used in Atari home personal computers |
GSM 03.38 | The encoding was used in GSM networks for SMS, CB (broadcast short messages), and USSD |
KPS 9566 | An encoding developed in North Korea to support Hangul Korean characters |
ISO 8-bit Urdu (IBM CP1006) | The encoding used by IBM on the AIX operating system in Pakistan for the Urdu language |
ISO-IR-68 | Encoding for representing characters in the APL programming language |
The rules for converting encodings to Unicode were obtained from the unicode.org1 site.
-
Unicode encoding mappings http://www.unicode.org/Public/MAPPINGS/ ↩
Comments