Online calculator: Text file encoding

In the previous article, I already touched on the topic of text encodings, described in more detail Unicode and its UTF-8 representation as a sequence of variable length characters.
This calculator can convert text to several outdated encodings. I call them outdated because, in modern applications, it is possible to use Unicode and its most convenient representation, UTF-8.
However, old encodings can also be useful when you need to compactly encode the text, for example, for subsequent compression and transmission, when the receiving party knows for sure in what encoding the text is transmitted. For example, Russian text encoded in Windows-1251 will take up half the space than text in UTF-8.
So the calculator below allows you to download a file in the selected encoding or view a hexadecimal dump of the encoded text.

Encoded text

Encoding

Input text

File

Hex dump

You can view the created file using the Text file decoder.

The calculator will return an error if an incompatible encoding is selected. In the case of Unicode, this is not possible - it contains characters from all modern languages. But outdated 8-bit encodings contain a limited set of characters. For text in several languages, the required encoding may not be found at all.
Many encodings were invented for different languages and character sets in the years before Unicode, so choosing the right encoding for your text can be a daunting task. The following calculator finds all encodings compatible with the entered text.

Choose text encoding

Input text

Encoding

The file is very large. Browser slowdown may occur during loading and creation.

The calculators support 70 different encodings:

IBM EBCDIC

EBCDIC - standard 8-bit encoding developed by IBM for use on IBM mainframes.

Encoding	Languages / Countries
EBCDIC 424 Hebrew	Hebrew
EBCDIC 037 USA/Canada	USA, Canada, Portugal, Brazil, Australia, New Zealand, South Africa
EBCDIC 1026 Turkish	Turkish
EBCDIC 500 International	International
EBCDIC 875 Greek	Greek

ISO 8859 encodings

Family of ASCII compliant encodings developed by International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC)

Encoding	Languages / Countries
ISO 8859-2 (Latin-2)	Eastern European languages using the Latin alphabet
ISO 8859-5	Cyrillic
ISO 8859-6	Arabic
ISO 8859-7	Modern Greek
ISO/IEC 8859-1 (Latin-1)	Western European languages
ISO/IEC 8859-10 (Latin-6)	Northern European languages
ISO/IEC 8859-11	Thai
ISO/IEC 8859-13 (Latin-7)	Estonian, Latvian, Lithuanian
ISO/IEC 8859-14	Celtic languages
ISO/IEC 8859-15 (Latin-9)	Western European languages
ISO/IEC 8859-16 (Latin-10)	Eastern European languages using the Latin alphabet
ISO/IEC 8859-3	Turkish, Maltese, Esperanto
ISO/IEC 8859-4 (Latin-4)	Estonian, Latvian, Lithuanian, Greenlandic, Sami
ISO/IEC 8859-8	Hebrew
ISO/IEC 8859-9	Turkish

KOI8 encoding family

KOI8 - 8-bit ASCII compatible encoding to represent letters of Cyrillic alphabets

Encoding	Languages
KOI8-R	Russian
KOI8-U	Ukrainian

Mac OS Encodings

Encoding	Languages / Countries
Mac OS Celtic	Celtic languages
Mac OS Gaelic	Gaelic
Mac OS Central European	Central European languages
Mac OS Croatian	Croatian
Mac OS Cyrillic	Cyrillic
Mac OS Greek	Greek
Mac OS Icelandic	Icelandic
Mac OS Inuit	Inuit
Mac OS Roman	Western European languages
Mac OS Romanian	Romanian
Mac OS Turkish	Turkish

DOS Cod Pages

Encodings for MS-DOS and similar operating systems.

Encoding	Languages / Countries
DOS Latin US (CP437)	Eastern European languages using the Latin alphabet
DOS Greek (CP737)	Greek
DOS Baltic Rim (CP775)	Estonian, Latvian, Lithuanian
DOS Latin 1 (CP850)	Western European languages
DOS Latin 2 (CP852)	Eastern European languages using the Latin alphabet
DOS Cyrillic (CP855)	Cyrillic
CP 856 Hebrew	Hebrew
DOS Turkish (CP857)	Turkish
DOS Portuguese (CP860)	Portuguese
DOS Icelandic (CP861)	Icelandic
DOS Hebrew (CP862)	Hebrew
DOS French Canada (CP863)	French
DOS Arabic (CP864)	Arabic
DOS Nordic (CP865)	Nordic
DOS Cyrillic Russian (CP866)	Russian
DOS Greek 2 (CP869)	Greek

Windows encodings

Encoding	Languages / Countries
Windows-1250	Central and Eastern European languages
Windows-1251	Russian, Ukrainian Belarusian, Serbian, Macedonian, Bulgarian
Windows-1252	Western European languages
Windows-1253	Modern Greek
Windows-1254	Turkish
Windows-1255	Hebrew
Windows-1256	Arabic
Windows-1257	Estonian, Latvian, Lithuanian
Windows-1258	Vietnamese
Windows-874	Thai
Windows-932	Japanese
Windows-936	Simplified Chinese
Windows-949	Korean
Windows-950	Traditional Chinese
KZ-1048	Kazakh

Others

Encoding	Description
Atari ST	Encoding used in Atari home personal computers
GSM 03.38	The encoding was used in GSM networks for SMS, CB (broadcast short messages), and USSD
KPS 9566	An encoding developed in North Korea to support Hangul Korean characters
ISO 8-bit Urdu (IBM CP1006)	The encoding used by IBM on the AIX operating system in Pakistan for the Urdu language
ISO-IR-68	Encoding for representing characters in the APL programming language

The rules for converting encodings to Unicode were obtained from the unicode.org¹ site.

Unicode encoding mappings http://www.unicode.org/Public/MAPPINGS/ ↩

PLANETCALC Online calculators

Text file encoding

This page exists due to the efforts of the following people:

Anton

Timur

Encoded text

Choose text encoding

IBM EBCDIC

ISO 8859 encodings

KOI8 encoding family

Mac OS Encodings

DOS Cod Pages

Windows encodings

Others

Similar calculators

Comments

PLANETCALC Online calculators

Encoded text

Choose text encoding

IBM EBCDIC

ISO 8859 encodings

KOI8 encoding family

Mac OS Encodings

DOS Cod Pages

Windows encodings

Others

Similar calculators

Comments

Share this page