Your source code is sent to the interpreter encoded in UTF-8,
and is expected to write output encoded in UTF-8 to STDOUT.
For languages where it matters, your code is run in the
en_US locale with a UTF-8 output encoding.
In Unicode-aware languages like Python, this means
print("🙂") and print(chr(0x1f642))
both produce the emoji U+1F642 "Slightly Smiling Face" 🙂, which
is encoded as f0 9f 99 82 in UTF-8.
In less Unicode-aware languages where strings are byte strings,
you might still get away with UTF-8 in string literals. For
example, OCaml treats "🙂" as a string of length 4
(four bytes), but Char.chr 0x1f642 is an error.
In yet other languages, like brainfuck, you have to print the
individual bytes f0 9f 99 82 one by one.
There are two scorings in use, bytes and chars. Bytes is the
number of bytes of a solution encoded in UTF-8. Chars is the
number of Unicode codepoints of a solution. Users may submit up to
two solutions per hole per language. This is handled automatically
when you enter two solutions and each minimizes a different
scoring. Each scoring has its own set of leaderboards. For the
chars scoring, both “A” (U+0041 Latin Capital Letter A) and “😉”
(U+1F609 Winking Face) cost the same despite the 1:4 ratio in byte
count in UTF-8.
Some experimental languages that make heavy use of multi-byte
characters (APL, UIUA, 05AB1E, BQN, Vyxal) have a third scoring
method that does not yet affect leaderboards or solution saving.
This method allows multi-byte characters commonly used in the
language to count for only one stroke, while all other characters
count as one stroke for every UTF-8 byte in their encoding.
Reference
PR#2513
for more detail.