An article all the way back from 2003, but still very relevant
TL;DR: state the encodingIf you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII.
Also worth a read: Unicode: The Good, the Bad, and the (mostly) Ugly Link Sadly, the original website is more often down than not, but here are the unformatted slides: https://dheeb.files.wordpress.com/2011/07/gbu.pdf It has a good overview of the various feature support of different languages and problems that you can run into with things like regexes and passwords.
I read this back when it came out; and while it makes some good points, ASCII is still king in the embedded world. Unicode overhead is often not at all necessary, and undesirable, in that realm. Just sayin'. Things are changing, but slowly (i.e. many embedded systems now run some form of Linux under the hood, and have plenty of power to spare for Unicode support).
This is pretty topical. Just yesterday I learned a program I use only accepts ascii file names. It gives a useless error if you give it a Unicode file name via the python API.