4:30 Germany calling (again): Now we DO have capital ß. It was created a few months ago, to deal with exactly that problem. It makes things slightly easier for the future, but right now you have to change your code once more. ;-)
@@mymo_in_Bb No. Before 2017, the official rules were to capitalise ß as SS exclusively. That is still recommended. An actual uppercase character ẞ has existed for a long time, but it only became part of official spelling rules in 2017.
The two dots above the e in Chloë is not an umlaut, it's a diaeresis. Many think that theyre just two names for the same thing, but theyre not; theyre different characters that just happen to look the same. An umlaut, as the German name suggests, is something changing the sound of something; an ö is pronounced different from o. Theyre two separate vowels, basically. A diaerisis on the other hand, is there to mark that those two vowels in a row should be read as two distinct syllables and not as a diphtong. I.e. Chlo-ee and not Chlow. A diaerisis doesnt change the sound of a letter, it's just there to mark that it should be treated as a separate syllable.
You're absolutely right, that's a good spot. I had "heavy metal umlaut" in my head and just came out with the wrong word. (I was also briefly concerned that I'd put the diaeresis over the wrong vowel, but it turns out that Chloe works with one over either vowel!)
You may think of umlauts as different vowels, but in German they're still counted as practically the same letter similarly to accents (when sorting, for instance) - in Swedish, we have the letters åäö which look like umlauts but are distinct letters. Of course our neighbours use other variants. It just keeps getting worse.
LoneTech The Swedish ö is *exactly* the same as the German ö though. How is it just "an accented letter" in one language, and a different letter altogether in the other, when they're identical in both languages? An accent on a letter is there to indicate if it should be stressed or unstressed, long or short, etc, but it doesnt change the actual sound of the letter. That is different from what the ¨ does to the ö in either Swedish or German (or to the ä for that matter). It makes it into a whole new sound, a whole new vowel. It's not an accented o, it's a different vowel altogether. Just like å and a are different vowels, and just like o and ø are different vowels (yes I'm Norwegian). Of this there can be no doubt.
And then you get a call from your users around the world saying "You've hired so many low quality translators just to cut costs that I prefer to use the site in English, thank you!"
@@tanmaypanadi1414 It's less lack of quality of translators than lack of translators in general. The culture among Wikipedia editors is overwhelmingly monolingual and English-speaking.
"Or do what programmers have been doing for many years and say "yea we are only doing it in English" " Also known as "screw it" - the ultimate rage quit
As a Spanish programmer, I suffer this many times with problems using non-English characters like tildes, or having errors in decimals and dates because by default in many systems all conversions are in English format unless the code is adapted. Special problems when connecting with other systems or companies when we need to change the format case by case, because many use Spanish format and many others use English formats
@@____-pb1lg I mean, he's still not wrong though, with the way computers were built, in the mind of an english speaking person, and where they developed most, in the english speaking world, only languages close to englisg are really even plausible for use in computers. You can make it work for other languages, but the further you get from english the more difficult it gets to program to the point where it's really not worth it. Luckily, english is the closest thing earth has to a universal language, so if we had to program it in such a way to be exclusive to one language, we chose the right one.
It does exist. I'm partial to "reverse Ittalics " which are basically ittalics but leaning the other way. There is also something called the double point (I think). Just look up irony punctuation..
+PrimaPunchy we do. I've even mapped it to AltGr+I on my keyboard. Behold the ⸮ (please note this character isn't the Arabic question mark, as it is left to right. It is encoded as 'reversed question mark')
Fun fact: if you create a character in World of Warcraft on a Russian game server, when you put your character's name in, you also need to provide them with how you want the name to show up for 7 different tenses/situations.
And then in England they use the comma as a thousand separator and the full stop as a decimal separator. So, in conclusion: Europe: 100.000,00 UK: 100,000.00 Finland: 100 000,00 Could you please deal with that?
This is even more complicated by inflections when using passive voice. In Czech, you don't say ’A likes this’ but ‘This is liked by A’ where ’by A’ is an inflection of A (it changes suffix and sometimes also root based on very complex rules). On Facebook, the translators use ‘This is liked by user A’ which avoids inflection of A (the proper word for ‘user’ also depends on gender but that was already sorted out by Facebook). But that causes the problem that it is very long and sounds quite unnatural. The proper way would be to put there all those rules but, well, there were at least four PhDs I know about that tried to put those rules to computers as their theses, and it still does not work reliably.
Yes, that is a trick I have used too. There is no way you can know for sure how "A" must be inflected if it is a person's name. It might not even be a name in the same language as that of the text it appears in.
I would love to just add a support for inflection of names. So instead of "This is liked by %NAME%" you write "This is liked by %NAME-dative%" or whatever case it is. Then you have a function that deals with Czech conjugations. It will be updated when users report incorrect conjugations, and it'll be very nice :)
Looking it up, Dothraki doesn't seems like it'd be particularly difficult other than the limited vocabulary, but that's the translator's problem, not the programmer's.
and that actually sometimes breaks some software in some programming languages (I specifically have java in mind) where the standard library just "deals with it" and converting between upper and lowercase does it for you depending on system language. Because if it's some internal text that is expected to be in english, it's just going to break things if it's suddenly a different character. Even when the program isn't localized at all. It's so weird to have have a "Fixed crashes when the OS is in Turkish" in changelog.
THIS. Unlike social networks, operating systems are simply omnipresent, even in places with no internet access, so, depending on the popularity of your OS, you'll have to deal with tribal languages with loads of exceptions in verb flexing and such
The Linux (kernel) strategy, if I understand it correctly, is not to deal with this. It's a userspace problem. The kernel just deals with bits, it doesn't need to care what language/encoding those bits are meant to represent. As far as the rest of the os goes, it just depends on how well those programs are 'internationalis(z)ed'. +Knuxfan24 Many of the programs you'll use on Linux are the same ones you'd use on windows/mac, and if the problem is strictly a userspace one, you're in the same boat as with any other operating system. (and remember it's easier to file a bug for gnome-calculator than it is microsoft paint)
I'm not sure about other OSs, but Windows developers don't even bother to translate from Jargonese into English. I don't feel the least bit sorry for them.
"ß -> SS" That's absolutely right. But the vast majority of Germans doesn't know how to deal with the upper case "ß". I'm glad one of the greatest nerds on RUclips knows. ;) Thanks for the vid!
+PunktKommaNull You may also use the pretty, new designed Letter ẞ instead of SS or SZ - but this is against the official rules, yet. But I think it's the best alternative for writing ß in upper-case letters.
i think the special case for this he was refering to, is when its a name. In that case you dont use "SS", but SZ. At least thats what we do in Austria.
For the Masse and Maße, it is the difference between weight and measurement. And don't forget the reform of the reform wthich brought us a beutiful exception: ß is to use especially when writing proper names, even if it should've been written with ss by the rules.
***** as a Romanian I've never noticed the different plural. Just Wow, I had to count up to twenty too sheep, and noticed 19 sheep folowed by 20 "of" sheep. Loved your vid, it reminded me of similar problems trying to support Resource Table translations for one of my apps.
Tarik Hamani more importantly the Korean Chinese Japanese top down script writes from right to left. But Manchurian and traditional Mongolian top down script writes from left to right.
4:28 FUN FACT: Specifically because of this issue, germany has very recently implemented a capital ß (double-s) character into their orthography. There was no capital ß until then, it was a strict lowercase letter, and that's why it becomes SS when it turns into capital letters.
Год назад+7
Lower case: ß (Straße) Upper case: ẞ (STRAẞE) The traditional capitalization with SS is still valid and common (STRASSE)
And that is why nothing ever comes in Hebrew. Arabic at least has many millions of users, but when you only have 7-9 million native speakers, you don't get very high up on the localization ladder - especially when you are a non-latin, RTL language with masculine and feminine rules and dual plurals and many other rules.
Gavers23 the most annoying was when instagram made hebrew version but not for iphone. Or when you have English and Hebrew on the same document and nothing goes where you want it to.
*+Gavers23* But your argument falls apart. - Certainly, getting a Hebrew localisation before an Arabic localisation might not happen most of the time. But if Arabic is supported, you already got non-Latin, RTL language with masculine and feminine rules and multiple plural forms and so on. And as you said, Arabic has so many speakers that Arabic usually gets support, and when it does, Hebrew can just easily tag along.
@@terner1234 Yes, supporting Hebrew when you can already fully support Arabic is just much better start than only supporting English. I think the hardest part is when you have multiple languages mixed together. In worst case you could have overall layout in Arabic, have some long quotation in English (meaning that the quotation must wrap over multiple lines in the middle of Arabic text) with some Japanese names with Ruby text above the name. And when you can successfully support all that, some joker comes by and messes with your user interface with zalgo text overflowing over all the content.
+sean wilkerson In English, and pretty much all other languages, there are certain words you don't use in formal contexts. In some other languages, the use of common every day words (most often pronouns and verbs) completely changes.
+thermate93 If we are bitching about so many things here, then you version is not okay in poland and hungary :P "sz" in polish is the english "sh" and "sz" in hungarian is the english "s" just to mess things up
For time, the 24 hour clock makes way more sense. It avoids a lot of ambiguity that arises when discussing time in a vague manner. For dates, it makes more sense to go Year-Month-Day than any other system. The start of the week is arbitrary. My Spanish Teacher told me in Mexico they start with Monday because they like putting the weekend next to one another on the calendar. I like that reasoning, honestly. It does look cleaner. Plus, it makes the "weekend" actually make sense. In America the "weekend" is actually the "weekend-week-start" since Sunday is the first day of the week.
Sunday as a weekend always made sense to me- Sunday and Saturday are on either end of the week. If you see the week as a string (...the stringy kind, not the textual programmy kind), then Sunday and Saturday are at the ends.
Day-Month-Year is better for human form because it means you can leave off the year or the month when they're obvious, but Year-Month-Day is better for computer systems since it can be more easily sorted.
Nah, that's not a problem. Users have every right to expect things should work. It's more of a problem if developers resent that user expectations are that the software just works.
In college I had one course of translation theory... this is one of the biggest problems we face when we are translating any content to another culture. It is not only for coding, but it is very nice to see that more than one group of people are working to solve this issue :)
Pssssh you think that's bad Mr. Scott? Internationalization for video games is all that times 10. Imagine not only having those incredible language barriers and nuances, but also having to literally cut or completely change gameplay elements to meet national standards and norms. For example, if your game has a nazi symbol anywhere in it, your game cannot be legally sold in germany, so the only solution is to completely censor that image. As you might guess, many developers choose to just not release their game in germany if it has to do anything with nazis. But it can be worse than that - for example in The Ledgend of Zelda: Ocarina of Time, large parts of the fire dungeon had to be censored because certain symbols and sounds were extremely offensive to the Islamic audience, which lead to boycotting and other social repercussions for Nintendo. There have been many games with instances of homosexual content, particularly between males, which were released without controversy in Japan but required heavy censorship of the homosexual content for an american release. Many times developers have to completely change parts of their design documents because they are told by their investors that they intend to release the game in places where certain design elements could make the video game illegal, so its the developers burden to fix all those issues.... or just dump it on the internationalization team when you're done with it ;-P . Any computer scientists will agree with this: when code crosses boarders, thats when the s*%t hits the fan.
But... America? Europe? Catholics? Protestants? Idiots? Life? Why. -The only thing humans can all agree is that we don't all agree. Not on death, not existence, not how, not why, not where, not when. Not now, not ever. --Demotivational speech of the day
Business applications typically need to deal with grammars of foreign languages, but video games have an artistic component therefore video games need to deal with cultures in story and gameplay. I've heard some games change difficulty between localizations (for example, harder in Japanese/American versions). Then some the fans get angry over ruining the purity of the work (the debate "how much localization is enough". I grew up with crappy Chinese translations of Japanese Game Boy Advance games.
When ANYTHING crosses borders, everything becomes orders of magnitude more complex. I work in shipping and logistics, and luckily the company I work for only has a small international presence. International sales make up maybe 5% of our overall sales, but they make up for probably 30-40% of my headaches.
@@Spikeupine That's new! Am I right to assume from your profile picture that you're Norwegian? The only time I would use an apostrophe to separate numbers is with base-60, like 125' 45" for 125 minutes and 45 seconds, or base-12, like 6'4" for six-foot-four. I'm in Canada.
I'm getting my degree in engineering, but I've always loved foreign languages. Videos like this prove that people in such fields (like computer science) can apply their knack.
I so agree with that. I'm a native swede and all my software is in English. For one, it makes it much easier to find help and tutorials in English, as you are more likely it get a google hit on an English error than the same error in Swedish. And some translations just sound stupid.
"because in general stuff makes more sense in English" I'd rather say: The translations make bog all sense. I mean "Tools" is in German translated with "Extras". WTF? The things in tools aren't extras, they are essential things. No wonder it's the last place the user search for the important functions. Why not use "Werkzeuge"? Because it's a tiny bit longer than "Extras"? Also all help/documentation/tutorials/books are in English, so it's easier to get help/google for help when you use the English version.
blenderpanzi Ah, Extras, my favourite mystery. In Outlook, this is where "Senden/Empfangen" (Send/Receive) is located. The most important function in the program. In a menu called 'Extras'. Huehuehue.
I am Spanish speaker and I downloaded a game developed in Linux, and he used many colloquial words from my country (Argentina), later I realized that it was because of my language pack and it surprised me. I had never played a game in "Spanish-Argentine"
I am a dev in a Japanese company ... that has branches all over the world. Even internal services have to be internationalized ... this ... this made me less lonely and now I know I have brothers and sisters that share the pains of ... this ...
7 лет назад+2
As a Slovak who has localized software in the past, I can identify with many of the exceptions mentioned in the video. - There are different verb suffixes that go with male and female subjects in the past tense. - Two different plurals, one for 2-4 and the other for 5 and more. - But most importantly, nouns change their suffixes based on the action related to the noun. For example: in English you have a single word, such as Peter, and when you mention him with some action, you can just add prepositions, e.g. "to Peter" and create a sentence "Give it to Peter" which can be sent out to localizers as "Give it to NAME" and it works just fine. However, in Slovak (and many Slavic languages too), we change suffixes instead of (or sometimes in addition to) using prepositions. The name Peter would turn to "Petrovi", so the sentence "Give it to Peter" would be "Daj to Petrovi". And now you can't use "Daj to NAME" because our language(s) just don't work like this. It is similar with "from Peter" ("od Petra"), "with Peter" ("s Petrom"), etc. Oh and by the way, there are 12 categories for words in Slovak and each has a different set of suffixes. So remember "from Peter" was "od Petra"? Well, you can't use the same suffix for let's say the name Anna, because "from Anna" is "od Anny". Let's try another name, Lucia: "from Lucia" is "od Lucie". Hats off to Facebook, they have implemented all these rules and categories a few years back and now they use naturally sounding sentences when mentioning users' names. Because often when some software doesn't support this, a workaround is used, which is grammatically (almost) correct, but sounds far from natural. When you add in the word "user", you can apply all the suffixes on it, instead of the name, because that becomes the subject of the sentence. Change "Peter" to "user Peter" and suddenly you can use variables, because the localizer knows exactly what suffixes to use on the word "user" since it does not change and you can throw in the name in its original form after it. Thus "to Peter" is often translated as if it was "to user Peter" and in Slovak that is "používateľovi Peter" and you can finally use variables, like "používateľovi NAME". ("user" is "používateľ", so the suffix "ovi" was applied on that one; oh and notice that when applying suffixes, vowels are ommited and consonants are kept... for the most part, there are more rules for that but I just mentioned it so that you see how many things there are to work out). Before I finish, I should note that even "používateľovi Peter" sounds a bit weird, because if it was done properly, suffixes would change to both the word "user" and the name, so the proper version would be "používateľovi Petrovi", but hey, that is what we have to live with.
this was a really good video topic; I remember one of my CS professors talking about this once, saying that natural language processing would create a great leap forward in terms of applications... I'm just thankful English and Spanish are in the top 5 most used
Thanks for this! It reminds me of how happy I am to have escaped mobile phone game development. At various points in time I've been through nearly all of the topics listed and some extras. Nothing is more frustrating than when your translations come in but with words that are too long to fit into your text box or in some cases on the screen at all (yes Germany, I'm looking at you). When you ask for a shorter alternative you're told there is nothing suitable. Then you have to re-work your code and add special hidden characters that can be used to denote where it is suitable place a hyphen and wrap the remainder of the word on to the next line, but only if it needs to... Typically a better solution (if you can get away with it) is to remove all text and replace it with icons. Yes, different cultures may need their own icons that they can relate to, but it's a lot easier to ask an artist to create an image to signify "this concept", than to deal with each individual language. As they say: A picture paints a thousand words!
+Hendlton Unfortunately, in my experience everyone around the world is happy to use that in online communication to arrange meet-ups, except US Americans.
Davixxa Yes, it is but we have to pick one, we can't just invent a new time zone. As long as everyone follows it, it doesn't matter what time zone it is.
I am American and in Italian class once I learned they did day/month/year, it made so much more sense that I accidentally did it in all of my other classes for about a month.
Yeah, but unlike that video, there's no silver lining here. Timezones are easy when you just keep track of timezones everywhere and use the black box libraries that can handle all the details. The only hard part about timezones is to wrap it around non-developers that a "date" is not a thing worldwide. When you have date such as 2022-03-15 (ISO 8601 syntax) it starts and ends at different times around the globe. You cannot say that e.g. deadline for a homework is 2022-03-15 because that would be 2022-03-15 plus or minus 12 hours. And if you're close to switch between summertime and wintertime, make it plus or minus 13 hours. Plus maybe an extra hour if some country is also changing timezones that year. Any deadline or other exact time should always include date, time and timezone. And the timezone is important because when non-developers set time, they may say that they want "2035-03-15 23:55 Europe/Helsinki" and that means the moment when clocks show that time in Helsinki after all *future* changes to timezones have already been implemented. As a result, you cannot store timezones as time delta to UTC, no matter how many existing systems are already doing so.
@@Z4KIUS Whenever I try to make a clip on Twitch, it sends me to the Polish version of the page, with the error probably being along the lines of 'No clip found'.
The timezone video and this one make Tom sound so like matt smith, the "AAAND THEN!"'s from the timezone video and the "There are So..many..changes" the matt smith-isms are strong in this series :D
Holy moly! My respect to those dealing with internationalisation! Omg! Depending on the app or design you're working on, you might also have to figure out register as you translate and, consequently, define whether it's either formal or casual (usted/tu; Sie/du; vous/tu, etc.).
One game I like to play, is when you buy something that has a little leaflet with the user instructions is all the European languages. Study the translation into your language and try to figure out in which language the instructions were written. Study the odd turns of phrase and word order. It can be fun. One note: In translations from Japanese: Some sentences no verb! (don’t ask why 😊 ).
Alan Bacon Yet since not all jobs run the standard work week you run into other weird things where some have their schedules and pay periods start on Monday because that's when a standard work week starts, but others start on Sunday because that's when a standard week starts. I have two jobs and they each do it the other way. It might also have something to do with military time (24 hour clock) because one uses that and the other doesn't. But then the day you actually get paid is generally Friday, presumably for processing, and the ones that have their pay period start on Sunday still usually do that just to conform with the ones that start on Monday and the ones that give out a salaried pay, but a few just hand it out a day earlier (like my one job) because the pay period ends a day earlier and oh no I've gone cross eyed. tl;dr: America is just a pain in the ass. You have no idea how many people I've not only explained military time to, but also non-American date format and they still just can't wrap their head around it even though it, you know, actually makes real sense and not magical fantasy land sense.
Alan Bacon I'm American and my entire life, at home, or in any institution, school, or workplace, I've never seen anyone start the week on Sunday. Everything I've ever seen has started the week on Monday. In fact, I've always considered it natural; I configure my Outlook calendar to start the week on Monday. In fact, now that I notice paper calendars and such starting the week on Sunday, it's going to annoy me tremendously.
1:15 e.g. to translate the single word "relatable" in French, you have to make a whole sentence explaining that you can understand the feeling/situation very well because you have experienced something very similar in your own life. There is, to my knowledge, no simple equivalent for "relatable" which doesn't alter the meaning of your sentence (I am a native french speaker, but I'm only 16, so maybe I just don't know or don't remember the equivalent).
(I haven't finished reading all the comments so I don't know if this is mentioned by anyone else already... but here we are.) The traditional Mongolian script are supposed to be written top to bottom ONLY. Have fun redesign your whole interface. This problem escaped the limelight because Soviet Union shoved their Cyrillic alphabet to Mongolia in 20th century. Thus, the difficulty of computerization of traditional Mongolian script was ignored until Russia become weak and Mongolia society / government want to revive some of their older days traditional culture. In fact, Chinese / Japanese were supposed to be written top to bottom traditionally too. One can still see many books published in Japan printed in vertical text flow even nowadays. But it is a lot easier for CJK characters to adopt to a horizontal text flow so outsiders almost never notice. Meanwhile, letters in Mongolian script are linked like Arabic, so a faithful conversion to horizontal layout doesn't work for Mongolian. Caution: it is vertical-left-to-right for traditional Mongolian but vertical-right-to-left for traditional Chinese / Japanese. Have extra fun redesign your interface for both of them.
When I was in Israel, the 5 day work week started on Sunday and ended on Thursday. Btw, Americans do this because G-d rested on the seventh day which is, of course, Saturday. This distinction makes more sense in other languages like Spanish or French.
There are so many American standards I hate, like using Imperial, MM/DD/YYYY format, etc. However I do code in American English (from UK) because all libraries and stuff are written using American English. I like to be consistent. A few big changes include: Colour -> Color Initialisation -> Initialization Centre -> Center (Side note: I actually prefer the American spelling as Center)
George_E As far as I can tell, the American date format is like that because it's in order of the lowest maximum to the highest maximum. So as there are only 12 months in a year and up to 31 days in a month, months come before days. It's a weird way of doing it but it does look nicer, just as long as you don't have to actually read it :P
He seems to love the word subtle. His last sentence hits the nail on the head though, This is a problem we have had for decades, maybe centuries if you talk about translation of letters. Human language weren't built by the same people or by the same cultures. There is no solid fix but you also cannot possibly try to account for everything. He said the social network was for the English world to start with. You have to handle things as they come after that unfortunately. The best way to fix these specifics is a system similar to facebook in which the translations can be rated and you can submit quality control reviews. That being said, I do love Tom Scott, very emotive, very informative.
I often get an error when trying to enter my name, claiming "your name cannot contain special characters". Ø is a completely ordinary letter in Denmark, and happens to be part of my last name. It's hilarious being told my name is invalid.
And that can happen even without internationalization considerations. My hyphenated last name sometimes gets rejected by websites built by my fellow US English-speaking developers.
I think you should have included a sentence or two about text input. When you have mixture of LTR and RTL input, your text caret can split into two to show where the next letter is going to be depending on the next letter (the future left to right letter would go to one caret, the future right to left letter would go to another caret). I'm pretty sure implementing that after-the-fact would be pretty hard indeed. And to make things even worse, many languages require IME to enter the text (e.g. traditional Chinese) where you have to render something after entering it partially. For more latin-like letters, combining characters are one example, too.
Plural rules, date formatting, number formatting, month and weekday names, currency symbols and formatting; all that is supported by CLDR. That's your black box you throw into your code. - But you still have to translate the UI.
Solutions to all the problems(not really): 1)Use Unicode but sterilize your inputs 2)Use scientific notation 3)Write out all dates(eg 5/3/19 becomes May 3rd, 2019) 4)When cutting of text, just cut straight through it, even if it part of a character is not visible it's fine.
There is a branch of translation, Localization, that deals with exactly this. It requires web/software designers trusting translators with the source code, but the upside is the translators know what their target language needs, and so a lot of time is saved and a lot of messing around back and forth is avoided.
Oh, while you're at it, I'm in the US, but I want 24-hour time because it makes more sense. And I'd like my clock in UTC so that I can coordinate with people in Poland without us trying to do time zone conversions in conversation. Sure, your other US users don't want this, but just because my neighbors... no, I don't want to spell "neighbors" with a "u" just because I've got some non-US preferences.
I agree, you need a mixture... that's the way it should be. IMO windows got this right, although choosing your location as "US" will set much to US standard, you can modify various bits. I have my location set to the UK, because that's where I am, 24 hr clock (the UK setting in windows seems to like to do am/pm), US keyboard layout (with a British keyboard, but that doesn't matter as I touchtype) And currency using - for negative values rather than ( ) because it makes more sense to me. Windows DID get wrong the 'mess with the computer clock when the time changes' thing. Other operating systems use UTC and simply apply an offset based on your locale.
@@TheChipmunk2008 I agree on the clock thing. I guess they figured some BIOS features such as Autowake would be best served by such an arrangement, but they missed the mark on this.
But you'd still have to do time conversions in conversations... Poland uses CET, which is UTC+1/2 (depending on the time of the year). I totally about your other point though. I prefer using a 24-hour clock but at the same time I prefer American spellings over British ones. And I very rarely can have both. For example, on Discord I have to choose - either a 24-hour clock and british spellings, or a 12-hour clock and american spellings. It's really annoying.
@@Matihood1 Oh, yeah, they're not locally in UTC, but they could work 8-16 UTC or 7-15 UTC (depending to time of year), and be sharing CET business hours while having their clocks read the same as my UTC clock.
@@LowestofheDead Of course, and some of the units don't match either - which wouldn't be that much of an issue except Americans are convinced the world rotates around then and will definitely whine about it.
@@JonasDAtlas Well, most programming languages came from the US of A. Microsoft and Apple are American. Git gud @ murican english or git lost... Is all there is to say.
@@mitpoker7319 Just because the programming language is based on American English doesn't mean any of the user-facing text needs to be... you're kind of missing the point here.
if i had a nickel for every time Tom Scott acknowledged non-binary people with the phrase "if that surprises you then you need to get out more." i would have two nickels. not a lot but cool that it happened twice.
4:27 Well explained. Greetings from Germany 🙂 Surnames and place names follow their own rules. And exceptions to avoid misunderstandings. Trink in Maßen -> Drink moderately. TRINK IN MASSEN -> DRINK IN MASSIVE AMOUNTS. TRINK IN MAßEN -> Drink moderately 😅
Weeks don't really have start though... and Daylight savings needs to exist, just in a standardized format. Otherwise 0100 would be dark for some people, and light for others.
@Santiago Colla its more about making use of that time. if you have more sunlight throughout half the year, why would you waste it by waking up later? it makes much more sense to adjust clocks to utilise the sunlight rather than to go to bed before the sun sets fully or to wake up an hour after the sun rises. if im not mistaken, this practice comes from long before when people would wake up early and go and work out in the field, they would adjust to when the sun was out the longest so that they would not need to work after the sun starts to set.
@Santiago Colla yeah, I agree in a way, but growing up in a country where we change our clocks I can definitely tell you that if you are a kid and your mum tells you to be back home by say 7 or 8 pm, you'd be much happier to stay outside in sunlight for another hour free of charge. But in all seriousness though, it's not a big deal, but those countries who've been using it for decades don't really have a reason to not do it. A thing that does bother me is why some countries have been doing it for many many years, but others haven't. If people relied on sunlight a lot more before, wouldn't it be more global?
@@GamerCo29 No no no no no. Not 0100. 01:00. I know it's technically correct but not having a colon in a timestamp just looks weird, no matter how you look at it.
The two dots above the "e" in "Chloë" are not an umlaut, but a trema. An umlaut would change the sound of the vowel, whereas a trema shows that o and e should be pronounced as to independent vowels.
@@ScorieDivine Depends. I have a guy friend who merged his last name and his wife's last name to make a unique surname. Everyone has their own reasons for taking, or making, new names for themselves.
I love that almost everytime that Tom has talked about gender non-conformity/non-binaryism he always says that, to whom the existance of those concepts may be surprising, they need to go out more.
@Dill Stevens I get it not being literal, but the meaning my head got was "go out of your usual limited space". While that can apply to "being open minded", my thinking was that back then, spaces where people are educated on concepts such as non-binaryism were niche while the outside world (majority of people, online or offline) was mostly not educated on such matters. Things have changed now within the internet thankfully, but outside the internet and outside 1st world countries, it still feels somewhat niche nowadays (although thankfully, such matters are more and more becoming mainstream in developing countries). Hence, why I felt it ironic.
Now need a playlist of "X causes Tom Scott's descent into madness". He does these really well.
Maybe literally with the recent developments with Twitter
@@quokka_yt 😂😂😂
The vape one counts too?
4:30 Germany calling (again): Now we DO have capital ß. It was created a few months ago, to deal with exactly that problem.
It makes things slightly easier for the future, but right now you have to change your code once more. ;-)
@@mymo_in_Bb It had been a Unicode character since 2008, but it became part of official spelling rules in 2017, the time of the original comment.
@@mymo_in_Bb No. Before 2017, the official rules were to capitalise ß as SS exclusively. That is still recommended. An actual uppercase character ẞ has existed for a long time, but it only became part of official spelling rules in 2017.
@@grmpf how can you make them on a keyboard? And a phone keyboard?
@@GoogleUser-dwcy don't know about normal keyboard but on a phone uppercase + long press s
@@GoogleUser-dwcy regular ß is the key to the right of the numberline 0 while uppercase ẞ is that key + Ctrl + Alt + Shift. Or just AltGr and Shift.
The two dots above the e in Chloë is not an umlaut, it's a diaeresis. Many think that theyre just two names for the same thing, but theyre not; theyre different characters that just happen to look the same. An umlaut, as the German name suggests, is something changing the sound of something; an ö is pronounced different from o. Theyre two separate vowels, basically. A diaerisis on the other hand, is there to mark that those two vowels in a row should be read as two distinct syllables and not as a diphtong. I.e. Chlo-ee and not Chlow. A diaerisis doesnt change the sound of a letter, it's just there to mark that it should be treated as a separate syllable.
You're absolutely right, that's a good spot. I had "heavy metal umlaut" in my head and just came out with the wrong word. (I was also briefly concerned that I'd put the diaeresis over the wrong vowel, but it turns out that Chloe works with one over either vowel!)
Any more surprises like that and you'll give Tom an aneurysm.
You may think of umlauts as different vowels, but in German they're still counted as practically the same letter similarly to accents (when sorting, for instance) - in Swedish, we have the letters åäö which look like umlauts but are distinct letters. Of course our neighbours use other variants. It just keeps getting worse.
LoneTech The Swedish ö is *exactly* the same as the German ö though. How is it just "an accented letter" in one language, and a different letter altogether in the other, when they're identical in both languages? An accent on a letter is there to indicate if it should be stressed or unstressed, long or short, etc, but it doesnt change the actual sound of the letter. That is different from what the ¨ does to the ö in either Swedish or German (or to the ä for that matter). It makes it into a whole new sound, a whole new vowel. It's not an accented o, it's a different vowel altogether. Just like å and a are different vowels, and just like o and ø are different vowels (yes I'm Norwegian). Of this there can be no doubt.
Just how. how do people use those.
And then you get a call from your users around the world saying "You've hired so many low quality translators just to cut costs that I prefer to use the site in English, thank you!"
Or we will do it for you.
I guess that what happens with Wikipedia in many situations .
@@tanmaypanadi1414 It's less lack of quality of translators than lack of translators in general. The culture among Wikipedia editors is overwhelmingly monolingual and English-speaking.
I mean I prefer most things in English anyway. Why should I use incomplete or weird sounding translations when I am able to understand English?
@@4cps777 This. So. Much.
"Or do what programmers have been doing for many years and say "yea we are only doing it in English" "
Also known as "screw it" - the ultimate rage quit
Asura same
As a Spanish programmer, I suffer this many times with problems using non-English characters like tildes, or having errors in decimals and dates because by default in many systems all conversions are in English format unless the code is adapted. Special problems when connecting with other systems or companies when we need to change the format case by case, because many use Spanish format and many others use English formats
@ebulating the fact that computers where designed primarily with english in mind somewhat biases that
@@____-pb1lg I mean, he's still not wrong though, with the way computers were built, in the mind of an english speaking person, and where they developed most, in the english speaking world, only languages close to englisg are really even plausible for use in computers. You can make it work for other languages, but the further you get from english the more difficult it gets to program to the point where it's really not worth it. Luckily, english is the closest thing earth has to a universal language, so if we had to program it in such a way to be exclusive to one language, we chose the right one.
@@fiadhgrimm8197 That's true
Our country uses 0's as true and 1's as false.
Still waiting for a fix...
JessLe Berry Some languages don't even allow implicit conversion between int's and booleans.
JessLe Berry Which country is this? It sounds cool.
JessLe Berry My country can't even handle unsigned words.
JessLe Berry that's why languages like algol and pascal were sensible enough to include a Boolean type.
JessLe Berry Were they just trying to be different?
In our language we sometimes write with irony. Can your software add a case for this too?
I'm pretty sure there's an irony mark.
Fernando Santos It exists, but it doesn't have a unicode assigned to it as far as I know
thought2007 That's what what emoticons are for right?
It does exist. I'm partial to "reverse Ittalics " which are basically ittalics but leaning the other way.
There is also something called the double point (I think). Just look up irony punctuation..
+PrimaPunchy we do. I've even mapped it to AltGr+I on my keyboard. Behold the ⸮ (please note this character isn't the Arabic question mark, as it is left to right. It is encoded as 'reversed question mark')
Fun fact: if you create a character in World of Warcraft on a Russian game server, when you put your character's name in, you also need to provide them with how you want the name to show up for 7 different tenses/situations.
Mental
@@cheeseburgermonkey7104 no, beautiful
@@fedferno.
Europeans use "." as a thousand seperator? In Finland we use space. So 100.000,00 would be 100 000,00
Coiuld you please deal with that?
And in Denmark it varies from person to person, can you deal with that as well?
The same in Polish
Well in Singapore (textbooks) and UK, they also use space to separate thousands, and it is used if the number is at least 10 000.
In the UK you do 100 000.00 or 100,000.00
And then in England they use the comma as a thousand separator and the full stop as a decimal separator. So, in conclusion:
Europe: 100.000,00
UK: 100,000.00
Finland: 100 000,00
Could you please deal with that?
"[The] first problem you get... is France"-English history in a nutshell ;-)
North Africans as well.
bruh
To borrow from another Tom Scott series, "The wheel spins and lands on France!"
I got an anxiety attack just listening to him
Kiron Kabir I
420 likes to relieve your anxiety attack after all these years.
It was 430, I made it 431, now I feel evil
You can hear him getting frustrated and it somehow creates a deafening low pitch sound like the ones they use in horror movies to create tension
My brain is fried
This is even more complicated by inflections when using passive voice. In Czech, you don't say ’A likes this’ but ‘This is liked by A’ where ’by A’ is an inflection of A (it changes suffix and sometimes also root based on very complex rules). On Facebook, the translators use ‘This is liked by user A’ which avoids inflection of A (the proper word for ‘user’ also depends on gender but that was already sorted out by Facebook). But that causes the problem that it is very long and sounds quite unnatural. The proper way would be to put there all those rules but, well, there were at least four PhDs I know about that tried to put those rules to computers as their theses, and it still does not work reliably.
But that' NLP, not i18n...
Yes, that is a trick I have used too. There is no way you can know for sure how "A" must be inflected if it is a person's name. It might not even be a name in the same language as that of the text it appears in.
Ah so perhaps that's why Turkish twitter says "so-and-so named user(s) liked this"!
And this is why we should all use Chinese: No grammar necessary
I would love to just add a support for inflection of names. So instead of "This is liked by %NAME%" you write "This is liked by %NAME-dative%" or whatever case it is. Then you have a function that deals with Czech conjugations. It will be updated when users report incorrect conjugations, and it'll be very nice :)
"Hey, that's a nice program you have there, but, how will it work with Klingon?"
* -Smashes computer. *
Congrats, you just won free Internet, and likes
Minecraft works with Klingon :D
What about Dothraki?
Looking it up, Dothraki doesn't seems like it'd be particularly difficult other than the limited vocabulary, but that's the translator's problem, not the programmer's.
...pretty easily honestly. It's a pretty boring conlang; basically just English with a weird vocabulary.
In Turkish "I" and "i" are two different characters, capital "i" is not "I" but "İ".
for me i is I because ı is İ
That's a cursed i
Bruh
Yes. 'I' is pronounced as 'e' (but not really e) in 'brother' and lower case version is ı.
and that actually sometimes breaks some software in some programming languages (I specifically have java in mind) where the standard library just "deals with it" and converting between upper and lowercase does it for you depending on system language. Because if it's some internal text that is expected to be in english, it's just going to break things if it's suddenly a different character. Even when the program isn't localized at all. It's so weird to have have a "Fixed crashes when the OS is in Turkish" in changelog.
I feel sorry for the operating system developers.
THIS. Unlike social networks, operating systems are simply omnipresent, even in places with no internet access, so, depending on the popularity of your OS, you'll have to deal with tribal languages with loads of exceptions in verb flexing and such
Linux ftw.
The Linux (kernel) strategy, if I understand it correctly, is not to deal
with this. It's a userspace problem. The kernel just deals with bits, it
doesn't need to care what language/encoding those bits are meant to
represent. As far as the rest of the os goes, it just depends on how
well those programs are 'internationalis(z)ed'.
+Knuxfan24 Many of the programs you'll use on Linux are the same ones you'd use on windows/mac, and if the problem is strictly a userspace one, you're in the same boat as with any other operating system. (and remember it's easier to file a bug for gnome-calculator than it is microsoft paint)
I'm not sure about other OSs, but Windows developers don't even bother to translate from Jargonese into English. I don't feel the least bit sorry for them.
android is the most international OS, as far as I am concerned.
Welcome back to another episode of Tom Scott slowly descending into insanity!
my favourite
I love these videos with Tom Scott :D
It's especially entertaining when he goes on a rant like this. I loved the timezone video.
Sagamir he is the best
+Sagamir Yes, he is the reason, I clicked on the video.
+Sagamir at the risk of embarrassing him, he is very cool.
+Sagamir exactly :D his rants are awesome :D
And you didn’t even get to conjugations/inflecting! Shame on you! /s
I think that's what the Icelandic bit was about; case.
Fun fact: Chinese and Japanese characters do not have italic variants, instead you're supposed to use special characters for emphasis.
"ß -> SS" That's absolutely right. But the vast majority of Germans doesn't know how to deal with the upper case "ß". I'm glad one of the greatest nerds on RUclips knows. ;) Thanks for the vid!
+PunktKommaNull You may also use the pretty, new designed Letter ẞ instead of SS or SZ - but this is against the official rules, yet. But I think it's the best alternative for writing ß in upper-case letters.
i think the special case for this he was refering to, is when its a name. In that case you dont use "SS", but SZ. At least thats what we do in Austria.
Martin Steindl It's official by now
+Martin Steindl Why is the capital letter just a bold version of the lowercase letter?
For the Masse and Maße, it is the difference between weight and measurement.
And don't forget the reform of the reform wthich brought us a beutiful exception: ß is to use especially when writing proper names, even if it should've been written with ss by the rules.
And then your Indian translator calls and asks "Are you satisfied with your internet service provider package"?
+Linus Fedora Tips Kindly advise.
+Linus Fedora Tips I lol'd.
"Have you tried turning your language off and back on?"
HAHAHAHAHAH!!!
"Please start Event Viewer."
***** as a Romanian I've never noticed the different plural.
Just Wow, I had to count up to twenty too sheep, and noticed 19 sheep folowed by 20 "of" sheep.
Loved your vid, it reminded me of similar problems trying to support Resource Table translations for one of my apps.
PATRU SUUUUUUUTE ȘI CINCIZECI DE OI
@@bitterjames Legenda spune că încă mai numără oi...
@@everythingaboutromania4278 eu făceam reference la un cântec de Zdob și Zdub
ok?
And then Japan, China and Korea call: "Hey, could you also make it so that we can read vertically?"
Omagari Toshi You could also rotate everything (including the monitor) clockwise.
Almost all people in china no longer read vertically
And then Mongolia calls and says: we still use top-down script on computers even if Chinese and Japanese have mostly switched.
@@tariik.h they also use Cyrillic. Which is a pain to read for me as a Russian speaker because it sounds so different than it looks.
Tarik Hamani more importantly the Korean Chinese Japanese top down script writes from right to left. But Manchurian and traditional Mongolian top down script writes from left to right.
I love the “Tom Scott descends into madness talking about global compatibility online” series
"and if that surprises you, you need to get out more..."
The more I watch Tom Scott the more I realize how awesome he is.
Finally subscribed!
To Computerphile, or to Tom Scott?
4:28 FUN FACT: Specifically because of this issue, germany has very recently implemented a capital ß (double-s) character into their orthography. There was no capital ß until then, it was a strict lowercase letter, and that's why it becomes SS when it turns into capital letters.
Lower case: ß (Straße)
Upper case: ẞ (STRAẞE)
The traditional capitalization with SS is still valid and common (STRASSE)
And that is why nothing ever comes in Hebrew. Arabic at least has many millions of users, but when you only have 7-9 million native speakers, you don't get very high up on the localization ladder - especially when you are a non-latin, RTL language with masculine and feminine rules and dual plurals and many other rules.
Gavers23 the most annoying was when instagram made hebrew version but not for iphone. Or when you have English and Hebrew on the same document and nothing goes where you want it to.
@@dm7626
*_trailer horn_*
_"this summer"_
_"prepare for a horror story"_
*_rising strings_*
_"like you've never seen before,"_
_"as one israeli guy"_
_"tries to use"_
*_musical climax_*
_"the arrow keys"_
*_distant screaming_*
*+Gavers23* But your argument falls apart. - Certainly, getting a Hebrew localisation before an Arabic localisation might not happen most of the time. But if Arabic is supported, you already got non-Latin, RTL language with masculine and feminine rules and multiple plural forms and so on. And as you said, Arabic has so many speakers that Arabic usually gets support, and when it does, Hebrew can just easily tag along.
@@Liggliluff hebrew and arabic aren't identical, this is like saying french is easy to implement if you have english implemented
@@terner1234 Yes, supporting Hebrew when you can already fully support Arabic is just much better start than only supporting English. I think the hardest part is when you have multiple languages mixed together. In worst case you could have overall layout in Arabic, have some long quotation in English (meaning that the quotation must wrap over multiple lines in the middle of Arabic text) with some Japanese names with Ruby text above the name.
And when you can successfully support all that, some joker comes by and messes with your user interface with zalgo text overflowing over all the content.
In Japanese, the words change depending on who you are talking to and how polite you want to be.
+zboodles2 Spanish and French, too
Yeah, in American it's called swearing.
*America (not American. That's not a place, or a language)
+sean wilkerson In English, and pretty much all other languages, there are certain words you don't use in formal contexts. In some other languages, the use of common every day words (most often pronouns and verbs) completely changes.
+-Double Negative- While that's true, I don't think it will be a problem for coders because it's going to be a single politeness for every user.
The title should say, "Internationali(s|z)ing Code - Computerphile"
+Jonny Bloom What about Internationalis?z?ing Code - Computerphile?
kabliekke If you were to do that then, Internationaliszing would also be valid, which it shouldn't.
It also could be internationali[sz]ing, it's the shortest
+thermate93 If we are bitching about so many things here, then you version is not okay in poland and hungary :P
"sz" in polish is the english "sh" and "sz" in hungarian is the english "s" just to mess things up
Bitching got us the "/" in the end of the videos :D
Lol Tom Scott on Computerphile...I love how he always breaks into a mental collapse as he assesses a complex problem.
For time, the 24 hour clock makes way more sense. It avoids a lot of ambiguity that arises when discussing time in a vague manner.
For dates, it makes more sense to go Year-Month-Day than any other system.
The start of the week is arbitrary. My Spanish Teacher told me in Mexico they start with Monday because they like putting the weekend next to one another on the calendar. I like that reasoning, honestly. It does look cleaner. Plus, it makes the "weekend" actually make sense. In America the "weekend" is actually the "weekend-week-start" since Sunday is the first day of the week.
Sunday as a weekend always made sense to me- Sunday and Saturday are on either end of the week. If you see the week as a string (...the stringy kind, not the textual programmy kind), then Sunday and Saturday are at the ends.
ISO 8601 for the win
Year-Month-Date makes sense for computer, but for user is Day-Month-Year more readable
@@marekbartos4808 in Hungary, we use y.m.d way, and we read it pretty well :)
Day-Month-Year is better for human form because it means you can leave off the year or the month when they're obvious, but Year-Month-Day is better for computer systems since it can be more easily sorted.
The bigger problem is the users thinking that it's easy...
Nah, that's not a problem. Users have every right to expect things should work. It's more of a problem if developers resent that user expectations are that the software just works.
@@godofbiscuitssf You misunderstood: Users think that developing software is easy.
@@godofbiscuitssf 5 y e a r s a g o
Zai it’s not true anymore?
outasi free vs paid doesn’t matter if it’s a product. “Entitlement” is exactly the bad attitude I’m talking about.
In college I had one course of translation theory... this is one of the biggest problems we face when we are translating any content to another culture. It is not only for coding, but it is very nice to see that more than one group of people are working to solve this issue :)
Pssssh you think that's bad Mr. Scott?
Internationalization for video games is all that times 10. Imagine not only having those incredible language barriers and nuances, but also having to literally cut or completely change gameplay elements to meet national standards and norms. For example, if your game has a nazi symbol anywhere in it, your game cannot be legally sold in germany, so the only solution is to completely censor that image. As you might guess, many developers choose to just not release their game in germany if it has to do anything with nazis. But it can be worse than that - for example in The Ledgend of Zelda: Ocarina of Time, large parts of the fire dungeon had to be censored because certain symbols and sounds were extremely offensive to the Islamic audience, which lead to boycotting and other social repercussions for Nintendo. There have been many games with instances of homosexual content, particularly between males, which were released without controversy in Japan but required heavy censorship of the homosexual content for an american release. Many times developers have to completely change parts of their design documents because they are told by their investors that they intend to release the game in places where certain design elements could make the video game illegal, so its the developers burden to fix all those issues.... or just dump it on the internationalization team when you're done with it ;-P .
Any computer scientists will agree with this: when code crosses boarders, thats when the s*%t hits the fan.
But...
America?
Europe?
Catholics?
Protestants?
Idiots?
Life?
Why.
-The only thing humans can all agree is that we don't all agree. Not on death, not existence, not how, not why, not where, not when.
Not now, not ever.
--Demotivational speech of the day
Business applications typically need to deal with grammars of foreign languages, but video games have an artistic component therefore video games need to deal with cultures in story and gameplay.
I've heard some games change difficulty between localizations (for example, harder in Japanese/American versions).
Then some the fans get angry over ruining the purity of the work (the debate "how much localization is enough".
I grew up with crappy Chinese translations of Japanese Game Boy Advance games.
Censorship is not internationalization. That’s a different story.
When ANYTHING crosses borders, everything becomes orders of magnitude more complex. I work in shipping and logistics, and luckily the company I work for only has a small international presence. International sales make up maybe 5% of our overall sales, but they make up for probably 30-40% of my headaches.
false.
I would wonder how every translator in the world got my number.
You gave it to them when you contracted with them.
Because numbers are easy to translate, except for the comma, fullstop thing.
coweatsman I learnt using an aposthrophy to divide numbers like this 100'000.00
@@Spikeupine That's new! Am I right to assume from your profile picture that you're Norwegian?
The only time I would use an apostrophe to separate numbers is with base-60, like 125' 45" for 125 minutes and 45 seconds, or base-12, like 6'4" for six-foot-four. I'm in Canada.
From the contact information on the site, you provide your service on I'd say.
I'm getting my degree in engineering, but I've always loved foreign languages. Videos like this prove that people in such fields (like computer science) can apply their knack.
I don't install software in my native language, I install software in English because in general stuff makes more sense in English
Yup. Most translations are so awful you wonder why they even got paid for it. And if the translation isn't bad, it's the VO that ruins it.
I so agree with that. I'm a native swede and all my software is in English. For one, it makes it much easier to find help and tutorials in English, as you are more likely it get a google hit on an English error than the same error in Swedish. And some translations just sound stupid.
Same here, in everything I do OS and Internet wise, I pretend to be U.S. American if possible. It nearly always offers advantages.
"because in general stuff makes more sense in English"
I'd rather say: The translations make bog all sense. I mean "Tools" is in German translated with "Extras". WTF? The things in tools aren't extras, they are essential things. No wonder it's the last place the user search for the important functions. Why not use "Werkzeuge"? Because it's a tiny bit longer than "Extras"?
Also all help/documentation/tutorials/books are in English, so it's easier to get help/google for help when you use the English version.
blenderpanzi Ah, Extras, my favourite mystery. In Outlook, this is where "Senden/Empfangen" (Send/Receive) is located.
The most important function in the program.
In a menu called 'Extras'.
Huehuehue.
Every computerphile video he makes he genuinely seems done with everything.
Oh wow. You just made this translator so happy with this nice summary of all the "problems" we are confronted with!
4:20 Good news, German has officially now a capital version of the "ß"... ;-)
I am Spanish speaker and I downloaded a game developed in Linux, and he used many colloquial words from my country (Argentina), later I realized that it was because of my language pack and it surprised me.
I had never played a game in "Spanish-Argentine"
As a developer and a native Hebrew speaker, I can say that I never want to translate my apps to Hebrew. My language is terrible :-(
I lowkey support LTR flipped Hebrew just to not deal with bidi and reversed text and-
_a-and the.._
*_THE ARROW KEYS_*
@@unflexian Let's just invent a LTR Hebrew alphabet, or just flipped the Hebrew alphabet.
Just use Yiddish, problem solved 😂
@@unflexian The arrows are the worst part!!!!!
I can never scroll through my bilingual documents! I always get stuck somewhere in a Unicode loophole!
ok?
And then your translator from Inner Mongolia calls and says: "Well, our script goes in vertical lines"
I am a dev in a Japanese company ... that has branches all over the world. Even internal services have to be internationalized ... this ... this made me less lonely and now I know I have brothers and sisters that share the pains of ... this ...
As a Slovak who has localized software in the past, I can identify with many of the exceptions mentioned in the video.
- There are different verb suffixes that go with male and female subjects in the past tense.
- Two different plurals, one for 2-4 and the other for 5 and more.
- But most importantly, nouns change their suffixes based on the action related to the noun. For example: in English you have a single word, such as Peter, and when you mention him with some action, you can just add prepositions, e.g. "to Peter" and create a sentence "Give it to Peter" which can be sent out to localizers as "Give it to NAME" and it works just fine. However, in Slovak (and many Slavic languages too), we change suffixes instead of (or sometimes in addition to) using prepositions. The name Peter would turn to "Petrovi", so the sentence "Give it to Peter" would be "Daj to Petrovi". And now you can't use "Daj to NAME" because our language(s) just don't work like this. It is similar with "from Peter" ("od Petra"), "with Peter" ("s Petrom"), etc. Oh and by the way, there are 12 categories for words in Slovak and each has a different set of suffixes. So remember "from Peter" was "od Petra"? Well, you can't use the same suffix for let's say the name Anna, because "from Anna" is "od Anny". Let's try another name, Lucia: "from Lucia" is "od Lucie". Hats off to Facebook, they have implemented all these rules and categories a few years back and now they use naturally sounding sentences when mentioning users' names. Because often when some software doesn't support this, a workaround is used, which is grammatically (almost) correct, but sounds far from natural. When you add in the word "user", you can apply all the suffixes on it, instead of the name, because that becomes the subject of the sentence. Change "Peter" to "user Peter" and suddenly you can use variables, because the localizer knows exactly what suffixes to use on the word "user" since it does not change and you can throw in the name in its original form after it. Thus "to Peter" is often translated as if it was "to user Peter" and in Slovak that is "používateľovi Peter" and you can finally use variables, like "používateľovi NAME". ("user" is "používateľ", so the suffix "ovi" was applied on that one; oh and notice that when applying suffixes, vowels are ommited and consonants are kept... for the most part, there are more rules for that but I just mentioned it so that you see how many things there are to work out). Before I finish, I should note that even "používateľovi Peter" sounds a bit weird, because if it was done properly, suffixes would change to both the word "user" and the name, so the proper version would be "používateľovi Petrovi", but hey, that is what we have to live with.
Computerphile is just the kind of channel I would love to binge watch, and actually learn something useful! Nice work team!
this was a really good video topic; I remember one of my CS professors talking about this once, saying that natural language processing would create a great leap forward in terms of applications... I'm just thankful English and Spanish are in the top 5 most used
ok?
Thanks for this! It reminds me of how happy I am to have escaped mobile phone game development. At various points in time I've been through nearly all of the topics listed and some extras. Nothing is more frustrating than when your translations come in but with words that are too long to fit into your text box or in some cases on the screen at all (yes Germany, I'm looking at you). When you ask for a shorter alternative you're told there is nothing suitable. Then you have to re-work your code and add special hidden characters that can be used to denote where it is suitable place a hyphen and wrap the remainder of the word on to the next line, but only if it needs to...
Typically a better solution (if you can get away with it) is to remove all text and replace it with icons. Yes, different cultures may need their own icons that they can relate to, but it's a lot easier to ask an artist to create an image to signify "this concept", than to deal with each individual language. As they say: A picture paints a thousand words!
When I colonize a distant planet, we're only going to have one universal language.
..and time zone.
Well we have UTC and most of the world speaks English so we're heading there.
+Hendlton Unfortunately, in my experience everyone around the world is happy to use that in online communication to arrange meet-ups, except US Americans.
+Hendlton UTC is just a rebranded GMT though?
Davixxa Yes, it is but we have to pick one, we can't just invent a new time zone. As long as everyone follows it, it doesn't matter what time zone it is.
Hendlton I mean, for example, in Denmark, we use UTC+1 (+2 with Daylight savings) - That's not following UTC per say.
I am American and in Italian class once I learned they did day/month/year, it made so much more sense that I accidentally did it in all of my other classes for about a month.
ok?
@@Triantalex i made this comment 7 years ago that's crazy
i don't even like d/m/y anymore
Tom Scott is such a wonderful host, I could watch him all day...
This reminded me of the time zone video. His rants are hilarious. There really are so many irregular cases you can't get your head around all of them.
Yeah, but unlike that video, there's no silver lining here. Timezones are easy when you just keep track of timezones everywhere and use the black box libraries that can handle all the details. The only hard part about timezones is to wrap it around non-developers that a "date" is not a thing worldwide. When you have date such as 2022-03-15 (ISO 8601 syntax) it starts and ends at different times around the globe. You cannot say that e.g. deadline for a homework is 2022-03-15 because that would be 2022-03-15 plus or minus 12 hours. And if you're close to switch between summertime and wintertime, make it plus or minus 13 hours. Plus maybe an extra hour if some country is also changing timezones that year. Any deadline or other exact time should always include date, time and timezone. And the timezone is important because when non-developers set time, they may say that they want "2035-03-15 23:55 Europe/Helsinki" and that means the moment when clocks show that time in Helsinki after all *future* changes to timezones have already been implemented. As a result, you cannot store timezones as time delta to UTC, no matter how many existing systems are already doing so.
false.
I'm Croatian and I never use localized software because in comparison with English it just feels bizarre.
Right. In English you know what it means. Localized, you have to think what the translator meant, it just doesn't feel natural.
this Polish user of apps in English hops on your train :D
@@Z4KIUS Whenever I try to make a clip on Twitch, it sends me to the Polish version of the page, with the error probably being along the lines of 'No clip found'.
isto
ok?
The timezone video and this one make Tom sound so like matt smith, the "AAAND THEN!"'s from the timezone video and the "There are So..many..changes" the matt smith-isms are strong in this series :D
that's why I love him sooo much
"AAAND THEN, Daleks attack and exterminate everyone."
"AAAND THEN, Daleks attack and exterminate everyone."
But they can fly now, since 2005 :(
(or 2012... whichever way you look at it)
***** the van stattan's vault episode in the Christopher ecclestone series aired in 2005
Holy moly! My respect to those dealing with internationalisation! Omg!
Depending on the app or design you're working on, you might also have to figure out register as you translate and, consequently, define whether it's either formal or casual (usted/tu; Sie/du; vous/tu, etc.).
3:39 I’m romanian and I didn’t realise we did that untill now xD
Edit: btw, if you’re curious, we say (2-19) ducks; (20+) *of* ducks
1 duck, 2 ducks, 3 ducks... 18 ducks 19 ducks, 20 of ducks, 21 of ducks
Like that?
ok?
I just love his way of explaining, espacialy with the mental breakdown thing going on at the mind of the coder
One game I like to play, is when you buy something that has a little leaflet with the user instructions is all the European languages. Study the translation into your language and try to figure out in which language the instructions were written. Study the odd turns of phrase and word order. It can be fun.
One note: In translations from Japanese: Some sentences no verb! (don’t ask why 😊 ).
Norwegians would need *two* translations - Bokmål and Nynorsk.
I just made your headache even worse.
false.
@@Triantalex You say that on practically every comment
I had no idea Americans start the week on Sunday.
Alan Bacon We do. Europe did too, once upon a time when they were still religious.
Alan Bacon Yet since not all jobs run the standard work week you run into other weird things where some have their schedules and pay periods start on Monday because that's when a standard work week starts, but others start on Sunday because that's when a standard week starts. I have two jobs and they each do it the other way. It might also have something to do with military time (24 hour clock) because one uses that and the other doesn't. But then the day you actually get paid is generally Friday, presumably for processing, and the ones that have their pay period start on Sunday still usually do that just to conform with the ones that start on Monday and the ones that give out a salaried pay, but a few just hand it out a day earlier (like my one job) because the pay period ends a day earlier and oh no I've gone cross eyed.
tl;dr: America is just a pain in the ass. You have no idea how many people I've not only explained military time to, but also non-American date format and they still just can't wrap their head around it even though it, you know, actually makes real sense and not magical fantasy land sense.
Alan Bacon We start it on Sunday in Brazil, too... Always thought it was weird, though haha
Alan Bacon I'm American and my entire life, at home, or in any institution, school, or workplace, I've never seen anyone start the week on Sunday. Everything I've ever seen has started the week on Monday. In fact, I've always considered it natural; I configure my Outlook calendar to start the week on Monday.
In fact, now that I notice paper calendars and such starting the week on Sunday, it's going to annoy me tremendously.
Alan Bacon The start of the week is entirely arbitrary. Sunday made more sense in the past, Monday makes more sense today.
1:15 e.g. to translate the single word "relatable" in French, you have to make a whole sentence explaining that you can understand the feeling/situation very well because you have experienced something very similar in your own life. There is, to my knowledge, no simple equivalent for "relatable" which doesn't alter the meaning of your sentence (I am a native french speaker, but I'm only 16, so maybe I just don't know or don't remember the equivalent).
And do I have $5 in my pocket or 5$?
$5
£5
$5, I'm pretty sure. Assuming you're using USD.
It doesn't make sense. You have five dollars, not dollars five. But of course Americans never make sense.
$5.- of course, or preferably €5.-; Never forget to add cents.
Basicaly: Why coders get depression when they have to code multi-lingual systems
false.
(I haven't finished reading all the comments so I don't know if this is mentioned by anyone else already... but here we are.)
The traditional Mongolian script are supposed to be written top to bottom ONLY. Have fun redesign your whole interface. This problem escaped the limelight because Soviet Union shoved their Cyrillic alphabet to Mongolia in 20th century. Thus, the difficulty of computerization of traditional Mongolian script was ignored until Russia become weak and Mongolia society / government want to revive some of their older days traditional culture.
In fact, Chinese / Japanese were supposed to be written top to bottom traditionally too. One can still see many books published in Japan printed in vertical text flow even nowadays. But it is a lot easier for CJK characters to adopt to a horizontal text flow so outsiders almost never notice. Meanwhile, letters in Mongolian script are linked like Arabic, so a faithful conversion to horizontal layout doesn't work for Mongolian.
Caution: it is vertical-left-to-right for traditional Mongolian but vertical-right-to-left for traditional Chinese / Japanese. Have extra fun redesign your interface for both of them.
6:49 I like to imagine Tom with this manic smirk on his face as he drives the horror home. "But... that doesn't work here."
Me : "I love how this is basically an eight minute rant"
Tom : "The last rant..."
Me : Well... shit.
"And then someone promptly breaks your site"
The matter of fact way he said this struck me as very funny.
Surprised you didn't mention Turkish. :)
In short: i18n is hell. I worked on that, you always find new fun ways that the code can break.
I like this guy. He's really interesting to listen to.
I really love it when Tom gets irked up. It's very enjoyable to watch. I can almost feel the blood and tears of a thousand coders.
After learning HTML, I finally understand the thing at the end.
"and if that surprises you, you need to get out more." based tom scott living in the future
Exactly haha 8 years ago as well
Wow, I didn't even check the upload date. An even bigger W then
That was the best sentence of the whole video. And 8 years ago?! I would hardly have understood it back then.
@@drunkenhobo8020I didn’t know catholic priests moved to reddit
false.
American weeks start on a sunday? Now I've heard everything.
we call them weekends because of our work week not because of where they are in the week itself
A Monday. As it should.
***** So little kids and the unemployed aren't entitled to Saturdays and Sundays? What a country you live in /s/
***** Don't forget about Imperial System too
When I was in Israel, the 5 day work week started on Sunday and ended on Thursday.
Btw, Americans do this because G-d rested on the seventh day which is, of course, Saturday. This distinction makes more sense in other languages like Spanish or French.
There are so many American standards I hate, like using Imperial, MM/DD/YYYY format, etc.
However I do code in American English (from UK) because all libraries and stuff are written using American English. I like to be consistent.
A few big changes include:
Colour -> Color
Initialisation -> Initialization
Centre -> Center
(Side note: I actually prefer the American spelling as Center)
George_E As far as I can tell, the American date format is like that because it's in order of the lowest maximum to the highest maximum. So as there are only 12 months in a year and up to 31 days in a month, months come before days. It's a weird way of doing it but it does look nicer, just as long as you don't have to actually read it :P
Absolutely agree. ISO8601 or Unix time and measure everything in seconds would be so much better.
He seems to love the word subtle. His last sentence hits the nail on the head though, This is a problem we have had for decades, maybe centuries if you talk about translation of letters. Human language weren't built by the same people or by the same cultures. There is no solid fix but you also cannot possibly try to account for everything. He said the social network was for the English world to start with. You have to handle things as they come after that unfortunately. The best way to fix these specifics is a system similar to facebook in which the translations can be rated and you can submit quality control reviews. That being said, I do love Tom Scott, very emotive, very informative.
The sum up is brilliant and yet this is how it is for many cases
I often get an error when trying to enter my name, claiming "your name cannot contain special characters".
Ø is a completely ordinary letter in Denmark, and happens to be part of my last name. It's hilarious being told my name is invalid.
And that can happen even without internationalization considerations. My hyphenated last name sometimes gets rejected by websites built by my fellow US English-speaking developers.
I think you should have included a sentence or two about text input. When you have mixture of LTR and RTL input, your text caret can split into two to show where the next letter is going to be depending on the next letter (the future left to right letter would go to one caret, the future right to left letter would go to another caret). I'm pretty sure implementing that after-the-fact would be pretty hard indeed.
And to make things even worse, many languages require IME to enter the text (e.g. traditional Chinese) where you have to render something after entering it partially. For more latin-like letters, combining characters are one example, too.
I appreciate 7:28 being brought up...the factor which will influence who can remain understanding technology and who won’t.
Plural rules, date formatting, number formatting, month and weekday names, currency symbols and formatting; all that is supported by CLDR. That's your black box you throw into your code. - But you still have to translate the UI.
This is brilliant, Tom. Thank you for this video!
"If that surprises you, you need to get out more" ...nice one
false.
Sad for you
Solutions to all the problems(not really):
1)Use Unicode but sterilize your inputs
2)Use scientific notation
3)Write out all dates(eg 5/3/19 becomes May 3rd, 2019)
4)When cutting of text, just cut straight through it, even if it part of a character is not visible it's fine.
5/3/19 is wrong
@@anarhistul7257 exactly!
@@anarhistul7257 You're right, it should be 2019/05/03
(This comment was made by the yyyy/mm/dd gang)
There is a branch of translation, Localization, that deals with exactly this. It requires web/software designers trusting translators with the source code, but the upside is the translators know what their target language needs, and so a lot of time is saved and a lot of messing around back and forth is avoided.
The Existential Angst of Tom Scott is my favorite series on youtube, thanks!
Oh, while you're at it, I'm in the US, but I want 24-hour time because it makes more sense. And I'd like my clock in UTC so that I can coordinate with people in Poland without us trying to do time zone conversions in conversation. Sure, your other US users don't want this, but just because my neighbors... no, I don't want to spell "neighbors" with a "u" just because I've got some non-US preferences.
I agree, you need a mixture... that's the way it should be. IMO windows got this right, although choosing your location as "US" will set much to US standard, you can modify various bits. I have my location set to the UK, because that's where I am, 24 hr clock (the UK setting in windows seems to like to do am/pm), US keyboard layout (with a British keyboard, but that doesn't matter as I touchtype)
And currency using - for negative values rather than ( ) because it makes more sense to me.
Windows DID get wrong the 'mess with the computer clock when the time changes' thing. Other operating systems use UTC and simply apply an offset based on your locale.
@@TheChipmunk2008 I agree on the clock thing. I guess they figured some BIOS features such as Autowake would be best served by such an arrangement, but they missed the mark on this.
But you'd still have to do time conversions in conversations... Poland uses CET, which is UTC+1/2 (depending on the time of the year). I totally about your other point though. I prefer using a 24-hour clock but at the same time I prefer American spellings over British ones. And I very rarely can have both. For example, on Discord I have to choose - either a 24-hour clock and british spellings, or a 12-hour clock and american spellings. It's really annoying.
@@Matihood1 Oh, yeah, they're not locally in UTC, but they could work 8-16 UTC or 7-15 UTC (depending to time of year), and be sharing CET business hours while having their clocks read the same as my UTC clock.
ok?
"we are just producing it in english" - +100500! I'm agree with that despite I'm not a native english speaker
And then he produces it in British English and the Americans complain the spellings and dates are wrong
@@LowestofheDead Of course, and some of the units don't match either - which wouldn't be that much of an issue except Americans are convinced the world rotates around then and will definitely whine about it.
@@JonasDAtlas
Well, most programming languages came from the US of A. Microsoft and Apple are American. Git gud @ murican english or git lost... Is all there is to say.
@@mitpoker7319 Just because the programming language is based on American English doesn't mean any of the user-facing text needs to be... you're kind of missing the point here.
ok?
if i had a nickel for every time Tom Scott acknowledged non-binary people with the phrase "if that surprises you then you need to get out more." i would have two nickels.
not a lot but cool that it happened twice.
false.
4:27 Well explained. Greetings from Germany 🙂
Surnames and place names follow their own rules. And exceptions to avoid misunderstandings.
Trink in Maßen -> Drink moderately.
TRINK IN MASSEN -> DRINK IN MASSIVE AMOUNTS.
TRINK IN MAßEN -> Drink moderately
😅
Can we get more Tom Scott ranting about code? I love these videos!
yy:mm:dd hh:mn:ss
24 hour time
weeks start with monday
day light savings does not exist
Weeks don't really have start though... and Daylight savings needs to exist, just in a standardized format. Otherwise 0100 would be dark for some people, and light for others.
@Santiago Colla its more about making use of that time. if you have more sunlight throughout half the year, why would you waste it by waking up later? it makes much more sense to adjust clocks to utilise the sunlight rather than to go to bed before the sun sets fully or to wake up an hour after the sun rises. if im not mistaken, this practice comes from long before when people would wake up early and go and work out in the field, they would adjust to when the sun was out the longest so that they would not need to work after the sun starts to set.
@Santiago Colla yeah, I agree in a way, but growing up in a country where we change our clocks I can definitely tell you that if you are a kid and your mum tells you to be back home by say 7 or 8 pm, you'd be much happier to stay outside in sunlight for another hour free of charge. But in all seriousness though, it's not a big deal, but those countries who've been using it for decades don't really have a reason to not do it. A thing that does bother me is why some countries have been doing it for many many years, but others haven't. If people relied on sunlight a lot more before, wouldn't it be more global?
@@GamerCo29 No no no no no. Not 0100. 01:00. I know it's technically correct but not having a colon in a timestamp just looks weird, no matter how you look at it.
yyyy-mm-dd
The two dots above the "e" in "Chloë" are not an umlaut, but a trema. An umlaut would change the sound of the vowel, whereas a trema shows that o and e should be pronounced as to independent vowels.
It's actually called a diaeresis
+KasabianFan44 Actually, both names are correct.
+KasabianFan44 That's what you get after you try to internationalize code, give up, and go to taco bell
This is why I only localize for the two languages I know for my personal websites/games -- English and Esperanto. :B
:D
I had never met a Singh withh such a WASP face, Rachel. Was your grand-father a very very white indian?
@@ScorieDivine I married an Indian
@@MoosieSingh Ok. I thought women in the States had stopped taking their husband names a while back. Thanks for replying.
@@ScorieDivine Depends. I have a guy friend who merged his last name and his wife's last name to make a unique surname. Everyone has their own reasons for taking, or making, new names for themselves.
Best channel for me right now
So many wonderful topics and every single one is a reason why humans have problems understanding each other
The Unix timezone database is indeed a work of art, lol.
The sharp s (ß) actually sometimes becomes an SZ when an SS could lead to misunderstandings :)
4:35 if I know right, there's a upper case ß now
Tom Scott rants are the best
Wow, truly remarkable that large sites can manage such complexity
I love that almost everytime that Tom has talked about gender non-conformity/non-binaryism he always says that, to whom the existance of those concepts may be surprising, they need to go out more.
Tom in 2014 was ahead of much of the world. Wonderful to see.
Which is weird considering now, and especially back then, most people found out about non-binaryism on the internet, not "in the outside world."
@Dill Stevens I get it not being literal, but the meaning my head got was "go out of your usual limited space". While that can apply to "being open minded", my thinking was that back then, spaces where people are educated on concepts such as non-binaryism were niche while the outside world (majority of people, online or offline) was mostly not educated on such matters. Things have changed now within the internet thankfully, but outside the internet and outside 1st world countries, it still feels somewhat niche nowadays (although thankfully, such matters are more and more becoming mainstream in developing countries).
Hence, why I felt it ironic.