LurkingHusband am I right in thinking that the original specs for email deliberately excluded the use of extended character sets in addressing
Er quite the reverse. The original spec (RFC822) and it's successor both allowed this specification:
quote
3.2.4. Atom
Several productions in structured header field bodies are simply
strings of certain basic characters. Such productions are called
atoms.
Some of the structured header field bodies also allow the period
character (".", ASCII value 46) within runs of atext. An additional
"dot-atom" token is defined for those purposes.
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1atext ("." 1*atext)
Both atom and dot-atom are interpreted as a single unit, comprised of
the string of characters that make it up. Semantically, the optional
comments and FWS surrounding the rest of the characters are not part
of the atom; the atom is only the run of atext characters in an atom,
or the atext and "." characters in a dot-atom.
endquote
But there are some quite crap coders out there who insist on rewriting their own email address validation routines - and getting it wrong.
With the current push for internationalisation it seems likely the updated spec will be expanded to cover non-ASCII characters. So expect a tsunami of really crap coding.
For those that care, I used variants of this as one of my interview questions. If a candidate had said "hang on", googled, and then mentioned RFC822 or 2822, they would have been hired on the spot !
My (real) name is in an RFC from 1987 