Is there anything in the framework which will help translate accented
characters in strings to their standard counterparts?

eg. "Gráda" to "Grada"

Re: Translate accented characters by Mark

Mark
Mon May 09 07:44:59 CDT 2005

"JezB" <jezbroadsword@blueyonder.co.uk> wrote in message
news:uQ54T9IVFHA.3636@TK2MSFTNGP14.phx.gbl...

> Is there anything in the framework which will help translate accented
> characters in strings to their standard counterparts?
>
> eg. "Gráda" to "Grada"

Interestingly enough, I was looking for exactly the same thing recently, and
was unable to find anything native to the Framework, so I ended up writing
my own mapping function. Easy enough for the Latin languages (e.g. French,
Spanish, Italian, Portuguese etc), fairly simply for German (e.g. any vowel
with an umlaut is replaced by the unmodified vowel + 'e'), a little messier
for the Scandinavian languages, even worse for Greek and Cyrillic, and
almost impossible for the Eastern European languages with diacritics.

What is the business purpose behind your need to do this, AAMOI?



Re: Translate accented characters by JezB

JezB
Mon May 09 07:53:32 CDT 2005

I'm passing artist/album names stored within mp3 files through Amazon's web
service, to look up album details. Many of the artist names have accented
characters, since I am interested in world/celtic music, but Amazon's serach
criteria seems to be based on normalized unaccented strings. A real pain to
edit all my id3 tags !

"Mark Rae" <mark@mark-N-O-S-P-A-M-rae.co.uk> wrote in message
news:OdlNrTJVFHA.2420@TK2MSFTNGP12.phx.gbl...
> "JezB" <jezbroadsword@blueyonder.co.uk> wrote in message
> news:uQ54T9IVFHA.3636@TK2MSFTNGP14.phx.gbl...
>
>> Is there anything in the framework which will help translate accented
>> characters in strings to their standard counterparts?
>>
>> eg. "Gráda" to "Grada"
>
> Interestingly enough, I was looking for exactly the same thing recently,
> and was unable to find anything native to the Framework, so I ended up
> writing my own mapping function. Easy enough for the Latin languages (e.g.
> French, Spanish, Italian, Portuguese etc), fairly simply for German (e.g.
> any vowel with an umlaut is replaced by the unmodified vowel + 'e'), a
> little messier for the Scandinavian languages, even worse for Greek and
> Cyrillic, and almost impossible for the Eastern European languages with
> diacritics.
>
> What is the business purpose behind your need to do this, AAMOI?
>



Re: Translate accented characters by Ignacio

Ignacio
Mon May 09 08:26:19 CDT 2005

Hi,

There is nothing like this in the framework, what you can do is use
String.Replace , it will be slower but there are only 5 vocals after all :)

Cheers,

--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation



"JezB" <jezbroadsword@blueyonder.co.uk> wrote in message
news:uQ54T9IVFHA.3636@TK2MSFTNGP14.phx.gbl...
> Is there anything in the framework which will help translate accented
> characters in strings to their standard counterparts?
>
> eg. "Gráda" to "Grada"
>



Re: Translate accented characters by Morten

Morten
Mon May 09 08:34:51 CDT 2005

Hi Jez,

There is nothing pre-made in .Net that will do what you want. You need to create a translation table and translate each character as necessary.

There is a method that seems to work in most cases involving translation between different encodings, but I cannot guarantee that it works in all cases.

string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
string t = Encoding.ASCII.GetString(b);

//t == aaaaaaeeeeiiiioooooouuuuyy


--
Happy coding!
Morten Wennevik [C# MVP]

Re: Translate accented characters by Mark

Mark
Mon May 09 08:57:32 CDT 2005

"Morten Wennevik" <MortenWennevik@hotmail.com> wrote in message
news:op.sqif0dt2klbvpo@stone...

> There is a method that seems to work in most cases involving translation
> between different encodings, but I cannot guarantee that it works in all
> cases.
>
> string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
> byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
> string t = Encoding.ASCII.GetString(b);
>
> //t == aaaaaaeeeeiiiioooooouuuuyy

No use at all for German:

ä = ae
ö = oe
ü = ue
ß = ss



Re: Translate accented characters by Mark

Mark
Mon May 09 09:00:38 CDT 2005

"Ignacio Machin ( .NET/ C# MVP )" <ignacio.machin AT dot.state.fl.us> wrote
in message news:%23a82wqJVFHA.3252@TK2MSFTNGP10.phx.gbl...

> There is nothing like this in the framework, what you can do is use
> String.Replace , it will be slower but there are only 5 vocals after all
> :)

If by "vocals" you mean "vowels", then that just isn't the case in many
languages...



Re: Translate accented characters by Mark

Mark
Mon May 09 09:02:20 CDT 2005

"JezB" <jezbroadsword@blueyonder.co.uk> wrote in message
news:OwYiAXJVFHA.1040@TK2MSFTNGP10.phx.gbl...

> I'm passing artist/album names stored within mp3 files through Amazon's
> web service, to look up album details. Many of the artist names have
> accented characters, since I am interested in world/celtic music, but
> Amazon's serach criteria seems to be based on normalized unaccented
> strings. A real pain to edit all my id3 tags !

Then the translation table approach is what you need here...



Re: Translate accented characters by JezB

JezB
Mon May 09 09:05:58 CDT 2005

That's good enough for me !!! This is just a hobby program so doesn't need
to be foolproof.
Many thanks Morten

"Morten Wennevik" <MortenWennevik@hotmail.com> wrote in message
news:op.sqif0dt2klbvpo@stone...
> Hi Jez,
>
> There is nothing pre-made in .Net that will do what you want. You need to
> create a translation table and translate each character as necessary.
>
> There is a method that seems to work in most cases involving translation
> between different encodings, but I cannot guarantee that it works in all
> cases.
>
> string s = "áàäãâåéèëêíìïîóòöõôøúùüûýÿ";
> byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
> string t = Encoding.ASCII.GetString(b);
>
> //t == aaaaaaeeeeiiiioooooouuuuyy
>
>
> --
> Happy coding!
> Morten Wennevik [C# MVP]



Re: Translate accented characters by Ignacio

Ignacio
Mon May 09 12:47:20 CDT 2005

Hi,

Yes I meant vowels :)

It was an "spanglish" vocales = vowels in spanish :)


You are right, but in a particular language there are not that many, most
certainly by the description of the OP he has the mp3 tags in one language

cheers,

--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation

"Mark Rae" <mark@mark-N-O-S-P-A-M-rae.co.uk> wrote in message
news:%23X6w89JVFHA.3696@TK2MSFTNGP10.phx.gbl...
> "Ignacio Machin ( .NET/ C# MVP )" <ignacio.machin AT dot.state.fl.us>
> wrote in message news:%23a82wqJVFHA.3252@TK2MSFTNGP10.phx.gbl...
>
>> There is nothing like this in the framework, what you can do is use
>> String.Replace , it will be slower but there are only 5 vocals after all
>> :)
>
> If by "vocals" you mean "vowels", then that just isn't the case in many
> languages...
>