Removing a server: Removing accented characters

I have a list of French words and want to replace all of the accented
characters with unaccented characters. I could go through and replace each
accented character one at a time, but if I can somehow do it with collations
or some other automatic method, that would be great!You can use the function REPLACE as the source for another REPLACE.
Example:
use northwind
go
create table t1 (
c1 varchar(25)
)
go
insert into t1 values('áé_óú')
go
select
*
from
t1
go
update
t1
set
c1 = replace(replace(replace(replace(replace(
c1, 'á', 'a'), 'é', 'e'), '_
',
'i'), 'ó', 'o'), 'ú', 'u')
GO
select
*
from
t1
go
drop table t1
go
AMB
"MatthewR" wrote:

> I have a list of French words and want to replace all of the accented
> characters with unaccented characters. I could go through and replace eac
h
> accented character one at a time, but if I can somehow do it with collatio
ns
> or some other automatic method, that would be great!|||In playing with this further I built on Alejandro's code to take care of mor
e
accented characters with fewer REPLACE functions:
use northwind
go
-- Create the c1 column with an accent-sensitive collation so any
comparisons done on c1 ignore accents.
create table t1 (
c1 varchar(25) COLLATE SQL_Latin1_General_CP1_CI_AI
)
go
--use even more accented characters
insert into t1 values('áé_óúà?êí????')
go
select
*
from
t1
go
-- If you replace a letter with itself in an accent-insensitive column, the
accented letters will also be replaced.
update
t1
set
c1 = replace(replace(replace(replace(replace(
c1, 'a', 'a'), 'e', 'e'), 'i',
'i'), 'o', 'o'), 'u', 'u')
GO
select
*
from
t1
go
drop table t1
go
The value in c1 after running this is 'aeiouaaeiiuao'.
You'll end up doing more replacements this way since even non-accented
characters will be replaced, but you'll cover more accented characters with
fewer REPLACE functions.
Now if I could just figure out how to do this with just collations instead
of REPLACE functions, I'd be set!|||Matthew,
Here is a generic way to do this (not thoroughly tested, though,
particularly with different system and column collations). It will
be slow, but you are less likely to miss cases. It also replaces
with E and with e, which might be helpful.
Without using REPLACE, I can't think of any solution other
than possibly exporting to text and reimporting, either with some
bcp code page option (which I couldn't make work) or using
a third-party utility on the intermediate text file.
create table T (
s nvarchar(20)
)
insert into T values ('cre')
insert into T values ('pt de fois gras')
insert into T values ('HLNE MELANON')
go
create function unaccented(
@.s nvarchar(80)
) returns nvarchar(80) as begin
declare @.a nchar(12)
declare @.c nchar(1)
set @.a = N'AaEeIiOoUuCc'
declare @.pos int set @.pos = 0
while @.pos < len(@.s) begin
set @.pos = @.pos + 1
set @.c = substring(@.s,@.pos,1)
if @.c like '%['+@.a+']%' collate Latin1_General_CS_AI
and @.c not like '%['+@.a+']%' collate Latin1_General_CS_AS
set @.s = stuff(@.s,@.pos,1,substring(@.a,patindex(N'
%'+@.c+N'%' collate
Latin1_General_CS_AI,@.a),1))
end
return @.s
end
go
select dbo.unaccented(s)
from T
go
drop table T
drop function dbo.unaccented
-- Steve Kass
-- Drew University
"MatthewR" <MatthewR@.discussions.microsoft.com> wrote in message
news:B8577707-A67C-444A-B753-D6C8A12A663D@.microsoft.com...
> In playing with this further I built on Alejandro's code to take care of
> more
> accented characters with fewer REPLACE functions:
> --
> use northwind
> go
> -- Create the c1 column with an accent-sensitive collation so any
> comparisons done on c1 ignore accents.
> create table t1 (
> c1 varchar(25) COLLATE SQL_Latin1_General_CP1_CI_AI
> )
> go
> --use even more accented characters
> insert into t1 values('')
> go
> select
> *
> from
> t1
> go
> -- If you replace a letter with itself in an accent-insensitive column,
> the
> accented letters will also be replaced.
> update
> t1
> set
> c1 = replace(replace(replace(replace(replace(
c1, 'a', 'a'), 'e', 'e'),
> 'i',
> 'i'), 'o', 'o'), 'u', 'u')
> GO
> select
> *
> from
> t1
> go
> drop table t1
> go
> --
> The value in c1 after running this is 'aeiouaaeiiuao'.
> You'll end up doing more replacements this way since even non-accented
> characters will be replaced, but you'll cover more accented characters
> with
> fewer REPLACE functions.
> Now if I could just figure out how to do this with just collations instead
> of REPLACE functions, I'd be set!
>

Monday, February 20, 2012

Removing accented characters

No comments:

Post a Comment

Removing a server

Blog Archive

About Me