[Info-ingres] Micro-madness

Martin Bowes martin.bowes at ndph.ox.ac.uk
Thu Jun 17 14:26:05 UTC 2021


Yeah, that’s the idea I’ve explored with the user. It’s amazing what you can do with the replace function.

Something a bit more general may still be required as I’m pretty well guaranteed to bump into this elsewhere.

Marty

From: Paul A. <paul at ipauland.com>
Sent: 17 June 2021 15:21
To: info-ingres at lists.planetingres.org
Subject: Re: [Info-ingres] Micro-madness

Choose one representation and change the codes, use an insert/modify rule to force consistency?

On 17/06/2021 14:17, Martin Bowes wrote:
I’m seeing some progress…nvarchar stores Unicode points as UTF-8.

And:

The UTF-8 encoding of mu (U+03BC) is 0xCE 0xBC

https://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128&names=-&utf8=0x



Also the UTF-8 encoding of mu(U+00B5) is 0xC2 0xB5

https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=128&names=-&utf8=0x

So we have two Unicode code points for mu…why I know not.

And I still don’t know how to get them to equate.

Marty

From: Tony Douglas <tonyd08068 at netscape.net><mailto:tonyd08068 at netscape.net>
Sent: 17 June 2021 14:05
To: Martin Bowes <martin.bowes at ndph.ox.ac.uk><mailto:martin.bowes at ndph.ox.ac.uk>
Cc: info-ingres at lists.planetingres.org<mailto:info-ingres at lists.planetingres.org>
Subject: Re: [Info-ingres] Micro-madness

Unicode…. There be dragons. Might be something to do with normalisation form - NFC and NFD say how codes can combine to form different characters - this page https://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html might help, or it might not - I was just about getting unconfused with the terminology of Unicode when I stopped looking at it a few years ago :( But weird things could happen. Have you tried a UTF8 client to see what happens (assuming you’ve got an installation where transliteration is available) ?

Looking forward to seeing how this pans out !

Thanks,
- Tony
Sent from my iPhone



On 17 Jun 2021, at 13:54, Martin Bowes <martin.bowes at ndph.ox.ac.uk<mailto:martin.bowes at ndph.ox.ac.uk>> wrote:

Hi All,

Can someone please explain this one…please use small words…

My Linux installation is an ISO-8859-1 charset. We have a  table which has an nvarchar(20) column.

Now the Greek mu symbol is U+00B5, a capital-A with a circumflex is 00C2, The ¼ is U+00BC, and a capital-I with a circumflex is U+00CE.

And in terminal monitor connection, how does this work…
select U&'\00c2', U&'\00b5', U&'\00c2\00b5'\g
┌──────┬──────┬──────┐
│col1  │col2  │col3  │
├──────┼──────┼──────┤
│▒     │▒     │µ    │
└──────┴──────┴──────┘
(1 row)
select u&'\00ce', u&'\00bc', u&'\00ce\00bc'\g
┌──────┬──────┬──────┐
│col1  │col2  │col3  │
├──────┼──────┼──────┤
│▒     │▒     │μ    │
└──────┴──────┴──────┘
(1 row)

So two weird codes have both combined to make a mu. I didn’t just invent these. I got them from two distinct data sets which were being compared.

And they are clearly not the same thing.

create table test(id integer1, a nvarchar(20));

insert into test values (1, U&'\00c2\00b5'), (2, U&'\00ce\00bc');
select * from test\g
┌──────┬────────────────────────────────────────┐
│id    │a                                       │
├──────┼────────────────────────────────────────┤
│     1│µ                                      │
│     2│μ                                      │
└──────┴────────────────────────────────────────┘
(2 rows)

select a, count(1) from test group by a\g
┌────────────────────────────────────────┬─────────────┐
│a                                       │col2         │
├────────────────────────────────────────┼─────────────┤
│µ                                      │            1│
│μ                                      │            1│
└────────────────────────────────────────┴─────────────┘
(2 rows)

So could someone please explain this, and also how I can write some code which will say these two mu’s are the same thing.

Marty
_______________________________________________
Info-ingres mailing list
Info-ingres at lists.planetingres.org<mailto:Info-ingres at lists.planetingres.org>
https://lists.planetingres.org/mailman/listinfo/info-ingres



_______________________________________________

Info-ingres mailing list

Info-ingres at lists.planetingres.org<mailto:Info-ingres at lists.planetingres.org>

https://lists.planetingres.org/mailman/listinfo/info-ingres


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.planetingres.org/pipermail/info-ingres/attachments/20210617/da7a704c/attachment-0001.html>


More information about the Info-ingres mailing list