[Info-ingres] Micro-madness

Tony Douglas tonyd08068 at netscape.net
Thu Jun 17 13:05:15 UTC 2021


Unicode…. There be dragons. Might be something to do with normalisation form - NFC and NFD say how codes can combine to form different characters - this page https://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html might help, or it might not - I was just about getting unconfused with the terminology of Unicode when I stopped looking at it a few years ago :( But weird things could happen. Have you tried a UTF8 client to see what happens (assuming you’ve got an installation where transliteration is available) ?

Looking forward to seeing how this pans out !

Thanks,
- Tony

Sent from my iPhone

> On 17 Jun 2021, at 13:54, Martin Bowes <martin.bowes at ndph.ox.ac.uk> wrote:
> 
> 
> Hi All,
>  
> Can someone please explain this one…please use small words…
>  
> My Linux installation is an ISO-8859-1 charset. We have a  table which has an nvarchar(20) column.
>  
> Now the Greek mu symbol is U+00B5, a capital-A with a circumflex is 00C2, The ¼ is U+00BC, and a capital-I with a circumflex is U+00CE.
>  
> And in terminal monitor connection, how does this work…
> select U&'\00c2', U&'\00b5', U&'\00c2\00b5'\g
> ┌──────┬──────┬──────┐
> │col1  │col2  │col3  │
> ├──────┼──────┼──────┤
> │▒     │▒     │µ    │
> └──────┴──────┴──────┘
> (1 row)
> select u&'\00ce', u&'\00bc', u&'\00ce\00bc'\g
> ┌──────┬──────┬──────┐
> │col1  │col2  │col3  │
> ├──────┼──────┼──────┤
> │▒     │▒     │μ    │
> └──────┴──────┴──────┘
> (1 row)
>  
> So two weird codes have both combined to make a mu. I didn’t just invent these. I got them from two distinct data sets which were being compared.
>  
> And they are clearly not the same thing.
> create table test(id integer1, a nvarchar(20));
> insert into test values (1, U&'\00c2\00b5'), (2, U&'\00ce\00bc');
> select * from test\g
> ┌──────┬────────────────────────────────────────┐
> │id    │a                                       │
> ├──────┼────────────────────────────────────────┤
> │     1│µ                                      │
> │     2│μ                                      │
> └──────┴────────────────────────────────────────┘
> (2 rows)
>  
> select a, count(1) from test group by a\g
> ┌────────────────────────────────────────┬─────────────┐
> │a                                       │col2         │
> ├────────────────────────────────────────┼─────────────┤
> │µ                                      │            1│
> │μ                                      │            1│
> └────────────────────────────────────────┴─────────────┘
> (2 rows)
>  
> So could someone please explain this, and also how I can write some code which will say these two mu’s are the same thing.
>  
> Marty
> _______________________________________________
> Info-ingres mailing list
> Info-ingres at lists.planetingres.org
> https://lists.planetingres.org/mailman/listinfo/info-ingres
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.planetingres.org/pipermail/info-ingres/attachments/20210617/08aacf87/attachment-0001.html>


More information about the Info-ingres mailing list