[Info-ingres] Micro-madness

Paul A. paul at ipauland.com
Thu Jun 17 16:08:06 UTC 2021


I just googled this.

B5 is lower-case mu, BC is upper-case mu

So they are different.

https://www.compart.com/en/unicode/U+00B5

On 17/06/2021 15:26, Martin Bowes wrote:
>
> Yeah, that’s the idea I’ve explored with the user. It’s amazing what 
> you can do with the replace function.
>
> Something a bit more general may still be required as I’m pretty well 
> guaranteed to bump into this elsewhere.
>
> Marty
>
> *From:*Paul A. <paul at ipauland.com>
> *Sent:* 17 June 2021 15:21
> *To:* info-ingres at lists.planetingres.org
> *Subject:* Re: [Info-ingres] Micro-madness
>
> Choose one representation and change the codes, use an insert/modify 
> rule to force consistency?
>
> On 17/06/2021 14:17, Martin Bowes wrote:
>
>     I’m seeing some progress…nvarchar stores Unicode points as UTF-8.
>
>     And:
>
>     The UTF-8 encoding of mu (U+03BC) is 0xCE 0xBC
>
>     https://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128&names=-&utf8=0x
>     <https://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128&names=-&utf8=0x>
>
>     Also the UTF-8 encoding of mu(U+00B5) is 0xC2 0xB5
>
>     https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=128&names=-&utf8=0x
>     <https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=128&names=-&utf8=0x>
>
>     So we have two Unicode code points for mu…why I know not.
>
>     And I still don’t know how to get them to equate.
>
>     Marty
>
>     *From:*Tony Douglas <tonyd08068 at netscape.net>
>     <mailto:tonyd08068 at netscape.net>
>     *Sent:* 17 June 2021 14:05
>     *To:* Martin Bowes <martin.bowes at ndph.ox.ac.uk>
>     <mailto:martin.bowes at ndph.ox.ac.uk>
>     *Cc:* info-ingres at lists.planetingres.org
>     <mailto:info-ingres at lists.planetingres.org>
>     *Subject:* Re: [Info-ingres] Micro-madness
>
>     Unicode…. There be dragons. Might be something to do with
>     normalisation form - NFC and NFD say how codes can combine to form
>     different characters - this page
>     https://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html
>     <https://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html> might help,
>     or it might not - I was just about getting unconfused with the
>     terminology of Unicode when I stopped looking at it a few years
>     ago :( But weird things could happen. Have you tried a UTF8 client
>     to see what happens (assuming you’ve got an installation where
>     transliteration is available) ?
>
>     Looking forward to seeing how this pans out !
>
>     Thanks,
>
>     - Tony
>
>     Sent from my iPhone
>
>
>
>
>         On 17 Jun 2021, at 13:54, Martin Bowes
>         <martin.bowes at ndph.ox.ac.uk
>         <mailto:martin.bowes at ndph.ox.ac.uk>> wrote:
>
>         
>
>         Hi All,
>
>         Can someone please explain this one…please use small words…
>
>         My Linux installation is an ISO-8859-1 charset. We have a 
>         table which has an nvarchar(20) column.
>
>         Now the Greek mu symbol is U+00B5, a capital-A with a
>         circumflex is 00C2, The ¼ is U+00BC, and a capital-I with a
>         circumflex is U+00CE.
>
>         And in _terminal monitor_ connection, how does this work…
>
>         select U&'\00c2', U&'\00b5', U&'\00c2\00b5'\g
>
>         ┌──────┬──────┬──────┐
>
>         │col1  │col2  │col3  │
>
>         ├──────┼──────┼──────┤
>
>         │▒     │▒     │µ    │
>
>         └──────┴──────┴──────┘
>
>         (1 row)
>
>         select u&'\00ce', u&'\00bc', u&'\00ce\00bc'\g
>
>         ┌──────┬──────┬──────┐
>
>         │col1  │col2  │col3  │
>
>         ├──────┼──────┼──────┤
>
>         │▒     │▒     │μ    │
>
>         └──────┴──────┴──────┘
>
>         (1 row)
>
>         So two weird codes have both combined to make a mu. I didn’t
>         just invent these. I got them from two distinct data sets
>         which were being compared.
>
>         And they are clearly not the same thing.
>
>         create table test(id integer1, a nvarchar(20));
>
>         insert into test values (1, U&'\00c2\00b5'), (2, U&'\00ce\00bc');
>
>         select * from test\g
>
>         ┌──────┬────────────────────────────────────────┐
>
>         │id │a                                       │
>
>         ├──────┼────────────────────────────────────────┤
>
>         │ 1│µ                                      │
>
>         │     2│μ           │
>
>         └──────┴────────────────────────────────────────┘
>
>         (2 rows)
>
>         select a, count(1) from test group by a\g
>
>         ┌────────────────────────────────────────┬─────────────┐
>
>         │a │col2         │
>
>         ├────────────────────────────────────────┼─────────────┤
>
>         │µ │            1│
>
>         │μ │            1│
>
>         └────────────────────────────────────────┴─────────────┘
>
>         (2 rows)
>
>         So could someone please explain this, and also how I can write
>         some code which will say these two mu’s are the same thing.
>
>         Marty
>
>         _______________________________________________
>         Info-ingres mailing list
>         Info-ingres at lists.planetingres.org
>         <mailto:Info-ingres at lists.planetingres.org>
>         https://lists.planetingres.org/mailman/listinfo/info-ingres
>         <https://lists.planetingres.org/mailman/listinfo/info-ingres>
>
>
>
>     _______________________________________________
>
>     Info-ingres mailing list
>
>     Info-ingres at lists.planetingres.org  <mailto:Info-ingres at lists.planetingres.org>
>
>     https://lists.planetingres.org/mailman/listinfo/info-ingres  <https://lists.planetingres.org/mailman/listinfo/info-ingres>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.planetingres.org/pipermail/info-ingres/attachments/20210617/18b4032f/attachment-0001.html>


More information about the Info-ingres mailing list