[Info-ingres] More similar to

Laframboise, André (BAC/LAC) andre.laframboise at canada.ca
Fri Aug 31 13:41:49 UTC 2018


Our term for that is plain ‘data correction’.

Oddly, for stuff like that, I’m more comfortable with C than SQL.
I usually dump the data into a file and process it from there and re-insert the corrected data.
When processing millions of rows, it’s way more efficient than within the database.

André Laframboise

Conseiller Principal Base de Données, Direction générale de l'Innovation et du Dirigeant principal de l'information
Bibliothèque et Archives Canada<http://www.bac-lac.gc.ca/fra/Pages/bac-web.aspx> / Gouvernement du Canada
andre.laframboise at canada.ca<mailto:andre.laframboise at canada.ca> / Tél. : 613-298-1346

Senior Database Advisor, Innovation and Chief Information Officer Branch
Library and Archives Canada<http://www.bac-lac.gc.ca/eng/Pages/lac-web.aspx> / Government of Canada
andre.laframboise at canada.ca<mailto:andre.laframboise at canada.ca> / Tel: 613-298-1346

[http://www.bac-lac.gc.ca/eng/events/PublishingImages/banner.jpg]<http://www.bac-lac.gc.ca/eng/events/Pages/events.aspx>

From: info-ingres-bounces at lists.planetingres.org <info-ingres-bounces at lists.planetingres.org> On Behalf Of Allan Biggs
Sent: Friday, August 31, 2018 9:21 AM
To: Martin Bowes <martin.bowes at ndph.ox.ac.uk>; paul.white at shift7solutions.com.au; info-ingres at lists.planetingres.org
Subject: Re: [Info-ingres] More similar to

Have you thought of using OpenROAD to do this?
It might enable you to do some of the processing in SQL and some in the 4GL?


Or maybe report writer .... or maybe not ...
Allan
--
Sent from my Android phone with mail.com<http://mail.com> Mail. Please excuse my brevity.
On 31/08/2018, 12:29 Martin Bowes <martin.bowes at ndph.ox.ac.uk<mailto:martin.bowes at ndph.ox.ac.uk>> wrote:
I would like to keep it in the database, but if push comes to shove I’m more than willing to look at some utility outside the database.


Marty


From: Paul White [mailto:paul.white at shift7solutions.com.au]
Sent: 31 August 2018 12:20
To: Martin Bowes; info-ingres at lists.planetingres.org<mailto:info-ingres at lists.planetingres.org>
Subject: RE: [Info-ingres] More similar to


So you want output like this?


res_note for snapshot_id = 14 is dodgy.
pos=55 char=0x03
pos=103 char=0x04
pos=11999 char=0x00


res_note for snapshot_id = 28 is dodgy.




Text processing in the database. yuk.
Do you have the option of extracting the matching records to a file then do the grunt work with text processing tools?


Paul




From: info-ingres-bounces at lists.planetingres.org<mailto:info-ingres-bounces at lists.planetingres.org> [mailto:info-ingres-bounces at lists.planetingres.org] On Behalf Of Martin Bowes
Sent: Friday, 31 August 2018 8:41 PM
To: info-ingres at lists.planetingres.org<mailto:info-ingres at lists.planetingres.org>
Subject: [Info-ingres] More similar to


Hi All,


I just knew this non printing character stuff was going to come back and bite me…


I can identify the rows with non printing characters excluding \t, \r and \n.


Now I need to find which characters are being objected to on each row. I may need to know their position as well.


The data in question is a long varchar. But (fortunately) the longest item is under 12000 characters, so we are under the 32000 char varchar conversion limit.


The easiest way I can think of doing this is to do it in a database procedure…


create procedure identify_bad_char
as declare
    snapshot_id integer4 not null not default;
    res_note    long varchar not null not default;
    msgid       integer4 not null not default;
    msg         varchar(256) not null not default;
begin
    msgid = 0;
    FOR SELECT snapshot_id, res_note INTO :snapshot_id, :res_note
        FROM basket_snapshot
        WHERE res_note SIMILAR TO '%[^' + x'09' + x'0a' + x'0d' + '[:print:]]%'
    DO
        msgid = msgid + 1;
        msg = 'res_note for snapshot_id = ' + varchar(:snapshot_id) + ' is dodgy.';
        message :msgid :msg;


        /* Insert manic while loop here and process the line character by character */
    ENDFOR;
end;


Before I start coding up the manic while loop … does anyone have a suggestion as to how I can do this more efficiently.


The data was probably added to a web form via a cut and paste from a word document, so we’ve picked up things like smart quotes. Hence we may ultimately be interested in replacing multi character sequences with simple ansi equivalents. I have a suspicion a clean() function is galloping in my direction.


Martin Bowes
_______________________________________________ Info-ingres mailing list Info-ingres at lists.planetingres.org<mailto:Info-ingres at lists.planetingres.org> https://lists.planetingres.org/mailman/listinfo/info-ingres
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.planetingres.org/pipermail/info-ingres/attachments/20180831/b6462ceb/attachment.html>


More information about the Info-ingres mailing list