[Info-ingres] Why I love table procedures!

Martin Bowes martin.bowes at ndph.ox.ac.uk
Thu Nov 22 10:51:35 UTC 2018


Hi All,

Gloat mode engaged....

I had a table which was very big, it had more than the maximum integer4 (2+billion) rows and had to be partitioned 32 ways just to store the data. It had become a serious issue within the database and in particular meant I couldn't transfer this database to another installation which was becoming a necessary step.

It looked like this:
Run_id integer4
Pid    integer4
Gid    integer2
Status integer1
For any run_id there may be as many as (and typically are as many as) 500,000 rows.

Then I got told, 'What? You're storing that stuff! We don't need it.' ... but after examining the code for a while I found that they did need it during a particular operational phase and then not until another process which occurs once every 6months.

So I couldn't remove the data. But the good news was that those processes accessed the data by run_id only.

So I made an archive table which looked like this:
Run_id   integer4
All_data long varchar
Tinsert  ingresdate

The all_data field was composed of all the old rows pid,gid,status data separated by commas and each of the original rows data separated by colons.
Eg. 1000780,-1,0:1000800,-1,0:1000889,-1,0:1000959,-1,0...

The all_data would typically be about 65M.

So how to extract the data when actually needed it? Enter a table procedure and the similar to regular expression pattern matching.

select first 10 pid, gid, status from all_data(run_id =146)
┌─────────────┬────────┬──────┐
│pid          │gid     │status│
├─────────────┼────────┼──────┤
│      1000018│      -1│     0│
│      1000020│      -1│     0│
│      1000034│      -1│     0│
│      1000041│      -1│     0│
│      1000056│      -1│     0│
│      1000062│      -1│     0│
│      1000075│      -1│     0│
│      1000089│      -1│     0│
│      1000093│      -1│     0│
│      1000104│      -1│     0│
└─────────────┴────────┴──────┘
(10 rows in 0.004865 secs)

select count(*) from all_data(run_id = 162)
┌─────────────┐
│col1         │
├─────────────┤
│       502713│
└─────────────┘
(1 row in 20.629744 secs)

By comparison on the source table such a select would take about 0.5 seconds.

The trade off of performance for reliably maintaining the data was worth it.

Marty
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.planetingres.org/pipermail/info-ingres/attachments/20181122/e4127a50/attachment.html>


More information about the Info-ingres mailing list