[Info-ingres] Weird copy timeout error E_LC0030_WRITE_SEND_FAIL

Thu Sep 14 13:09:19 UTC 2017

Hi All,

I have a job which runs on a host with ingres version II 10.2.0 (a64.lnx/100) + 15162.

It uses a vnode to connect to a database on another host which runs ingres version II 10.2.0 (a64.lnx/100) + 15151.

Having established the connection to the remote database it does some initial work and then must pause activity on that connection while work is being performed on other hosts/databases. After that work is completed the connection then gets to do as follows:
drop table if exists targetable; /* Which works with no error */

create table targetable(
    a integer4 not null not default,
    b integer4 not null not default,
   c integer4 not null not default
) with nojournaling;
/* And this too works with no error. Note that the columns are all just plain old integers. Nothing fancy No blobs No nvarchar.
*/

copy table targetable(a=c0tab, b=c0tab, c=c0nl) from 'a/raging/great/data/file';

And on that last step we have recently started getting an error:
E_LC0030_WRITE_SEND_FAIL       GCA protocol service (GCA_SEND) failure with message type GCA_CDATA.
Internal service status E_GCfe06 -- Write to peer process failed; it may have exited. - System communication error: Connection reset by peer..Exiting session because of communications failure.

In the errlog on the target installation we see:
biota             ::[39831        IIGCC, 13193     , 0000000000000002]: Thu Sep 14 13:15:24 2017 E_GC2820_CONN_FAIL_INFO    Connection to node '::ffff:10.131.0.3', port '59824' for user 'ingres' failed: reason follows.
biota             ::[39831        IIGCC, 13193     , 0000000000000002]: Thu Sep 14 13:15:24 2017 E_CLFE07_BS_READ_ERR   Read from peer process failed; it may have exited.
biota             ::[39831        IIGCC, 13193     , 0000000000000002]: System communication error: Connection reset by peer.
BIOTA             ::[39278             , 13088     ,  00007f4eacc6f180, scscopy.c:613         ]: Thu Sep 14 13:15:24 2017 E_SC022E_WRONG_BLOCK_TYPE Internal Protocol Error: SCF received block type 00000005 (5.) when expecting type 00000019 (25.).
BIOTA             ::[39278             , 13088     ,  00007f4eacc6f180, scscopy.c:614         ]: Thu Sep 14 13:15:24 2017 E_SC0250_COPY_OUT_OF_SEQUENCE A COPY data block was received when one was not expected (or not received when expected).

I have managed to show that this is a weird ass timeout. The jobs pause has now breached 15minutes (where have I seen that number before). The really curious thing is that the connection is perfectly fine with anything other than a copy.

I'm working on a test case at the moment, but as it relies on  a very large data file it's a bit of a nuisance.

Anyone seen anything like this before?

Marty
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.planetingres.org/pipermail/info-ingres/attachments/20170914/656d4ec8/attachment.html>