TCP timeout on Solaris 9?

  • Follow


Hi folks,

I have Solaris 9 server, on this box I have an app that talks to a
database on a linux server. Between these servers is my PIX firewall.

The Problem...
the webserver in the morning and at lunch will start timing out when
trying to talk to the database server, once it re-establishes comms
it's ok until the next day or lunch.

What I think is happening...
the PIX has a TCP timeout value of 1 hour, the Solaris box has a TCP
timeout of 2 hours, I think the Solaris box doesn't realise the
connection's been dropped so it doesn't try to re-establish the
connection, the pix doesn't like this so it blocks the packets. After
the initial 1 hour of connection, the firewall will only let the suse
box get passed if it creates a new connection.

What I've done..
I've ran the following command in a start up script and can confirm the
changes have been implemented successfully...
ndd -set /dev/tcp tcp_keepalive_interval 1200000 (this changes the
timeout to 45 minutes)

This still doesn't work, so to verify that it is indeed a timeout issue
I've put the PIX to 24 hour timeout, since then the solris box hasn't
timed out.

So what other settings other than tcp_keepalive_interval will hold a
TCP connection open?

a bit long winded but got there in the end!

thanks a lot
Dave

0
Reply dave_h194 (16) 10/14/2005 8:01:16 PM

Hi Dave,

>the PIX has a TCP timeout value of 1 hour, the Solaris box has a TCP
>timeout of 2 hours, I think the Solaris box doesn't realise the
>connection's been dropped so it doesn't try to re-establish the
>connection, the pix doesn't like this so it blocks the packets. After

How does the PIX then "drop" the connection?
Should it not send RSTs in some direction when it feels like interrupting
the existing connection? AFAIK that is the way TCP connections are
"aborted"...

If it just sits in the middle and blocks traffic belonging to that
connection, I don't think any one of the both systems involved can notice
that.

I would suggest not to try and adjust timeout values (I wonder if they help
here anyways), but make the connection abort happen in a way both parties
can recognize...

Cheers,
  Thiemo


-- 
Query a PGP key server (e.g. http://www.pgp.net/) for my public key 41068629.
Strange sender address? Please see http://www.thiemo.net/misc/list-mail.shtml
0
Reply Thiemo 10/14/2005 8:28:36 PM


In article <1129320076.495743.103560@g43g2000cwa.googlegroups.com>, 
dave_h194@yahoo.co.uk says...
> Hi folks,
> 
> I have Solaris 9 server, on this box I have an app that talks to a
> database on a linux server. Between these servers is my PIX firewall.
> 
> The Problem...
> the webserver in the morning and at lunch will start timing out when
> trying to talk to the database server, once it re-establishes comms
> it's ok until the next day or lunch.
> 
> What I think is happening...
> the PIX has a TCP timeout value of 1 hour, the Solaris box has a TCP
> timeout of 2 hours, I think the Solaris box doesn't realise the
> connection's been dropped so it doesn't try to re-establish the
> connection, the pix doesn't like this so it blocks the packets. After
> the initial 1 hour of connection, the firewall will only let the suse
> box get passed if it creates a new connection.
> 
> What I've done..
> I've ran the following command in a start up script and can confirm the
> changes have been implemented successfully...
> ndd -set /dev/tcp tcp_keepalive_interval 1200000 (this changes the
> timeout to 45 minutes)
> 
> This still doesn't work, so to verify that it is indeed a timeout issue
> I've put the PIX to 24 hour timeout, since then the solris box hasn't
> timed out.
> 
> So what other settings other than tcp_keepalive_interval will hold a
> TCP connection open?
> 
> a bit long winded but got there in the end!
> 
> thanks a lot
> Dave
> 
> 
Perhaps if you do some (I know, its not nice) ping in a crontab to your 
database server every 30 mins, it should solve your problem.
0
Reply miki 10/14/2005 9:19:41 PM

In <1129320076.495743.103560@g43g2000cwa.googlegroups.com> dave_h194@yahoo.co.uk writes:

>What I've done..
>I've ran the following command in a start up script and can confirm the
>changes have been implemented successfully...
>ndd -set /dev/tcp tcp_keepalive_interval 1200000 (this changes the
>timeout to 45 minutes)

In addition, you need to modify or configure your application so
that it sends TCP keepalive packets.  Something like this will do it:

    if (setsockopt(0, SOL_SOCKET, SO_KEEPALIVE, (const void *)&oval,
                   sizeof(oval)) < 0) {
        perror("keepalive");
    }

Most programs do not do this by default.  Some have a configuration
setting to enable keepalives.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-
0
Reply Gary 10/14/2005 9:27:28 PM

Thiemo Nordenholz <list+ip37be01jlc4qtr2@thiemo.net> wrote:
> Hi Dave,

>>the PIX has a TCP timeout value of 1 hour, the Solaris box has a TCP
>>timeout of 2 hours, I think the Solaris box doesn't realise the
>>connection's been dropped so it doesn't try to re-establish the
>>connection, the pix doesn't like this so it blocks the packets. After

> How does the PIX then "drop" the connection?
> Should it not send RSTs in some direction when it feels like interrupting
> the existing connection? AFAIK that is the way TCP connections are
> "aborted"...

Not if it's a NAT box like this.  After an idle period, it simply times
out the cache entry establishing the NAT mapping.  Later traffic will
simply not work.

> If it just sits in the middle and blocks traffic belonging to that
> connection, I don't think any one of the both systems involved can notice
> that.

Right, but they can send data so that the entry does not age out.

-- 
Darren Dunham                                           ddunham@taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >
0
Reply Darren 10/14/2005 9:31:36 PM

Hi Gary,

So will I just copy and paste that info into the same script as the
"ndd" command? do I need to change any of the values?

If that doesn't work I'll try the Ping Cron job.

cheers guys, you've been very helpful, but if anyone else has any other
ideas let me know!
Dave

0
Reply dave_h194 10/14/2005 10:54:41 PM

In article <1129330480.995145.249270@g47g2000cwa.googlegroups.com>,
 <dave_h194@yahoo.co.uk> wrote:
>
>If that doesn't work I'll try the Ping Cron job.

will not work. 
this does not modify the session-timeout for a given connection. 
you have to ensure, that the given session will produce 
some data periodicly within the smallest timeout of all
components. 

best regards 
hans 

-- 



0
Reply mayer42 10/16/2005 12:10:22 PM

So I have to get my webserver app to periodically communicate to the
Database server after the initial connection has been made?

0
Reply dave_h194 10/16/2005 3:51:55 PM

dave_h194@yahoo.co.uk wrote:
> So I have to get my webserver app to periodically communicate to the
> Database server after the initial connection has been made?

Or at least get TCP to send a TCP keepalive.

rick jones
-- 
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 10/17/2005 9:43:37 PM

>> So I have to get my webserver app to periodically communicate to the
>> Database server after the initial connection has been made?

>Or at least get TCP to send a TCP keepalive.


Which is what I thought I had done to begin with when I set the TCP
Keepalive Time to 45 minutes...

0
Reply dave_h194 10/18/2005 12:48:36 PM

In article <1129639716.874993.276270@g14g2000cwa.googlegroups.com>,
 <dave_h194@yahoo.co.uk> wrote:
>
>Which is what I thought I had done to begin with when I set the TCP
>Keepalive Time to 45 minutes...

i can't belief, that your tcp-stack can send a keepalive.
isn't it more, that after 45 min the session will be dropped ? 
so you have to increase the keepalive-timeout on all participated 
components. 


best regards 
hans 

-- 



0
Reply mayer42 10/18/2005 9:08:58 PM

dave_h194@yahoo.co.uk wrote:
>>> So I have to get my webserver app to periodically communicate to
>>> the Database server after the initial connection has been made?
>> Or at least get TCP to send a TCP keepalive.

> Which is what I thought I had done to begin with when I set the TCP
> Keepalive Time to 45 minutes...

What you set was how long it would be before TCP would issue a TCP
keepalive.  However, the application has to enable TCP keepalives on a
connection by connection basis via setsockopt() (Unless Solaris has
added a "enable TCP keepalives by default" option)

rick jones
gotta love firewalls...
-- 
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 10/18/2005 10:01:47 PM

Hi lads,

it seems we may have fixed the problem but we don't know how!!!

I initially kept this question simple so I left out the fact that we
had another webserver, an exact duplicate of the webserver we've been
talking about here.

to get the system running and to verify that the timeout values were
indeed the cause I increased the timeout on the PIX Firewall to 24
hours, we left this running for a couple of days, everything workied
fine, we then updated the JDBC connecter and set it's timeout on the
solris box to 10 minutes, (it comes with one that's set to 10 minutes
but it apparantly doesn't work), we then changed the firewall back to 1
hour timeouts.

Since then BOTH servers have worked fine, it appears as if the problem
is fixed, the issue is that we never done any changes to the other
webserver apart from change it's TCP_KEEPALIVE_INTERVAL value, which
didn't work initially.

any ideas???
Dave

0
Reply dave_h194 10/19/2005 8:43:13 AM

dave_h194@yahoo.co.uk wrote:
> Since then BOTH servers have worked fine, it appears as if the
> problem is fixed, the issue is that we never done any changes to the
> other webserver apart from change it's TCP_KEEPALIVE_INTERVAL value,
> which didn't work initially.

Well, a change to the tcp_keepalive_interval is unlikely to have
altered any existing connections.  Someone from Sun would have to
comment on whether new listen endpoints would have to be created to
get newly accepted connections to get the updated value.

rick jones
-- 
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 10/19/2005 7:06:38 PM

13 Replies
626 Views

(page loaded in 0.235 seconds)

Similiar Articles:


















7/20/2012 3:36:19 PM


Reply: