SQL*Net (a.k.a Oracle TNS) and firewalls…

Most vendor’s firewalls have a SQL ALG that handles SQL*Net traffic. They listen on TCP port 1521. SQL*Net is based on Oracle’s TNS protocol. The specification for this protocol is proprietary and inaccessible, but you can figure it out by reading Oracle’s docs and looking at the Wireshark dissector source code.

Essentially TNS was specified in such a way that the session on port 1521 was a “control” session of sorts. The Oracle listener process listens on this port. Once connected to this port the client requests an Oracle “service.” If the listener knows of such a service, it would send a “redirect” message back to the client with the IP and port of the process providing that service. That other process is either a dedicated server process or a “dispatcher” process that arbitrates requests to a shared server process. This second TCP connection was commonly referred to as the “data session” or “data connection.” On the firewall the ALG inspects traffic in the “control” session looking for “HOST=” and “PORT=” in order to perform NAT properly and to open pinholes. That’s all the ALG does.

However, the terms “control” and “data” are somewhat misleading.  Oracle TNS isn’t like FTP.   Its not really a control session. A TNS connection is just a TNS connection. There isn’t a separate port or protocol messaging for control vs data. In fact, in most cases once the redirection occurs the original connection to 1521 should be closed according to the documentation.

The default behavior has changed in more recent versions of Oracle, however.  When the secondary process (dispatcher process or dedicated server process) resides on the same host as the listener process, a redirect does not occur. Instead, internal to the server, the socket handle of the connection to port 1521 is passed to the secondary process. This means there is no second TCP connection. Subsequent SQL queries and the result sets are passed over this connection. Oracle refers to this method as “bequeathing the connection” or “direct hand-off.” Many Oracle deployments are built in this fashion now.

This poses two problems for the ALGs: (1) The firewalls must inspect the packets in the “control” session looking for IPs and ports. If a large amount of data is transiting the firewall over port 1521 this could impact the CPU of the firewall. (2) The programmer’s of the ALGs tuned down the buffers of the control session (why would you need large buffers for a control session?). This negatively impacts the performance of the Oracle Session.

Cisco tunes the buffers down to 16k (this is the equivalent of the SDU paramenter in Oracle) while Juniper tunes it down to 2k. The impact is not good on an SRX therefore. Juniper did “fix” this by increasing the window in 10.4R1, but it looks like its back to being 2k in 10.4R4. Even with the the larger buffer though, if you are using a newer version of Oracle that behaves this way by default and there are a lot of sessions then why impact the CPU unnecessarily? It has to inspect those packets looking for IPs and ports. If Oracle isn’t redirecting then there is no point to this.

As it happens, both Cisco and Juniper recommend bypassing the SQL ALG when using newer versions of Oracle that behave this way. You then would only enable or disable the ALG as required by policy on a per IP/port basis.

In Checkpoint firewalls, there are two ALGs for SQL*Net: “sqlnet1″ and “sqlnet2.” sqlnet1 should be used for non-redirected sessions and sqlnet2 should be used for redirected sessions. The implication is that non-redirected sessions evaluated against sqlnet2 could negatively impact the CPU of the firewall.

For Cisco firewalls you can disable the SQL*Net ALG globally and then enable it as needed per policy using policy-maps.   For Juniper SRXs, it appears that you must *disable* the ALG per policy as required using a custom application configuration. It may be possible to disable it globally and then enable it per policy as with Cisco, but I believe if you disable the ALG globally on the SRX then the ALG can’t be used. Honestly, I haven’t verified this yet.

1. How to disable an ALG globally on a Cisco ASA then enable it per-policy

2. Disabling an ALG globally or per-policy on a Juniper SRX

Notes from Cisco’s documentation:

##########

SQL*Net Version 2 TNSFrames, Redirect, and Data packets will be scanned for ports to open and addresses to NAT, if preceded by a REDIRECT TNS frame type with a zero data length for the payload.

SQL*Net Version 1 is assumed for all other cases. TNSFrame types (Connect, Accept, Refuse, Resend, Marker, Redirect, and Data) and all packets will be scanned for ports and addresses.

Note:Disable SQL*Net inspection when SQL data transfer occurs on the same port as the SQL control TCP port 1521. The security appliance acts as a proxy when SQL*Net inspection is enabled and reduces the client window size from 65000 to about 16000 causing data transfer issues.

##########

Notes from Juniper KB article 8087:

########

In general TCP port 1521 is the default port to trigger the SQL ALG and is a default port assigned to SQL. If there is an Oracle application which uses the SQL port 1521 for both the Control and Data channel, then TCP port 1521 being this the signalling channel for or SQLNET ALG, each packet is sent to the CPU. Depending on the number of packets hitting the firewall we can expect the firewall to experience high CPU.

When you are sure that all SQL applications are single threaded and using only port TCP 1521 for both the Control and Data channel, then you should consider turning off  SQL ALG. Otherwise, you can consider having the DBA’s to change the port for such applications to something other than 1521.

The SQL ALG is recommended when Oracle applications are multi-threaded and where they open a high range of TCP ports dynamically for data channels and TCP port 1521 is only used for the Control connection.

##########

There is an internal note in Juniper PR524444 that indicates, essentially, that Oracle sessions that are not redirected can bypass the ALG as it is not needed. This PR was opened because of the ALG’s impact on the performance of the Oracle sessions.

Cloud Toad
CloudToad is adrift on the great sea of network serenity. - CCIE #15672 (RS, SP) JNCIE-M #721 Twitter: @cloudtoad LinkedIn: http://www.linkedin.com/in/derickwinkworth - Derick's opinions are his own and do not reflect those of the company he works for.
  • Anonymous

    Wow I used to do a lot of TNS network protocol analysis in the late 90s and some of my papers on this are located here on my site. Just email me for them

    Mostly I was able to get more information about TNS from oracle directly when escalating an issue. plus there is a good book called Oracle Networking from Oracle Press. dated but good.

    Ora 15xxs etc full table scans et. al. Most of the problems I analyzed were not network related but SQL related. Poor SQl programming, invalid use of cursors, LOV processing  etc over wan links made the application appear slow to users but it was never the network. Agilent at the time had the best TNS decodes going. 2 and 3 tier architectures analysis and table normalization, wow you bring back some memories.

  • Writemike

    “The default behavior has changed in more recent versions of Oracle, however.” Any chance you could be more specific on when you saw this default behavior occur?  Which version of Oracle or a date range? We are currently troubleshooting this exact issue on some SRX5800s and your article has been a big help! Thank you!

  • sven

    Great article

  • Sam

    Your Article was good help troubleshooting our Oracle issue on SRX240, issue was with big query (29K) applicaton freezed but not with small query or just -1 record from the big query also worked , port was used 1521. by disabling ALG resolved the issue.