2010-11-12

11.2.0.2 multicast fails totally

Recently Oracle released a new Patch Set: 11g Release 2 (11.2.0.2) Patch Set 1

In its README there is no word about multicast or the IP 230.0.1.0. It just contains a link to the Known Issues specific to the 11.2.0.2 Patch Set. Still no word there about multicast or the IP 230.0.1.0.
But why is it so important for me, more than it seems to be for Oracle?

Oracle has introduced a new feature in 11.2.0.2 Grid Infrastructure: It uses multicast communication on the cluster interlink for some purposes.
To be more precise, it uses the IP 230.0.1.0 by default for this purpose.

Multicast per se is not a bad thing at all, but it should be done right, not the Oracle way:

The IANA has defined a set of IP adresses to be used for multicast communication. All together this is the range from 224.0.0.0 to 239.255.255.255.
As 230.0.1.0 is within this range - everything fine? NO!
The block 225.0.0.0-231.255.255.255 is marked as RESERVED. And the IETF has a clear wording about these addresses:

Applications MUST NOT use addressing in the IANA reserved blocks.


It seems I'm not the first person who found out this issue. If you read
11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting Requirement [ID 1212703.1] you see the

Cause


The cause of this issue is CSSD being unable to establish network communication on the private interconnect network.  Assuming that Cluster Verify (cluvfy) has succeeded on all network checks or you are upgrading to 11.2.0.2 whereas the previous release was not experiencing communication issues, multicast not being enabled on the private network is a potential cause.

Oracle Grid Infrastructure 11.2.0.2 introduces new feature called Redundant Interconnect allowing for Oracle Supplied redundancy for the cluster interconnect.  With this new feature, multicast network communication on the private interconnect network is utilized on bootstrap to establish communication with peer nodes in the cluster, once communication is established network communication is then switched to unicast.  This mulitcast communication utilizes the 230.0.1.0 address (port 42424) on the private interconnect network.  Therefore multicast on the private interconnect network must be enabled and properly functioning on all cluster nodes for the mulitcast address 230.0.1.0 (port 42424).  Should multicast communication fail, the end result will be the inability for the node to join the cluster (as shown above in the symptoms). 


and a suggested

Solution


It has been found that the functionality of multicast on the 230.0.1.0 (port 42424) network address has been problematic with some network environments resulting in the issue stated above.  To address this issue Oracle has released Patch: 9974223 on top of 11.2.0.2 .  This patch makes use of the 224.0.0.251 (port 42424) multicast network address in addition to the 230.0.1.0 (port 42424) multicast address.  Multicast must be enabled on one of these two addresses to allow for Oracle Grid Infrastructure to successfully start on all cluster nodes.


Patch 9974223 has a size of 153M for Linux x86-64. but that's not a problem per se.


The real problem is the spirit of the patch: Instead of making the multicast IP configureable (by that it would be possible to use a dedicated IP address which was assigned for exactly that purpose) Oracle decided to use another IP: 224.0.0.251. So let's look for that IP in more detail: In the IANAs Document for multicast IPs 224.0.0.251 is defined for the use with mDNS - Multicast DNS.
Ok. Now we have at least an IP which is defined to be used with a protocol to announce available resources within a local network. But still, Oracle failed:
Now 224.0.0.251 (port 42424) is used. But this conflicts with the IETFs Document for Multicast DNS: There you can read:


When this document uses the term "Multicast DNS", it should be taken
to mean: "Clients performing DNS-like queries for DNS-like resource
records by sending DNS-like UDP query and response packets over IP
Multicast to UDP port 5353."


5353 != 42424


At the end my plea to Oracle is: Make the Multicast IP of a Cluster configurable by the customer! Follow existing standards! Many companies has different requirements regarding the usage of IPs - so any hard coded value will have disadvantages.

Update: I've just created SR:3-2347539801 for this - just out of curiosity.