Bad Gateway error after apply July 2023 CPU
On Tuesday Oracle released the quarterly security patch set, which contains a bunch of security related patches for both the Tech Stack and the Application Tier of E-Business Suite. While applying the patch for some of our customers running E-Business Suite on OCI I realized that access to the environment was no longer possible: Instead, a "502 Bad Gateway" error is shown.
Analysis and Root Cause
While analyzing why the OCI Load Balancer was setting the backend into an unhealthy state, I realized, that there were tons of errors in the access log of the Oracle HTTP Server that is part of E-Business Suite as follows:
[2023-07-19T19:16:14.0055+02:00] [OHS] [ERROR:32] [OHS-9999] [core.c] [host_id: master230703app01] [host_addr: 172.31.11.76] [pid: 18249] [tid: 139845577930496] [user: oracle] [VirtualHost: main] [client ::1] ModSecurity: Access denied with code 400 (phase 1). Match of "rx ^mastebsapp.oci.promatis.de(:443)?$" against "REQUEST_HEADERS:Host" required. [file "/u01/install/APPS/fs1/FMW_Home/webtier/instances/EBS_web_OHS1/config/OHS/EBS_web/security2.conf"] [line "77"] [id "100017"] [hostname "mastebsapp.oci.promatis.de"] [uri "/index.html"] [unique_id "ZLgaXqwfC0wAAEdJHK8AAAAU"]
Analyzing further, I found that this is due to a new line in security2.conf generated through the latest template for that file (security2_conf_FMW.tmp 120.5.12020000.19):
< SecRule REQUEST_HEADERS:Host !^mastebsapp.oci.promatis.de(:443)?$ phase:1,id:100017,deny,log,t:lowercase,status:400
While this is generally a good idea, it is not compatible with the health checks that are usually performed by OCI Load Balancers:
Regular Health Check for OCI Load Balancer
Performing a HTTP is usually a clever idea, however, the OCI Load Balancer has no way to pass a host name to the backend during the health check. That leads to the health check failing, putting the seemingly down backend out of rotation:
Failed Load Balancer health check
I have filed a SR for this, and Oracle is working on it with bug 35626508.
There is a bunch of workarounds: I decided to change the health check to be a TCP level check which is just a click of a button in the OCI Console:
Update the health check to be just TCP
Other alternatives would have been:
- Change the Status Code from 200 to 400 (which is what the server answers). I dislike that with this approach the log file is still filled up with above errors.
- Change the security.conf to remove that line: This has the disadvantage of the file being overwritten on the next autoconfig run.
- Change the security2_conf_FMW.tmp: This is probably the best way, but since I'm waiting for a official solution anyway it was not worth the additional effort.