Turn off all the lights at night - Reducing costs by automatically pausing EBS@OCI instances

Turn off all the lights at night - Reducing costs by automatically pausing EBS@OCI instances

Johannes Michler PROMATIS Horus Oracle


Executive Vice President – Head of Platforms & Development


Costs Analysis dashboard in OCI

Running services on a public cloud platform, even when running the service 24/7, is often cheaper than running these services on premise regarding Total Cost of Ownership (TCO). Especially for development and testing instances, this cost advantages get even better when taking into consideration the following aspects:

  • The number of environments required is often volatile: There are phases when you need many environments with high performance (e.g. during a UAT of a major "new feature" or an "upgrade" implementation), but then there are many other months where a single "small" testing instance to troubleshoot issues might be enough.
  • Even during peak periods, such instances are often not needed 24/7 but only during usual business hours.

When running Oracle E-Business Suite on Oracle Cloud Infrastructure (OCI), this advantage is even more significant than with other cloud vendors: OCI allows to easily "rent" licenses for the database and application service required for E-Business Suite in a PaaS model "by the hour". This means: When reducing the average number of CPU cores used throughout the year (by one or both methods shown above), the cost savings can be very significant.

Let's have a look into the second aspect and see how we can easily shutdown E-Business Suite instances on OCI during night or weekend times.

Clean E-Business Suite shutdown

First of all, to automate pausing instances (and billing) during nighttime, you first have to cleanly shut down the E-Business Suite application server and middle tier. At least until today, there is no way of really "pausing" /"freezing" an instance without shutting down everything. Even though this is on the OCI roadmap, we will need to see if a complex environment such as an E-Business Suite will survive such a hibernate. I suggest the following stopEBS.sh script. You will obviously replace APPS_PWD and WEBLOGIC_PWD with your values:

source /u01/install/APPS/EBSapps.env run
adstpall.sh -mode=allnodes apps/APPS_PWD << EOF
WEBLOGIC_PWD
EOF

sh /home/oracle/stop_apex122.sh

echo -n "Waiting for Concurrent Manager to go down"

while true; do
$FND_TOP/bin/FNDSVCRG STATUS > /tmp/icmstatus2.txt
cat /tmp/icmstatus2.txt
if test `cat /tmp/icmstatus2.txt | grep "Internal Concurrent Manager is Active" | wc -l` -eq 0 ; then
echo
echo -n "Concurrent Manager is down now";
break;
fi
sleep 10;
done
ps -fu oracle
sleep 60
ps -fu oracle

Stopping of the OCI infrastructure (and billing)

With that, you can initiate the shutdown of the VMs (in this case with Base Database on VM and a single Apps Tier on Compute) using a script stopInstance.sh to which you need to pass the environment name:

COMPARTMENT_ID=ocid1.compartment.oc1..XXXXXXX
CONFIG_FILE=/u01/install/APPS/.oci/johannes.michler@promatis.de
ENV_NAME=$1

echo "Instance Name:"$ENV_NAME
HOST_APPS=${1,,}app01
IP_APPS=`dig +short ${HOST_APPS}.appssubnet.ebsnetwork.oraclevcn.com`
echo "IP Address Apps:"$IP_APPS
HOST_DB=${1,,}db
IP_DB=`dig +short ${HOST_DB}.dbsubnet.ebsnetwork.oraclevcn.com`
echo "IP Address DB:"$IP_DB

OCID_APPS=$(oci compute instance list --compartment-id $COMPARTMENT_ID --query "data [?\"display-name\" == '${ENV_NAME}app01'].id|join(',',@)" --config-file $CONFIG_FILE | tr -d '\"')
echo "OCID-APPS:"$OCID_APPS
OCID_DB_SYS=$(oci db system list --compartment-id $COMPARTMENT_ID --config-file $CONFIG_FILE --query "data [?\"display-name\" == '${ENV_NAME}'].id|join(',',@)"| tr -d '\"')
echo "OCI-DB-Sys:"$OCID_DB_SYS

OCID_DB_NODE=$(oci db node list --db-system-id $OCID_DB_SYS --config-file $CONFIG_FILE --compartment-id $COMPARTMENT_ID --query "data[].id|join(',',@)"| tr -d '\"')
echo "OCI-DB-Node:"$OCID_DB_NODE

echo Stopping apps tier
ssh $IP_APPS "./stopEBS.sh"
echo Stopping VM-DB
ssh $IP_DB "srvctl stop database -d \$ORACLE_UNQNAME -stopoption IMMEDIATE"
oci db node stop --config-file $CONFIG_FILE --db-node-id $OCID_DB_NODE

echo Now stopping Apps
oci --config-file $CONFIG_FILE compute instance action --action STOP --instance-id $OCID_APPS

The script makes use of the OCI Command Line Interface (CLI) to stop the database and the compute instance(s).

If you have the database on compute, you can use commands like the ones to stop the compute instance for the apps tier.

Keep in mind that you might need to disable advanced monitoring as well to stop billing of that OCI Database Management.

Bringing everything up again

Starting everything again works similar: I'm using a startInstance.sh script as follows:

echo "starting shutdown"
COMPARTMENT_ID=ocid1.compartment.oc1..XXXXX
CONFIG_FILE=/u01/install/APPS/.oci/johannes.michler@promatis.de
ENV_NAME=$1
echo "Instance Name:"$ENV_NAME
HOST_APPS=${1,,}app01
IP_APPS=`dig +short ${HOST_APPS}.appssubnet.ebsnetwork.oraclevcn.com`
echo "IP Address Apps:"$IP_APPS
HOST_DB=${1,,}db
IP_DB=`dig +short ${HOST_DB}.dbsubnet.ebsnetwork.oraclevcn.com`
echo "IP Address DB:"$IP_DB

OCID_APPS=$(oci compute instance list --compartment-id $COMPARTMENT_ID --query "data [?\"display-name\" == '${ENV_NAME}app01'].id|join(',',@)" --config-file $CONFIG_FILE | tr -d '\"')
echo "OCID-APPS:"$OCID_APPS
OCID_DB_SYS=$(oci db system list --compartment-id $COMPARTMENT_ID --config-file $CONFIG_FILE --query "data [?\"display-name\" == '${ENV_NAME}'].id|join(',',@)"| tr -d '\"')
echo "OCI-DB-Sys:"$OCID_DB_SYS

OCID_DB_NODE=$(oci db node list --db-system-id $OCID_DB_SYS --config-file $CONFIG_FILE --compartment-id $COMPARTMENT_ID --query "data[].id|join(',',@)"| tr -d '\"')
echo "OCI-DB-Node:"$OCID_DB_NODE

echo Starting DB
oci db node start --config-file $CONFIG_FILE --db-node-id $OCID_DB_NODE --wait-for-state AVAILABLE
echo Now Starting Apps
oci --config-file $CONFIG_FILE compute instance action --action START --instance-id $OCID_APPS --wait-for-state RUNNING &
wait
echo "started DB and Apps"
sleep 30
until ssh $IP_DB "echo da" 2> /dev/null
do
echo "not ready, waiting 5"
sleep 5
done
echo "sleeping another 30, dann start"
sleep 30

until ssh $IP_DB "srvctl start database -d \$ORACLE_UNQNAME"
do
echo "not ready, waiting 5"
sleep 5
done

until ssh $IP_APPS "echo da" 2> /dev/null
do
echo "not ready, waiting 5"
sleep 5
done
echo "sleeping another 30, dann start"
sleep 30
echo "Just in case the mountpoint is not yet there"
ssh opc@$IP_APPS sudo mount /u01

ssh $IP_APPS ./startEBS.sh

More fine-grained scaling - even without downtimes and maybe on PROD

Of course, the above approach can be down in less harsh manner. Instead of stopping the entire environment, you could, for example, simply stop some of the application tiers during nighttime; e.g. if you need 6 oacore servers to handle the daily work, but maybe only one during the night, then you can shut down five servers every night. And since in this way, the system stays available, you could even do so on production. OCI has dynamic scaling of CPU and memory on the roadmap, so maybe this gives even more flexibility in the future.

Summary

Using the above scripts and some crontab entries (e.g. on the E-Business Suite Cloud Manager machine) you can easily stop most of the costs for E-Business Suite Dev and Test Instances if they're not needed.
There are many reasons to run E-Business Suite on OCI as I've shown in previous blog posts. By dynamically scaling down the infrastructure during low or no-use periods costs can be dropped significantly! If you're interested to try it out, maybe look at the Free Trial for OCI and the E-Business Suite on OCI Hands On Lab - see my previous posts on things to consider while doing so with brand new tenancies.

Migrating E-Business Suite to File Storage Service

Migrating E-Business Suite to File Storage Service

Johannes Michler PROMATIS Horus Oracle


Executive Vice President – Head of Platforms & Development

Recently, I described that starting with Release 22.2.1, the Oracle E-Business Suite Cloud Manager now supports OCI File Storage Service (FSS) besides block storage. Furthermore, the usage of FSS is mandatory for (new) multi-node environments: First Experience with E-Business Suite Cloud Manager 22.2.1

Given the advantages of FSS described in my previous blog one of my customers decided to migrate their apps-tier to File Storage Service. Let’s see how we did this:

Pre-Downtime

In preparation for the move first of all we had to create a new file system:

Creating the new file system.

This file system then has to be attached to a mount target and needs the proper export options:

NFS Export Options

Then, the mount point can be mounted in a temporary directory:

sudo mkdir /mnt/fss
sudo vi /etc/fstab
10.22.9.97:/ENTW220811 /mnt/fss nfs rw,bg,hard,timeo=600,nfsvers=3 0 0

Then:

sudo mount /mnt/fss
sudo chown oracle:oinstall /mnt/fss
sudo yum install -y fss-parallel-tools/

Downtime Operations

After these preparatory steps, we stop the entire environment and copy the content from the previous /u01 block volume over to the new FSS mount point /mnt/fss. Performance can be greatly improved by using fss-parallel-tools and parcp for this. For volumes of considerable size, an incremental operation can also be handy; but for the 300 GB one usually used for the apps-tier doing the migration in a 1-2 hours downtime is usually not a problem:

. setenv_run.s
adstpall.sh -mode=allnodes
~/stop_apex122.sh

 

sudo parcp --restore /u01/install /mnt/fss/

 

If working in a multi-node environment, then on the second apps-tier the mount through NFS to the primary apps-tier then needs to be replaced with a mount to the FSS mount target.

After the copy is finished, the old /u01/install mountpoint is retired and FSS is mounted to /u01:

sudo umount /u01
sudo vi /etc/fstab
10.22.9.97:/ENTW220811 /u01 nfs rw,bg,hard,timeo=600,nfsvers=3 0 0
sudo mount /u01

Then, start the apps-tier again:

. setenv_run.s
adstrtal.sh -mode=allnodes
~/start_apex122.shh

Post Downtime

After the downtime the old block volume should be detached through the OCI console. Furthermore, it is necessary to re-discover the environment in the E-Business Suite Cloud Manager. For this, either unregister the existing environment or (as I did in order to pick up the latest operating system image and add compatibility with the latest OCI shapes for the Cloud Manager) setup a new Cloud Manager environment. Make sure the Cloud Manager network profile is aware of the FSS Mount Target (usually, you have to create a new network profile for this).

Then, issue a re-discovery request:

Rediscovery of the moved environment

Setup FSS Snapshots

FSS is a highly durable service. However, that does not help against e.g. user errors (issuing “rm -rf /u01”). To remedy this, it is helpful to setup FSS Snapshots with e.g. an hourly snapshot (kept for a day) plus a daily snapshot (kept for a week). Unfortunately, so far this cannot be setup from the OCI console. Furthermore, for FSS it is currently not possible to perform automatic backups to OCI object storage (this is on the roadmap, though). However, using the utility fss-scheduler, such a backup policy can be easily setup.

Costs

As described in my previous post, FSS is considerably more expensive “per gigabyte” (see the oci price list) compared to block storage: 30 cents compared to just around 4.5 cents per GB and month. However due to the “sparse” nature of FSS, for an environment with production plus 3 clones for dev/test our storage usage changed from 4x400GB=1200GB with block storage (equaling roughly 54 USD/month) to 200GB + (3*15) GB = 245 GB (which equals 74 USD/month). The relative increase in costs may still seem significant, compared with the total costs for hosting 4 E-Business Suite environments with terabytes of block storage for the database; these are, however, peanuts. The ratio might even improve if in the future, Oracle will eventually use FSS clones for creating the patch file system (see, and vote for my idea over there).

Summary

By performing the above steps, you can migrate an E-Business Suite apps-tier easily to File Storage Service in a downtime of about 1 hour. By doing so, this it is e.g. possible to create very fast clones that are furthermore “sparse”.

Sicherungskonzepte für Oracle Cloud Infrastructure

Sicherungskonzepte für Oracle Cloud Infrastructure

Die Oracle Cloud Infrastructure (OCI) wird von vielen unserer Kunden für den Betrieb ihrer Oracle E-Businness Suite verwendet. Besonders, wenn nicht nur Entwicklungs- und Testsysteme in der OCI betrieben werden, wird hier ein solides Konzept zur Sicherung (und Wiederherstellung) der Umgebung für den Katastrophenfall benötigt – egal ob dieser durch Benutzer- oder Systemfehler ausgelöst wird. Sehen wir uns die möglichen Optionen hierfür einmal genauer an.

Grundkonzepte und Terminologie – RPO und RTO

Die beiden wichtigsten Begriffe beim Entwurf einer Backup-Strategie sind zweifellos Recovery Point Objective (RPO) und Recovery Time Objective (RTO). RPO ist die akzeptierte Datenmenge, deren Verlust im Falle einer Katastrophe in Kauf genommen wird. Beispielsweise bedeutet ein RPO von 30 Minuten, dass man nach Eintritt einer Katastrophe in keiner Situation mehr als 30 Minuten an Transaktionen verlieren möchte. RTO bezeichnet die Zeit, die benötigt wird, um die Instanz nach einer Katastrophe wieder betriebsbereit zu machen. Die Oracle-Dokumentation der Datenbank – insbesondere „High Availability Overview“ – bietet weitere Einzelheiten dazu.

OCI-Isolationsgrade

Es gibt drei mit der Oracle Cloud Infrastructure verbundene Isolationsgrade, die zum Schutz vor Ausfällen beitragen …

Lesen Sie hier den kompletten Artikel im Red Stack Magazin 03/2021.

Bildquelle: © pixabay.com

New Compute Shape E4.Flex available on OCI in Frankfurt

New Compute Shape E4.Flex available on OCI in Frankfurt

Johannes Michler PROMATIS Horus Oracle


Senior Vice President – Head of Platforms & Development

While upgrading one of our customers from 12.1.3 to 12.2.10 I realized (by chance) that Oracle made available the latest E4.Flex (AMD EPYC) shapes in the Frankfurt OCI data center last week. This was not the case at the time of the official “news update”.

Back in march the new instance type was only available in a single data center in Europe (Zurich).

This has now obviously changed: The new instance type can at least be used in the AD-1 and AD-3 availability domain of the EU-FRANKFURT1 data center. This is now also confirmed in the “Service Limits” documentation.

Both my first personal tests and the benchmarks with the new shape suggest that there is another 15-20 % speedup when compared to the E3.Flex instance. This is really great, given that even the older E3.Flex instance provides greater performance at maximum flexibility when compared both with the traditional Intel XEON CPU VMStandard2.x under OCI or with other Cloud vendors.

New Compute Shape E4.Flex available on OCI in Frankfurt

OCI I/O-Performance with E-Business Suite (Part 3/3) - LVM

Johannes Michler PROMATIS Horus Oracle


Senior Vice President – Head of Platforms & Development

While moving a customer from On-Premises to Oracle Cloud Infrastructure, we recently conducted some performance testing before the final cutover. For the E-Business Suite database server, a physical box with 16 CPU cores was migrated to a VMStandard E3.Flex VM with 12 OCPU. From a CPU performance perspective, the new VM was much faster than the old box. Unfortunately, this was not the case on the I/O performance side. The first blog post (part 1) contains some general information about I/O performance in Oracle Cloud Infrastructure. In Part 2 I have covered how we can use ASM to overcome the limitations.

In this part we'll see how we can migrate an E-Business Suite on an OCI database instance to a LVM "RAID0" volume (striping) across a bunch of smaller disks.

Logical Volume Manager LVM

Wikipedia gives a good explanation on the purpose of Linux LVM:

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. LVM is used for the following purposes:

  • Creating single logical volumes of multiple physical volumes or entire hard disks (somewhat similar to RAID 0, but more similar to JBOD), allowing for dynamic volume resizing.
  • Managing large hard disk farms by allowing disks to be added and replaced without downtime or service disruption, in combination with hot swapping.
  • On small systems (like a desktop), instead of having to estimate at installation time how big a partition might need to be, LVM allows filesystems to be easily resized as needed.
  • Performing consistent backups by taking snapshots of the logical volumes.
  • Encrypting multiple physical partitions with one password.
    LVM can be considered as a thin software layer on top of the hard disks and partitions, which creates an abstraction of continuity and ease-of-use for managing hard drive replacement, repartitioning and backup.

For our purpose we are mainly interested in the first aspect of LVM.

Creating a new Logical Volume

Unfortunately, E-Business Suite instances provisioned through Oracle Cloud Manager (be it new installations or be it the restore of a backup) "on Compute" only use a single "Block Volume" to store all the data and the Oracle-Home of the Database (the whole structure beginning with /u01). Fortunately, this can be changed with the following sequence of steps outlined below:

1. In the Oracle Cloud Infrastructure Console create 9 new Block Volumes (lvm1-lvm9) with 500 GB each and attach them to the database compute instance.
2. Run the iscsi-attach commands shown in OCI console for each of the volumes (making them available in the linux vm).
3. Convert the LUNs into LVM physical volumes using:
pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
4. Extend the volume group to use those disks:
vgextend ebs_vg /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
5. Create a new logical volume:
lvcreate -i9 -I4M -l100%FREE -n ebs_lv3 ebs_vg
Note: I have made good experience using a stripe size of 4 Megabyte, however this may depend on your payload and DB Block size.
6. Verify with lvdisplay -m that the new logical volume is available and appropriately striped.

Moving data to the new Volume

The easiest approach is to do an "offline" copy of all the data to the new logical volume. For this I did:

1. Stop everything and unmount /u01
2. Move everything to the new volume:
dd if=/dev/dm-0 of=/dev/dm-1 bs=128M status=progress oflag=direct iflag=direct
Double-check to take the appropriate source and target. If you want to resize to a smaller target in the same process you can also use filesystem copy (while preserving all links and permissions). The copy worked with appx. 600MByte/s so took a bit more than 2 hours for our database.
3. Change /etc/fstab to mount ebs_lv2 instead of ebs_lv and remount /u01
4. Bring the database up again.

It is also possible to do an online-migration using an approach as described here:

1. Instead of step (5) of the lvcreation do a lvconvert to create a raid1 between the existing LUN and the new LUNs:
lvconvert --type mirror -m 1 --mirrorlog core --stripes 8 /dev/ebs_vg/ebs_lv /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj
2. If this gives an error like "Number of extents requested (1165823) needs to be divisible by 8." first extend the logical volume by 1-7 extends to have something dividable.
3. Then break the mirror:
lvconvert -m0 /dev/ebs_vg/ebs_lv /dev/sdb
4. Reduce the VG by the old LUN:
vgreduce /dev/ebs_vg /dev/sdb
5. Remove the PV:
pvremove /dev/sdb
6. Detach the old 4.5TB Volume using the iscsi Commands, then using the OCI Cloud Console. Finally destroy the old/large volume.

Performance

Measuring Performance gives us an I/O-Performance that is by a factor of 7.7 (out of the theoretical maximum of 9) above the performance with a single drive. The throughput is a bit more than factor 3 higher than with a single drive:
Max IOPS = 178.571
Max MBPS = 1.409

This poor increase in throughput is due to the fact that we have hit the networking speed of our 12 OCPU VM: With "only" 12 GBps of network bandwidth we can reach no more than appx. 1400 MByte per second.

To check if this scales up even further with the added Network Bandwidth, change the number of OCPUs from 12 to 20, which gives us:

max_iops = 213.391 (theoretical max. by Storage: 225K)
max_mbps = 2.115 (theoretical max. by Storage 4.320; by Network 2.500 MByte/S)

This is again close to the physical limit of the network interface regarding the throughput, and as we see, the IOPS went a bit higher as well.

Overall, the performance was more or less the same compared to ASM. Note that for ASM I chose to use only 7 LUNs (since that was enough to contain all the data; then in addition, I had a LUN with 1 TB for the Oracle Home/...) while in the case with LVM everything went to the same logical volume requiring 9 LUNs (and thus providing an even better performance).

Summary

Using multiple Block Storage devices behind a single LVM logical volume with striping provides a convenient and reliable way to store the contents of an Oracle Database. In Oracle Cloud Infrastructure, more than a single block volume can easily be used to store the data. By striping of the data across multiple volumes, one can get a much better I/O performance than by using a single block volume, as provided by oracle "out of the box". For a database with multiple terabytes of storage, this provides a massive improvement in performance without additional OCI costs.

There are only two major drawbacks of this approach:

  • Unfortunately, so far, it is not possible to perform a fast-clone using E-Business Suite Cloud manager. There is an Enhancement Request planning to add support for a similar functionality.
  • Creating a backup and restoring this backup (e.g. for Cloning production to test) using Cloud Manager Tooling is possible. However, the target will also just have a single block volume. This leads to the testing environment being (slightly, hopefully this has only a performance impact) different from production.

To round up, I would recommend going for the LVM approach compared to ASM for the time being - especially if you do not use ASM for all of your "other" Oracle Databases in use.

The numbers are so excellent using this approach, that for now I do not give an "LVM Cache" on top of DenseIO instances with locally attached NVMe storage a try (as I had initially thought of). According to the documentation, this would at least not help much for an 8 OCPU instance much, given that DenseIO would then also "only" provide 250.000 IOPS. And it adds a lot of complexity and additional costs, especially since no "DenseIO E3 shape" is currently available.