SAP on Amazon Web Services (AWS)

A collection of AWS SAP related resources. This is work in progress. Please revisit this page from time to time.

AWS (Amazon Web Services) SAP Benchmarks in Cloud Environments

Resources

Benchmarks published

Two-tier Internet Configuration

2-Tier Internet Configuration
Certification Number Date Benchmark Instances  OS 
2016021 May 2016 Sales and Distribution x1.32xlarge instance  Windows Server 2012 R2 Standard Edition
2015032 July 2015 Sales and Distribution m4.10xlarge instance  Windows Server 2012 Standard Edition
2015006 Mar. 2015 Sales and Distribution c4.8xlarge instance  Windows Server 2012 Standard Edition
2015005 Mar. 2015 Sales and Distribution c4.4xlarge instance  Windows Server 2012 Standard Edition
2014041 Oct. 2014 Sales and Distribution c3.8xlarge instance  Windows Server 2012 Standard Edition
2014035 June 2014 Sales and Distribution r3.8xlarge instance  Windows Server 2012 Standard Edition
2014010 Mar. 2014 Sales and Distribution cr1.8xlarge instance  Windows Server 2008 R2 Datacenter

 

Three-tier Internet Configuration

3-Tier Internet Configuration
Certification Number Date Benchmark Instances  OS 
2013035 Nov. 2013 Sales and Distribution  9 m2.4large instances Windows Server 2008 R2 Datacenter

 

SAP BW Enhanced Mixed Load (BW EML)

SAP BW EML
Certification Number Date Benchmark Ad-Hoc Navigation Steps/Hour Instances  OS 
2014001 Jan. 2014 SAP BW Enhanced Mixed Load (BW EML) 500.000 rcords 113390 1 cr1.8xlarge DB server + 2 c3.8xlarge appl. server instances SuSE Linux Enterprise Server 11
2014013 Apr. 2014 SAP BW Enhanced Mixed Load (BW EML) 5.000.000 rcords 137510 cr1.8xlarge DB server + 2  c3.8xlarge appl. server instances

SuSE Linux Enterprise Server 11 (DB Server), Windows Server 2008R2 Datacenter Edition (app servers)

2014014 Apr. 2014 SAP BW Enhanced Mixed Load (BW EML) 2.000.000.000 rcords 177590 cr1.8xlarge DB server + 3  c3.8xlarge appl. server instances

SuSE Linux Enterprise Server 11 (DB Server), Windows Server 2008R2 Datacenter Edition (app servers)

 

AWS Data Provider for SAP

 Resources

Users in the chinese region will have to use:

Testing the Collector

A well operating collector will operate a web server which reports the current status through a URL in the following form:

http://localhost:8888/vhostmd

The collector is supposed to bind against localhost only for security reasons.

Information flow around the AWS Data Provider for SAP

Data Provider Installation through AWS System Manager (for SLES)

Prerequistes

Execute the following two steps to enable an instance to be managed by System Manager:

  • Add the AWS managed policy AmazonSSMAutomationRole to the role of the instance
  • Install the System Manager agent according to the AWS documentation

Creation of System Manager Document

Use the AWS console.

Move to "System Manager"->"Documents". Create a new document with the following content:

{
"schemaVersion" : "2.2",
"description" : "Command Document Example JSON Template",
"mainSteps" : [ {
"action" : "aws:runShellScript",
"name" : "test",
"inputs" : {
"runCommand": [ "wget https://s3.amazonaws.com/aws-data-provider/bin/aws-agent_install.sh;",
"chmod ugo+x aws-agent_install.sh;",
"sudo ./aws-agent_install.sh;",
"curl http://localhost:8888/vhostmd"
],
"workingDirectory":"/tmp",
"timeoutSeconds":"3600",
"executionTimeout":"3600"
}
} ]
}

Save the document with the name SAP-Data-Provider-Installation-Linux.

Command Line Execution of the System Manager Document

New data providers can then be provisioned with the AWS console or the following AWS CLI command:

aws ssm send-command --document-name "SAP-Data-Provider-Installation-Linux" \
   --comment "SAP Data Provider Installation" --targets "Key=instanceids,Values=i-my-instance-id"  \
   --timeout-seconds 600 --max-concurrency "50" \
   --max-errors "0" --region my-region

Replace the variables

  • i-my-instance-id with the instance id
  • my-region with the matching region

 

AWS Quickstarts for SAP

 AWS offers Cloudformation scripts to automate the installation of SAP applications.

Name Manual Launch github sources
SAP HANA Deployment Guide Launch quickstart-sap-hana
Netweaver ABAP Quick Start Reference Deployment

new VPC
existing VPC

quickstart-sap-netweaver-abap

 

Command Line Creation of an AWS Instance for SAP HANA

The bash script shown here allows to create SAP HANA instances from the command line. It uses the AWS CLI. The aws command needs to be in the search path.

Consider to use the AWS Quick Start for HANA deployment. It installs the AWS instance as well the HANA software.

The script below will create an AWS instance only. The script solves a number of issues for administrators:

The limitations

The script requires a file with the name disks.json. This file is currently configured to create a boot disk and 4 gp2 disks with 667GB.

Warning

This script will create AWS resources AWS will charge you for. Be careful using this script. I don't warrant anything

Preconditions

Download

Using the Script

Provide all parameter when calling it in the following format:

./createHANA.sh ami-id pem-name instance-type ip-address cidr security-group "name tag"

Example:

./createHana.sh ami-6b5a5601 myPEM r3.4xlarge 10.79.7.95 10.79.7.0/24 mySecGroup "my-HANA-System95"

*** 1. Prequisites checking
*** 1.1 OK: requested IP address 10.79.7.95 is available
*** 1.2 OK: requested CIDR 10.79.7.0/24 belongs to subnet-6b964f32 in us-east-1c , vpc-2e976742
*** 1.3 Warning: No check whether 10.79.7.95 fits into CIDR 10.79.7.0/24 !
*** 1.4 OK: requested security group mySecGroup is sg-582eec37 (Connect toLab)
*** 1.5 OK: requested PEM myPEM exists.
*** 1.6 OK: AMI ami-6b5a5601 exists (amazon/suse-sles-12-sp1-v20160322-hvm-ssd-x86_64)
Do you want to create this instance? (y/n) Yes
*** 2. About to create the instance
*** 2.1 Created system with Id: i-9dd0e707
*** 2.2 Tagged system with Id: i-9dd0e707 with Name: my-HANA-System95
*** 2.3.1 Tagged all volumes from system with Id: i-9dd0e707 with Name: my-HANA-System95
*** 2.4 The created instance: i-9dd0e707 with Name: my-HANA-System95
RESERVATIONS 752040392274 r-684b74d9
INSTANCES 0 x86_64 None True xen ami-6b5a5601 i-9dd0e707 r3.4xlarge myPEM 2016-06-24T13:47:25.000Z None 10.79.7.95 None /dev/sda1 ebs True None subnet-6b964f32 hvm vpc-2e976742
BLOCKDEVICEMAPPINGS /dev/sda1
EBS 2016-06-24T13:47:26.000Z True attaching vol-8c836528
BLOCKDEVICEMAPPINGS /dev/sdf
EBS 2016-06-24T13:47:26.000Z True attaching vol-0f8365ab
BLOCKDEVICEMAPPINGS /dev/sdg
EBS 2016-06-24T13:47:26.000Z True attaching vol-0e8365aa
BLOCKDEVICEMAPPINGS /dev/sdh
EBS 2016-06-24T13:47:26.000Z True attaching vol-fe83655a
BLOCKDEVICEMAPPINGS /dev/sdi
EBS 2016-06-24T13:47:26.000Z True attaching vol-e983654d
BLOCKDEVICEMAPPINGS /dev/sdj
EBS 2016-06-24T13:47:26.000Z True attaching vol-ac836508
MONITORING pending
NETWORKINTERFACES None 0e:0f:2c:30:0b:b7 eni-522c1700 752040392274 10.79.7.95 True in-use subnet-6b964f32 vpc-2e976742
ATTACHMENT 2016-06-24T13:47:25.000Z eni-attach-c0de2d15 True 0 attaching
GROUPS sg-582eec37 mySecGroup
PRIVATEIPADDRESSES True 10.79.7.95
PLACEMENT us-east-1c None default
SECURITYGROUPS sg-582eec37 mySecGroup
STATE 0 pending
TAGS Name my-HANA-System95

The second option to use the script is the interactive dialog:

./createHANA.sh
Enter AMI name:
ami-6b5a5601
Enter name of security key:
myPEM
Enter instance type:
r3.4xlarge
Enter IP address:
10.79.7.94
Enter CIDR n the format xxx..xxx.xxx.xxx/yy:
10.79.7.0/24
Enter security group name:
mySecGroup
Enter name tags for instance and volumes:
my-HANA-System94
*** 1. Prequisites checking
*** 1.1 OK: requested IP address 10.79.7.94 is available
*** 1.2 OK: requested CIDR 10.79.7.0/24 belongs to subnet-6b964f32 in us-east-1c , vpc-2e976742
*** 1.3 Warning: No check whether 10.79.7.94 fits into CIDR 10.79.7.0/24 !
*** 1.4 OK: requested security group mySecGroup is sg-582eec37 (Connect to Lab)
*** 1.5 OK: requested PEM myPEM exists.
*** 1.6 OK: AMI ami-6b5a5601 exists (amazon/suse-sles-12-sp1-v20160322-hvm-ssd-x86_64)
Do you want to create this instance? (y/n) Yes
*** 2. About to create the instance
*** 2.1 Created system with Id: i-30ddeaaa
*** 2.2 Tagged system with Id: i-30ddeaaa with Name: my-HANA-System94
*** 2.3.1 Tagged all volumes from system with Id: i-30ddeaaa with Name: my-HANA-System94
*** 2.4 The created instance: i-30ddeaaa with Name: my-HANA-System94
RESERVATIONS 752040392274 r-2a49769b
INSTANCES 0 x86_64 None True xen ami-6b5a5601 i-30ddeaaa r3.4xlarge myPEM 2016-06-24T13:53:11.000Z None 10.79.7.94 None /dev/sda1 ebs True None subnet-6b964f32 hvm vpc-2e976742
BLOCKDEVICEMAPPINGS /dev/sda1
EBS 2016-06-24T13:53:12.000Z True attaching vol-5a8d6bfe
BLOCKDEVICEMAPPINGS /dev/sdf
EBS 2016-06-24T13:53:12.000Z True attaching vol-b38a6c17
BLOCKDEVICEMAPPINGS /dev/sdg
EBS 2016-06-24T13:53:12.000Z True attaching vol-b28a6c16
BLOCKDEVICEMAPPINGS /dev/sdh
EBS 2016-06-24T13:53:12.000Z True attaching vol-5b8d6bff
BLOCKDEVICEMAPPINGS /dev/sdi
EBS 2016-06-24T13:53:12.000Z True attaching vol-4e8d6bea
BLOCKDEVICEMAPPINGS /dev/sdj
EBS 2016-06-24T13:53:12.000Z True attaching vol-198d6bbd
MONITORING pending
NETWORKINTERFACES None 0e:cc:0c:19:f4:b1 eni-621c2730 752040392274 10.79.7.94 True in-use subnet-6b964f32 vpc-2e976742
ATTACHMENT 2016-06-24T13:53:11.000Z eni-attach-0ec635db True 0 attaching
GROUPS sg-582eec37 mySecGroup
PRIVATEIPADDRESSES True 10.79.7.94
PLACEMENT us-east-1c None default
SECURITYGROUPS sg-582eec37 mySecGroup
STATE 0 pending
TAGS Name my-HANA-System94

The script createHANA.sh

#!/bin/bash
# version 1.0 June 24, 2016
# This script is using the AWS cli.
# It assumes that the aws command is part of the search path
AMI=$1
PEM=$2
INSTANCETYPE=$3
IP=$4
CIDR=$5
SGNAME=$6
NAMETAG=$7
case $1 in
-h | -help)p
echo "Use this command with the following options:"
echo "$0 -h : to obtain this output"
echo "$0 -help : to obtain this output"
echo "$0 : enter information through a dialog"
echo "$0 ami-id pem-name instance-type ip-address cidr security-group \"name tag\" "
echo "Example:"
echo " ./createHana.sh ami-6b5a5601 myPEM r3.4xlarge 10.79.7.96 10.79.7.0/24 mySecGroup \"my-HANA-System96\""
exit
;;
esac
if [[ -z $AMI ]]; then
echo "Enter AMI name:"
read AMI
fi
if [[ -z $PEM ]]; then
echo "Enter name of security key:"
read PEM
fi
if [[ -z $INSTANCETYPE ]]; then
echo "Enter instance type:"
read INSTANCETYPE
fi
if [[ -z $IP ]]; then
echo "Enter IP address:"
read IP
fi
if [[ -z $CIDR ]]; then
echo "Enter CIDR n the format xxx..xxx.xxx.xxx/yy:"
read CIDR
fi
if [[ -z $SGNAME ]]; then
echo "Enter security group name:"
read SGNAME
fi
if [[ -z $NAMETAG ]]; then
echo "Enter name tags for instance and volumes:"
read NAMETAG
fi


echo "*** 1. Prequisites checking"
EXISTINGIP=$(aws ec2 describe-network-interfaces --filter Name=private-ip-address,Values=$IP | awk -F\t '/PRIVATEIPADDRESSES/ {print $3}' | grep $IP)
if [ $EXISTINGIP ]
then
INSTID=$(aws ec2 describe-network-interfaces --filter Name=private-ip-address,Values=$IP | awk -F\t '/ATTACHMENT/ {print $6}')
echo "*** 1.1 ERROR: requested IP address $IP is already in use by instance $INSTID. Will stop here..."
exit 1
else
echo "*** 1.1 OK: requested IP address $IP is available"
fi
SUBNET=$(aws ec2 describe-subnets --filter Name=cidrBlock,Values=$CIDR | awk -F\t '/SUBNETS/ {print $8}')
AZ=$(aws ec2 describe-subnets --filter Name=cidrBlock,Values=$CIDR | awk -F\t '/SUBNETS/ {print $2}')
VPC=$(aws ec2 describe-subnets --filter Name=cidrBlock,Values=$CIDR | awk -F\t '/SUBNETS/ {print $9}')
if [ $SUBNET ]
then
echo "*** 1.2 OK: requested CIDR $CIDR belongs to $SUBNET in $AZ , $VPC"
else
echo "*** 1.2 ERROR: no subnet found for CIDR $CIDR . Will stop here..."
exit 1
fi
echo "*** 1.3 Warning: No check whether $IP fits into CIDR $CIDR !"
SECURITY=$(aws ec2 describe-security-groups --filters Name=group-name,Values=${SGNAME} | awk -F\t '/SECURITYGROUPS/ {print $3}')
SECURITYTEXT=$(aws ec2 describe-security-groups --filters Name=group-name,Values=${SGNAME} | awk -F\t '/SECURITYGROUPS/ {print $2}')
if [ $SECURITY ]
then
echo "*** 1.4 OK: requested security group $SGNAME is $SECURITY ($SECURITYTEXT)"
else
echo "*** 1.4 ERROR: requested security group $SGNAME not found. Will stop here"
exit 1
fi
PEMRESULT=$(aws ec2 describe-key-pairs --filters Name=key-name,Values=$PEM| awk -F\t '/KEYPAIRS/ {print $3}')
if [ $PEMRESULT ]
then
echo "*** 1.5 OK: requested PEM $PEM exists."
else
echo "*** 1.5 ERROR: requested PEM $PEM not found. Will stop here"
exit 1
fi
AMINAME=$(aws ec2 describe-images --image-ids $AMI | awk -F\t '/IMAGES/ {print $6}')
if [ $AMINAME ]
then
echo "*** 1.6 OK: AMI $AMI exists ($AMINAME)"
else
echo "*** 1.6 ERROR: AMI $AMI does not exist. Will stop here"
exit 1
fi
echo -n "Do you want to create this instance? (y/n) "
old_stty_cfg=$(stty -g)
stty raw -echo ; answer=$(head -c 1) ; stty $old_stty_cfg # Care playing with stty
if echo "$answer" | grep -iq "^y" ;then
echo Yes
else
echo No
exit
fi
echo "*** 2. About to create the instance"
ID=$(aws ec2 run-instances \
--key-name $PEM \
--instance-type $INSTANCETYPE \
--count 1 \
--block-device-mappings file://disks.json \
--image-id $AMI \
--monitoring Enabled=true \
--instance-initiated-shutdown-behavior stop \
--security-group-ids $SECURITY \
--subnet-id $SUBNET \
--private-ip-address $IP \
--ebs-optimized | \
awk '/INSTANCES/ {print $8}' \
)
echo "*** 2.1 Created system with Id: $ID"
aws ec2 create-tags --resources $ID --tags Key=Name,Value=${NAMETAG}
echo "*** 2.2 Tagged system with Id: $ID with Name: $NAMETAG "
#echo "*** 2.3.0 will sleep for 2s before tagging the volumes with $NAMETAG "
#sleep 2
aws ec2 describe-instances --instance-ids $ID | awk '/EBS/ {print "aws ec2 create-tags --resources " $5 " --tags Key=Name,Value='"$NAMETAG"'" }' | bash -
echo "*** 2.3.1 Tagged all volumes from system with Id: $ID with Name: $NAMETAG "
echo "*** 2.4 The created instance: $ID with Name: $NAMETAG "
aws ec2 describe-instances --instance-ids $ID

The file disks.json

This file has to be in the directory in which the script is called

[ 
{"DeviceName":"/dev/sda1",
"Ebs":{"VolumeSize":200,"VolumeType":"gp2",
"DeleteOnTermination":true}},
{"DeviceName":"/dev/sdf",
"Ebs":{"VolumeSize":667,"VolumeType":"gp2",
"DeleteOnTermination":true}},
{"DeviceName":"/dev/sdg",
"Ebs":{"VolumeSize":667,"VolumeType":"gp2",
"DeleteOnTermination":true}},
{"DeviceName":"/dev/sdh","Ebs":{"VolumeSize":667,"VolumeType":"gp2",
"DeleteOnTermination":true}},
{"DeviceName":"/dev/sdi","Ebs":{"VolumeSize":667,"VolumeType":"gp2",
"DeleteOnTermination":true}},
{"DeviceName":"/dev/sdj","Ebs":{"VolumeSize":50,"VolumeType":"gp2",
"DeleteOnTermination":true}}
]

 Feedback

The script is limited. Leave a comment to get in touch with me. I'll be happy to improve the script and integrate a better coding.

Configuring SAProuter (as a service) on Linux

Installing a saprouter on Linux is straight forward.

... at least without using SNC.

SAP Routers can be used to

The playbook for the installation is

Have a routing table file for saprouter

Create a configuration file with the name saprouttab. The simplest one which means: route all ABAP traffic in all directions is a file with the name /usr/sap/saprouter/saprouttab with the content:

P * * *

This means: P(ermit) ALL SOURCE IP/HOSTNAMES to ALL DESTINATION IP/HOSTNAMES using a PORT-RANGE from 3200 – 3299

Create a Policy which grants Access to an S3 Bucket to Download all required Software

Create a policy which looks like the following:

{
"Version": "2012-10-17",
"Statement": [ {
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::bucket-name/bucket-folder/*"
},
{
"Effect": "Allow",
"Action": ["sS:ListBucket","S3:HeadBucket"],
"Resource": "arn:aws:s3:::bucket-name"
}
]

Replace the following variables with you individual settings

Add this policy to a new role.

Attach the role to the instance when you will create it.

Creation of a Service

SLES 12, 15 or Red Hat will need a service to restart the saprouter whenever needed. Create a file saprouter.service:

[Unit]
Description=SAP Router Configuration
After=syslog.target network.target

[Service]
Type=simple
RemainAfterExit=yes
WorkingDirectory=/usr/sap/saprouter
ExecStart=/usr/sap/saprouter/saprouter -r
ExecStop=/usr/sap/saprouter/saprouter -s
KillMode=none
Restart=no

[Install]
WantedBy=multi-user.target

Start the service with the commands:

systemctl daemon-reload
systemctl enable saprouter.service
systemctl start saprouter.service

Create an Installation Script

Create a file install.sh:

#!/usr/bin/env bash
# version 0.2
# December, 2018
## Run script as super user:
# This script needs one parameter, the URL to access the S3 bucket
# with all downloadble files
# Use the notation s3:my-bucket/myfolder
##BUCKET="s3://stefanschneider-saptesting/saprouter"
BUCKET=$1
SAPSAPROUTTAB="saprouttab"
SERVICE="saprouter.service"
ROUTDIR="/usr/sap/saprouter"
echo "*** 1. Create /usr/sap/saprouter"
mkdir -p ${ROUTDIR}/install
echo "*** 2. Download files"
aws s3 sync ${BUCKET} ${ROUTDIR}/install
cd ${ROUTDIR}/install
# All files will become lowe case files
for f in `find`; do mv -v "$f" "`echo $f | tr '[A-Z]' '[a-z]'`"; done
chmod u+x ${ROUTDIR}/install/${SAPCAR}
chmod u+x uninstall.sh
mv uninstall.sh ..
mv ${SERVICE} /etc/systemd/system/${SERVICE}
for f in `find . -name saprouter*.sar`; do mv -v $f saprouter.sar; done
for f in `find . -name sapcryptolib*.sar`; do mv -v $f sapcryptolib.sar; done
for f in `find . -name sapcar*`; do mv -v $f sapcar; done
chmod u+x sapcar
mv saprouttab ..
echo "*** 3. Unpack files"
cd ${ROUTDIR}
./install/sapcar -xf ${ROUTDIR}/install/saprouter.sar
./install/sapcar -xf ${ROUTDIR}/install/sapcryptolib.sar
echo "*** 4. Start service"
systemctl daemon-reload
systemctl enable ${SERVICE}
systemctl start ${SERVICE}
echo "5. Done..."

The file will work if there are three unique files in the download bucket which are the onlyones with names like sapcar*, sapcrypto*.sar and saprouter*.sar. Capitalztion will not matt Update the bucket-name and the bucket-folder variables matching your individual needs.

Create a De-installation Script

Create a file withe the name uninstall.sh:

#!/usr/bin/env bash
# version 0.1
# December, 2018
## Run as super user:
echo "1. Stopping and disabling service"
systemctl stop saprouter.service
systemctl disable saprouter.service
systemctl daemon-reload
echo "2. Removing files"
rm /etc/systemd/system/saprouter.service
rm -rf /usr/sap/saprouter
echo "3. Completed deinstallation"

Files Upload

Upload the following files to the S3 bucket:

There is no need to make this bucket public. The instance will have an IAM profile which entitles the instance to download the files needed.

Create a UserData file on your Adminstration PC

Create a file prep.sh:

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash
BUCKET="s3://bucket-name/bucket-folder"
# take a one scond nap before moving on...
sleep 1
aws s3 cp ${BUCKET}/install.sh /tmp/install.sh
chmod u+x /tmp/install.sh
/tmp/install.sh $BUCKET
--//

Replace bucket-name and bucket-folder with the appropriate values.

This file will get executed when the instance will get created.

Installation of Instance

The following script will launch an instance with an automated saprouter installation. It assumes that

The command is

aws ec2 run-instances --image-id ami-XYZ \
--count 1 --instance-type m5.large \
--key-name aws-key \
--associate-public-ip-address \
--security-group-ids sg-XYZ \
--subnet-id subnet-XYZ \
--iam-instance-profile Name=saprouter-inst \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=PublicSaprouter}]' \
--user-data file://prep.sh

This command will create an instance with

Installation as VPC internal saprouter as a proxy to relay traffic from on-premises users

Omit the parameter --associate-public-ip-address. This parameter creates a public IP address. You don't want this for an internal saprouter.

Installation with the help of an AWS Cloudformation template

Use this template (saprouter.template). It works with SLES 12SP3. Replace the AMIs if you need a higher revision.

  1. Upload the template to an S3 bucket
  2. Upload the SAP installation media and the file saprouttab to a S3 bucket
  3. Execute the file in CloudFormation

Warning: Please check the template upfront. It'll allocate resources in your AWS account. It has the potential to do damage.

More Information

Consult the SAP documentation to configure SNC or more detailed routing entries.

HANA Cheat Sheet

Starting and stopping HANA 

Start HANA instance with hostctrl as root:

/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Start

Stop HANA instance with hostctrl as root:

/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Stop

Start HANA as <sid>adm:

/usr/sap/<SID>/HDB<instance number>/HDB start

Example: /usr/sap/KB1/HDB26/HDB start Stop the SAP HANA system as <sid>adm by entering the following command:

/usr/sap/<SID>/HDB<instance number>/HDB stop

HANA Backups Command Line

Systems with XSA may have multiple tenants which need to get all backed up. Example as add user:

$ hdbsql -u system -d systemdb -i 00 "BACKUP DATA USING FILE ('backup')"
$ hdbsql -u system -d systemdb -i 00 "BACKUP DATA FOR HDB USING FILE ('backup')"

High Availability Solutions for SAP on AWS

The SAP on Amazon Web Services High Availability Guide describes Windows and Linux architectures with failover scenarios.

This page focuses on solutions which can automatically fail over SAP services from one AWS server to another. 

The AWS cloud implements high availability in a different way traditional on premises implementations do:

SAP has a list of certified HA-Interface Partners. AWS is not part of this list since the certified HA-Interface Partners use the AWS platform as supported configurations. The following partners and solutions are known to support the AWS platform:

NEC Express Cluster 3.3

Product: NEC Express Cluster 3.3 (Product landing page)

Failover Services: HANA Scale Up data bases on Red Hat Linux

Licensing: NEC licenses depending on the services

Status: released, supported

The NEC Cluster relies on the SAP HANA system replication. It works across AWS availability zones within a region.

The NEC cluster uses AWS Overlay IP addresses which support a fast failover. The NEC Cluster will not shut down a node which isn't providing anymore the service. It will fail over to the standby node.

More Resources

AWS Specific Configuration Details

Be aware that the NEC cluster will change the network topology. The privileges required for these operations allow to change the AWS network topology in an account. Verify and test all entries very carefully. Limit access to user working on the NEC Express cluster nodes to the required minimum.

Required Routing Entries

The NEC Cluster will typically operate in a single VPC. The cluster nodes are typically located in different availability zones for increased availability. Therefore thew will have their primary IP addresses in different subnets.

The AWS overlay IP addresses are based on a concept which allows to create routing entries which point traffic to an IP address (NEC cluster node). The NEC Express Cluster will change these routing entries when needed. It will however not create the routing entries. The initial creation of the routing entries needs to happen manually. The same routing entry will have to be created in all routing tables of the given VPC. 

The AWS VPC console can be used to add this entry. The AWS Command Line Interface offers the following command as well:

ec2addrt ROUTE_TABLE -r CIDR -i INSTANCE

The user will have to pick an arbitrary AWS instance id from a cluster node as option -i. The NEC Express cluster will then update this entry as needed.

The NEC cluster will only operate in a  correct way if the routing entry in all routing tables of the VPC have been created!

AWS Instance Configuration for Cluster Nodes

The AWS cluster nodes will have to be able to communicate through a second IP address. The document IP Failover with Overlay IP Addresses on this site describe how to disable the source/destination check for AWS instances and how to host a second IP address on the same Linux system.

IAM Policies: NEC-HA-Policy

The cluster nodes will require the following privileges to operate:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1424870324000",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeInstanceAttribute",
"ec2:DescribeTags",
"ec2:DescribeVpcs",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeAvailabilityZones"
],
"Resource": "*"
},
{
"Sid": "Stmt1424860166260",
"Action": [
"ec2:CreateRoute",
"ec2:DeleteRoute",
"ec2:DescribeRouteTables",
"ec2:ReplaceRoute"
],
"Effect": "Allow",
"Resource": "*"
}
]
}

 

Red Hat Pacemaker for SAP Applications

 Red Hat supports the protection of SAP HANA DB starting with Red Hat 7.4 on AWS .

Access to documentation requires a Red Hat customer account with the appropriate entitlement. Please read:

 

Bad Hair Days (with Red Hat Pacemaker)

This page documents known problems with the Red Hat Pacemaker cluster. The problems typically arise from incorrect configurations...

Symptom: Virtual IP Service doesn't start

Problem: A manual start leads to the following problem:

[root@myNode1 ~]# pcs resource debug-start s4h_vip_ascs20 --full
... ...
> stderr: Unknown output type: test
> stderr: WARNING: command failed, rc: 255

Solution: Fix AWS CLI configuration. The output format may be wrong. It has to be text.

[root@myNnode1 ~]# aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [us-east-1]:
Default output format [test]: text

 

SUSE SLES for SAP

Product: SLES for SAP 12 (Product landing page)

Failover Services: HANA Scale Up databases and Netweaver central systems

Licensing: Bring your own SUSE subscription or use the AWS Marketplace SUSE Linux Enterprise Server for SAP Applications 12 SP3 offering.

Status: Full support starting with SLES for SAP 12 SP1

This product relies on the SAP HANA system replication. It will monitor the master and the slave node for health. The Linux Cluster will failover a service IP address to the previous slave node when needed. The fencing agents will then reboot the previouse master node.

See:

More Resources:

 

Trouble Shooting the Configuration

Verification and debugging of the aws-vpc-move-ip Cluster Agent

As root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address= OCF_RESKEY_routing_table= OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=cluster OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip monitor 

Check the console output (DEBUG keyword) for error messages

Stop the overlay IP Address to be hosted on a given Node

As root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address=<virtual_IPv4_address> OCF_RESKEY_routing_table=<AWS_route_table> OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=cluster OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip stop 

Check DEBUG output for errors and verify that the virtual IP address is NOT active on the current node with the command ip a . Start the overlay IP Address to be hosted on a given node as root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address=<virtual_IPv4_address> OCF_RESKEY_routing_table=<AWS_route_table> OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=cluster OCF_ROOT=/usr/lib/ocf/usr/lib/ocf/resource.d/suse/aws-vpc-move-ip start 

Check DEBUG output for error messages and verify that the virtual IP address is active on the current node with the command ip a.

Start the overlay IP Address to be hosted on a given Node

As root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address=<virtual_IPv4_address> OCF_RESKEY_routing_table=<AWS_route_table> OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=<AWS-profile> /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip start

Check DEBUG output for error messages and verify that the virtual IP address is active on the current node with the command ip a.

Testing the Stonith Agent

The Stonith agent will shutdown the other node if he thinks that this node isn't anymore reachable. The agent can be called manually as super user on a cluster node 1 to shut down cluster node 2. Use it with the same parameter as being used in the Stoneith agent configuration:

# stonith -t external/ec2 profile=<AWS-profile> port=<cluster-node2> tag=<aws_tag_containing_hostname> -T off <cluster-node2>

This command will shutdown cluster node 2. Check the errors reported during execution of the command if it's not going to work as planned.
Re-start cluster node 2 and test STONITH the other way around.

The parameter used here are:

  • AWS-profile : The profile which will be used by the AWS CLI. Check the file ~/.aws/config for the matching one. Using the AWS CLI command aws configure list will provide the same information
  • cluster-node2: The name or IP address of the other cluster node
  • aws_tag_containing_hostname: The is the name of the tag of the EC2 instances for the two cluster nodes. We used the name pacemaker in this documentation

Checking Cluster Log Files

Check the file: /var/log/cluster/corosync.log

Useful Commands

As super user:

crm_resource -C Reset warnings showing up in the command crm status
crm configure edit Configure all agents in vi
crm configure property maintenance-mode=true Set Pace Maker in maintenance mode. This allows to reconfigure, start, stop, resync. SAP HANA 
crm configure property maintenance-mode=false Bring Pace Maker from maintenance mode back into controlling, production mode. Allow Pace Maker to explore the current configuration. This can take a few seconds.

SAP HANA related commands (as <SAP>adm user)

hdbcons -e hdbindexserver 'replication info' Check whether HANA is replicating, detailed
hdbnsutil -sr_state  Check whether HANA is replicating. Show the master, slave relationship
 SAPHanaSR-showAttr  Cluster tool which checks the current configuration. Run as super user

 

Bad Hair Days (with SLES for SAP)

Bugs I ran into:

Symptom: Virtual IP Address doesn't get hosted

Manual testing of virtual IP address agent (start option) creates the following output:

INFO: EC2: Moving IP address 192.168.10.22 to this host by adjusting routing table rtb-xxx 
INFO: monitor: check routing table (API call) 
DEBUG: executing command: /usr/bin/aws --profile cluster --output text ec2 describe-route-tables --route-table-ids rtb-xxx 
DEBUG: executing command: ping -W 1 -c 1 192.168.10.22 
WARNING: IP 192.168.10.22 not locally reachable via ping on this system 
INFO: EC2: Adjusting routing table and locally configuring IP address 
DEBUG: executing command: /usr/bin/aws --profile cluster ec2 replace-route --route-table-id rtb-xxx --destination-cidr-block 192.168.10.22/32 --instance-id i-1234567890 
DEBUG: executing command: ip addr delete 192.168.10.22/32 dev eth0 
RTNETLINK answers: Cannot assign requested address 
WARNING: command failed, rc 2 INFO: monitor: check routing table (API call)

The host can't add the IP address to eth0

Problem: SUSE netconfig hasn't been disabled

Solution: Set CLOUD_NETCONFIG_MANAGE='no' in /etc/sysconfig/network/ifcfg-eth0

Symptom: Virtual IP Address gets removed after some minutes

corosyn logs show a line like:

rsc_ip_XXX_XXXX_start_0:17147:stderr [ An error occurred (UnauthorizedOperation) when calling the ReplaceRoute operation: You are not authorized to"

Problem: The instance does not have the right to modifiy routing tables

Solution: The virtual IP address policy has a problem. It may be missing. It may have a typo. Another policy may disallow access to routing tables.

Symptom: Nodes fence each other

The log file shows lines like:

2018-10-11T11:14:06.597541-04:00 my-hostname pengine[1234]: error: Resource rsc_ip_ABC_DEF01 (ocf::aws-vpc-move-ip) is active on 2 nodes attempting recovery
2018-10-11T11:14:06.597766-04:00 my-hostname pengine[1234]: warning: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.

Problem: There is a bug is the aws-vpc-move-ip agent. The monitoring has a glitch. The cluster thinks that both sides host the IP address on eth0 and they fence each other.

Solution: Update the package in question. Contact SUSE if this doesn't work or...

Modify all aws-vpc-move-ip resources in your CIB by adding monapi=true to the parameters of each aws-vpc-move-ip resource.

Symptom: Nodes fence each other

Both nodes shut down. The corosync log looks like:

Jan 07 07:31:17 [4750] my-hostname corosync notice  [TOTEM ] A processor failed, forming new configuration.
Jan 07 07:31:25 [4750] my-hostname corosync notice [TOTEM ] A new membership (w.x.y.z:52) was formed. Members left: 2
Jan 07 07:31:25 [4750] my-hostname corosync notice [TOTEM ] Failed to receive the leave message. failed: 2

Problem: The corosync token didn't arrive for 6 times within 5 seconds. Check whether the communication in between the two servers works as intented or...

Solution: Increase the following corosync parameter:

  • token: from 5000 to 30000
  • consensus: from 7500 to 32000
  • token_retransmits_before_loss_const: from 6 to 10

Decrease these parameters later on as long as the cluster runs stable. These changes have the following impact:

  • The cluster will give up on coroysnc communication after (token) 30 seconds
  • The time out for an individual token gets increased to token/retransmit : 30000ms/10 = 3s
  • The cluster will attempt (token_retransmits_before_loss_const) 10 times to reestablish communication instead of 6 times
  • The consensus parameter has to be larger than the token parameter

This configuration will increase the time for a cluster to recognize the communication failure and take over!

Symptom: Virtual IP Address gets removed after some minutes

corosync logs show a line like:

rsc_ip_XXX_XXXX_start_0:17147:stderr [ An error occurred (UnauthorizedOperation) when calling the ReplaceRoute operation: You are not authorized to"

Problem: The instance does not have the right to modifiy routing tables

Solution: The virtual IP address policy has a problem. It may be missing. It may have a typo. Another policy may disallow access to routing tables.

Symptom: Both nodes shut down after a while

The log file shows lines like:

2018-10-12T08:33:10.477900-04:00 xxx stonith-ng[2199]: warning: fence_legacy[32274] stderr: [ An error occurred (UnauthorizedOperation) when calling the StopInstances operation: You are not authorized to perform this operation. Encoded authorization failure message: Q5Edo8F0xvippgHSKd11QKshu_Hhc3Z8Es_D9O4PYkrLrqY_o6ziaM0JkUrCwadpplJsJreOGxwCTEGd-f68XYc82Dz- HqBZmIrwacTFsYxa0fAQLOA6stHTc2OolBqD-X-HsKZ-bOMjAXs69RT04MRAgNVWJPXeAtq4PHZqN5nne8ocnsshgCt_5xkdjGnxp5VsfzE6o75OUtdHKtblq- 8MokX1ItkZKdohocthhQdQyhGlG8HT1loxdDSuG50LE-kHwGo1slNnZOa-Rw3rPKi0tNzpPvDvlMR3_OXwyC
2018-10-12T08:33:10.478589-04:00 xxx stonith-ng[2199]: error: Operation 'poweroff' [32274] (call 56 from crmd.2205) for host 'haawnulsmqaci' with device 'res_AWS_STONITH' returned: -62 (Timer expired)
2018-10-12T08:33:10.478793-04:00 xxx stonith-ng[2199]: warning: res_AWS_STONITH:32274 [ Performing: stonith -t external/ec2 -T off xxx ]
2018-10-12T08:33:10.478978-04:00 xxx stonith-ng[2199]: error: Operation poweroff of haawnulsmqaci by awnulsmqaci for crmd.2205@awnulsmqaci.98fa9afe: Timer expired
2018-10-12T08:33:10.479151-04:00 xxx crmd[2205]: notice: Stonith operation 56/53:87:0:c76c1861-5fd3-4132-a36c-8f22794a6f1b: Timer expired (-62)
2018-10-12T08:33:10.479340-04:00 xx crmd[2205]: notice: Stonith operation 56 for haawnulsmqaci failed (Timer expired): aborting transition.

Problem: A node can't shut down the other since the stonith policies are missing or not being configured appropriately

Solution: Add the stonith policy as indicated in the installation manual. Make sure that the policy is using the appropriate AWS instance ids. Test them individually!

Symptom: Confusing messages after crm configure commands

Example:

host01:~ # crm configure property maintenance-mode=false
 WARNING: cib-bootstrap-options: unknown attribute 'have-watchdog'
 WARNING: cib-bootstrap-options: unknown attribute 'stonith-enabled'
 WARNING: cib-bootstrap-options: unknown attribute 'placement- strategy'
 WARNING: cib-bootstrap-options: unknown attribute 'maintenance- mode'

Problem:  This is a bug in crmsh. See:  https://github.com/ClusterLabs/crmsh/pull/386 . It shouldn't affect functionality.

Solution: Wait for fix

Symptom: Cluster loses quorum after on node leaves the cluster

Problem: A cluster starts but it breakes the quorum

The corosync-quorum-tools lists the following incorrect status:

# corosync-quorumtool
(...)
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2 --> Quorum
Flags: Quorate

A correctly configured cluster will show the following output:

# corosync-quorumtool
(...)
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 1 --> Quorum
Flags: 2Node Quorate WaitForAll

Solution: Fix typo in corosync configuration.

One line is probably incorrect. It may look like

two_nodes: 1

Remove the plural s and change it to

two_node: 1

 

Checklist for the Installation of SAP Central Systems with SLES HAE

This check list is supposed to help with the installation of SAP HAE for ASCS protection.

The various identifiers will be needed at different stages of the installation. This check list should be complete before the SAP and the SLES HAE installation begins.

Tip: Click on "Generate printer friendly layout" at the bottom of the page before you print this file.

Item Status/Value

SLES subscription and update status

  • All systems have a SLES for SAP subscription
  • All system have been updated to use the lates patch level
 

AWS User Privileges for the installing person

  • Creation of EC2 instances and EBS volumes
  • Creation security groups
  • Creation EFS file systems
  • Modification of AWS routing tables
  • Creation policies and attach them to IAM roles
  • Optional for Route53 agent installation
    • Create and modify A-records in a private hosted zone
  • Potentially needed
    • Creation of subnets and routing tables 
 

VPC

  • VPC Id
  • CIDR range of VPC
 
Subnet id A for systems in first AZ  
Subnet id B for systems in second AZ  
Routing table id for subnet A and B
  • Is this routing table in charge to route both subnets?
    • Is it associated to both subnets?
    • Alternative: Is it associated to VPC?
      • Subnets do not have their own ones
 

Optional:

  • Name of hosted Route 53 zone
  • Name of DHCP option set
    • Verify options!
    • Is option set associated to VPC?
 
AWS Policies Creation
  • Name of Data Provider policy
  • Name of STONITH policy
  • Name of Move IP (Overlay IP) policy
  • Optionally: Name of Route53 policy
 

First cluster node (ASCS and ERS)

  • instance id
  • ENI id
  • IP address
  • hostname
  • instance is associated to subnet A?
  • instance has all 3 or 4 policies attached?
 
Second cluster node (ASCS and ERS)
  • instance id
  • ENI id
  • IP address
  • hostname
  • instance is associated to subnet B?
  • instance has all 3 or 4 policies attached?
 
PAS system
  • instance id
  • ENI id
  • IP address
  • hostname
  • instance is associated to subnet A or B?
  • instance has data provider policy attached?
 
AAS system
  • instance id
  • ENI id
  • IP address
  • hostname
  • instance is associated to subnet A or B
  • instance has data provider policy attached?
 
DB system (is potentially node 1 of a database failover cluster)
  • instance id
  • ENI id
  • IP address
  • hostname
  • instance is associated to subnet A
  • instance has data provider policy attached?
    • a cluster node has 2 to 3 more policies attached
 

 Overlay IP address: service ASCS

  • IP address
  • Has it been added to routing table?
  • Does it point the ENI of first node?
 
Overlay IP address: service ERS
  • IP address
  • Has it been added to routing table?
  • Does it point the ENI of first node?
 
Optional: Overlay IP address DB server
  • IP address
  • Has it been added to routing table?
  • Does it point th ENI of first node?
 

 Optional: Route 53 configuration

  • The Route 53 private hosted zone has an A record with
    • the name of the ASCS system
    • the IP address of the first cluster node
 

 Creation of EFS filesytem

  • DNS name of EFS filesystem
 

 All instance have Internet access

  • Check routing tables
  • Alternative: Add http proxies for data providers and cluster software
 

 

Open Source Agents being used by SLES-for-SAP

 SUSE is a dedicated Open Source provider. SUSE tends to uses agents being published Upstream in the ClusterLabs Open Source project.

The Open Source agents being published via SLES-for-SAP are the only ones with SUSE support. Customers have evergrowing requirements. SUSE and AWS work on improving the agents.

This page lists the ClusterLabs agents as well as experimental agents without support.

Current ClusterLabs agent
Name location in SLES file system Github sources as of Github commit Comment Shortcomings
STONITH agent /usr/lib64/stonith/plugins/external/ec2 ec2 34a217f on ~ Aug 6, 2018

Stops and monitors EC2 instances.

This version is filtering the EC2 commands which has the following advantages

  • no problems with Unicode EC2 tags
  • smaller result sets, faster
  • viewer problems with EC2 CLI response syntx
  • doesn't contribute to EC2 call API limit

Cosmetic:

The --text option in AWS CLI command is missing. This would lower the risk of configuration errors with the AWS profile

SUSE Bug 1106700: - AWS: ec2 agent has fixes implemented upstream

Move Overlay IP /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip aws-vpc-move-ip  7ac4653Sept. 4, 2018 Reassign an AWS Overlay IP address in a routing table

Heads up:

This agent is not compatible to the proprietary agent from SUSE. SUSE uses a parameter with the name address. The upstream version uses the parameter name ip.

I haven't yet been able to make this agent work in a SUSE cluster :-(

Bug 1106707 - AWS: aws-vpc-move-ip agent needs maintenance

Pull request for multi routing table support

Route 53 /usr/lib/ocf/resource.d/heartbeat/aws-vpc-route53 aws-vpc-route53.in  7632a85 ~August 6, 2018 Update a record in an AWS Route 53 hosted zone (DNS server)

calls of ec2metadata will fail if the AWS user data contains strings like "local-ipv4". This can happen in specific AWS Quickstart implementations

Bug 1106706 - AWS: Route 53 agent has fixes implemented upstream

There is an ongoing discussion about updating the agents. Here are some experimental agents without any SUSE support.

Experimental ClusterLabs agent
Name location in SLES file system Github sources as of Github commit Comment Shortcomings
Move Overlay IP /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip ...soon here... . Reassign an AWS Overlay IP address in a routing table New monitoring doesn't work when a cluster node rejoins a cluster. Use the old monitoring mode by adding the parametermonapi="true" to the primitive. Monitoring function got updated. New mode works. No parameter needed
Route 53 /usr/lib/ocf/resource.d/heartbeat/aws-vpc-route53 aws-vpc-route53 319ba06 on 2 Jul, 2018 Update a record in an AWS Route 53 hosted zone (DNS server) calls of ec2metadata will fail if the AWS user data contains strings like "local-ipv4". This can happen in specific AWS Quickstart implementations. The implementation ofec2metadata has been replaced with a more specific implementation

 

SLES HAE Cluster Tests with Netweaver on AWS

 This is an example of tests to be performed with a SLES HAE HANA cluster.

Anyone will want to execute these tests before going into production.

No. Topic Expected behavior
1.0 Set a node on standby/offline
Set a node on standby by means of Pacemaker Cluster Tools (“crm node standby”).
 
The cluster stops all managed resources on the standby node (master resources will be migrated / slave resources will just stop)
1.1 Set <nodenameA> to standby.
 
Time until all managed resources were stopped / migrated to the other node: XX sec
1.2 Set <nodenameB> to standby Time until all managed resources were stopped / migrated to the other node: XX sec
2.0 Switch off cluster node A
Power-off the EC2 instance (hard / instant stop of the VM).
 
The cluster notices that a member node is down. The remaining node makes a STONITH attempt to verify that the lost member is really offline. If STONITH is confirmed the remaining node takes over all resources.
2.1 Failover time of ASCS / HANA primary XXX sec.
3 Switch off cluster node B
Power-off the EC2 instance (hard / instant stop of the VM).
 
The cluster notices that a member node is down. The remaining node makes a STONITH attempt to verify that the lost member is really offline. If STONITH is confirmed the remaining node takes over all resources.
3.1 Failover time of ASCS / HANA primary XXX sec.
4 un-plug network connection (Split Brain)
The cluster communication over the network is down.
 

Both nodes detect the split brain scenario and try to fence each other (using the AWS STONITH agent). One node shuts down – the other will take over all resources

Failovertime: XXX sec

5

Failure (crash) of ASCS instance
The processes of the SAP instance are killed via OS command:

ps -ef | grep ASCS | awk ‘{print $2}’ | xargs kill -9

The cluster notices the problem and promotes the ERS instance to ASCS while keeping all locks from the ENQ replication table.

ASCS Failover time: XXX sec

6

Failure of ERS instance
The processes of the SAP instance are killed via OS command:

ps –ef | grep ERS | awk ‘{print $2}’ | xargs kill -9
 

The cluster notices the problem and restarts the ERS instance.

 

Time until ERS got restarted on same node: XX sec
 

7 Failure of HANA primary
 
Time until HANA DB is available again: XXX sec
8 Failure of corosync
Kill corosync cluster deamon “kill -9 “ on one node.
 

The node without corosync is fenced by the remaining node (since it appears down). The remaining node makes a STONITH attempt to verify that the lost member is really offline. If STONITH is confirmed the remaining node takes over all resources.

Failover of all managed resources: xxx sec

Keep logfiles of all relevant resources to prove functionality. For instance after ASCS failover keep a copy of /usr/<SID>/ASCS<nr>/work/dev_enqserver. This logfile should list that an ENQ replication table was found in memory and that all locks got copied into the new ENQ table. Customers may request to aquire ENQ locks before the failover test and then check the status of those locks after successful failover (please document with screenshots of SM12 on both nodes before and after failover).

Keep corosync / cluster log of all actions taken during failover tests.

Ask customer for additional failover tests / requirements / scenarios he would like to cover.

Have customer sign the protocol (!) acknowledging that all tested failover scenarios worked as expected.

Remind customer to regularly re-test all failover scenarios if SAP / OS / cluster configuration changed or patches were applied.
 

Testing SLES clusters with SAP HANA Database

The following three tests should be done before a HANA DB cluster is taken into production.
The tests will use all configured components.

Primary HANA servers becomes unavailable

Simulated Failures

  • Instance failures. The primary HANA instance is crashed or not anymore reachable through the network
  • Availability zone failure.

Components getting tested

  • EC2 stoneith agent
  • HANA agent
  • Overlay IP agent
  • Optional: Route 53 agent if it is configured

Approach

  • Have a correctly working HANA DB cluster
  • Shutdown eth0 on the instance to isolate
  • The cluster will shutdown the node
  • The cluster will failover the HANA database
  • The cluster will not restart the failed node

Intial Configuration

Check whether the overlay IP address gets hosted on the interface eth0 on the first node:

hana01:/var/log # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.10.21/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::ca:c9ff:feca:a652/64 scope link
valid_lft forever preferred_lft forever

Check the cluster status as super user with the command crm status:

hana01:/var/log # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 12:37:53 2018
Last change: Tue Sep 11 12:37:53 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured

Online: [ hana01 hana02 ]

Full list of resources:

res_AWS_STONITH (stonith:external/ec2): Started hana01
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started hana01
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana01 hana02 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana01 ]
Slaves: [ hana02 ]

The AWS console shows that both nodes are running:

Screenshot two running nodes

Damage the Instance

There are two ways to "damage" an instance

Corrupt Kernel

Become super user on the master HANA node.

Issue the command:

echo 'b' > /proc/sysrq-trigger

Isolate Instance

Become super user on the master HANA node.

Issue the command:

$ ifdown eth0

The current session will now hang. The system will not be able to communicate with the network anymore.

SUSE has a recommendation to do the isolation with firewalls and IP tables.

Monitor Fail Over

Expect the following in a correct working cluster:

  • The second node will fence the first node. This means it will force a shutdown through AWS CLI commands
  • The first node will be stopped
  • The second node will take over the Overlay IP address and it will host the Hana database.

The cluster will now switch the master node and the slave node. 

Monitor progress from the healthy node!

The first node gets reported being offline:

hana02:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 13:18:21 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 1537362888 offline logreplay hana02 WDF sync hana01
hana02 PROMOTED 1537363101 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

hana02:/home/ec2-user # crm_mon -1rfn

Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 13:18:52 2018
Last change: Wed Sep 19 13:18:21 2018 by root via crm_attribute on hana02

2 nodes configured
6 resources configured

Node hana01: OFFLINE
Node hana02: online
rsc_SAPHana_HDB_HDB00 (ocf::suse:SAPHana): Slave
rsc_SAPHanaTopology_HDB_HDB00 (ocf::suse:SAPHanaTopology): Started
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started

Inactive resources:

res_AWS_STONITH (stonith:external/ec2): Stopped
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana02 ]
Stopped: [ hana01 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Slaves: [ hana02 ]
Stopped: [ hana01 ]

Migration Summary:
* Node hana02:
res_AWS_STONITH: migration-threshold=5000 fail-count=1 last-failure='Wed Sep 19 13:18:00 2018'

Failed Actions:
* res_AWS_STONITH_monitor_120000 on hana02 'unknown error' (1): call=-1, status=Timed Out, exitreason='none',
last-rc-change='Wed Sep 19 13:18:00 2018', queued=0ms, exec=0ms

The AWS console will now show that the second node has been fencing the first node. It gets shut down:

Screenshot node gets shut won

The second node will wait until the first node is shut down. The AWS console will look like:

 First node being shut down

The cluster will now promote the instance on the second node to be the primary instance:

hana02:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 13:19:14 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 1537362888 offline logreplay hana02 WDF sync hana01
hana02 PROMOTED 1537363154 online logreplay hana01 4:P:master1:master:worker:master 100 ROT sync PRIM 2.00.030.00.1522209842 hana02

The cluster status will be the following:

hana02:/home/ec2-user #  crm_mon -1rfn
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 13:19:16 2018
Last change: Wed Sep 19 13:19:14 2018 by root via crm_attribute on hana02

2 nodes configured
6 resources configured

Node hana01: OFFLINE
Node hana02: online
rsc_SAPHana_HDB_HDB00 (ocf::suse:SAPHana): Master
res_AWS_STONITH (stonith:external/ec2): Started
rsc_SAPHanaTopology_HDB_HDB00 (ocf::suse:SAPHanaTopology): Started
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started

Inactive resources:

Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana02 ]
Stopped: [ hana01 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana02 ]
Stopped: [ hana01 ]

Migration Summary:
* Node hana02:
res_AWS_STONITH: migration-threshold=5000 fail-count=1 last-failure='Wed Sep 19 13:18:00 2018'

Failed Actions:
* res_AWS_STONITH_monitor_120000 on hana02 'unknown error' (1): call=-1, status=Timed Out, exitreason='none',
last-rc-change='Wed Sep 19 13:18:00 2018', queued=0ms, exec=0ms

Check whether the overlay IP address gets hosted on the eth0 interface of the second node. Example:

hana02:/tmp # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 06:4f:41:53:ff:76 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.129/24 brd 10.0.2.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.10.21/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::44f:41ff:fe53:ff76/64 scope link
valid_lft forever preferred_lft forever

Last step: Clean up the message on the second node:

hana02:/home/ec2-user # crm resource cleanup res_AWS_STONITH hana02
Cleaning up res_AWS_STONITH on hana02, removing fail-count-res_AWS_STONITH
Waiting for 1 replies from the CRMd. OK
hana02:/home/ec2-user # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 13:20:44 2018
Last change: Wed Sep 19 13:20:34 2018 by hacluster via crmd on hana02

2 nodes configured
6 resources configured

Online: [ hana02 ]
OFFLINE: [ hana01 ]

Full list of resources:

res_AWS_STONITH (stonith:external/ec2): Started hana02
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started hana02
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana02 ]
Stopped: [ hana01 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana02 ]
Stopped: [ hana01 ]

Recovering the Cluster

Restart your stopped node. See:

Starting first node

Check whether the cluster services get started

Check whether the first node becomes a replicating server

See:

hana02:/home/ec2-user # SAPHanaSR-showAttr;
Global cib-time
--------------------------------
global Wed Sep 19 13:57:41 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 DEMOTED 30 online logreplay hana02 4:S:master1:master:worker:master 100 WDF sync SOK 2.00.030.00.1522209842 hana01
hana02 PROMOTED 1537365461 online logreplay hana01 4:P:master1:master:worker:master 150 ROT sync PRIM 2.00.030.00.1522209842 hana02

 

Secondary HANA server becomes unavailable

Simulated Failures

  • Instance failures. The secondary HANA instance is crashed or not anymore reachable through the network
  • Availability zone failure.

Components getting tested

  • EC2 stoneith agent
  • HANA agent
  • Overlay IP agent
  • Optional: Route 53 agent if it is configured

Approach

  • Have a correctly working HANA DB cluster
  • Shutdown eth0 on the secondary server to isolate the server
  • The cluster will shutdown the the secondary node
  • The cluster will keep the primary node running without replication
  • The cluster will not restart the failed node

Intial Configuration

Check whether the overlay IP address gets hosted on the interface eth0 on the first node:

hana01:/var/log # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.10.21/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::ca:c9ff:feca:a652/64 scope link
valid_lft forever preferred_lft forever

Check the cluster status as super user with the command crm status:

hana01:/var/log # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 12:37:53 2018
Last change: Tue Sep 11 12:37:53 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured

Online: [ hana01 hana02 ]

Full list of resources:

res_AWS_STONITH (stonith:external/ec2): Started hana01
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started hana01
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana01 hana02 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana01 ]
Slaves: [ hana02 ]

Status of HANA replication:

hana01:/home/ec2-user # SAPHanaSR-showAttr

Global cib-time
--------------------------------
global Wed Sep 19 14:23:11 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537366980 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

The AWS console shows that both nodes are running:

Screenshot two running nodes

Damage the Instance

There are two ways to "damage" an instance

Corrupt Kernel

Become super user on the secondary HANA node.

Issue the command:

echo 'b' > /proc/sysrq-trigger

Isolate secondary Instance

Become super user on the secondary HANA node.

Issue the command:

$ ifdown eth0

The current session will now hang. The system will not be able to communicate with the network anymore.

SUSE has a recommendation to do the isolation with firewalls and IP tables.

Monitor Fail Over

Expect the following in a correct working cluster:

  • The first node will fence the second node. This means it will force a shutdown through AWS CLI commands

  • The second node will be stopped

  • The first node will remain the master node of the HANA database.

  • There is no more replication!

Monitor progress from the master node!

The first node gets reported being offline:

hana01:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time                
--------------------------------
global Wed Sep 19 14:24:13 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537367044 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 offline logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

 

The cluster will figure out that the secondary node is in an unclean state

hana01:/home/ec2-user # crm_mon -1rfn
Stack: corosync
Current DC: hana01 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 14:24:26 2018
Last change: Wed Sep 19 14:24:13 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured

Node hana01: online
rsc_SAPHana_HDB_HDB00 (ocf::suse:SAPHana): Master
res_AWS_STONITH (stonith:external/ec2): Started
rsc_SAPHanaTopology_HDB_HDB00 (ocf::suse:SAPHanaTopology): Started
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started
Node hana02: UNCLEAN (offline)
res_AWS_STONITH (stonith:external/ec2): Started
rsc_SAPHanaTopology_HDB_HDB00 (ocf::suse:SAPHanaTopology): Started
rsc_SAPHana_HDB_HDB00 (ocf::suse:SAPHana): Slave

Inactive resources:
Migration Summary:
* Node hana01:

The AWS console will now show that the master node has been fencing the secondary node node. It gets shut down:

Screenshot node gets shut down

The master node will wait until the secondary  node is shut down. The AWS console will look like:

 Secondary node being shut down

The cluster will now reconfigure it HANA configuration. The cluster knows that the node is offline and replication has been stopped:

hana01:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time                
--------------------------------
global Wed Sep 19 14:24:13 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537367044 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 30 offline logreplay hana01 ROT sync hana02

The cluster status is the following:

hana01:/home/ec2-user # crm_mon -1rfn
Stack: corosync
Current DC: hana01 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 14:27:05 2018
Last change: Wed Sep 19 14:24:13 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured

Node hana01: online
rsc_SAPHana_HDB_HDB00 (ocf::suse:SAPHana): Master
res_AWS_STONITH (stonith:external/ec2): Started
rsc_SAPHanaTopology_HDB_HDB00 (ocf::suse:SAPHanaTopology): Started
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started
Node hana02: OFFLINE

Inactive resources:

Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana01 ]
Stopped: [ hana02 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana01 ]
Stopped: [ hana02 ]

Migration Summary:
* Node hana01:
res_AWS_STONITH: migration-threshold=5000 fail-count=1 last-failure='Wed Sep 19 14:26:17 2018'

Failed Actions:
* res_AWS_STONITH_monitor_120000 on hana01 'unknown error' (1): call=-1, status=Timed Out, exitreason='none',
last-rc-change='Wed Sep 19 14:26:17 2018', queued=0ms, exec=0ms

Check whether the overlay IP address gets hosted on the eth0 interface of the master node. Example:

hana01:/home/ec2-user # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca:c9ff:feca:a652/64 scope link
       valid_lft forever preferred_lft forever

Recovering the Cluster

  • Restart your stopped node.

  • Check whether the cluster services get started

  • Check whether the first node becomes a replicating server

See:

hana01:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time                
--------------------------------
global Wed Sep 19 14:59:15 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537369155 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

 

 

Takeover a HANA DB through killing the Database

Simulated Failures

  • Database failures. The database is not working as expected

Components getting tested

  • HANA agent
  • Overlay IP agent
  • Optional: Route 53 agent if it is configured

Approach

  • Have a correctly working HANA DB cluster
  • Kill database
  • The cluster will failover the database without fencing the node

Intial Configuration

Check whether the overlay IP address gets hosted on the interface eth0 on the first node:

hana01:/var/log # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.10.21/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::ca:c9ff:feca:a652/64 scope link
valid_lft forever preferred_lft forever

Check the cluster status as super user with the command crm status:

hana01:/var/log # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 12:37:53 2018
Last change: Tue Sep 11 12:37:53 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured

Online: [ hana01 hana02 ]

Full list of resources:

res_AWS_STONITH (stonith:external/ec2): Started hana01
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started hana01
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana01 hana02 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana01 ]
Slaves: [ hana02 ]

Kill Database

hana01 is the node with the leading HANA database.

The failover will only work if the re-syncing of the slave node is completed. Check this through the command . Example: 

hana02:/tmp # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Tue Sep 11 09:11:16 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1536657075 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

 

The synchronisation state (colum sync_state) of the slave node has to be SOK.

Become HANA DB user and execute the following command:

hdbadm@hana01:/usr/sap/HDB/HDB00> HDB kill
killing HDB processes:
kill -9 462 /usr/sap/HDB/HDB00/hana01/trace/hdb.sapHDB_HDB00 -d -nw -f /usr/sap/HDB/HDB00/hana01/daemon.ini pf=/usr/sap/HDB/SYS/profile/HDB_HDB00_hana01
kill -9 599 hdbnameserver
kill -9 826 hdbcompileserver
kill -9 828 hdbpreprocessor
kill -9 1036 hdbindexserver -port 30003
kill -9 1038 hdbxsengine -port 30007
kill -9 1372 hdbwebdispatcher
kill orphan HDB processes:
kill -9 599 [hdbnameserver] <defunct>
kill -9 1036 [hdbindexserver] <defunct>

Monitoring Fail Over

The cluster will now switch the master node and the slave node. The failover will be completed when the HANA database on the first node has been synchronized as well

hana02:/tmp # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Tue Sep 11 09:20:38 2018


Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
---------------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 DEMOTED 30 online logreplay hana02 4:S:master1:master:worker:master -INFINITY WDF sync SOK 2.00.030.00.1522209842 hana01
hana02 PROMOTED 1536657638 online logreplay hana01 4:P:master1:master:worker:master 150 ROT sync PRIM 2.00.030.00.1522209842 hana02

Check the cluster status as super user with the command cluster status. Example

hana02:/tmp # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 09:28:10 2018
Last change: Tue Sep 11 09:28:06 2018 by root via crm_attribute on hana02

2 nodes configured
6 resources configured

Online: [ hana01 hana02 ]

Full list of resources:

res_AWS_STONITH (stonith:external/ec2): Started hana01
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started hana02
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana01 hana02 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Masters: [ hana02 ]
Slaves: [ hana01 ]

Failed Actions:
* rsc_SAPHana_HDB_HDB00_monitor_61000 on hana01 'not running' (7): call=273, status=complete, exitreason='none',
last-rc-change='Tue Sep 11 09:18:47 2018', queued=0ms, exec=1867ms
* res_AWS_IP_monitor_60000 on hana01 'not running' (7): call=264, status=complete, exitreason='none',
last-rc-change='Tue Sep 11 08:57:15 2018', queued=0ms, exec=0ms

All resources are started. The overlay IP addres is now hosted on the second node. Delete the failed actions with the command:

hana02:/tmp # crm resource cleanup rsc_SAPHana_HDB_HDB00
Cleaning up rsc_SAPHana_HDB_HDB00:0 on hana01, removing fail-count-rsc_SAPHana_HDB_HDB00
Cleaning up rsc_SAPHana_HDB_HDB00:0 on hana02, removing fail-count-rsc_SAPHana_HDB_HDB00
Waiting for 2 replies from the CRMd.. OK
hana02:/tmp # crm resource cleanup res_AWS_IP
Cleaning up res_AWS_IP on hana01, removing fail-count-res_AWS_IP
Cleaning up res_AWS_IP on hana02, removing fail-count-res_AWS_IP
Waiting for 2 replies from the CRMd.. OK

The crm status command will not show anymore the failures.

Check whether the overlay IP address gets hosted on the eth0 interface of the second node. Example:

hana02:/tmp # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether 06:4f:41:53:ff:76 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.129/24 brd 10.0.2.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.10.21/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::44f:41ff:fe53:ff76/64 scope link
valid_lft forever preferred_lft forever

SteelEye Protection Suite for Linux 8 & 9

Resources

 

RHEL related Topics for SAP Installations on AWS

Change Hostname on RHEL 7.x for SAP Installations on AWS

SAP systems require hostnames which aren't longer than 13 characters. The default AWS naming schema is to use the IP address separated with dashes to create hostnames. This naming schema can lead to host names with are to long for SAP installations.

The fix is based on the following assumptions:

  • The SAP system is being operated in a VPC with it's network interface
  • The IP address is a private one.
  • No DNS or NIS naming in the clients have to be used

The following procedure renames a system to node1

RHEL 7.x

  1. Change the content of file /etc/HOSTNAME to node1. This entry will be used to set the host name for future reboots
  2. Edit the file /etc/cloud/cloud.cfg
    1. Add the line preserve_hostname: true at the beginning. This entry will be used in the next reboot to determine whether the hostname should be left as it is.
  3. Edit the file /etc/hosts
    1. Add node1 to the primary IP address like in this example "10.79.7.92 ip-10-79-7-92 node1"
  4. Set the host name with the command "$ hostname node1". This command performs a dynamic change. It's effect will not last beyond a reboot.

 

SAP Cloud Appliance Library

SAP has a rich collection of preconfigured SAP systems to be tun in the Amazon Web Services (AWS) cloud. This collection is called the SAP Cloud Appliance Library.

SAP Notes related to Amazon Web Services (AWS)

Readers will need the appropriate SAP authorizations to access the pages in the SAP support system

SAP Notes
SAP Note Title last known update Comment
500235 Network Diagnosis with NIPING April 08, 2014 Checking latencies in between AWS Als...
560499 Customer Interaction Center: Hotline Numbers & E-mail Addresses June 12, 2017 How to open SAP tickets...
1380654 SAP support in public cloud environments Dec. 6, 2012 Provides a general introduction to cloud and cloud service categories. Lists AWS as only supported IAAS provider (as of Apr. 9, 2014)
1588667 SAP on AWS: Overview of related SAP Notes and Web-Links July 30, 2015 How to pick the right Linux AMIs
1618572 Linux: Support Statement for RHEL on Amazon Web Services Jan. 10, 2014  
1618590 Support: Oracle database on Amazon Web Services Jan. 10, 2014 Oracle support for productive and non productive SAP systems on AWS platform
1656099 SAP Applications on AWS: Supported DB/OS and AWS EC2 products June 17, 2016 Supported EC2 instances, databases, SAP products
1656250 SAP on AWS: Support prerequisites Aug. 7, 2014 Explains license requirements, support contracts required, AWS specific data collector required .
1697114 Determining hardware ID in Amazon clouds Mar. 26, 2012  
1758890 SAP HANA: Information needed by Product/ Development Support June 20, 2014 Information needed to open an incident at SAP
1788665 SAP HANA Support for virtualized / partitioned (multi-tenant) environments May 5, 2015  
1798212 Support for SAP HANA One Dec. 12, 2012  Explains special peer community support mode for this product 
1838364 Performance and CPU Affinity Mar. 10, 2014  Explains how to map CPUs for best performance
1964437 SAP HANA on AWS: Supported AWS EC2 products Jun. 17, 2016  This note is basically retired. It points to the SAP HANA Hardware Directory
2058870 SAP Business One, version for SAP HANA on public Infrastructure-as-a-Service (IaaS) platforms Aug. 26, 2014  Explains that B1 is supported on AWS EC2
2198693 Key Monitoring Metrics for SAP on Amazon Web Services (AWS) July 29, 2015 Details the AWS specific metrics gathered for EC2 systems running SAP

2240028

SAP Host Agent Patches specific to Linux Jan., 19, 2018 Documents the SAP host agent which had a problem without AWS data provider

2288345

EIM Applications on Amazon Web Services (AWS) March, 3rd, 2016 DS, Data Services Support

2302728

Supported scenarios with NEC Expresscluster on Amazon Web Services August, 24th, 2016  

2309342

SUSE Linux Enterprise High Availability Extension auf AWS June, 29th, 2016 All AWS specific information to setup a SUSE HAE Cluster for HANA  

2358420

Oracle Database Support for Amazon Web Services EC2 Aug., 23th, 2016 All AWS specific information to setup Oracle RDBMS on Oracle Linux  

2449062

Error getting the hardware key on Amazon AWS server March 29, 2017 .  

2646715

SAP GUI Terminal Virtualization with Amazon AppStream 2.0 June first, 2018 .  

2772496

AWS File Systems EFS and FSx for SAP Solutions March 2019 .  

HANA Sizing, Limits, Operations and Patches

SAP Notes
SAP Note Title last known update Comment
  SAP Quicksizer    
1514966 General HANA Sizing May 7, 2014 General HANA Sizing
2382421 Optimizing the Network Configuration on HANA- and OS-Level September 12, 2017 HANA Tuning
1514967 SAP HANA: Central Note Jan 22, 2016 Recommendation for 10GB interface etc.
1651055 Scheduling SAP HANA Database Backups in Linux Nov. 27, 2014  
1736976 Sizing Report for BW on HANA May 5, 2014 The note details the requirements for existing SAP BW users who want to migrate to HANA
1781986 Business Suite on SAP HANA Scale Out Dec. 12, 2013 .
1793345 Sizing for SAP Suite on HANA Apr. 7, 2015 .
1825774 SAP Business Suite Powered by SAP HANA - Multi Node Support Feb. 27, 2014 .
1840954 Alerts related to HANA memory consumption Feb 12, 2014 .
1872170 Suite on HANA Memory Sizing Report June 6, 2013 Determine your memory space requirements on a HANA system
1963779 HANA row store limits Aug. 14, 2014 Maximum limits depending on service pack
1984422 SAP HANA: Analysis of Out-of-memory (OOM) Dumps May 5, 2015 .
2057595 FAQ: SAP HANA High Availability January 2nd, 2017 .
2001528 Linux: SAP HANA Database SPS 08 revision 80 (or higher) on RHEL 6 or SLES 11 July 6, 2014 Details the glib C++ package update which is required
2205917 SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12 May 5, 2016 .
2235581 SAP HANA: Supported Operating Systems Oct. 23, 2017 .
2205917 SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12 May 5, 2016 .
2455582 Linux: Running SAP applications compiled with GCC 6.x Apr. 4, 2018 .

 

General Purpose SAP Notes

SAP Notes
SAP Note Title last known update Comment
212876 SAPCAR, The SAP archiving tools April 4, 2011 The note explains where to find the tool which allows to decompress all SAP downloads
1275776 Linux: Preparing SLES for SAP environments Nov. 26, 2013 All SLES related system settings
1825774 SAP Business Suite Powered by SAP HANA - Multi-Node Support Feb. 28, 2013 The note explains the support status of scale out configurations for SAP HANA Business Suite Solutions

 

SAP related AWS technical White Papers

Amazon Web Services has a SAP micro site from which they reference as well the SAP related publications.

SAP Product White Paper Last Update Size Summary
HANA Setting up AWS Resources and the SLES Operating System for SAP HANA Installation March 2015 36 pages Set up guide for the SAP HANA. The documents discusses all aspects of a SAP HANA installation on SLES like security, network and disk related requirements.
HANA SAP HANA on the Amazon Web Services Cloud:
Quick Start Reference Deployment
July 2014 27 pages This document documents the fully automated installation of scale up or scale out HANA systems on AWS.
HANA SAP HANA on AWS Implementation and Operations Guide Feb. 2014 38 pages The document discusses all aspects of operating the SAP HANA database on AWS. It covers aspects like backup, support, security, administration, architecture and  high availability.
General Implementing SAP Solutions on
Amazon Web Services
April 2013 28 pages This documents covers: planning of installations, licensing, AWS architecture, EC2 instance types for SAP, sizing and performance
General SAP on AWS Operations Guide Feb. 2013 19 pages Discussion of AWS specific SAP topics like image cloning, SAP patching, trouble shooting, on premises printing, system copies etc.
General SAP on Amazon Web Services High Availability Guide Dec. 2014 29 pages Discussion of Windows and Linux related AWS architectures and implementations for SAP applications
General SAP on Amazon Web Services Backup and Recovery Guide Dec. 2014 20 pages Discussion of backup and recovery for production and non production systems. Covers the relevant operating systems and database products
General AWS Data Provider for SAP March. 2015 28 pages Setup and installation guide for the AWS SAP Data Provider which is required to gather AWS specific system information for the SAP monitoting utilities 
General VMS: TCO Study for SAP on AWS Feb. 2013 27 pages AWS references this document. The document got published by the VM AG
B1 SAP Business One version for SAP HANA on AWS Cloud Reference Sheet April 2015 2 pages Documents the key benefits of using B1 on AWS including sizing information for AWS
B1 SAP Business One, version for SAP HANA, on the AWS Cloud: Deployment Guide Sept. 2014 15 pages Document outlines step by step the deployment steps of B1 on AWS

Non AWS Publications

SAP Product White Paper Last Update Summary
SA HANA Developer Edition How to create a SAP HANA Developer Edition in the cloud June 2014 Covers setup information for AWS and other cloud services
HANA SAP HANA on AWS Certified Feb. 2014 SAP blog entry about the support of SAP HANA on AWS
General SAP on Amazon Web Services (AWS) March 2015 SAP SCN article with supported SAP products on AWS
Netweaver 7.3  SAP Netweaver 7.3 on
Amazon Cloud
(RedHat 6 Install)
July 2013 31 one pages step by step installation guide from Thusjanthan Kubendranathan

 

SUSE SLES related Topics

A number of tidbits needed when working with SUSE SLES.

Disclaimer:

Consult the appropriate documentation before you apply them and understand the implications.

Other interesting topics

 

yast bug in SLES for SAP 12 SP1 with AWS Elastic File System (EFS)

 There is a bug in the SLE command line installation tool yast which may effect SAP customers using SLES for SAP 12 SP1 (suse-sles-sap-12-sp1-byos-v20160308-hvm-ssd-x86_64, ami-4a8fb520) on AWS in conjunction with the Elastic File System (EFS).

The Architecture

A customer uses EFS for shared SAP file systems like /sapmnt or /usr/sap. An AWS system may look before the installation of the SAP software with the command df -k as follows:

nw11:~ # df -k
Filesystem    1K-blocks        Used    Available        Use% Mounted on
/dev/hda1     103078876        3286120 95477452         4% /
devtmpfs      8222944          8       8222936          1% /dev
tmpfs         12347764         0       12347764         0% /dev/shm
tmpfs         8231840          9716    8222124          1% /run
tmpfs         8231840          0       8231840          0% /sys/fs/cgroup
10.79.8.181:/ 9007199254740992 0       9007199254740992 0% /usr/sap/SI1
10.79.8.15:/  9007199254740992 0       9007199254740992 0% /sapmnt/SI1

SLES reports two file systems which have 8 Exabyte free space where as nothing is getting used.

The Bug

I have been calling yast from the command line to install an X11 environment for the upcoming SAP Netweaver installation.

yast seems to be overhelmed by the capacity of 8 Exabyte in these two additional file systems. It seems to have an integer overrun and thinks that there isn't enough disk space. It'll report the following and irrelevant message:

last error with EFS file systems, screen shot 1

 You will want to continue by pushing the button [Continue anyway]. The installation will happen on the root file system and not on the two NFS mounted EFS file systems.

Then yast will come up with the following dialog:

last error with EFS file systems, screen shot 2

Activate the option Do not Show This Message Again and move on with the option [Yes].

Avoiding the Problem

SUSE is processing this bug as: 991090 (yast sw_single reports "error out of diskspace" while filesystems with 8 ExaByte are mounted)

The problem can been avoided in the mean time through the three following options:

  1. Install all SLES software through yast before you create the EFS file systems
  2. Umount the EFS file systems before you use yast.
  3. Overrule the warning. Move ahead. You will risk a full file system somewhere else

 

Add swap space

Disclaimer: The following commands document how to add a raw device as swap volume. Selecting the wrong raw device will lead to data corruption in file systems!

All commands have to be executed with root privileges

  • Create a separate AWS volume with the required space. This warrants that there are no contentions in regards of maximum IOs with other volumes and the solution is price neutral
  • I assume that the swap volume is /dev/xvdg, Format and add the volume to swap:
$ mkswap /dev/xvdg
$ swapon /dev/xvdg

Make it persistent through reboots by adding the following line to /etc/fstab

/dev/xvdg     swap     swap defaults

 

Allow User Access without Certificates (Password only)

AWS systems allow by default access with a certificate only. This is a security measure.

Administrators who decide to lower the security standards by allowing ssh access through user/password credentials on SUSE SLES have to execute the following commands:

  • Edit the /etc/ssh/sshd_config file.
    • Change the entry "PasswordAuthentication no" to  "PasswordAuthentication yes"
    • Save the changes
  • Restart the sshd daemon with the command
    • $ service sshd restart

 

Change Hostname on SUSE SLES for SAP Installations on AWS

SAP systems require hostnames which aren't longer than 13 characters. The default AWS naming schema is to use the IP address separated with dashes to create hostnames. This naming schema can lead to host names with are to long for SAP installations.

The fix is based on the following assumptions:

  • The SAP system is being operated in a VPC with it's network interface
  • The IP address is a private one.
  • No DNS or NIS naming in the clients have to be used

The following procedure renames a system to node1

SLES 11

  1. Change the content of file /etc/HOSTNAME to node1. This entry will be used to set the host name in future reboots
  2. Edit the file /etc/cloud/cloud.cfg
    1. Modify the line preserve_hostname: false to preserve_hostname: true . This entry will be used in the next reboot to determine whether the hostname should be left as it is.
  3. Edit the file /etc/hosts
    1. Add node1 to the primary IP address like in this example "10.79.7.92 ip-10-79-7-92 node1"
  4. Set the host name with the command "# hostname node1". This command performs a dynamic change. It's effect will not last beyond a reboot.
  5. Configure the DHCP client not to configure the hostname
    1. Enter the command yast lan
    2. Move to entry Hostname/DNS (tabarrow right) and select it
    3. set hostname to node1 in the host name field
    4. deselect (remove x) from the entry set hostname dynamically
    5. Save all settings and leave yast

SLES 12 & SLES 15

  1. Edit the file /etc/cloud/cloud.cfg
    • Modify the line preserve_hostname: false to preserve_hostname: true . This entry will be used in the next reboot to determine whether the hostname should be left as it is.
  2. Edit the file /etc/hosts
    • Add node1 to the primary IP address like in this example "10.79.7.92 ip-10-79-7-92 node1"
  3. Use command:
    • $ hostnamectl set-hostname node1

 

Enable root Access for Linux Instances

AWS doesn't grant root access by default to EC2 instances. This is an important security best practise. Users are supposed to open a ssh connection using the secure key/pair to login as ec2-user. Users are supposed to use the sudo command as ec2-user to obtain elevated privileges.

Problems arise with a number of software packages which require remote root access for installation and operation. The following cheat sheet explains how to enable root access. It hasn't been tested with all Linux distributions.

Disclaimer: Enabling direct root access to EC2 systems is a bad security practise which AWS doesn't recommend. It creates vulnerabilities especially for systems which are facing the Internet (see AWS documentation).

Use these commands on your own risk. Understand the function of the commands and the related risks before you apply them.

All commands require root privileges which can be obtained through the sudo command.

Create a root Password

$ passwd root <the password>

Configure and Restart the ssh Service for root Access

Edit the configuration file /etc/ssh/sshd_config. Change the following to parameter to the values shown below:

PermitRootLogin yes
PasswordAuthentication yes

Restart the service with the command

$ service sshd reload

Patch the authorized Keys File for the root User

The simplest way is to use the ec2-user file and the certificate for the root user. Copy the ec2-user file over to the root user:

$ cp ~ec2-user/.ssh/authorized_keys ~root/.ssh/authorized_keys

This allows as well to login with the same key which is available for the ec2-user.

Update the AWS Cloud Configuration File

Edit the file /etc/cloud/cloud.cfg and change the following entry to this value:

disable_root false

 

 

Installation of a Graphical Desktop with RDP Access for SUSE SLES 11, 12, 15

SAP installations may require graphical tools to be operated on the target server.

Important dependencies

  • xrdp uses vnc
  • vnc uses X11 and window managers

Software Installation

AWS SUSE installations come by default without a GNOME desktop environment. The following commands will install a GNOME desktop and an xrdp service to connect to the systems:

SLES 11 & 12

# zypper install -t pattern gnome-basic

SLES 15

Use yast and the install pattern "Gnome basic"

  • start yast
  • Select "Software", enter "tab"
  • Select "Software Management", enter "cr"
  • Move active field to "Filter Search", enter "Shift"+"tab"
  • Use down keyborad key to unfold selection list
  • Select "Patterns"
  • Select "GNOME Desktop Environment (Basic)"
  • Select "Accept"

Install xRDP

# zypper install xrdp

Enable VNC Remote Login

  • Start yast
  • Select " Network Services"
  • Select first entry "Remote Administration with VNC"
  • Enable service

Configure Window Manager to use Gnome

  • Edit file /etc/sysconfig/windowmanager
  • Change entry DEFAULT_WM="" to DEFAULT_WM="gnome"

Startup the RDP service and make it start automatically after Reboot

These commands need to be executed with the sudo command from the ec2-user.

SLES 11

# service xrdp start
# chkconfig --set xrdp on

SLES 12 & 15

# systemctl start xrdp
# systemctl enable xrdp 

Register a Subscription in SLES ( and keep AWS CLI working!)

SLES 12 & 15

Use this command to register your system at SUSE.

# SUSEConnect -r <YourActivationCode> -e <YourEmailAddress>

More details can be found in the SUSE documentation.

SUSE BYOS AMIs on AWS do not tend to update their cloud module. Execute the following commands as super user to get this done:

SLES 12

# SUSEConnect --list-extensions
# SUSEConnect -p sle-module-public-cloud/12/x86_64

SLES 15

# SUSEConnect --list-extensions
# SUSEConnect -p sle-module-public-cloud/15/x86_64

The AWS CLI is an important part of this module. Updating it will allow you to use the latest services and new regions. Don't forget updating your packages with

# zypper update
Important

The AWS CLI will not work by default on SLES 15! The required patch for boto will only be installed if this repository is configured.

See SUSE support document 7023686.

 

Registering Repositories for AWS SuSE AMIs

 SuSE SLES 11 and 12 AMIs use AWS specific repositories to install and update packages.

There are situations when SuSE systems aren't able to install new packages or update them because they lost their AWS repository  configuration.

This problem can be fixed by issuing the following command as super user:

/usr/sbin/registercloudguest --force-new

Disclaimer: This command will perform major changes to your system. Handle it with care and consult the SuSE documentation upfront!