SAP on Amazon Web Services (AWS)

A collection of AWS SAP related resources. This is work in progress. Please revisit this page from time to time.

Stefan Schneider Tue, 03/04/2014 - 16:31

9802 views

AWS (Amazon Web Services) SAP Benchmarks in Cloud Environments

Resources

SAP Benchmarks in Cloud Environments
- SAP Defintion of Cloud Awareness

Benchmarks published

Two-tier Internet Configuration

2-Tier Internet Configuration
Certification Number	Date	Benchmark	Instances	OS
2016021	May 2016	Sales and Distribution	x1.32xlarge instance	Windows Server 2012 R2 Standard Edition
2015032	July 2015	Sales and Distribution	m4.10xlarge instance	Windows Server 2012 Standard Edition
2015006	Mar. 2015	Sales and Distribution	c4.8xlarge instance	Windows Server 2012 Standard Edition
2015005	Mar. 2015	Sales and Distribution	c4.4xlarge instance	Windows Server 2012 Standard Edition
2014041	Oct. 2014	Sales and Distribution	c3.8xlarge instance	Windows Server 2012 Standard Edition
2014035	June 2014	Sales and Distribution	r3.8xlarge instance	Windows Server 2012 Standard Edition
2014010	Mar. 2014	Sales and Distribution	cr1.8xlarge instance	Windows Server 2008 R2 Datacenter

Three-tier Internet Configuration

3-Tier Internet Configuration
Certification Number	Date	Benchmark	Instances	OS
2013035	Nov. 2013	Sales and Distribution	9 m2.4large instances	Windows Server 2008 R2 Datacenter

SAP BW Enhanced Mixed Load (BW EML)

SAP BW EML
Certification Number	Date	Benchmark	Ad-Hoc Navigation Steps/Hour	Instances	OS
2014001	Jan. 2014	SAP BW Enhanced Mixed Load (BW EML) 500.000 rcords	113390	1 cr1.8xlarge DB server + 2 c3.8xlarge appl. server instances	SuSE Linux Enterprise Server 11
2014013	Apr. 2014	SAP BW Enhanced Mixed Load (BW EML) 5.000.000 rcords	137510	1 cr1.8xlarge DB server + 2 c3.8xlarge appl. server instances	SuSE Linux Enterprise Server 11 (DB Server), Windows Server 2008R2 Datacenter Edition (app servers)
2014014	Apr. 2014	SAP BW Enhanced Mixed Load (BW EML) 2.000.000.000 rcords	177590	5 cr1.8xlarge DB server + 3 c3.8xlarge appl. server instances	SuSE Linux Enterprise Server 11 (DB Server), Windows Server 2008R2 Datacenter Edition (app servers)

Stefan Schneider Fri, 04/04/2014 - 11:14

7364 views

AWS Data Provider for SAP

Resources

Users in the chinese region will have to use:

Testing the Collector

A well operating collector will operate a web server which reports the current status through a URL in the following form:

http://localhost:8888/vhostmd

The collector is supposed to bind against localhost only for security reasons.

Information flow around the AWS Data Provider for SAP

Stefan Schneider Wed, 03/05/2014 - 10:49

5622 views

Data Provider Installation through AWS System Manager (for SLES)

Prerequistes

Execute the following two steps to enable an instance to be managed by System Manager:

Add the AWS managed policy AmazonSSMAutomationRole to the role of the instance
Install the System Manager agent according to the AWS documentation

Creation of System Manager Document

Use the AWS console.

Move to "System Manager"->"Documents". Create a new document with the following content:

{
  "schemaVersion" : "2.2",
  "description" : "Command Document Example JSON Template",
  "mainSteps" : [ {
    "action" : "aws:runShellScript",
    "name" : "test",
    "inputs" : {
      "runCommand": [ "wget https://s3.amazonaws.com/aws-data-provider/bin/aws-agent_install.sh;",
                      "chmod ugo+x aws-agent_install.sh;",
                      "sudo ./aws-agent_install.sh;",
                      "curl http://localhost:8888/vhostmd"
                    ],
    "workingDirectory":"/tmp",
    "timeoutSeconds":"3600",
    "executionTimeout":"3600"
    }
  } ]
}

Save the document with the name SAP-Data-Provider-Installation-Linux.

Command Line Execution of the System Manager Document

New data providers can then be provisioned with the AWS console or the following AWS CLI command:

aws ssm send-command --document-name "SAP-Data-Provider-Installation-Linux" \
   --comment "SAP Data Provider Installation" --targets "Key=instanceids,Values=i-my-instance-id"  \
   --timeout-seconds 600 --max-concurrency "50" \
   --max-errors "0" --region my-region

Replace the variables

i-my-instance-id with the instance id
my-region with the matching region

Stefan Schneider Tue, 04/17/2018 - 17:37

1472 views

AWS Quickstarts for SAP

AWS offers Cloudformation scripts to automate the installation of SAP applications.

Name	Manual	Launch	github sources
SAP HANA	Deployment Guide	Launch	quickstart-sap-hana
Netweaver ABAP	Quick Start Reference Deployment	new VPC existing VPC	quickstart-sap-netweaver-abap

Stefan Schneider Tue, 01/16/2018 - 16:23

1461 views

Command Line Creation of an AWS Instance for SAP HANA

The bash script shown here allows to create SAP HANA instances from the command line. It uses the AWS CLI. The aws command needs to be in the search path.

Consider to use the AWS Quick Start for HANA deployment. It installs the AWS instance as well the HANA software.

The script below will create an AWS instance only. The script solves a number of issues for administrators:

It tags the instance and all instance volumes. This simplifies the complete deletion of the instance
All volumes are getting marked to be deleted when the instance gets deleted
It takes a private IP address to be used as input parameter
It takes the CIDR of the subnet in which the instance is supposed to be created.
It enabled detailed monitoring (as requested by SAP)
it will use the ebs-optimized flag to expedite IO

The limitations

The script creates instances with a private IP address in a VPC.
It doesn't check whether the IP address matches the CIDR
It is currently using a disk configuration for r3.8xlarge SAP compliant systems
It doesn't create VPS or subnets.
It will use the region of the current profile of the AWC CLI

The script requires a file with the name disks.json. This file is currently configured to create a boot disk and 4 gp2 disks with 667GB.

Warning

This script will create AWS resources AWS will charge you for. Be careful using this script. I don't warrant anything

Preconditions

Have the AWS Command Line Interface (CLI) installed on your system.
Configure a AWS user in your AWS CLI profile which has the appropriate IAM profile to create EC2 instances and use a number of describe aws commands
Have the bash shell installed on your system

Download

createHANA.tar with both files

Using the Script

Provide all parameter when calling it in the following format:

./createHANA.sh ami-id pem-name instance-type ip-address cidr security-group "name tag"

Example:

./createHana.sh ami-6b5a5601 myPEM r3.4xlarge 10.79.7.95 10.79.7.0/24 mySecGroup "my-HANA-System95"

*** 1. Prequisites checking
*** 1.1 OK: requested IP address 10.79.7.95 is available
*** 1.2 OK: requested CIDR 10.79.7.0/24 belongs to subnet-6b964f32 in us-east-1c , vpc-2e976742
*** 1.3 Warning: No check whether 10.79.7.95 fits into CIDR 10.79.7.0/24 !
*** 1.4 OK: requested security group mySecGroup is sg-582eec37 (Connect toLab)
*** 1.5 OK: requested PEM myPEM exists.
*** 1.6 OK: AMI ami-6b5a5601 exists (amazon/suse-sles-12-sp1-v20160322-hvm-ssd-x86_64)
Do you want to create this instance? (y/n) Yes
*** 2. About to create the instance
*** 2.1 Created system with Id: i-9dd0e707
*** 2.2 Tagged system with Id: i-9dd0e707 with Name: my-HANA-System95
*** 2.3.1 Tagged all volumes from system with Id: i-9dd0e707 with Name: my-HANA-System95
*** 2.4 The created instance: i-9dd0e707 with Name: my-HANA-System95
RESERVATIONS 752040392274 r-684b74d9
INSTANCES 0 x86_64 None True xen ami-6b5a5601 i-9dd0e707 r3.4xlarge myPEM 2016-06-24T13:47:25.000Z None 10.79.7.95 None /dev/sda1 ebs True None subnet-6b964f32 hvm vpc-2e976742
BLOCKDEVICEMAPPINGS /dev/sda1
EBS 2016-06-24T13:47:26.000Z True attaching vol-8c836528
BLOCKDEVICEMAPPINGS /dev/sdf
EBS 2016-06-24T13:47:26.000Z True attaching vol-0f8365ab
BLOCKDEVICEMAPPINGS /dev/sdg
EBS 2016-06-24T13:47:26.000Z True attaching vol-0e8365aa
BLOCKDEVICEMAPPINGS /dev/sdh
EBS 2016-06-24T13:47:26.000Z True attaching vol-fe83655a
BLOCKDEVICEMAPPINGS /dev/sdi
EBS 2016-06-24T13:47:26.000Z True attaching vol-e983654d
BLOCKDEVICEMAPPINGS /dev/sdj
EBS 2016-06-24T13:47:26.000Z True attaching vol-ac836508
MONITORING pending
NETWORKINTERFACES None 0e:0f:2c:30:0b:b7 eni-522c1700 752040392274 10.79.7.95 True in-use subnet-6b964f32 vpc-2e976742
ATTACHMENT 2016-06-24T13:47:25.000Z eni-attach-c0de2d15 True 0 attaching
GROUPS sg-582eec37 mySecGroup
PRIVATEIPADDRESSES True 10.79.7.95
PLACEMENT us-east-1c None default
SECURITYGROUPS sg-582eec37 mySecGroup
STATE 0 pending
TAGS Name my-HANA-System95

The second option to use the script is the interactive dialog:

./createHANA.sh
Enter AMI name:
ami-6b5a5601
Enter name of security key:
myPEM
Enter instance type:
r3.4xlarge
Enter IP address:
10.79.7.94
Enter CIDR n the format xxx..xxx.xxx.xxx/yy:
10.79.7.0/24
Enter security group name:
mySecGroup
Enter name tags for instance and volumes:
my-HANA-System94
*** 1.  Prequisites checking
*** 1.1 OK: requested IP address 10.79.7.94 is available
*** 1.2 OK: requested CIDR 10.79.7.0/24 belongs to subnet-6b964f32 in us-east-1c , vpc-2e976742
*** 1.3 Warning: No check whether 10.79.7.94 fits into CIDR 10.79.7.0/24 !
*** 1.4 OK: requested security group mySecGroup is sg-582eec37 (Connect to Lab)
*** 1.5 OK: requested PEM myPEM exists.
*** 1.6 OK: AMI ami-6b5a5601 exists (amazon/suse-sles-12-sp1-v20160322-hvm-ssd-x86_64)
Do you want to create this instance? (y/n) Yes
*** 2. About to create the instance
*** 2.1 Created system with Id: i-30ddeaaa
*** 2.2 Tagged system with Id: i-30ddeaaa with Name: my-HANA-System94 
*** 2.3.1 Tagged all volumes from system with Id: i-30ddeaaa with Name: my-HANA-System94 
*** 2.4 The created instance: i-30ddeaaa with Name: my-HANA-System94 
RESERVATIONS	752040392274	r-2a49769b
INSTANCES	0	x86_64	None	True	xen	ami-6b5a5601	i-30ddeaaa	r3.4xlarge	myPEM	2016-06-24T13:53:11.000Z	None	10.79.7.94	None	/dev/sda1	ebs	True	None	subnet-6b964f32	hvm	vpc-2e976742
BLOCKDEVICEMAPPINGS	/dev/sda1
EBS	2016-06-24T13:53:12.000Z	True	attaching	vol-5a8d6bfe
BLOCKDEVICEMAPPINGS	/dev/sdf
EBS	2016-06-24T13:53:12.000Z	True	attaching	vol-b38a6c17
BLOCKDEVICEMAPPINGS	/dev/sdg
EBS	2016-06-24T13:53:12.000Z	True	attaching	vol-b28a6c16
BLOCKDEVICEMAPPINGS	/dev/sdh
EBS	2016-06-24T13:53:12.000Z	True	attaching	vol-5b8d6bff
BLOCKDEVICEMAPPINGS	/dev/sdi
EBS	2016-06-24T13:53:12.000Z	True	attaching	vol-4e8d6bea
BLOCKDEVICEMAPPINGS	/dev/sdj
EBS	2016-06-24T13:53:12.000Z	True	attaching	vol-198d6bbd
MONITORING	pending
NETWORKINTERFACES	None	0e:cc:0c:19:f4:b1	eni-621c2730	752040392274	10.79.7.94	True	in-use	subnet-6b964f32	vpc-2e976742
ATTACHMENT	2016-06-24T13:53:11.000Z	eni-attach-0ec635db	True	0	attaching
GROUPS	sg-582eec37	mySecGroup
PRIVATEIPADDRESSES	True	10.79.7.94
PLACEMENT	us-east-1c	None	default
SECURITYGROUPS	sg-582eec37	mySecGroup
STATE	0	pending
TAGS	Name	my-HANA-System94

The script createHANA.sh

#!/bin/bash
# version 1.0 June 24, 2016
# This script is using the AWS cli.
# It assumes that the aws command is part of the search path
AMI=$1
PEM=$2
INSTANCETYPE=$3
IP=$4
CIDR=$5
SGNAME=$6
NAMETAG=$7
case $1 in
        -h | -help)p
        echo "Use this command with the following options:"
        echo "$0 -h                : to obtain this output"
        echo "$0 -help             : to obtain this output"
        echo "$0                   : enter information through a dialog"
        echo "$0 ami-id pem-name instance-type ip-address cidr security-group \"name tag\" "
        echo "Example:"
        echo " ./createHana.sh ami-6b5a5601 myPEM r3.4xlarge 10.79.7.96 10.79.7.0/24 mySecGroup \"my-HANA-System96\"" 
        exit
        ;;
esac    
if [[ -z $AMI ]]; then
    echo "Enter AMI name:"
        read AMI
fi
if [[ -z $PEM ]]; then
    echo "Enter name of security key:"
        read PEM
fi
if [[ -z $INSTANCETYPE ]]; then
    echo "Enter instance type:"
        read INSTANCETYPE
fi      
if [[ -z $IP ]]; then
    echo "Enter IP address:"
        read IP
fi
if [[ -z $CIDR ]]; then
    echo "Enter CIDR n the format xxx..xxx.xxx.xxx/yy:"
        read CIDR 
fi
if [[ -z $SGNAME ]]; then
    echo "Enter security group name:"
        read SGNAME
fi
if [[ -z $NAMETAG ]]; then
    echo "Enter name tags for instance and volumes:"
        read NAMETAG
fi
echo "*** 1.  Prequisites checking"
EXISTINGIP=$(aws ec2 describe-network-interfaces --filter Name=private-ip-address,Values=$IP | awk -F\t '/PRIVATEIPADDRESSES/ {print $3}' | grep $IP) 
if [ $EXISTINGIP ]
then
        INSTID=$(aws ec2 describe-network-interfaces --filter Name=private-ip-address,Values=$IP | awk -F\t '/ATTACHMENT/ {print $6}')
        echo "*** 1.1 ERROR: requested IP address $IP is already in use by instance $INSTID. Will stop here..."
        exit 1
else
        echo "*** 1.1 OK: requested IP address $IP is available"
fi
SUBNET=$(aws ec2 describe-subnets --filter  Name=cidrBlock,Values=$CIDR | awk -F\t '/SUBNETS/ {print $8}')
AZ=$(aws ec2 describe-subnets --filter  Name=cidrBlock,Values=$CIDR | awk -F\t '/SUBNETS/ {print $2}')
VPC=$(aws ec2 describe-subnets --filter  Name=cidrBlock,Values=$CIDR | awk -F\t '/SUBNETS/ {print $9}')
if [ $SUBNET ]
then
        echo "*** 1.2 OK: requested CIDR $CIDR belongs to $SUBNET in $AZ , $VPC"
else
        echo "*** 1.2 ERROR: no subnet found for CIDR $CIDR . Will stop here..."
        exit 1
fi
echo "*** 1.3 Warning: No check whether $IP fits into CIDR $CIDR !"
SECURITY=$(aws ec2 describe-security-groups --filters Name=group-name,Values=${SGNAME} | awk -F\t '/SECURITYGROUPS/ {print $3}')
SECURITYTEXT=$(aws ec2 describe-security-groups --filters Name=group-name,Values=${SGNAME} | awk -F\t '/SECURITYGROUPS/ {print $2}')
if [ $SECURITY ]
then
        echo "*** 1.4 OK: requested security group $SGNAME is $SECURITY ($SECURITYTEXT)"
else
        echo "*** 1.4 ERROR: requested security group $SGNAME not found. Will stop here"
        exit 1
fi
PEMRESULT=$(aws ec2 describe-key-pairs --filters Name=key-name,Values=$PEM| awk -F\t '/KEYPAIRS/ {print $3}')
if [ $PEMRESULT ]
then
        echo "*** 1.5 OK: requested PEM $PEM exists."
else
        echo "*** 1.5 ERROR: requested PEM $PEM not found. Will stop here"
        exit 1
fi
AMINAME=$(aws ec2 describe-images --image-ids $AMI | awk -F\t '/IMAGES/ {print $6}')
if [ $AMINAME ]
then
        echo "*** 1.6 OK: AMI $AMI exists ($AMINAME)"
else
        echo "*** 1.6 ERROR: AMI $AMI does not exist. Will stop here"
        exit 1
fi
echo -n "Do you want to create this instance? (y/n) "
old_stty_cfg=$(stty -g)
stty raw -echo ; answer=$(head -c 1) ; stty $old_stty_cfg # Care playing with stty
if echo "$answer" | grep -iq "^y" ;then
    echo Yes
else
    echo No
        exit
fi
echo "*** 2. About to create the instance"
ID=$(aws ec2 run-instances \
        --key-name $PEM \
        --instance-type $INSTANCETYPE \
        --count 1 \
        --block-device-mappings file://disks.json \
        --image-id $AMI \
        --monitoring Enabled=true \
        --instance-initiated-shutdown-behavior stop \
        --security-group-ids $SECURITY \
        --subnet-id $SUBNET \
        --private-ip-address $IP \
        --ebs-optimized  | \
        awk '/INSTANCES/ {print $8}' \
) 
echo "*** 2.1 Created system with Id: $ID"
aws ec2 create-tags --resources $ID  --tags Key=Name,Value=${NAMETAG}
echo "*** 2.2 Tagged system with Id: $ID with Name: $NAMETAG "
#echo "*** 2.3.0  will sleep for 2s before tagging the volumes with $NAMETAG "
#sleep 2
aws ec2 describe-instances --instance-ids $ID | awk '/EBS/ {print "aws ec2 create-tags --resources " $5 " --tags Key=Name,Value='"$NAMETAG"'" }' | bash -
echo "*** 2.3.1 Tagged all volumes from system with Id: $ID with Name: $NAMETAG "
echo "*** 2.4 The created instance: $ID with Name: $NAMETAG "
aws ec2 describe-instances --instance-ids $ID

The file disks.json

This file has to be in the directory in which the script is called

[ 
        {"DeviceName":"/dev/sda1",
                "Ebs":{"VolumeSize":200,"VolumeType":"gp2",
                        "DeleteOnTermination":true}}, 
        {"DeviceName":"/dev/sdf",
                "Ebs":{"VolumeSize":667,"VolumeType":"gp2",
                        "DeleteOnTermination":true}}, 
        {"DeviceName":"/dev/sdg",
                "Ebs":{"VolumeSize":667,"VolumeType":"gp2",
                "DeleteOnTermination":true}}, 
        {"DeviceName":"/dev/sdh","Ebs":{"VolumeSize":667,"VolumeType":"gp2",
                "DeleteOnTermination":true}}, 
        {"DeviceName":"/dev/sdi","Ebs":{"VolumeSize":667,"VolumeType":"gp2",
                "DeleteOnTermination":true}}, 
        {"DeviceName":"/dev/sdj","Ebs":{"VolumeSize":50,"VolumeType":"gp2",
                "DeleteOnTermination":true}}
]

Feedback

The script is limited. Leave a comment to get in touch with me. I'll be happy to improve the script and integrate a better coding.

Stefan Schneider Fri, 06/24/2016 - 16:19

3229 views

Configuring SAProuter (as a service) on Linux

Installing a saprouter on Linux is straight forward.

... at least without using SNC.

SAP Routers can be used to

connect your production system to SAP Remote Services
route traffic of on premises SAP GUI users to a peered VNC
Allow on premises SAP GUI users to reach highy available SAP systems which use an overlay IP address.

The playbook for the installation is

Create files for services, the installation, a saprouting table file
Copy all files to a private S3 bucket
Create a policy which allows the instance to pull the files from the S3 bucket
Use an AWS CLI command to create an instance which will automatically install the saprouter

Have a routing table file for saprouter

Create a configuration file with the name saprouttab. The simplest one which means: route all ABAP traffic in all directions is a file with the name /usr/sap/saprouter/saprouttab with the content:

P * * *

This means: P(ermit) ALL SOURCE IP/HOSTNAMES to ALL DESTINATION IP/HOSTNAMES using a PORT-RANGE from 3200 – 3299

Create a Policy which grants Access to an S3 Bucket to Download all required Software

Create a policy which looks like the following:

{
    "Version": "2012-10-17",
    "Statement": [
     {
         "Effect": "Allow",
         "Action": "s3:GetObject",
         "Resource": "arn:aws:s3:::bucket-name/bucket-folder/*"
        },
        {
          "Effect": "Allow",
          "Action": ["sS:ListBucket","S3:HeadBucket"],
          "Resource": "arn:aws:s3:::bucket-name"
        }
    ]

Replace the following variables with you individual settings

bucket-name: the name of the bucket which stores all files to be downloaded
bucket-folder: The subfolder which contains your download information. It is an optional part

Add this policy to a new role.

Attach the role to the instance when you will create it.

Creation of a Service

SLES 12, 15 or Red Hat will need a service to restart the saprouter whenever needed. Create a file saprouter.service:

[Unit]
Description=SAP Router Configuration
After=syslog.target network.target

[Service]
Type=simple
RemainAfterExit=yes
WorkingDirectory=/usr/sap/saprouter
ExecStart=/usr/sap/saprouter/saprouter -r
ExecStop=/usr/sap/saprouter/saprouter -s
KillMode=none
Restart=no

[Install]
WantedBy=multi-user.target

Start the service with the commands:

systemctl daemon-reload
systemctl enable saprouter.service
systemctl start saprouter.service

Create an Installation Script

Create a file install.sh:

#!/usr/bin/env bash
# version 0.2
# December, 2018
## Run script as super user:
# This script needs one parameter, the URL to access the S3 bucket
# with all downloadble files
# Use the notation s3:my-bucket/myfolder
BUCKET=$1
SAPSAPROUTTAB="saprouttab"
SERVICE="saprouter.service"
ROUTDIR="/usr/sap/saprouter"
echo "*** 1. Create /usr/sap/saprouter"
mkdir -p ${ROUTDIR}/install
echo "*** 2. Download files"
aws s3 sync  ${BUCKET} ${ROUTDIR}/install
cd  ${ROUTDIR}/install
# All files will become lowe case files
for f in `find`; do mv -v "$f" "`echo $f | tr '[A-Z]' '[a-z]'`"; done
chmod u+x ${ROUTDIR}/install/${SAPCAR}
chmod u+x uninstall.sh
mv uninstall.sh ..
mv ${SERVICE} /etc/systemd/system/${SERVICE}
for f in `find . -name saprouter*.sar`; do mv -v $f saprouter.sar; done
for f in `find . -name sapcryptolib*.sar`; do mv -v $f sapcryptolib.sar; done
for f in `find . -name sapcar*`; do mv -v $f sapcar; done
chmod u+x sapcar
mv saprouttab ..
echo "*** 3. Unpack files"
cd ${ROUTDIR}
./install/sapcar -xf ${ROUTDIR}/install/saprouter.sar
./install/sapcar -xf ${ROUTDIR}/install/sapcryptolib.sar
echo "*** 4. Start service"
systemctl daemon-reload
systemctl enable ${SERVICE}
systemctl start ${SERVICE}
echo "5. Done..."

The file will work if there are three unique files in the download bucket which are the onlyones with names like sapcar*, sapcrypto*.sar and saprouter*.sar. Capitalztion will not matt Update the bucket-name and the bucket-folder variables matching your individual needs.

Create a De-installation Script

Create a file withe the name uninstall.sh:

#!/usr/bin/env bash
# version 0.1
# December, 2018
## Run as super user:
echo "1. Stopping and disabling service"
systemctl stop saprouter.service
systemctl disable saprouter.service
systemctl daemon-reload
echo "2. Removing files"
rm /etc/systemd/system/saprouter.service
rm -rf /usr/sap/saprouter
echo "3. Completed deinstallation"

Files Upload

Upload the following files to the S3 bucket:

sapcar
Cryptolib installation file
saprouter installation file
saprouttab
install.sh
uninstall.sh
saprouter.service

There is no need to make this bucket public. The instance will have an IAM profile which entitles the instance to download the files needed.

Create a UserData file on your Adminstration PC

Create a file prep.sh:

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash
BUCKET="s3://bucket-name/bucket-folder"
# take a one scond nap before moving on...
sleep 1
aws s3 cp ${BUCKET}/install.sh /tmp/install.sh
chmod u+x /tmp/install.sh
/tmp/install.sh $BUCKET
--//

Replace bucket-name and bucket-folder with the appropriate values.

This file will get executed when the instance will get created.

Installation of Instance

The following script will launch an instance with an automated saprouter installation. It assumes that

The local account has the AWS CLI (Command Line Interface) configured
The AMI-ID is one of a SLES12 or SLES 15 AMI available in the region (image-id parameter)
There is security group which has the appropriate ports open (security-group-ids parameter)
The file prep.sh is in the directory where the command gets launched
There is subnet with Internet access and access to the SAP systems (subnet-id parameter)
There is an IAM role which grants access to the appropriate S3 bucket (iam-instance-profile parameter)
aws-key an the AWS key which allows to login through ssh. It needs to exist upfront

The command is

aws ec2 run-instances --image-id ami-XYZ \
        --count 1 --instance-type m5.large \
        --key-name aws-key \
        --associate-public-ip-address \
        --security-group-ids sg-XYZ \
        --subnet-id subnet-XYZ \
        --iam-instance-profile  Name=saprouter-inst \
        --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=PublicSaprouter}]' \
        --user-data file://prep.sh

This command will create an instance with

a public IP address
a running saprouter
a service being configured for the saprouter
SAP Cryptolib currently gets unpacked but not configured (stay tuned)

Installation as VPC internal saprouter as a proxy to relay traffic from on-premises users

Omit the parameter --associate-public-ip-address. This parameter creates a public IP address. You don't want this for an internal saprouter.

Installation with the help of an AWS Cloudformation template

Use this template (saprouter.template). It works with SLES 12SP3. Replace the AMIs if you need a higher revision.

Upload the template to an S3 bucket
Upload the SAP installation media and the file saprouttab to a S3 bucket
Execute the file in CloudFormation

Warning: Please check the template upfront. It'll allocate resources in your AWS account. It has the potential to do damage.

More Information

Consult the SAP documentation to configure SNC or more detailed routing entries.

Stefan Schneider Wed, 12/05/2018 - 17:07

11008 views

HANA Cheat Sheet

Starting and stopping HANA

Start HANA instance with hostctrl as root:

/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Start

Stop HANA instance with hostctrl as root:

/usr/sap/hostctrl/exe/sapcontrol -nr <instance number> -function Stop

Start HANA as <sid>adm:

/usr/sap/<SID>/HDB<instance number>/HDB start

Example: /usr/sap/KB1/HDB26/HDB start Stop the SAP HANA system as <sid>adm by entering the following command:

/usr/sap/<SID>/HDB<instance number>/HDB stop

HANA Backups Command Line

Systems with XSA may have multiple tenants which need to get all backed up. Example as add user:

$ hdbsql -u system -d systemdb -i 00 "BACKUP DATA USING FILE ('backup')"
$ hdbsql -u system -d systemdb -i 00 "BACKUP DATA FOR HDB USING FILE ('backup')"

Stefan Schneider Thu, 09/13/2018 - 14:55

2762 views

High Availability Solutions for SAP on AWS

The SAP on Amazon Web Services High Availability Guide describes Windows and Linux architectures with failover scenarios.

This page focuses on solutions which can automatically fail over SAP services from one AWS server to another.

The AWS cloud implements high availability in a different way traditional on premises implementations do:

A failing instance can be restarted automatically. AWS will provide automatically the required resources at restart after the instance became unavailable. There is no need to have standby spare instances.
AWS regions provide multipe availability zones with are far enough apart to not fail through the same desasters and close enough to provide low latency, high bandwidth connections. A SAP customer will want to leverage architectures which are able to exploit the completly independent availibility zones. Using two independent availability zones in such a "metro-cluster" setup is typically very expensive to implement in an on premises setup.
HA solutions can be implemented the same way in all AWS regions. AWS provides a homogenous infrastructure which allows to operate HA systems in all regions of the world.

SAP has a list of certified HA-Interface Partners. AWS is not part of this list since the certified HA-Interface Partners use the AWS platform as supported configurations. The following partners and solutions are known to support the AWS platform:

Stefan Schneider Mon, 02/15/2016 - 11:15

6973 views

NEC Express Cluster 3.3

Product: NEC Express Cluster 3.3 (Product landing page)

Failover Services: HANA Scale Up data bases on Red Hat Linux

Licensing: NEC licenses depending on the services

Status: released, supported

The NEC Cluster relies on the SAP HANA system replication. It works across AWS availability zones within a region.

The NEC cluster uses AWS Overlay IP addresses which support a fast failover. The NEC Cluster will not shut down a node which isn't providing anymore the service. It will fail over to the standby node.

More Resources

Documentation: EXPRESSCLUSTER X 3.1 HA Cluster Configuration Guide for Amazon Web Services
Documentation: EXPRESSCLUSTER X 3.1 for Linux SAP NetWeaver System Configuration Guide
SAP Note 1768213: Support details for NEC EXPRESSCLUSTER
SAP Note 1841837: Support Details for NEC EXPRESSCLUSTER Support on SAP NetWeaver Systems
SAP Note 2302728: Supported scenarios with NEC Expresscluster on Amazon Web Services
SCN Article with AWS mention: High Availability with NEC Express Cluster

AWS Specific Configuration Details

Be aware that the NEC cluster will change the network topology. The privileges required for these operations allow to change the AWS network topology in an account. Verify and test all entries very carefully. Limit access to user working on the NEC Express cluster nodes to the required minimum.

Required Routing Entries

The NEC Cluster will typically operate in a single VPC. The cluster nodes are typically located in different availability zones for increased availability. Therefore thew will have their primary IP addresses in different subnets.

The AWS overlay IP addresses are based on a concept which allows to create routing entries which point traffic to an IP address (NEC cluster node). The NEC Express Cluster will change these routing entries when needed. It will however not create the routing entries. The initial creation of the routing entries needs to happen manually. The same routing entry will have to be created in all routing tables of the given VPC.

The AWS VPC console can be used to add this entry. The AWS Command Line Interface offers the following command as well:

ec2addrt ROUTE_TABLE -r CIDR -i INSTANCE

The user will have to pick an arbitrary AWS instance id from a cluster node as option -i. The NEC Express cluster will then update this entry as needed.

The NEC cluster will only operate in a correct way if the routing entry in all routing tables of the VPC have been created!

AWS Instance Configuration for Cluster Nodes

The AWS cluster nodes will have to be able to communicate through a second IP address. The document IP Failover with Overlay IP Addresses on this site describe how to disable the source/destination check for AWS instances and how to host a second IP address on the same Linux system.

IAM Policies: NEC-HA-Policy

The cluster nodes will require the following privileges to operate:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1424870324000",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeTags",
                "ec2:DescribeVpcs",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeAvailabilityZones"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Stmt1424860166260",
            "Action": [
                "ec2:CreateRoute",
                "ec2:DeleteRoute",
                "ec2:DescribeRouteTables",
                "ec2:ReplaceRoute"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Stefan Schneider Mon, 02/15/2016 - 11:36

3390 views

Red Hat Pacemaker for SAP Applications

Red Hat supports the protection of SAP HANA DB starting with Red Hat 7.4 on AWS .

Access to documentation requires a Red Hat customer account with the appropriate entitlement. Please read:

Installing and Configuring a Red Hat Enterprise Linux 7.4 (and later) High-Availability Cluster on Amazon Web Services
Configure SAP HANA System Replication in Pacemaker on Amazon Web Services
Configure SAP Netweaver ASCS/ERS with Standalone Resources on Amazon Web Services (AWS)
SAP note: 2765525 - Red Hat Enterprise Linux High Availability Add-On on AWS for SAP NetWeaver and SAP HANA

Stefan Schneider Fri, 08/24/2018 - 17:11

1657 views

Bad Hair Days (with Red Hat Pacemaker)

This page documents known problems with the Red Hat Pacemaker cluster. The problems typically arise from incorrect configurations...

Symptom: Virtual IP Service doesn't start

Problem: A manual start leads to the following problem:

[root@myNode1 ~]# pcs resource debug-start s4h_vip_ascs20 --full
... ...
 >  stderr: Unknown output type: test
 >  stderr: WARNING: command failed, rc: 255

Solution: Fix AWS CLI configuration. The output format may be wrong. It has to be text.

[root@myNnode1 ~]# aws configure
AWS Access Key ID [None]: 
AWS Secret Access Key [None]: 
Default region name [us-east-1]: 
Default output format [test]: text

Stefan Schneider Mon, 03/04/2019 - 09:33

785 views

SUSE SLES for SAP

Product: SLES for SAP 12 (Product landing page)

Failover Services: HANA Scale Up databases and Netweaver central systems

Licensing: Bring your own SUSE subscription or use the AWS Marketplace SUSE Linux Enterprise Server for SAP Applications 12 SP3 offering.

Status: Full support starting with SLES for SAP 12 SP1

This product relies on the SAP HANA system replication. It will monitor the master and the slave node for health. The Linux Cluster will failover a service IP address to the previous slave node when needed. The fencing agents will then reboot the previouse master node.

See:

More Resources:

Technical presentation SUSECon 2015: Fast SAP HANA Fail Over Architecture with a SUSE High Availability Cluster in the AWS Cloud
15 minutes video showing an automated failover
SAP note: (1765442) Joined support SAP SUSE (SLES High Availability)
SAP note: (2309342) SUSE Linux Enterprise High Availability Extension on AWS
SAP note: (1763512) Supportdetails für SUSE Linux Enterprise High Availability
SUSE Setup Guide:
- SLES 11 (no AWS support):Automate your SAP HANA System Replication Failover
- SLES for SAP 12 SP1(with AWS Support): SAP HANA SR Performance Optimized Scenario
Agent sources (not inidvidually required when SLES for SAP is being used)
- Open source AWS fencing agent in github
- Open source AWS move ip agent in github
AWS Quickstart to install SLES HAE with HANA DB

Stefan Schneider Mon, 02/15/2016 - 11:26

5817 views

Trouble Shooting the Configuration

Verification and debugging of the aws-vpc-move-ip Cluster Agent

As root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address= OCF_RESKEY_routing_table= OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=cluster OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip monitor

Check the console output (DEBUG keyword) for error messages

Stop the overlay IP Address to be hosted on a given Node

As root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address=<virtual_IPv4_address> OCF_RESKEY_routing_table=<AWS_route_table> OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=cluster OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip stop

Check DEBUG output for errors and verify that the virtual IP address is NOT active on the current node with the command ip a . Start the overlay IP Address to be hosted on a given node as root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address=<virtual_IPv4_address> OCF_RESKEY_routing_table=<AWS_route_table> OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=cluster OCF_ROOT=/usr/lib/ocf/usr/lib/ocf/resource.d/suse/aws-vpc-move-ip start

Check DEBUG output for error messages and verify that the virtual IP address is active on the current node with the command ip a.

Start the overlay IP Address to be hosted on a given Node

As root user run the following command using the same parameters as in your cluster configuration:

# OCF_RESKEY_address=<virtual_IPv4_address> OCF_RESKEY_routing_table=<AWS_route_table> OCF_RESKEY_interface=eth0 OCF_RESKEY_profile=<AWS-profile> /usr/lib/ocf/resource.d/suse/aws-vpc-move-ip start

Check DEBUG output for error messages and verify that the virtual IP address is active on the current node with the command ip a.

Testing the Stonith Agent

The Stonith agent will shutdown the other node if he thinks that this node isn't anymore reachable. The agent can be called manually as super user on a cluster node 1 to shut down cluster node 2. Use it with the same parameter as being used in the Stoneith agent configuration:

# stonith -t external/ec2 profile=<AWS-profile> port=<cluster-node2> tag=<aws_tag_containing_hostname> -T off <cluster-node2>

This command will shutdown cluster node 2. Check the errors reported during execution of the command if it's not going to work as planned.
Re-start cluster node 2 and test STONITH the other way around.

The parameter used here are:

AWS-profile : The profile which will be used by the AWS CLI. Check the file ~/.aws/config for the matching one. Using the AWS CLI command aws configure list will provide the same information
cluster-node2: The name or IP address of the other cluster node
aws_tag_containing_hostname: The is the name of the tag of the EC2 instances for the two cluster nodes. We used the name pacemaker in this documentation

Checking Cluster Log Files

Check the file: /var/log/cluster/corosync.log

Useful Commands

As super user:

crm_resource -C	Reset warnings showing up in the command crm status
crm configure edit	Configure all agents in vi
crm configure property maintenance-mode=true	Set Pace Maker in maintenance mode. This allows to reconfigure, start, stop, resync. SAP HANA
crm configure property maintenance-mode=false	Bring Pace Maker from maintenance mode back into controlling, production mode. Allow Pace Maker to explore the current configuration. This can take a few seconds.

SAP HANA related commands (as <SAP>adm user)

hdbcons -e hdbindexserver 'replication info'	Check whether HANA is replicating, detailed
hdbnsutil -sr_state	Check whether HANA is replicating. Show the master, slave relationship
SAPHanaSR-showAttr	Cluster tool which checks the current configuration. Run as super user

Stefan Schneider Thu, 07/14/2016 - 12:46

2797 views

Bad Hair Days (with SLES for SAP)

Bugs I ran into:

Symptom: Virtual IP Address doesn't get hosted

Manual testing of virtual IP address agent (start option) creates the following output:

INFO: EC2: Moving IP address 192.168.10.22 to this host by adjusting routing table rtb-xxx 
INFO: monitor: check routing table (API call) 
DEBUG: executing command: /usr/bin/aws --profile cluster --output text ec2 describe-route-tables --route-table-ids rtb-xxx 
DEBUG: executing command: ping -W 1 -c 1 192.168.10.22 
WARNING: IP 192.168.10.22 not locally reachable via ping on this system 
INFO: EC2: Adjusting routing table and locally configuring IP address 
DEBUG: executing command: /usr/bin/aws --profile cluster ec2 replace-route --route-table-id rtb-xxx --destination-cidr-block 192.168.10.22/32 --instance-id i-1234567890 
DEBUG: executing command: ip addr delete 192.168.10.22/32 dev eth0 
RTNETLINK answers: Cannot assign requested address 
WARNING: command failed, rc 2 INFO: monitor: check routing table (API call)

The host can't add the IP address to eth0

Problem: SUSE netconfig hasn't been disabled

Solution: Set CLOUD_NETCONFIG_MANAGE='no' in /etc/sysconfig/network/ifcfg-eth0

Symptom: Virtual IP Address gets removed after some minutes

corosyn logs show a line like:

rsc_ip_XXX_XXXX_start_0:17147:stderr [ An error occurred (UnauthorizedOperation) when calling the ReplaceRoute operation: You are not authorized to"

Problem: The instance does not have the right to modifiy routing tables

Solution: The virtual IP address policy has a problem. It may be missing. It may have a typo. Another policy may disallow access to routing tables.

Symptom: Nodes fence each other

The log file shows lines like:

2018-10-11T11:14:06.597541-04:00 my-hostname pengine[1234]: error: Resource rsc_ip_ABC_DEF01 (ocf::aws-vpc-move-ip) is active on 2 nodes attempting recovery
2018-10-11T11:14:06.597766-04:00 my-hostname pengine[1234]: warning: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.

Problem: There is a bug is the aws-vpc-move-ip agent. The monitoring has a glitch. The cluster thinks that both sides host the IP address on eth0 and they fence each other.

Solution: Update the package in question. Contact SUSE if this doesn't work or...

Modify all aws-vpc-move-ip resources in your CIB by adding monapi=true to the parameters of each aws-vpc-move-ip resource.

Symptom: Nodes fence each other

Both nodes shut down. The corosync log looks like:

Jan 07 07:31:17 [4750] my-hostname corosync notice  [TOTEM ] A processor failed, forming new configuration.
Jan 07 07:31:25 [4750] my-hostname corosync notice  [TOTEM ] A new membership (w.x.y.z:52) was formed. Members left: 2
Jan 07 07:31:25 [4750] my-hostname corosync notice  [TOTEM ] Failed to receive the leave message. failed: 2

Problem: The corosync token didn't arrive for 6 times within 5 seconds. Check whether the communication in between the two servers works as intented or...

Solution: Increase the following corosync parameter:

token: from 5000 to 30000
consensus: from 7500 to 32000
token_retransmits_before_loss_const: from 6 to 10

Decrease these parameters later on as long as the cluster runs stable. These changes have the following impact:

The cluster will give up on coroysnc communication after (token) 30 seconds
The time out for an individual token gets increased to token/retransmit : 30000ms/10 = 3s
The cluster will attempt (token_retransmits_before_loss_const) 10 times to reestablish communication instead of 6 times
The consensus parameter has to be larger than the token parameter

This configuration will increase the time for a cluster to recognize the communication failure and take over!

Symptom: Virtual IP Address gets removed after some minutes

corosync logs show a line like:

rsc_ip_XXX_XXXX_start_0:17147:stderr [ An error occurred (UnauthorizedOperation) when calling the ReplaceRoute operation: You are not authorized to"

Problem: The instance does not have the right to modifiy routing tables

Solution: The virtual IP address policy has a problem. It may be missing. It may have a typo. Another policy may disallow access to routing tables.

Symptom: Both nodes shut down after a while

The log file shows lines like:

2018-10-12T08:33:10.477900-04:00 xxx stonith-ng[2199]: warning: fence_legacy[32274] stderr: [ An error occurred (UnauthorizedOperation) when calling the StopInstances operation: You are not authorized to perform this operation. Encoded authorization failure message: Q5Edo8F0xvippgHSKd11QKshu_Hhc3Z8Es_D9O4PYkrLrqY_o6ziaM0JkUrCwadpplJsJreOGxwCTEGd-f68XYc82Dz- HqBZmIrwacTFsYxa0fAQLOA6stHTc2OolBqD-X-HsKZ-bOMjAXs69RT04MRAgNVWJPXeAtq4PHZqN5nne8ocnsshgCt_5xkdjGnxp5VsfzE6o75OUtdHKtblq- 8MokX1ItkZKdohocthhQdQyhGlG8HT1loxdDSuG50LE-kHwGo1slNnZOa-Rw3rPKi0tNzpPvDvlMR3_OXwyC
2018-10-12T08:33:10.478589-04:00 xxx stonith-ng[2199]: error: Operation 'poweroff' [32274] (call 56 from crmd.2205) for host 'haawnulsmqaci' with device 'res_AWS_STONITH' returned: -62 (Timer expired)
2018-10-12T08:33:10.478793-04:00 xxx stonith-ng[2199]: warning: res_AWS_STONITH:32274 [ Performing: stonith -t external/ec2 -T off xxx ]
2018-10-12T08:33:10.478978-04:00 xxx stonith-ng[2199]: error: Operation poweroff of haawnulsmqaci by awnulsmqaci for crmd.2205@awnulsmqaci.98fa9afe: Timer expired
2018-10-12T08:33:10.479151-04:00 xxx crmd[2205]: notice: Stonith operation 56/53:87:0:c76c1861-5fd3-4132-a36c-8f22794a6f1b: Timer expired (-62)
2018-10-12T08:33:10.479340-04:00 xx crmd[2205]: notice: Stonith operation 56 for haawnulsmqaci failed (Timer expired): aborting transition.

Problem: A node can't shut down the other since the stonith policies are missing or not being configured appropriately

Solution: Add the stonith policy as indicated in the installation manual. Make sure that the policy is using the appropriate AWS instance ids. Test them individually!

Symptom: Confusing messages after crm configure commands

Example:

host01:~ # crm configure property maintenance-mode=false
 WARNING: cib-bootstrap-options: unknown attribute 'have-watchdog'
 WARNING: cib-bootstrap-options: unknown attribute 'stonith-enabled'
 WARNING: cib-bootstrap-options: unknown attribute 'placement- strategy'
 WARNING: cib-bootstrap-options: unknown attribute 'maintenance- mode'

Problem: This is a bug in crmsh. See: https://github.com/ClusterLabs/crmsh/pull/386 . It shouldn't affect functionality.

Solution: Wait for fix

Symptom: Cluster loses quorum after on node leaves the cluster

Problem: A cluster starts but it breakes the quorum

The corosync-quorum-tools lists the following incorrect status:

# corosync-quorumtool
 (...)
 Votequorum information
 ----------------------
 Expected votes:   2
 Highest expected: 2
 Total votes:      2
 Quorum:           2  --> Quorum
 Flags:            Quorate

A correctly configured cluster will show the following output:

# corosync-quorumtool
 (...)
 Votequorum information
 ----------------------
 Expected votes:   2
 Highest expected: 2
 Total votes:      2
 Quorum:           1 --> Quorum
 Flags:            2Node Quorate WaitForAll

Solution: Fix typo in corosync configuration.

One line is probably incorrect. It may look like

two_nodes: 1

Remove the plural s and change it to

two_node: 1

Stefan Schneider Mon, 09/17/2018 - 16:50

1403 views

Checklist for the Installation of SAP Central Systems with SLES HAE

This check list is supposed to help with the installation of SAP HAE for ASCS protection.

The various identifiers will be needed at different stages of the installation. This check list should be complete before the SAP and the SLES HAE installation begins.

Tip: Click on "Generate printer friendly layout" at the bottom of the page before you print this file.

Item	Status/Value
SLES subscription and update status All systems have a SLES for SAP subscription All system have been updated to use the lates patch level
AWS User Privileges for the installing person Creation of EC2 instances and EBS volumes Creation security groups Creation EFS file systems Modification of AWS routing tables Creation policies and attach them to IAM roles Optional for Route53 agent installation Create and modify A-records in a private hosted zone Potentially needed Creation of subnets and routing tables
VPC VPC Id CIDR range of VPC
Subnet id A for systems in first AZ
Subnet id B for systems in second AZ
Routing table id for subnet A and B Is this routing table in charge to route both subnets? Is it associated to both subnets? Alternative: Is it associated to VPC? Subnets do not have their own ones
Optional: Name of hosted Route 53 zone Name of DHCP option set Verify options! Is option set associated to VPC?
AWS Policies Creation Name of Data Provider policy Name of STONITH policy Name of Move IP (Overlay IP) policy Optionally: Name of Route53 policy
First cluster node (ASCS and ERS) instance id ENI id IP address hostname instance is associated to subnet A? instance has all 3 or 4 policies attached?
Second cluster node (ASCS and ERS) instance id ENI id IP address hostname instance is associated to subnet B? instance has all 3 or 4 policies attached?
PAS system instance id ENI id IP address hostname instance is associated to subnet A or B? instance has data provider policy attached?
AAS system instance id ENI id IP address hostname instance is associated to subnet A or B instance has data provider policy attached?
DB system (is potentially node 1 of a database failover cluster) instance id ENI id IP address hostname instance is associated to subnet A instance has data provider policy attached? a cluster node has 2 to 3 more policies attached
Overlay IP address: service ASCS IP address Has it been added to routing table? Does it point the ENI of first node?
Overlay IP address: service ERS IP address Has it been added to routing table? Does it point the ENI of first node?
Optional: Overlay IP address DB server IP address Has it been added to routing table? Does it point th ENI of first node?
Optional: Route 53 configuration The Route 53 private hosted zone has an A record with the name of the ASCS system the IP address of the first cluster node
Creation of EFS filesytem DNS name of EFS filesystem
All instance have Internet access Check routing tables Alternative: Add http proxies for data providers and cluster software

Stefan Schneider Tue, 11/21/2017 - 16:40

1697 views

Open Source Agents being used by SLES-for-SAP

SUSE is a dedicated Open Source provider. SUSE tends to uses agents being published Upstream in the ClusterLabs Open Source project.

The Open Source agents being published via SLES-for-SAP are the only ones with SUSE support. Customers have evergrowing requirements. SUSE and AWS work on improving the agents.

This page lists the ClusterLabs agents as well as experimental agents without support.

Current ClusterLabs agent
Name	location in SLES file system	Github sources	as of Github commit	Comment	Shortcomings
STONITH agent	/usr/lib64/stonith/plugins/external/ec2	ec2	34a217f on ~ Aug 6, 2018	Stops and monitors EC2 instances. This version is filtering the EC2 commands which has the following advantages no problems with Unicode EC2 tags smaller result sets, faster viewer problems with EC2 CLI response syntx doesn't contribute to EC2 call API limit	Cosmetic: The --text option in AWS CLI command is missing. This would lower the risk of configuration errors with the AWS profile SUSE Bug 1106700: - AWS: ec2 agent has fixes implemented upstream
Move Overlay IP	/usr/lib/ocf/resource.d/suse/aws-vpc-move-ip	aws-vpc-move-ip	7ac4653Sept. 4, 2018	Reassign an AWS Overlay IP address in a routing table	Heads up: This agent is not compatible to the proprietary agent from SUSE. SUSE uses a parameter with the name address. The upstream version uses the parameter name ip. I haven't yet been able to make this agent work in a SUSE cluster :-( Bug 1106707 - AWS: aws-vpc-move-ip agent needs maintenance Pull request for multi routing table support
Route 53	/usr/lib/ocf/resource.d/heartbeat/aws-vpc-route53	aws-vpc-route53.in	7632a85 ~August 6, 2018	Update a record in an AWS Route 53 hosted zone (DNS server)	calls of ec2metadata will fail if the AWS user data contains strings like "local-ipv4". This can happen in specific AWS Quickstart implementations Bug 1106706 - AWS: Route 53 agent has fixes implemented upstream

There is an ongoing discussion about updating the agents. Here are some experimental agents without any SUSE support.

Experimental ClusterLabs agent
Name	location in SLES file system	Github sources	as of Github commit	Comment	Shortcomings
Move Overlay IP	/usr/lib/ocf/resource.d/suse/aws-vpc-move-ip	...soon here...	.	Reassign an AWS Overlay IP address in a routing table	New monitoring doesn't work when a cluster node rejoins a cluster. Use the old monitoring mode by adding the parametermonapi="true" to the primitive. Monitoring function got updated. New mode works. No parameter needed
Route 53	/usr/lib/ocf/resource.d/heartbeat/aws-vpc-route53	aws-vpc-route53	319ba06 on 2 Jul, 2018	Update a record in an AWS Route 53 hosted zone (DNS server)	calls of ec2metadata will fail if the AWS user data contains strings like "local-ipv4". This can happen in specific AWS Quickstart implementations. The implementation ofec2metadata has been replaced with a more specific implementation

Stefan Schneider Wed, 08/29/2018 - 15:20

1238 views

SLES HAE Cluster Tests with Netweaver on AWS

This is an example of tests to be performed with a SLES HAE HANA cluster.

Anyone will want to execute these tests before going into production.

No.	Topic	Expected behavior
1.0	Set a node on standby/offline Set a node on standby by means of Pacemaker Cluster Tools (“crm node standby”).	The cluster stops all managed resources on the standby node (master resources will be migrated / slave resources will just stop)
1.1	Set <nodenameA> to standby.	Time until all managed resources were stopped / migrated to the other node: XX sec
1.2	Set <nodenameB> to standby	Time until all managed resources were stopped / migrated to the other node: XX sec
2.0	Switch off cluster node A Power-off the EC2 instance (hard / instant stop of the VM).	The cluster notices that a member node is down. The remaining node makes a STONITH attempt to verify that the lost member is really offline. If STONITH is confirmed the remaining node takes over all resources.
2.1	Failover time of ASCS / HANA primary	XXX sec.
3	Switch off cluster node B Power-off the EC2 instance (hard / instant stop of the VM).	The cluster notices that a member node is down. The remaining node makes a STONITH attempt to verify that the lost member is really offline. If STONITH is confirmed the remaining node takes over all resources.
3.1	Failover time of ASCS / HANA primary	XXX sec.
4	un-plug network connection (Split Brain) The cluster communication over the network is down.	Both nodes detect the split brain scenario and try to fence each other (using the AWS STONITH agent). One node shuts down – the other will take over all resources Failovertime: XXX sec
5	Failure (crash) of ASCS instance The processes of the SAP instance are killed via OS command: ps -ef \| grep ASCS \| awk ‘{print $2}’ \| xargs kill -9	The cluster notices the problem and promotes the ERS instance to ASCS while keeping all locks from the ENQ replication table. ASCS Failover time: XXX sec
6	Failure of ERS instance The processes of the SAP instance are killed via OS command: ps –ef \| grep ERS \| awk ‘{print $2}’ \| xargs kill -9	The cluster notices the problem and restarts the ERS instance. Time until ERS got restarted on same node: XX sec
7	Failure of HANA primary	Time until HANA DB is available again: XXX sec
8	Failure of corosync Kill corosync cluster deamon “kill -9 “ on one node.	The node without corosync is fenced by the remaining node (since it appears down). The remaining node makes a STONITH attempt to verify that the lost member is really offline. If STONITH is confirmed the remaining node takes over all resources. Failover of all managed resources: xxx sec

Keep logfiles of all relevant resources to prove functionality. For instance after ASCS failover keep a copy of /usr/<SID>/ASCS<nr>/work/dev_enqserver. This logfile should list that an ENQ replication table was found in memory and that all locks got copied into the new ENQ table. Customers may request to aquire ENQ locks before the failover test and then check the status of those locks after successful failover (please document with screenshots of SM12 on both nodes before and after failover).

Keep corosync / cluster log of all actions taken during failover tests.

Ask customer for additional failover tests / requirements / scenarios he would like to cover.

Have customer sign the protocol (!) acknowledging that all tested failover scenarios worked as expected.

Remind customer to regularly re-test all failover scenarios if SAP / OS / cluster configuration changed or patches were applied.

Stefan Schneider Mon, 11/06/2017 - 16:54

2141 views

Testing SLES clusters with SAP HANA Database

The following three tests should be done before a HANA DB cluster is taken into production.
The tests will use all configured components.

Stefan Schneider Wed, 09/19/2018 - 17:06

1403 views

Primary HANA servers becomes unavailable

Simulated Failures

Instance failures. The primary HANA instance is crashed or not anymore reachable through the network
Availability zone failure.

Components getting tested

EC2 stoneith agent
HANA agent
Overlay IP agent
Optional: Route 53 agent if it is configured

Approach

Have a correctly working HANA DB cluster
Shutdown eth0 on the instance to isolate
The cluster will shutdown the node
The cluster will failover the HANA database
The cluster will not restart the failed node

Intial Configuration

Check whether the overlay IP address gets hosted on the interface eth0 on the first node:

hana01:/var/log # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca:c9ff:feca:a652/64 scope link 
       valid_lft forever preferred_lft forever

Check the cluster status as super user with the command crm status:

hana01:/var/log # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 12:37:53 2018
Last change: Tue Sep 11 12:37:53 2018 by root via crm_attribute on hana012 nodes configured
6 resources configured
Online: [ hana01 hana02 ]
Full list of resources:
res_AWS_STONITH	(stonith:external/ec2):	Started hana01
 res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started hana01
 Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana01 hana02 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana01 ]
     Slaves: [ hana02 ]

The AWS console shows that both nodes are running:

Screenshot two running nodes

Damage the Instance

There are two ways to "damage" an instance

Corrupt Kernel

Become super user on the master HANA node.

Issue the command:

echo 'b' > /proc/sysrq-trigger

Isolate Instance

Become super user on the master HANA node.

Issue the command:

$ ifdown eth0

The current session will now hang. The system will not be able to communicate with the network anymore.

SUSE has a recommendation to do the isolation with firewalls and IP tables.

Monitor Fail Over

Expect the following in a correct working cluster:

The second node will fence the first node. This means it will force a shutdown through AWS CLI commands
The first node will be stopped
The second node will take over the Overlay IP address and it will host the Hana database.

The cluster will now switch the master node and the slave node.

Monitor progress from the healthy node!

The first node gets reported being offline:

hana02:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 13:18:21 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 1537362888 offline logreplay hana02 WDF sync hana01
hana02 PROMOTED 1537363101 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

hana02:/home/ec2-user # crm_mon -1rfn

Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 13:18:52 2018
Last change: Wed Sep 19 13:18:21 2018 by root via crm_attribute on hana02

2 nodes configured
6 resources configured

Node hana01: OFFLINE
Node hana02: online
rsc_SAPHana_HDB_HDB00 (ocf::suse:SAPHana): Slave
rsc_SAPHanaTopology_HDB_HDB00 (ocf::suse:SAPHanaTopology): Started
res_AWS_IP (ocf::heartbeat:aws-vpc-move-ip): Started

Inactive resources:

res_AWS_STONITH (stonith:external/ec2): Stopped
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
Started: [ hana02 ]
Stopped: [ hana01 ]
Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
Slaves: [ hana02 ]
Stopped: [ hana01 ]

Migration Summary:
* Node hana02:
res_AWS_STONITH: migration-threshold=5000 fail-count=1 last-failure='Wed Sep 19 13:18:00 2018'

Failed Actions:
* res_AWS_STONITH_monitor_120000 on hana02 'unknown error' (1): call=-1, status=Timed Out, exitreason='none',
last-rc-change='Wed Sep 19 13:18:00 2018', queued=0ms, exec=0ms

The AWS console will now show that the second node has been fencing the first node. It gets shut down:

Screenshot node gets shut won

The second node will wait until the first node is shut down. The AWS console will look like:

First node being shut down

The cluster will now promote the instance on the second node to be the primary instance:

hana02:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 13:19:14 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 1537362888 offline logreplay hana02 WDF sync hana01
hana02 PROMOTED 1537363154 online logreplay hana01 4:P:master1:master:worker:master 100 ROT sync PRIM 2.00.030.00.1522209842 hana02

The cluster status will be the following:

hana02:/home/ec2-user #  crm_mon -1rfn
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 13:19:16 2018
Last change: Wed Sep 19 13:19:14 2018 by root via crm_attribute on hana022 nodes configured
6 resources configured
Node hana01: OFFLINE
Node hana02: online
rsc_SAPHana_HDB_HDB00	(ocf::suse:SAPHana):	Master
res_AWS_STONITH	(stonith:external/ec2):	Started
rsc_SAPHanaTopology_HDB_HDB00	(ocf::suse:SAPHanaTopology):	Started
res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started
Inactive resources:
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana02 ]
     Stopped: [ hana01 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana02 ]
     Stopped: [ hana01 ]
Migration Summary:
* Node hana02:
   res_AWS_STONITH: migration-threshold=5000 fail-count=1 last-failure='Wed Sep 19 13:18:00 2018'
Failed Actions:
* res_AWS_STONITH_monitor_120000 on hana02 'unknown error' (1): call=-1, status=Timed Out, exitreason='none',
    last-rc-change='Wed Sep 19 13:18:00 2018', queued=0ms, exec=0ms

Check whether the overlay IP address gets hosted on the eth0 interface of the second node. Example:

hana02:/tmp # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 06:4f:41:53:ff:76 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.129/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::44f:41ff:fe53:ff76/64 scope link 
       valid_lft forever preferred_lft forever

Last step: Clean up the message on the second node:

hana02:/home/ec2-user # crm resource cleanup res_AWS_STONITH hana02
Cleaning up res_AWS_STONITH on hana02, removing fail-count-res_AWS_STONITH
Waiting for 1 replies from the CRMd. OK
hana02:/home/ec2-user # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 13:20:44 2018
Last change: Wed Sep 19 13:20:34 2018 by hacluster via crmd on hana022 nodes configured
6 resources configured
Online: [ hana02 ]
OFFLINE: [ hana01 ]
Full list of resources:
res_AWS_STONITH	(stonith:external/ec2):	Started hana02
 res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started hana02
 Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana02 ]
     Stopped: [ hana01 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana02 ]
     Stopped: [ hana01 ]

Recovering the Cluster

Restart your stopped node. See:

Starting first node

Check whether the cluster services get started

Check whether the first node becomes a replicating server

See:

hana02:/home/ec2-user # SAPHanaSR-showAttr;
Global cib-time
--------------------------------
global Wed Sep 19 13:57:41 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 DEMOTED 30 online logreplay hana02 4:S:master1:master:worker:master 100 WDF sync SOK 2.00.030.00.1522209842 hana01
hana02 PROMOTED 1537365461 online logreplay hana01 4:P:master1:master:worker:master 150 ROT sync PRIM 2.00.030.00.1522209842 hana02

Stefan Schneider Tue, 09/11/2018 - 14:50

2613 views

Secondary HANA server becomes unavailable

Simulated Failures

Instance failures. The secondary HANA instance is crashed or not anymore reachable through the network
Availability zone failure.

Components getting tested

EC2 stoneith agent
HANA agent
Overlay IP agent
Optional: Route 53 agent if it is configured

Approach

Have a correctly working HANA DB cluster
Shutdown eth0 on the secondary server to isolate the server
The cluster will shutdown the the secondary node
The cluster will keep the primary node running without replication
The cluster will not restart the failed node

Intial Configuration

Check whether the overlay IP address gets hosted on the interface eth0 on the first node:

hana01:/var/log # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca:c9ff:feca:a652/64 scope link 
       valid_lft forever preferred_lft forever

Check the cluster status as super user with the command crm status:

hana01:/var/log # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 12:37:53 2018
Last change: Tue Sep 11 12:37:53 2018 by root via crm_attribute on hana012 nodes configured
6 resources configured
Online: [ hana01 hana02 ]
Full list of resources:
res_AWS_STONITH	(stonith:external/ec2):	Started hana01
 res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started hana01
 Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana01 hana02 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana01 ]
     Slaves: [ hana02 ]

Status of HANA replication:

hana01:/home/ec2-user # SAPHanaSR-showAttr

Global cib-time
--------------------------------
global Wed Sep 19 14:23:11 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537366980 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

The AWS console shows that both nodes are running:

Screenshot two running nodes

Damage the Instance

There are two ways to "damage" an instance

Corrupt Kernel

Become super user on the secondary HANA node.

Issue the command:

echo 'b' > /proc/sysrq-trigger

Isolate secondary Instance

Become super user on the secondary HANA node.

Issue the command:

$ ifdown eth0

The current session will now hang. The system will not be able to communicate with the network anymore.

SUSE has a recommendation to do the isolation with firewalls and IP tables.

Monitor Fail Over

Expect the following in a correct working cluster:

The first node will fence the second node. This means it will force a shutdown through AWS CLI commands
The second node will be stopped
The first node will remain the master node of the HANA database.
There is no more replication!

Monitor progress from the master node!

The first node gets reported being offline:

hana01:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 14:24:13 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537367044 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 offline logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

The cluster will figure out that the secondary node is in an unclean state

hana01:/home/ec2-user # crm_mon -1rfn
Stack: corosync
Current DC: hana01 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 14:24:26 2018
Last change: Wed Sep 19 14:24:13 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured
Node hana01: online
rsc_SAPHana_HDB_HDB00	(ocf::suse:SAPHana):	Master
res_AWS_STONITH	(stonith:external/ec2):	Started
rsc_SAPHanaTopology_HDB_HDB00	(ocf::suse:SAPHanaTopology):	Started
res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started
Node hana02: UNCLEAN (offline)
res_AWS_STONITH	(stonith:external/ec2):	Started
rsc_SAPHanaTopology_HDB_HDB00	(ocf::suse:SAPHanaTopology):	Started
rsc_SAPHana_HDB_HDB00	(ocf::suse:SAPHana):	Slave
Inactive resources:
Migration Summary:
* Node hana01:

The AWS console will now show that the master node has been fencing the secondary node node. It gets shut down:

Screenshot node gets shut down

The master node will wait until the secondary node is shut down. The AWS console will look like:

Secondary node being shut down

The cluster will now reconfigure it HANA configuration. The cluster knows that the node is offline and replication has been stopped:

hana01:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 14:24:13 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537367044 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 30 offline logreplay hana01 ROT sync hana02

The cluster status is the following:

hana01:/home/ec2-user # crm_mon -1rfn
Stack: corosync
Current DC: hana01 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Wed Sep 19 14:27:05 2018
Last change: Wed Sep 19 14:24:13 2018 by root via crm_attribute on hana01

2 nodes configured
6 resources configured
Node hana01: online
rsc_SAPHana_HDB_HDB00	(ocf::suse:SAPHana):	Master
res_AWS_STONITH	(stonith:external/ec2):	Started
rsc_SAPHanaTopology_HDB_HDB00	(ocf::suse:SAPHanaTopology):	Started
res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started
Node hana02: OFFLINE
Inactive resources:
Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana01 ]
     Stopped: [ hana02 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana01 ]
     Stopped: [ hana02 ]
Migration Summary:
* Node hana01:
   res_AWS_STONITH: migration-threshold=5000 fail-count=1 last-failure='Wed Sep 19 14:26:17 2018'
Failed Actions:
* res_AWS_STONITH_monitor_120000 on hana01 'unknown error' (1): call=-1, status=Timed Out, exitreason='none',
    last-rc-change='Wed Sep 19 14:26:17 2018', queued=0ms, exec=0ms

Check whether the overlay IP address gets hosted on the eth0 interface of the master node. Example:

hana01:/home/ec2-user # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca:c9ff:feca:a652/64 scope link
       valid_lft forever preferred_lft forever

Recovering the Cluster

Restart your stopped node.
Check whether the cluster services get started
Check whether the first node becomes a replicating server

See:

hana01:/home/ec2-user # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Wed Sep 19 14:59:15 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1537369155 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

Stefan Schneider Wed, 09/19/2018 - 16:12

2162 views

Takeover a HANA DB through killing the Database

Simulated Failures

Database failures. The database is not working as expected

Components getting tested

HANA agent
Overlay IP agent
Optional: Route 53 agent if it is configured

Approach

Have a correctly working HANA DB cluster
Kill database
The cluster will failover the database without fencing the node

Intial Configuration

Check whether the overlay IP address gets hosted on the interface eth0 on the first node:

hana01:/var/log # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 02:ca:c9:ca:a6:52 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.115/24 brd 10.0.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca:c9ff:feca:a652/64 scope link 
       valid_lft forever preferred_lft forever

Check the cluster status as super user with the command crm status:

hana01:/var/log # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 12:37:53 2018
Last change: Tue Sep 11 12:37:53 2018 by root via crm_attribute on hana012 nodes configured
6 resources configured
Online: [ hana01 hana02 ]
Full list of resources:
res_AWS_STONITH	(stonith:external/ec2):	Started hana01
 res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started hana01
 Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana01 hana02 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana01 ]
     Slaves: [ hana02 ]

Kill Database

hana01 is the node with the leading HANA database.

The failover will only work if the re-syncing of the slave node is completed. Check this through the command . Example:

hana02:/tmp # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Tue Sep 11 09:11:16 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
-----------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 PROMOTED 1536657075 online logreplay hana02 4:P:master1:master:worker:master 150 WDF sync PRIM 2.00.030.00.1522209842 hana01
hana02 DEMOTED 30 online logreplay hana01 4:S:master1:master:worker:master 100 ROT sync SOK 2.00.030.00.1522209842 hana02

The synchronisation state (colum sync_state) of the slave node has to be SOK.

Become HANA DB user and execute the following command:

hdbadm@hana01:/usr/sap/HDB/HDB00> HDB kill
killing HDB processes:
kill -9 462 /usr/sap/HDB/HDB00/hana01/trace/hdb.sapHDB_HDB00 -d -nw -f /usr/sap/HDB/HDB00/hana01/daemon.ini pf=/usr/sap/HDB/SYS/profile/HDB_HDB00_hana01
kill -9 599 hdbnameserver
kill -9 826 hdbcompileserver
kill -9 828 hdbpreprocessor
kill -9 1036 hdbindexserver -port 30003
kill -9 1038 hdbxsengine -port 30007
kill -9 1372 hdbwebdispatcher
kill orphan HDB processes:
kill -9 599 [hdbnameserver] <defunct>
kill -9 1036 [hdbindexserver] <defunct>

Monitoring Fail Over

The cluster will now switch the master node and the slave node. The failover will be completed when the HANA database on the first node has been synchronized as well

hana02:/tmp # SAPHanaSR-showAttr
Global cib-time
--------------------------------
global Tue Sep 11 09:20:38 2018

Hosts clone_state lpa_hdb_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost
---------------------------------------------------------------------------------------------------------------------------------------------------------------
hana01 DEMOTED 30 online logreplay hana02 4:S:master1:master:worker:master -INFINITY WDF sync SOK 2.00.030.00.1522209842 hana01
hana02 PROMOTED 1536657638 online logreplay hana01 4:P:master1:master:worker:master 150 ROT sync PRIM 2.00.030.00.1522209842 hana02

Check the cluster status as super user with the command cluster status. Example

hana02:/tmp # crm status
Stack: corosync
Current DC: hana02 (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Tue Sep 11 09:28:10 2018
Last change: Tue Sep 11 09:28:06 2018 by root via crm_attribute on hana022 nodes configured
6 resources configured
Online: [ hana01 hana02 ]
Full list of resources:
res_AWS_STONITH	(stonith:external/ec2):	Started hana01
 res_AWS_IP	(ocf::heartbeat:aws-vpc-move-ip):	Started hana02
 Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]
     Started: [ hana01 hana02 ]
 Master/Slave Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00]
     Masters: [ hana02 ]
     Slaves: [ hana01 ]
Failed Actions:
* rsc_SAPHana_HDB_HDB00_monitor_61000 on hana01 'not running' (7): call=273, status=complete, exitreason='none',
    last-rc-change='Tue Sep 11 09:18:47 2018', queued=0ms, exec=1867ms
* res_AWS_IP_monitor_60000 on hana01 'not running' (7): call=264, status=complete, exitreason='none',
    last-rc-change='Tue Sep 11 08:57:15 2018', queued=0ms, exec=0ms

All resources are started. The overlay IP addres is now hosted on the second node. Delete the failed actions with the command:

hana02:/tmp # crm resource cleanup rsc_SAPHana_HDB_HDB00
Cleaning up rsc_SAPHana_HDB_HDB00:0 on hana01, removing fail-count-rsc_SAPHana_HDB_HDB00
Cleaning up rsc_SAPHana_HDB_HDB00:0 on hana02, removing fail-count-rsc_SAPHana_HDB_HDB00
Waiting for 2 replies from the CRMd.. OK
hana02:/tmp # crm resource cleanup res_AWS_IP
Cleaning up res_AWS_IP on hana01, removing fail-count-res_AWS_IP
Cleaning up res_AWS_IP on hana02, removing fail-count-res_AWS_IP
Waiting for 2 replies from the CRMd.. OK

The crm status command will not show anymore the failures.

Check whether the overlay IP address gets hosted on the eth0 interface of the second node. Example:

hana02:/tmp # ip address list eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 06:4f:41:53:ff:76 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.129/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.10.21/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::44f:41ff:fe53:ff76/64 scope link 
       valid_lft forever preferred_lft forever

Stefan Schneider Tue, 09/11/2018 - 11:44

2814 views

SteelEye Protection Suite for Linux 8 & 9

Resources

SIOS Protection Suite for SAP (data sheet)
Matrix supported operating systems
SAP note 1662610: Support details for SIOS Protection Suite for Linux
Recorded Demo of a SIOS SAP Central System with Sybase Failover
SIOS Technical documentation:
Step-By-Step: How to configure a Linux failover cluster in Amazon EC2 without shared storage

Stefan Schneider Mon, 02/15/2016 - 12:44

2896 views

RHEL related Topics for SAP Installations on AWS

RHEL related Topics for SAP Installations on AWS Stefan Schneider Fri, 07/21/2017 - 11:15

1690 views

Change Hostname on RHEL 7.x for SAP Installations on AWS

SAP systems require hostnames which aren't longer than 13 characters. The default AWS naming schema is to use the IP address separated with dashes to create hostnames. This naming schema can lead to host names with are to long for SAP installations.

The fix is based on the following assumptions:

The SAP system is being operated in a VPC with it's network interface
The IP address is a private one.
No DNS or NIS naming in the clients have to be used

The following procedure renames a system to node1

RHEL 7.x

Change the content of file /etc/HOSTNAME to node1. This entry will be used to set the host name for future reboots
Edit the file /etc/cloud/cloud.cfg
1. Add the line preserve_hostname: true at the beginning. This entry will be used in the next reboot to determine whether the hostname should be left as it is.
Edit the file /etc/hosts
1. Add node1 to the primary IP address like in this example "10.79.7.92 ip-10-79-7-92 node1"
Set the host name with the command "$ hostname node1". This command performs a dynamic change. It's effect will not last beyond a reboot.

Stefan Schneider Fri, 07/21/2017 - 11:19

3025 views

SAP Cloud Appliance Library

SAP has a rich collection of preconfigured SAP systems to be tun in the Amazon Web Services (AWS) cloud. This collection is called the SAP Cloud Appliance Library.

Main landing page
- SAP Cloud Appliance Library: FAQ
- Documentation (Registration and authentication required)
  - The responsibility model (Key Concepts)
Solutions offered
SAP support component: BC-VCM-CAL

Stefan Schneider Mon, 07/07/2014 - 09:32

3327 views

SAP Notes related to Amazon Web Services (AWS)

Readers will need the appropriate SAP authorizations to access the pages in the SAP support system

SAP Notes
SAP Note	Title	last known update	Comment
500235	Network Diagnosis with NIPING	April 08, 2014	Checking latencies in between AWS Als...
560499	Customer Interaction Center: Hotline Numbers & E-mail Addresses	June 12, 2017	How to open SAP tickets...
1380654	SAP support in public cloud environments	Dec. 6, 2012	Provides a general introduction to cloud and cloud service categories. Lists AWS as only supported IAAS provider (as of Apr. 9, 2014)
1588667	SAP on AWS: Overview of related SAP Notes and Web-Links	July 30, 2015	How to pick the right Linux AMIs
1618572	Linux: Support Statement for RHEL on Amazon Web Services	Jan. 10, 2014
1618590	Support: Oracle database on Amazon Web Services	Jan. 10, 2014	Oracle support for productive and non productive SAP systems on AWS platform
1656099	SAP Applications on AWS: Supported DB/OS and AWS EC2 products	June 17, 2016	Supported EC2 instances, databases, SAP products
1656250	SAP on AWS: Support prerequisites	Aug. 7, 2014	Explains license requirements, support contracts required, AWS specific data collector required .
1697114	Determining hardware ID in Amazon clouds	Mar. 26, 2012
1758890	SAP HANA: Information needed by Product/ Development Support	June 20, 2014	Information needed to open an incident at SAP
1788665	SAP HANA Support for virtualized / partitioned (multi-tenant) environments	May 5, 2015
1798212	Support for SAP HANA One	Dec. 12, 2012	Explains special peer community support mode for this product
1838364	Performance and CPU Affinity	Mar. 10, 2014	Explains how to map CPUs for best performance
1964437	SAP HANA on AWS: Supported AWS EC2 products	Jun. 17, 2016	This note is basically retired. It points to the SAP HANA Hardware Directory
2058870	SAP Business One, version for SAP HANA on public Infrastructure-as-a-Service (IaaS) platforms	Aug. 26, 2014	Explains that B1 is supported on AWS EC2
2198693	Key Monitoring Metrics for SAP on Amazon Web Services (AWS)	July 29, 2015	Details the AWS specific metrics gathered for EC2 systems running SAP
2240028	SAP Host Agent Patches specific to Linux	Jan., 19, 2018	Documents the SAP host agent which had a problem without AWS data provider
2288345	EIM Applications on Amazon Web Services (AWS)	March, 3rd, 2016	DS, Data Services Support
2302728	Supported scenarios with NEC Expresscluster on Amazon Web Services	August, 24th, 2016
2309342	SUSE Linux Enterprise High Availability Extension auf AWS	June, 29th, 2016	All AWS specific information to setup a SUSE HAE Cluster for HANA
2358420	Oracle Database Support for Amazon Web Services EC2	Aug., 23th, 2016	All AWS specific information to setup Oracle RDBMS on Oracle Linux
2449062	Error getting the hardware key on Amazon AWS server	March 29, 2017	.
2646715	SAP GUI Terminal Virtualization with Amazon AppStream 2.0	June first, 2018	.
2772496	AWS File Systems EFS and FSx for SAP Solutions	March 2019	.

HANA Sizing, Limits, Operations and Patches

SAP Notes
SAP Note	Title	last known update	Comment
	SAP Quicksizer
1514966	General HANA Sizing	May 7, 2014	General HANA Sizing
2382421	Optimizing the Network Configuration on HANA- and OS-Level	September 12, 2017	HANA Tuning
1514967	SAP HANA: Central Note	Jan 22, 2016	Recommendation for 10GB interface etc.
1651055	Scheduling SAP HANA Database Backups in Linux	Nov. 27, 2014
1736976	Sizing Report for BW on HANA	May 5, 2014	The note details the requirements for existing SAP BW users who want to migrate to HANA
1781986	Business Suite on SAP HANA Scale Out	Dec. 12, 2013	.
1793345	Sizing for SAP Suite on HANA	Apr. 7, 2015	.
1825774	SAP Business Suite Powered by SAP HANA - Multi Node Support	Feb. 27, 2014	.
1840954	Alerts related to HANA memory consumption	Feb 12, 2014	.
1872170	Suite on HANA Memory Sizing Report	June 6, 2013	Determine your memory space requirements on a HANA system
1963779	HANA row store limits	Aug. 14, 2014	Maximum limits depending on service pack
1984422	SAP HANA: Analysis of Out-of-memory (OOM) Dumps	May 5, 2015	.
2057595	FAQ: SAP HANA High Availability	January 2nd, 2017	.
2001528	Linux: SAP HANA Database SPS 08 revision 80 (or higher) on RHEL 6 or SLES 11	July 6, 2014	Details the glib C++ package update which is required
2205917	SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12	May 5, 2016	.
2235581	SAP HANA: Supported Operating Systems	Oct. 23, 2017	.
2205917	SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12	May 5, 2016	.
2455582	Linux: Running SAP applications compiled with GCC 6.x	Apr. 4, 2018	.

General Purpose SAP Notes

SAP Notes
SAP Note	Title	last known update	Comment
212876	SAPCAR, The SAP archiving tools	April 4, 2011	The note explains where to find the tool which allows to decompress all SAP downloads
1275776	Linux: Preparing SLES for SAP environments	Nov. 26, 2013	All SLES related system settings
1825774	SAP Business Suite Powered by SAP HANA - Multi-Node Support	Feb. 28, 2013	The note explains the support status of scale out configurations for SAP HANA Business Suite Solutions

Stefan Schneider Mon, 04/07/2014 - 11:07

9409 views

SAP related AWS technical White Papers

Amazon Web Services has a SAP micro site from which they reference as well the SAP related publications.

SAP Product	White Paper	Last Update	Size	Summary
HANA	Setting up AWS Resources and the SLES Operating System for SAP HANA Installation	March 2015	36 pages	Set up guide for the SAP HANA. The documents discusses all aspects of a SAP HANA installation on SLES like security, network and disk related requirements.
HANA	SAP HANA on the Amazon Web Services Cloud: Quick Start Reference Deployment	July 2014	27 pages	This document documents the fully automated installation of scale up or scale out HANA systems on AWS.
HANA	SAP HANA on AWS Implementation and Operations Guide	Feb. 2014	38 pages	The document discusses all aspects of operating the SAP HANA database on AWS. It covers aspects like backup, support, security, administration, architecture and high availability.
General	Implementing SAP Solutions on Amazon Web Services	April 2013	28 pages	This documents covers: planning of installations, licensing, AWS architecture, EC2 instance types for SAP, sizing and performance
General	SAP on AWS Operations Guide	Feb. 2013	19 pages	Discussion of AWS specific SAP topics like image cloning, SAP patching, trouble shooting, on premises printing, system copies etc.
General	SAP on Amazon Web Services High Availability Guide	Dec. 2014	29 pages	Discussion of Windows and Linux related AWS architectures and implementations for SAP applications
General	SAP on Amazon Web Services Backup and Recovery Guide	Dec. 2014	20 pages	Discussion of backup and recovery for production and non production systems. Covers the relevant operating systems and database products
General	AWS Data Provider for SAP	March. 2015	28 pages	Setup and installation guide for the AWS SAP Data Provider which is required to gather AWS specific system information for the SAP monitoting utilities
General	VMS: TCO Study for SAP on AWS	Feb. 2013	27 pages	AWS references this document. The document got published by the VM AG
B1	SAP Business One version for SAP HANA on AWS Cloud Reference Sheet	April 2015	2 pages	Documents the key benefits of using B1 on AWS including sizing information for AWS
B1	SAP Business One, version for SAP HANA, on the AWS Cloud: Deployment Guide	Sept. 2014	15 pages	Document outlines step by step the deployment steps of B1 on AWS

Non AWS Publications

SAP Product	White Paper	Last Update	Summary
SA HANA Developer Edition	How to create a SAP HANA Developer Edition in the cloud	June 2014	Covers setup information for AWS and other cloud services
HANA	SAP HANA on AWS Certified	Feb. 2014	SAP blog entry about the support of SAP HANA on AWS
General	SAP on Amazon Web Services (AWS)	March 2015	SAP SCN article with supported SAP products on AWS
Netweaver 7.3	SAP Netweaver 7.3 on Amazon Cloud (RedHat 6 Install)	July 2013	31 one pages step by step installation guide from Thusjanthan Kubendranathan

Stefan Schneider Mon, 04/27/2015 - 10:08

4635 views

SUSE SLES related Topics

A number of tidbits needed when working with SUSE SLES.

Disclaimer:

Please be careful applying them.
They all need elevated privileges.
They may lower the security of your system.
They may render your system unusable.

Consult the appropriate documentation before you apply them and understand the implications.

yast bug in SLES for SAP 12 SP1 with AWS Elastic File System (EFS)

There is a bug in the SLE command line installation tool yast which may effect SAP customers using SLES for SAP 12 SP1 (suse-sles-sap-12-sp1-byos-v20160308-hvm-ssd-x86_64, ami-4a8fb520) on AWS in conjunction with the Elastic File System (EFS).

The Architecture

A customer uses EFS for shared SAP file systems like /sapmnt or /usr/sap. An AWS system may look before the installation of the SAP software with the command df -k as follows:

nw11:~ # df -k
Filesystem    1K-blocks        Used    Available        Use% Mounted on
/dev/hda1     103078876        3286120 95477452         4% /
devtmpfs      8222944          8       8222936          1% /dev
tmpfs         12347764         0       12347764         0% /dev/shm
tmpfs         8231840          9716    8222124          1% /run
tmpfs         8231840          0       8231840          0% /sys/fs/cgroup
10.79.8.181:/ 9007199254740992 0       9007199254740992 0% /usr/sap/SI1
10.79.8.15:/  9007199254740992 0       9007199254740992 0% /sapmnt/SI1

SLES reports two file systems which have 8 Exabyte free space where as nothing is getting used.

The Bug

I have been calling yast from the command line to install an X11 environment for the upcoming SAP Netweaver installation.

yast seems to be overhelmed by the capacity of 8 Exabyte in these two additional file systems. It seems to have an integer overrun and thinks that there isn't enough disk space. It'll report the following and irrelevant message:

last error with EFS file systems, screen shot 1

You will want to continue by pushing the button [Continue anyway]. The installation will happen on the root file system and not on the two NFS mounted EFS file systems.

Then yast will come up with the following dialog:

last error with EFS file systems, screen shot 2

Activate the option Do not Show This Message Again and move on with the option [Yes].

Avoiding the Problem

SUSE is processing this bug as: 991090 (yast sw_single reports "error out of diskspace" while filesystems with 8 ExaByte are mounted)

The problem can been avoided in the mean time through the three following options:

Install all SLES software through yast before you create the EFS file systems
Umount the EFS file systems before you use yast.
Overrule the warning. Move ahead. You will risk a full file system somewhere else

Stefan Schneider Thu, 07/21/2016 - 11:12

3952 views

Add swap space

Disclaimer: The following commands document how to add a raw device as swap volume. Selecting the wrong raw device will lead to data corruption in file systems!

All commands have to be executed with root privileges

Create a separate AWS volume with the required space. This warrants that there are no contentions in regards of maximum IOs with other volumes and the solution is price neutral
I assume that the swap volume is /dev/xvdg, Format and add the volume to swap:

$ mkswap /dev/xvdg
$ swapon /dev/xvdg

Make it persistent through reboots by adding the following line to /etc/fstab

/dev/xvdg     swap     swap defaults

Stefan Schneider Thu, 08/20/2015 - 10:24

4891 views

Allow User Access without Certificates (Password only)

AWS systems allow by default access with a certificate only. This is a security measure.

Administrators who decide to lower the security standards by allowing ssh access through user/password credentials on SUSE SLES have to execute the following commands:

Edit the /etc/ssh/sshd_config file.
- Change the entry "PasswordAuthentication no" to "PasswordAuthentication yes"
- Save the changes
Restart the sshd daemon with the command
- $ service sshd restart

Stefan Schneider Tue, 08/18/2015 - 12:19

3534 views

Change Hostname on SUSE SLES for SAP Installations on AWS

The fix is based on the following assumptions:

The SAP system is being operated in a VPC with it's network interface
The IP address is a private one.
No DNS or NIS naming in the clients have to be used

The following procedure renames a system to node1

SLES 11

Change the content of file /etc/HOSTNAME to node1. This entry will be used to set the host name in future reboots
Edit the file /etc/cloud/cloud.cfg
1. Modify the line preserve_hostname: false to preserve_hostname: true . This entry will be used in the next reboot to determine whether the hostname should be left as it is.
Edit the file /etc/hosts
1. Add node1 to the primary IP address like in this example "10.79.7.92 ip-10-79-7-92 node1"
Set the host name with the command "# hostname node1". This command performs a dynamic change. It's effect will not last beyond a reboot.
Configure the DHCP client not to configure the hostname
1. Enter the command yast lan
2. Move to entry Hostname/DNS (<tab> <arrow right>) and select it
3. set hostname to node1 in the host name field
4. deselect (remove x) from the entry set hostname dynamically
5. Save all settings and leave yast

SLES 12 & SLES 15

Edit the file /etc/cloud/cloud.cfg
- Modify the line preserve_hostname: false to preserve_hostname: true . This entry will be used in the next reboot to determine whether the hostname should be left as it is.
Edit the file /etc/hosts
- Add node1 to the primary IP address like in this example "10.79.7.92 ip-10-79-7-92 node1"
Use command:
- ```
$ hostnamectl set-hostname node1
```

Stefan Schneider Tue, 12/09/2014 - 11:35

14278 views

Enable root Access for Linux Instances

AWS doesn't grant root access by default to EC2 instances. This is an important security best practise. Users are supposed to open a ssh connection using the secure key/pair to login as ec2-user. Users are supposed to use the sudo command as ec2-user to obtain elevated privileges.

Problems arise with a number of software packages which require remote root access for installation and operation. The following cheat sheet explains how to enable root access. It hasn't been tested with all Linux distributions.

Disclaimer: Enabling direct root access to EC2 systems is a bad security practise which AWS doesn't recommend. It creates vulnerabilities especially for systems which are facing the Internet (see AWS documentation).

Use these commands on your own risk. Understand the function of the commands and the related risks before you apply them.

All commands require root privileges which can be obtained through the sudo command.

Create a root Password

$ passwd root <the password>

Configure and Restart the ssh Service for root Access

Edit the configuration file /etc/ssh/sshd_config. Change the following to parameter to the values shown below:

PermitRootLogin yes
PasswordAuthentication yes

Restart the service with the command

$ service sshd reload

Patch the authorized Keys File for the root User

The simplest way is to use the ec2-user file and the certificate for the root user. Copy the ec2-user file over to the root user:

$ cp ~ec2-user/.ssh/authorized_keys ~root/.ssh/authorized_keys

This allows as well to login with the same key which is available for the ec2-user.

Update the AWS Cloud Configuration File

Edit the file /etc/cloud/cloud.cfg and change the following entry to this value:

disable_root false

Stefan Schneider Fri, 10/30/2015 - 14:03

73518 views

Important addition to this article

in /etc/cloud/cloud must set ssh_pwauth to true, otherwise, after reboot , or launch of EC2 from an AMI - root will fail to connect via SSH.

Installation of a Graphical Desktop with RDP Access for SUSE SLES 11, 12, 15 or Ubuntu

Some installations may require graphical tools to be operated on the target server.

Important dependencies

xrdp uses vnc
vnc uses X11 and window managers

Software Installation

AWS installations come by default without a GNOME desktop environment. The following commands will install a GNOME desktop and an xrdp service to connect to the systems:

SLES 11 & 12

$ sudo zypper install -t pattern gnome-basic

SLES 15

Use yast and the install pattern "Gnome basic"

$ sudo yast
- Select "Software", enter "tab"
- Select "Software Management", enter "cr"
- Move active field to "Filter Search", enter "Shift"+"tab"
- Use down keyboard key to unfold selection list
- Select "Patterns"
- Select "GNOME Desktop Environment (Basic)"
- Select "Accept"

Ubuntu

$ sudo apt update
$ sudo apt install ubuntu-desktop

Install xRDP

SUSE

$ sudo zypper install xrdp

UBUNTU

sudo apt install xrdp
sudo systemctl status xrdp
# Output
# xrdp.service - xrdp daemon...
sudo adduser xrdp ssl-cert
sudo systemctl restart xrdp

Enable VNC Remote Login

SUSE

$ sudo yast
Select " Network Services"
Select first entry "Remote Administration with VNC"
Enable service

SUSE: Configure Window Manager to use Gnome

Edit file /etc/sysconfig/windowmanager
Change entry DEFAULT_WM="" to DEFAULT_WM="gnome"

Ubuntu: Configure PolKit-Framework

sudo bash -c "cat >/etc/polkit-1/localauthority/50-local.d/45-allow.colord.pkla" <<EOF
[Allow Colord all Users]
Identity=unix-user:*
Action=org.freedesktop.color-manager.create-device;org.freedesktop.color-manager.create-profile;org.freedesktop.color-manager.delete-device;org.freedesktop.color-manager.delete-profile;org.freedesktop.color-manager.modify-device;org.freedesktop.color-manager.modify-profile
ResultAny=no
ResultInactive=no
ResultActive=yes
EOF

Startup the RDP service and make it start automatically after Reboot

These commands need to be executed with the sudo command from the ec2-user.

SLES 11

# service xrdp start
# chkconfig --set xrdp on

SLES 12 & 15, Ubuntu

# sudo systemctl start xrdp
# sudo systemctl enable xrdp

Stefan Schneider Tue, 08/18/2015 - 12:53

77812 views

software installation

I try this command it's nort answer.
But i try this : zypper install -t pattern gnome

it works!!!
thanks to make correction for SLES 15

harolpir@yahoo.fr

SUSE 12

Works perfectly, thank you so much !

SLES15 GNOME

Please add steps to install GNOME on SLES15:
1. Activate "Desktop Applications Module 15 SP3 x86_64"
2. sudo zypper in -t pattern gnome_basic

Register a Subscription in SLES ( and keep AWS CLI working!)

SLES 12 & 15

Use this command to register your system at SUSE.

# SUSEConnect -r <YourActivationCode> -e <YourEmailAddress>

More details can be found in the SUSE documentation.

SUSE BYOS AMIs on AWS do not tend to update their cloud module. Execute the following commands as super user to get this done:

SLES 12

# SUSEConnect --list-extensions
# SUSEConnect -p sle-module-public-cloud/12/x86_64

SLES 15

# SUSEConnect --list-extensions
# SUSEConnect -p sle-module-public-cloud/15/x86_64

The AWS CLI is an important part of this module. Updating it will allow you to use the latest services and new regions. Don't forget updating your packages with

# zypper update

Important
The AWS CLI will not work by default on SLES 15! The required patch for boto will only be installed if this repository is configured. See SUSE support document 7023686.

Important

The AWS CLI will not work by default on SLES 15! The required patch for boto will only be installed if this repository is configured.

See SUSE support document 7023686.

Stefan Schneider Sat, 01/14/2017 - 17:04

10845 views

Registering Repositories for AWS SuSE AMIs

SuSE SLES 11 and 12 AMIs use AWS specific repositories to install and update packages.

There are situations when SuSE systems aren't able to install new packages or update them because they lost their AWS repository configuration.

This problem can be fixed by issuing the following command as super user:

/usr/sbin/registercloudguest --force-new

Disclaimer: This command will perform major changes to your system. Handle it with care and consult the SuSE documentation upfront!

Stefan Schneider Fri, 06/17/2016 - 09:43

8155 views