28 Jun 2017

GlusterFS 3.8.13 update available, and 3.8 nearing End-Of-Life

The Gluster releases follow a 3-month cycle and, with alternating Short-Term-Maintenance and Long-Term-Maintenance versions. GlusterFS 3.8 is currently the oldest Long-Term-Maintenance release, and will become End-Of-Life with the GlusterFS 3.12 version. If all goes according to plan, 3.12 will get released in August and is the last 3.x version before Gluster 4.0 hits the disks.

There will be a few more releases in the GlusterFS 3.8 line, but users should start to plan an upgrade to a version that receives regular bugfix updates after August.

Release notes for Gluster 3.8.13

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9, 3.8.10, 3.8.11 and 3.8.12 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 13 patches have been merged, addressing 8 bugs:
  • #1447523: Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
  • #1449782: quota: limit-usage command failed with error " Failed to start aux mount"
  • #1449941: When either killing or restarting a brick with performance.stat-prefetch on, stat sometimes returns a bad st_size value.
  • #1450055: [GANESHA] Adding a node to existing cluster failed to start pacemaker service on new node
  • #1450380: GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers
  • #1450937: [New] - Replacing an arbiter brick while I/O happens causes vm pause
  • #1460650: posix-acl: Whitelist virtual ACL xattrs
  • #1460661: "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf

30 May 2017

Library of Ceph and Gluster reference architectures – Simplicity on the other side of complexity

The Storage Solution Architectures team at Red Hat develops reference architectures, performance and sizing guides, and test drives for Gluster- and Ceph-based solutions. We’re a group of architects who perform lab validation, tuning, and interoperability development for composable storage services with target workloads on optimized server and network configurations. We seek simplicity on the other side of complexity.

At the end of this blog entry is a full library of our current publications and test drives.

In our modern era, a top company asset is pivotability. Pivotability based on external market changes. Pivotability after unknowns become known. Pivotability after golden ideas become dark alleys. For most enterprises, pivotability requires a composable technology infrastructure for shifting resources to meet changing needs. Composable storage services, such as those provided by Ceph and Gluster, are part of many companies’ composable infrastructures.

Composable technology infrastructures are most frequently described by the following attributes:

  • Open source v. closed development.
  • On-demand architectures v. fixed architectures.
  • Commodity hardware v. proprietary appliances.
  • Cross-industry collaboration v. isolated single-vendor silos.

As noted in the following figure, a few companies with large staffs of in-house experts can create composable infrastructures from raw technologies. Their large investments in in-house expertise allows them to convert raw technologies into solutions with limited pre-integration by technology suppliers. AWS, Google, and Azure are all examples of DIY businesses. A larger number of other companies, also needing composable infrastructures, rely on technology suppliers and the community for solution pre-integration and guidance to reduce their in-house expertise costs. We’ll label them “Assisted DIY.” Finally, the majority of global enterprises lack the in-house expertise for deploying these composable infrastructures. They rely on public cloud providers and pre-packaged solutions for their infrastructure needs. We’ll call them “Pre-packaged.”


The reference architectures, performance and sizing guides, and test drives produced by our team are primarily focused on the “Assisted DIY” segment of companies. Additionally, we strive to make Gluster and Ceph composable storage services available to the “Pre-packaged” segment of companies by using what we learn to produce pre-packaged combinations of Red Hat software with partner hardware targeting specific workload use cases.

We enjoy our roles at Red Hat because of the many of you with whom we collaborate to produce value.  We hope you find these guides useful.

Team-produced with partner collaboration:

Partner-produced with team collaboration:

Pre-packaged solutions:

Hands-on test drives:

22 May 2017

Enjoy more bugfixes with GlusterFS 3.8.12

Like every month, there is an update for the GlusterFS 3.8 stable version. A few more bugfixes have been included in this release. Packages are already available for many distributions, some distributions might still need to promote the update from their testing repository to release, so hold tight if there is no update for your favourite OS yet.

Release notes for Gluster 3.8.12

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9, 3.8.10 and 3.8.11 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 13 patches have been merged, addressing 11 bugs:
  • #1440228: NFS Sub-directory mount not working on solaris10 client
  • #1440635: Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance
  • #1440810: Update rfc.sh to check Change-Id consistency for backports
  • #1441574: [geo-rep]: rsync should not try to sync internal xattrs
  • #1441930: [geo-rep]: Worker crashes with [Errno 16] Device or resource busy: '.gfid/00000000-0000-0000-0000-000000000001/dir.166 while renaming directories
  • #1441933: [Geo-rep] If for some reason MKDIR failed to sync, it should not proceed further.
  • #1442933: Segmentation fault when creating a qcow2 with qemu-img
  • #1443012: snapshot: snapshots appear to be failing with respect to secure geo-rep slave
  • #1443319: Don't wind post-op on a brick where the fop phase failed.
  • #1445213: Unable to take snapshot on a geo-replicated volume, even after stopping the session
  • #1449314: [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job always failed on win7-32/win2012/win2k8R2 guest

4 May 2017

2 May 2017

Struggling to containerize stateful applications in the cloud? Here’s how.

The newest release of Red Hat’s Reference Architecture “OpenShift Container Platform 3.5 on Amazon Web Services” now incorporates container-native storage, a unique approach based on Red Hat Gluster Storage to avoid lock-in, enable stateful applications, and simplify those applications’ high availability.

In the beginning, everything was so simple. Instead of going through the bureaucracy and compliance-driven process of requesting compute, storage, and networking resources, I would pull out my corporate credit card and register at the cloud provider of my choice. Instead of spending weeks forecasting the resource needs and costs of my newest project, I would get started in less than 1 hour. Much lower risk, virtually no capital expenditure for my newest pet project. And seemingly endless capacity—well, as long as my credit card was covered. If my project didn’t turn out to be a thing, I didn’t end up with excess infrastructure, either.

Until I found out that basically what I was doing was building my newest piece of software against a cloud mainframe. Not directly, of course. I was still operating on top of my operating system with the libraries and tools of my choice, but essentially I spend significant effort getting to that point with regards to orchestration and application architecture. And these are not easily ported to another cloud provider.

I realize that cloud providers are vertically integrated stacks, just as mainframes were. Much more modern and scalable with an entirely different cost structure—but, still, eventually and ultimately, lock-in.

Avoid provider lock-in with OpenShift Container Platform

This is where OpenShift comes in. I take orchestration and development cycles to a whole new level when I stop worrying about operating system instances, storage capacity, network overlays, NAT gateways, firewalls—all the things I need to make my application accessible and provide value.

Instead, I deal with application instances, persistent volumes, services, replication controllers, and build configurations—things that make much more sense to me as an application developer as they are closer to what I am really interested in: deploying new functionality into production. Thus, OpenShift offers abstraction on top of classic IT infrastructure and instead provides application infrastructure. The key here is massive automation on top of the concept of immutable infrastructure, thereby greatly enhancing the capability to bring new code into production.

The benefit is clear: Once I have OpenShift in place, I don’t need to worry about any of the underlying infrastructure—I don’t need to be aware of whether I am actually running on OpenStack, VMware, Azure, Google Cloud, or Amazon Web Services (AWS). My new common denominator is the interface of OpenShift powered by Kubernetes, and I can forget about what’s underneath.

Well, not quite. While OpenShift provides a lot of drivers for various underlying infrastructure, for instance storage, they are all somewhat different. Their availability, performance, and feature set is tied to the underlying provider, for instance Elastic Block Storage (EBS) on AWS. I need to make sure that critical aspects of the infrastructure below OpenShift are reflected in OpenShift topology. A good example are AWS availability zones (AZs): They are failure domains in a region across which an application instance should be distributed to avoid downtime in the event a single AZ is lost. So OpenShift nodes need to be deployed in multiple AZs.

This is where another caveat comes in: EBS volumes are present only inside a single AZ. Therefore, my application must replicate the data across other AZs if it uses EBS to store it.

So there are still dependencies and limitations a developer or operator must be aware of, even if OpenShift has drivers on board for EBS and will take care about provisioning.

Introducing container-native storage

With container-native storage (CNS), we now have a robust, scalable, and elastic storage service out-of-the-box for OpenShift Container Platform—based on Red Hat Gluster Storage. The trick: GlusterFS runs containerized on OpenShift itself. Thus, it runs on any platform that OpenShift is supported on—which is basically everything, from bare metal, to virtual, to private and public cloud.

With CNS, OpenShift gains a consistent storage feature set across, and independent of, all supported cloud providers. It’s deployed with native OpenShift / Kubernetes resources, and GlusterFS ends up running in pods as part of a DaemonSet:

[ec2-user@ip-10-20-4-55 ~]$ oc get pods
NAME              READY     STATUS    RESTARTS   AGE
glusterfs-0bkgr   1/1       Running   9          7d
glusterfs-4fmsm   1/1       Running   9          7d
glusterfs-bg0ls   1/1       Running   9          7d
glusterfs-j58vz   1/1       Running   9          7d
glusterfs-qpdf0   1/1       Running   9          7d
glusterfs-rkhpt   1/1       Running   9          7d
heketi-1-kml8v    1/1       Running   8          7d

The pods are running in privileged mode to access the nodes’ block device directly. Furthermore, for optimal performance, the pods are using host-networking mode. This way, OpenShift nodes are running a distributed, software-defined, scale-out file storage service, just as any distributed micro-service application.

There is an additional pod deployed that runs heketi—a RESTful API front end for GlusterFS. OpenShift natively integrates via a dynamic storage provisioner plug-in with this service to request and delete storage volumes on behalf of the user. In turn, heketi controls one or more GlusterFS Trusted Storage Pools.

Container-native storage on Amazon Web Services

The EBS provisioner has been available for OpenShift for some time. To understand what changes with CNS on AWS, a closer look at how EBS is accessible to OpenShift is in order.

  1. Dynamic provisioning
    EBS volumes are dynamically created and deleted as part of storage provisioning requests (PersistentVolumeClaims) in OpenShift.
  2. Local block storage
    EBS appears to the EC2 instances as a local block device. Once provisioned, it is attached to the EC2 instance, and a PCI interrupt is triggered to inform the operating system.
    NAME                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    xvda                                  202:0    0   15G  0 disk
    ├─xvda1                               202:1    0    1M  0 part
    └─xvda2                               202:2    0   15G  0 part /
    xvdb                                  202:16   0   25G  0 disk
    └─xvdb1                               202:17   0   25G  0 part
      ├─docker_vol-docker--pool_tmeta     253:0    0   28M  0 lvm
      │ └─...                             253:2    0 23.8G  0 lvm
      │   ├─...                           253:8    0    3G  0 dm
      │   └─...                           253:9    0    3G  0 dm
      └─docker_vol-docker--pool_tdata     253:1    0 23.8G  0 lvm
        └─docker_vol-docker--pool         253:2    0 23.8G  0 lvm
          ├─...                           253:8    0    3G  0 dm
          └─...                           253:9    0    3G  0 dm
    xvdc                                  202:32   0   50G  0 disk 
    xvdd                                  202:48   0  100G  0 disk

    OpenShift on AWS also uses EBS to back local docker storage. EBS storage is formatted with a local filesystem like XFS..

  3. Not shared storage
    EBS volumes cannot be attached to more than one EC2 instance. Thus, all pods mounting an EBS-based PersistentVolume in OpenShift must run on the same node. The local filesystem on top of the EBS block device does not support clustering either.
  4. AZ-local storage
    EBS volumes cannot cross AZs. Thus, OpenShift cannot failover pods mounting EBS storage into different AZs. Basically, an EBS volume is a failure domain.
  5. Performance characteristics
    The type of EBS storage, as well as capacity, must be selected up front. Specifically, for fast storage a certain minimum capacity must be requested to have a minimum performance level in terms of IOPS.

This is the lay of the land. While these characteristics may be acceptable for stateless applications that only need to have local storage, they become an obstacle for stateful applications.

People want to containerize databases, as well. Following a micro-service architecture where every service maintains its own state and data model, this request will become more common. The nature of these databases differs from the classic, often relational, database management system IT organizations have spent millions on: They are way smaller and store less data than their big brother from the monolithic world. Still, with the limitations of EBS, I would need to architect replication and database failover around those just to deal with a simple storage failure.

Here is what changes with CNS:

  1. Dynamic provisioning
    The user experience actually doesn’t change. CNS is represented like any storage provider in OpenShift, by a StorageClass. PersistentVolumeClaims (PVCs) are issued against it, and the dynamic provisioner for GlusterFS creates the volume and returns it as a PersistentVolume (PV). When the PVC is deleted, the GlusterFS volume is deleted, as well.
  2. Distributed file storage on top of EBS
    CNS volumes are basically GlusterFS volumes, managed by heketi. The volumes are built out of local block devices of the OpenShift nodes backed by EBS. These volumes provide shared storage and are mounted on the OpenShift nodes with the GlusterFS FUSE client.
    [ec2-user@ip-10-20-5-132 ~]$ mount
    ... on /var/lib/origin/openshift.local.volumes/pods/80e27364-2c60-11e7-80ec-0ad6adc2a87f/volumes/kubernetes.io~glusterfs/pvc-71472efe-2a06-11e7-bab8-02e062d20f83 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
  3. Container-shared storage
    Multiple pods can mount and write to the same volume. The access mode for the corresponding node is known as “RWX”—read-write many. The containers can run on different OpenShift nodes, and the dynamic provisioner will mount the GlusterFS volume on the right nodes accordingly. Then, this local mount directory is bind-mounted to the container.
  4. Cross-availability zone
    CNS is deployed across AWS AZs. The integrated, synchronous replication of GlusterFS will mirror every write 3 times. GlusterFS is deployed across OpenShift nodes in at least different AZs, and thus the storage is available in all zones. The failure of a single GlusterFS pod, an OpenShift node running the pod, or a block device accessed by the pod will have no impact. Once the failed resources come back, the storage is automatically re-replicated. CNS is actually aware of the failure zones as part of the cluster topology and will schedule new volumes, as well as recovery, so that there is no single point of failure.
  5. Predictable performance
    CNS storage performance is not tied to the size of storage request by the user in OpenShift. It’s the same performance whether 1GB or 100GB PVs are requested.
  6. Storage performance tiers
    CNS allows for multiple GlusterFS Trusted Storage Pools to be managed at once. Each pool consists of at least 3 OpenShift nodes running GlusterFS pods. While the OpenShift nodes belong to a single OpenShift cluster, the various GlusterFS pods form their own Trusted Storage Pools. An administrator can use this to equip the nodes with different kinds of storage and offer their pools with CNS as distinct storage tiers in OpenShift, via its own StorageClass. An administrator instance might, for example, run CNS on 3 OpenShift nodes with SSD (e.g., EBS gp2) storage and call it “fast,” whereas another set of OpenShift nodes with magnetic storage (e.g., EBS st1) runs a separate set of GlusterFS pods as an independent Trusted Storage Pool, represented with a StorageClass called “capacity.”

This is a significant step toward simplifying and abstracting provider infrastructure. For example, a MySQL database service running on top of OpenShift is now able to survive the failure of an AWS AZ, without needing to set up MySQL Master-Slave replication or change the micro-service to replicate data on its own.

Storage provided by CNS is efficiently allocated and provides performance with the first Gigabyte provisioned, thereby enabling storage consolidation. For example, consider six MySQL database instances, each in need of 25 GiB of storage capacity and up to 1500 IOPS at peak load. With EBS, I would create six EBS volumes, each with at least 500 GiB capacity out of the gp2 (General Purpose SSD) EBS tier, in order to get 1500 IOPS guaranteed. Guaranteed performance is tied to provisioned capacity with EBS.
With CNS, I can achieve the same using only 3 EBS volumes at 500 GiB capacity from the gp2 tier and run these with GlusterFS. I would create six 25 GiB volumes and provide storage to my databases with high IOPS performance, provided they don’t peak all at the same time.

Doing that, I would halve my EBS cost and still have capacity to spare for other services. My read IOPS performance is likely even higher because in CNS with 3-way replication I would read from data distributed across 3×1500 IOPS gp2 EBS volumes.

Finally, the setup for CNS is very simple and can run on any OpenShift installation based on version 3.4 or newer.

This way, no matter where I plan to run OpenShift (i.e., which cloud provider currently offers lowest prices), I can rely on the same storage features and performance. Furthermore, the Storage Service grows with the OpenShift cluster but still provides elasticity. Only a subset of OpenShift nodes must run CNS, at least 3 ideally across 3 AZs.

Deploying container-native storage on AWS

Installing OpenShift on AWS is dramatically simplified based on the OpenShift on Amazon Web Services Reference Architecture. A set of Ansible playbooks augments the existing openshift-ansible installation routine and creates all the required AWS infrastructure automatically.

A simple python script provides a convenient wrapper to the playbooks found in the openshift-ansible-contrib repository on GitHub for deploying on AWS.

All the heavy lifting of setting up Red Hat OpenShift Container Platform on AWS is automated with best practices incorporated.

The deployment finishes with an OpenShift Cluster with 3 master nodes, 3 infrastructure nodes, and 2 application nodes deployed in a highly available fashion across AWS AZs. The external and internal traffic is load balanced, and all required network, firewall, and NAT resources are stood up.

Since version 3.5, the reference architecture playbooks now ship with additional automation to make deployment of CNS as easy. Through additional AWS CloudFormation templates and Ansible playbook tasks, the additional, required infrastructure is stood up. This mainly concerns provisioning of additional OpenShift nodes with an amended firewall configuration, additional EBS volumes, and then joining them to the existing OpenShift cluster.

In addition, compared to previous releases, the CloudFormation templates now emit more information as part of the output. These are picked up by the playbooks to further reduce the information needed from the administrator. They will simply get the right information from the existing CloudFormation stack to retrieve the proper integration points.

The result is AWS infrastructure ready for the administrator to deploy CNS. Most of the manual steps of this process can therefore be avoided. Three additional app nodes are deployed with configurable instance type and EBS volume type. Availability zones of the selected AWS region are taken into account.

Subsequent calls allow for provisioning of additional CNS pools. The reference architecture makes reasonable choices for the EBS volume type and the EC2 instance with a balance between running costs and initial performance. The only thing left for the administrator to do is to run the cns-deploy utility and create a StorageClass object to make the new storage service accessible to users.

At this point, the administrator can choose between labeling the nodes as regular application nodes or provide a storage-related label that would initially exclude them from the OpenShift scheduler for regular application pods.

Container-ready storage

The reference architecture also incorporates the concept of Container-Ready Storage (CRS). In this deployment flavor, GlusterFS runs on dedicated EC2 instances with a heketi-instance deployed separately, both running without containers as ordinary system services. The difference is that these instances are not part of the OpenShift cluster. The storage service is, however, made available to, and used by, OpenShift in the same way. If the user, for performance or cost reasons, wants the GlusterFS storage layer outside of OpenShift, this is made possible with CRS. For this purpose, the reference architecture ships add-crs-storage.py to automate the deployment in the same way as for CNS.


CNS provides further means of OpenShift Container Platform becoming an equalizer for application development. Consistent storage services, performance, and management are provided independently of the underlying provider platform. Deployment of data-driven applications is further simplified with CNS as the backend. This way, not only stateless but also stateful applications become easy to manage.

For developers, nothing changes: The details of provisioning and lifecycle of storage capacity for containerized applications is transparent to them, thanks to CNS’s integration with native OpenShift facilities.

For administrators, achieving cross-provider, hybrid-cloud deployments just became even easier with the recent release of the OpenShift Container Platform 3.5 on Amazon Web Service Reference Architecture. With just two basic commands, an elastic and fault-tolerant foundation for applications can be deployed. Once set up, growth becomes a matter of adding nodes.

It is now possible to choose the most suitable cloud provider platform without worrying about various tradeoffs between different storage feature sets or becoming too close to one provider’s implementation, thereby avoiding lock-in long term.

The reference architecture details the deployment and resulting topology. Access the document here.

18 Apr 2017

Bugfix release GlusterFS 3.8.11 has landed

An other month has passed, and more bugs have been squashed in the 3.8 release. Packages should be available or arrive soon at the usual repositories. The next 3.8 update is expected to be made available just after the 10th of May.

Release notes for Gluster 3.8.11

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9 and 3.8.10 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 15 patches have been merged, addressing 13 bugs:
  • #1422788: [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible
  • #1427390: systemic testing: seeing lot of ping time outs which would lead to splitbrains
  • #1430845: build/packaging: Debian and Ubuntu don't have /usr/libexec/; results in bad packages
  • #1431592: memory leak in features/locks xlator
  • #1434298: [Disperse] Metadata version is not healing when a brick is down
  • #1434302: Move spit-brain msg in read txn to debug
  • #1435645: Disperse: Provide description of disperse.eager-lock option.
  • #1436231: Undo pending xattrs only on the up bricks
  • #1436412: Unrecognized filesystems (i.e. btrfs, zfs) log many errors about "getinode size"
  • #1437330: Sharding: Fix a performance bug
  • #1438424: [Ganesha + EC] : Input/Output Error while creating LOTS of smallfiles
  • #1439112: File-level WORM allows ftruncate() on read-only files
  • #1440635: Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance

11 Apr 2017

Script for creating EBS persistent volumes in OpenShift/Kubernetes

If you aren't using the automated dynamic volume provisioning (which you should!). Here is a short bash script to help you automatically create both the EBS volume and Kubernetes persistent volume:


if [ $# -ne 2 ]; then  
    echo "Usage: sh create-volumes.sh SIZE COUNT"

for i in `seq 1 $2`; do  
  vol=$(ec2-create-volume --size $size --region ap-southeast-2 --availability-zone ap-southeast-2a --type gp2 --encrypted | awk '{print $2}')

  echo "
  apiVersion: v1
  kind: PersistentVolume
      failure-domain.beta.kubernetes.io/region: ap-southeast-2
      failure-domain.beta.kubernetes.io/zone: ap-southeast-2a
    name: pv-$vol
      storage: $size
      - ReadWriteOnce
      fsType: ext4
      volumeID: aws://ap-southeast-2a/$vol
    persistentVolumeReclaimPolicy: Delete" | oc create -f -

3 Apr 2017

Docker 4th B’day Celebration – Bangalore

In Bangalore we celebrated Docker’s 4th Birthday at Microsoft Office on 25th March’17. Over 300 participants signed up and around 100 turned up for the event. We had around 15 mentors. We started the event at 9:30 AM. After a quick round  to introduction with Mentors,  participants started doing the Docker Birthday Labs.  Docker community and team have done a great job in creating self explanatory labs with Play with Docker environment.
Docker Mentors
Most of the participants followed the instructions on their own and where ever needed, mentors helped the participants. Around 11:30 AM our host Sudhir gave a quick demo Azure Container Service and then we did the cake cutting. This time only one girl attended the meetup event, so we requested her to cut the cake.
After that we had light snacks and spent time in networking. It was great event and I am sure participants would have learnt something new.
Docker 4th Birthday, Bangalore
Thanks to Usha and Sudhir from Microsoft for hosting the event. In the next meetup we collaborating with 7th other meetup of Bangalore and doing a event on Microservices and Serverless. 

Join us for next community event on Microservices and Serverless

If you are following the updates here then you would know that in Feb’17 AWS, DevOp and Docker meetup group of Bangalore did a combined event in the following proposed community driven event format.



Instead of INR 200 we charged INR 100 and did not look for sponsors. With all the collected money we gave gifts and prizes to speakers and participants respectively. We received very good feedback about the event. In that event we decided to next event around Microservices and Serverless. This time even more meetup groups are coming together. Following 8 meetup groups would be joining hands this time :-


This is going to be one great event as we have already received good amount of talk proposals and are looking for more until 7th April. If you are interested then please submit your talk proposal here. And if you would like to attend the event then go respective meetup group and purchase the INR 100 ticket.

2 Apr 2017

WordPress editor missing when using CloudFront

We often put CloudFront in front of our WordPress sites to increase the load times of the website significantly.

CloudFront and WordPress have a few quirks, the main one will be the missing rich post/page editor that suddenly goes missing from your wp-admin.

The issue comes down to the UA sniffing that WordPress does.

Adding this into your functions.php will be a good quick fix for you

* Ignore UA Sniffing and override the user_can_richedit function
* and just check the user preferences
* @return bool
function user_can_richedit_override() {  
    global $wp_rich_edit;

    if (get_user_option('rich_editing') == 'true' || !is_user_logged_in()) {
        $wp_rich_edit = true;
        return true;

    $wp_rich_edit = false;
    return false;

add_filter('user_can_richedit', 'user_can_richedit_override');  

17 Mar 2017

GlusterFS 3.8.10 is available

The 10th update for GlusterFS 3.8 is available for users of the 3.8 Long-Term-Maintenance version. Packages for this minor update are in many of the repositories for different distributions already. It is recommended to update any 3.8 installation to this latest release.

Release notes for Gluster 3.8.10

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8 and 3.8.9 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Improved configuration with additional 'virt' options

This release includes 5 more options to group virt (for VM workloads) for optimal performance.
Updating to the glusterfs version containing this patch won't automatically set these newer options on already existing volumes that have group virt configured. The changes take effect only when post-upgrade
# gluster volume-set <VOL> group virt
is performed.
For already existing volumes the users may execute the following five commands, if not already set:
# gluster volume set <VOL> performance.low-prio-threads 32
# gluster volume set <VOL> cluster.locking-scheme granular
# gluster volume set <VOL> features.shard on
# gluster volume set <VOL> cluster.shd-max-threads 8
# gluster volume set <VOL> cluster.shd-wait-qlength 10000
# gluster volume set <VOL> user.cifs off
It is most likely that features.shard would already have been set on the volume even before the upgrade, in which case the third volume set command above may be skipped.

Bugs addressed

A total of 18 patches have been merged, addressing 16 bugs:
  • #1387878: Rebalance after add bricks corrupts files
  • #1412994: Memory leak on mount/fuse when setxattr fails
  • #1420993: Modified volume options not synced once offline nodes comes up.
  • #1422352: glustershd process crashed on systemic setup
  • #1422394: Gluster NFS server crashing in __mnt3svc_umountall
  • #1422811: [Geo-rep] Recreating geo-rep session with same slave after deleting with reset-sync-time fails to sync
  • #1424915: dht_setxattr returns EINVAL when a file is deleted during the FOP
  • #1424934: Include few more options in virt file
  • #1424974: remove-brick status shows 0 rebalanced files
  • #1425112: [Ganesha] : Unable to bring up a Ganesha HA cluster on RHEL 6.9.
  • #1425307: Fix statvfs for FreeBSD in Python
  • #1427390: systemic testing: seeing lot of ping time outs which would lead to splitbrains
  • #1427419: Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user.
  • #1428743: Fix crash in dht resulting from tests/features/nuke.t
  • #1429312: Prevent reverse heal from happening
  • #1429405: Restore atime/mtime for symlinks and other non-regular files.

7 Mar 2017

Access Gluster volume as a object Storage (via S3)

Building gluster-object in Docker container:


This document is about accessing a gluster-volume using object interface.

Object interface is provided by gluster-swift. (2)

Here, gluster-swift is running inside a docker container. (1)

This Object interface(docker container) accesses Gluster volume which is mounted in the host.

For the same Gluster volume, bind mount is created inside the docker container and hence can be accessed using S3 GET/PUT requests.

Steps to build gluster-swift container:

git clone docker-gluster-swift containing Dockerfile

$ git clone https://github.com/prashanthpai/docker-gluster-swift.git

$ cd docker-gluster-swift

Start Docker service:
$ sudo systemctl start docker.service

Build  a new image using Dockerfile
$ docker build --rm --tag prashanthpai/gluster-swift:dev .

Sending build context to Docker daemon 187.4 kB
Sending build context to Docker daemon
Step 0 : FROM centos:7
 ---> 97cad5e16cb6
Step 1 : MAINTAINER Prashanth Pai <ppai@redhat.com>
 ---> Using cache
 ---> ec6511e6ae93
Step 2 : RUN yum --setopt=tsflags=nodocs -y update &&     yum --setopt=tsflags=nodocs -y install         centos-release-openstack-kilo         epel-release &&     yum --setopt=tsflags=nodocs -y install         openstack-swift openstack-swift-{proxy,account,container,object,plugin-swift3}         supervisor         git memcached python-prettytable &&     yum -y clean all
 ---> Using cache
 ---> ea7faccc4ae9
Step 3 : RUN git clone git://review.gluster.org/gluster-swift /tmp/gluster-swift &&     cd /tmp/gluster-swift &&     python setup.py install &&     cd -
 ---> Using cache
 ---> 32f4d0e75b14
Step 4 : VOLUME /mnt/gluster-object
 ---> Using cache
 ---> a42bbdd3df9f
Step 5 : RUN mkdir -p /etc/supervisor /var/log/supervisor
 ---> Using cache
 ---> cf5c1c5ee364
Step 6 : COPY supervisord.conf /etc/supervisor/supervisord.conf
 ---> Using cache
 ---> 537fdf7d9c6f
Step 7 : COPY supervisor_suicide.py /usr/local/bin/supervisor_suicide.py
 ---> Using cache
 ---> b5a82aaf177c
Step 8 : RUN chmod +x /usr/local/bin/supervisor_suicide.py
 ---> Using cache
 ---> 5c9971b033e4
Step 9 : COPY swift-start.sh /usr/local/bin/swift-start.sh
 ---> Using cache
 ---> 014ed9a6ae03
Step 10 : RUN chmod +x /usr/local/bin/swift-start.sh
 ---> Using cache
 ---> 00d3ffb6ccb2
Step 11 : COPY etc/swift/* /etc/swift/
 ---> Using cache
 ---> ca3be2138fa0
Step 12 : EXPOSE 8080
 ---> Using cache
 ---> 677fe3fd2fb5
Step 13 : CMD /usr/local/bin/swift-start.sh
 ---> Using cache
 ---> 3014617977e0
Successfully built 3014617977e0

Setup Gluster volume:

Glusterd service start, create and mount volumes

$  su
root@node1 docker-gluster-swift$ service glusterd start

Starting glusterd (via systemctl):                         [  OK  ]
root@node1 docker-gluster-swift$
root@node1 docker-gluster-swift$

Create gluster volume:

There are three nodes where Centos 7.0 is installed.

Ensure glusterd service is started all three nodes(node1, node2, node3) as below:
#systemctl glusterd start

root@node1 docker-gluster-swift$ sudo gluster volume create tv1  node1:/opt/volume_test/tv_1/b1 node2:/opt/volume_test/tv_1/b2  node3:/opt/volume_test/tv_1/b3 force

volume create: tv1: success: please start the volume to access data

- node1, node2, nod3 are the hostnames,

- /opt/volume_test/tv_1/b1,  /opt/volume_test/tv_1/b2 and /opt/volume_test/tv_1/b3 are the bricks

        - tv1 is the volume name

root@node1 docker-gluster-swift$

Start gluster volume:
root@node1 docker-gluster-swift$ gluster vol start tv1

volume start: tv1: success

root@node1docker-gluster-swift$ gluster vol status

Status of volume: tv1
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick node1:/opt/volume_test/tv_1/b1         49152     0          Y       5951
Brick node2:/opt/volume_test/tv_1/b2         49153     0          Y       5980
Brick node3:/opt/volume_test/tv_1/b3         49153     0          Y       5980

Task Status of Volume tv1
There are no active volume tasks
root@node1 docker-gluster-swift$

Create a directory to mount the volume:
root@node1 docker-gluster-swift$ mkdir -p /mnt/gluster-object/tv1

The path /mnt/gluster-object/ will be used while running Docker container.

mount the volume:

root@node1 docker-gluster-swift$ mount -t glusterfs node1:/tv1 /mnt/gluster-object/tv1

root@node1 docker-gluster-swift$

Verify mount:
sarumuga@node1 test$ mount | grep mnt

node1:/tv1 on /mnt/gluster-object/tv1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)


Run command in the new container with gluster mount path:

root@node1 test$ docker run -d -p 8080:8080 -v /mnt/gluster-object:/mnt/gluster-object -e GLUSTER_VOLUMES="tv1" prashanthpai/gluster-swift:dev


-p 8080:8080

publish container port to host.

format :    hostport : containerport

                         (a)                (b)
Note: -v /mnt/gluster-object:/mnt/gluster-object
(a) location where all gluster volumes are mounted in host location
(b) location inside docker where volume is mapped

passing tv1 volume name as environment.

Verify container :
sarumuga@node1 test$ docker ps
CONTAINER ID        IMAGE                            COMMAND                CREATED             STATUS              PORTS                    NAMES
feb8867e1fd9        prashanthpai/gluster-swift:dev   "/bin/sh -c /usr/loc   29 seconds ago      Up 28 seconds>8080/tcp   sick_heisenberg

Inspect container and get the IP address:
sarumuga@node1test$ docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'  feb8867e1fd9"


Verifying S3 access :

Now, verify S3 access requests to the Gluster volume.

We are going to make use of s3curl(3) for verifying object access.

Create bucket:
# ./s3curl.pl --debug --id 'tv1' --key 'test' --put /dev/null  -- -k -v

Put object
# ./s3curl.pl --debug --id 'tv1' --key 'test' --put  ./README -- -k -v -s

Get object
# ./s3curl.pl --debug --id 'tv1' --key 'test'   -- -k -v -s

List objects in a bucket request
# ./s3curl.pl --debug --id 'tv1' --key 'test'   -- -k -v -s

List all buckets
# ./s3curl.pl --debug --id 'tv1' --key 'test'   -- -k -v -s

Delete object
# ./s3curl.pl --debug --id 'tv1' --key 'test'   --del -- -k -v -s

Delete Bucket
# ./s3curl.pl --debug --id 'tv1' --key 'test'   --del -- -k -v -s


(1) GitHub - prashanthpai/docker-gluster-swift: Run gluster-swift inside a docker container.
(2) gluster-swift/quick_start_guide.md at master · gluster/gluster-swift · GitHub
(3) Amazon S3 Authentication Tool for Curl : Sample Code & Libraries : Amazon Web Services

27 Feb 2017

Bangalore Kubernetes Meetup #4 and update on next community event

Last weekend  we had 4th Kubernetes Meetup in Bangalore at VMware’s office. More than 100 people signed and ~40 people showed  up. The first session was from  Akshay Mathur and Manu Dilip Shah  of A10 Networks who shared their experience on choosing  and using Kubernetes for their  product.

The next session was from Krishna Kumar and Dhilip Kumar from Huawei who talked about  managing stateful application with Kubernetes. Krishna first gave us good overview and then Dhilip showed us a demo.  He also urged participant to join Kubernetes Stateful SIG.

The last talk was from Magesh and Kumar Gaurav of VMware on using using k8s to orchestrate VMware SaaS. It was an interesting talk on which Magesh talk about some real problems his team has faced.

Some participants asked about comparing Docker Swarm, Kubernetes, Mesos Marathon and other container orchestration. We had brief  discussion about I share my last year’s LinuxCon/ContainerCon EU workshop on comparing different  container orchestrators.

We also talked about on what we should do in in the next Kubernetes meetup. We might do hands-on labs.

In the end I also shared the details about the community event we are doing in conjunction with other meetup groups like one we did few weeks back AWS, DevOps and Docker meetup groups.   In April we plan to do similar event  on Microservices and Serverless. Other than previous participants Kubernetes and Mesos & CNCF meetup group would also be joining it.

It was a good meetup. Thanks for the organisers and speakers.

21 Feb 2017

AWS, DevOps and Docker Meetup

Continuing our experiment with community driven conference, this time AWS,DevOps and Docker meetup collaborated to do a combined meetup. We charged INR 100 to each participants to make sure they are really interested in coming. From the collected money we gave gifts to speakers and some prized to the participants. The meetup was hosted at Bangalore’s LinkedIn office. It was really nice venue and and everything went as expected.

AWS, DevOps and Docker Meetup - Feb'17

AWS, DevOps and Docker Meetup – Feb’17

Out of 150 participants ~140 showed up, which is really nice. Some people came from Chennai and Cochin as well. So putting a nominal fee really works :). We started almost on time. The first talk was from Neeraj Shah from Minjar Cloud. He shared his experience on how he deployed application on ECS. He mentioned the advantages of using ECS to deploy Docker based application on Cloud and why one should use it.

Next talk was on from Mohit Sethi, who talked about managing storage in containerised  environments.

Sreenivas Makam then talked about Docker 1.13 experimental features. He briefly introduced us with new features of Docker 1.13 and then focussed on experimental feature. Experimental features can enabled from stable binary only, which is a real help for anyone who wants to try out upcoming Docker features. In the end we also good discussion about Docker Stack.

We took a quick break after that and had quick intro session with all the organisers. After the break Madan shared some great tips on saving cost with AWS.

After that Shamasis shared Postman’s scaling journey with Docker and AWS Beanstalk. It was really fascinating.

In the first Flash talk Gourav Shah open-sourced  a project on creating Devops workspaces using Docker. Generally one is required to have more than one machined to try DevOps tools like Chef, Puppet Ansible etc. With this tool one can build a multi-node setup using Docker and do different labs.

The last session was from Mitesh, who showed to use Jenkins and CodeDeploy to deploy the applications on AWS.

All of the sessions were very informative, venue was host, participants were eager to learn. I think it was really good meetup.

Thanks to all the speakers and our host Bathri and Sethil at LinkedIn. Thanks to organisers for AWS and DevOps meetup group specially Mohit and Habeeb.

During the meetup some other meetup groups came forward to join us next time. We might do something around Micro-services and Serverless in April last week in the similar mode. So stay tuned 🙂


16 Feb 2017

GlusterFS 3.8.9 is an other Long-Term-Maintenance update

We are proud to announce the General Availability of yet the next update to the Long-Term-Stable releases for GlusterFS 3.8. Packages are being prepared to hit the mirrors expected to hit the repositories of distributions and the Gluster download server over the next few days. Details on which versions are part of which distributions can be found on the Community Packages in the documentation.

The release notes are part of the git repository, the downloadable tarball and are included in this post for easy access.

Release notes for Gluster 3.8.9

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7 and 3.8.8contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 16 patches have been merged, addressing 14 bugs:
  • #1410852: glusterfs-server should depend on firewalld-filesystem
  • #1411899: DHT doesn't evenly balance files on FreeBSD with ZFS
  • #1412119: ganesha service crashed on all nodes of ganesha cluster on disperse volume when doing lookup while copying files remotely using scp
  • #1412888: Extra lookup/fstats are sent over the network when a brick is down.
  • #1412913: [ganesha + EC]posix compliance rename tests failed on EC volume with nfs-ganesha mount.
  • #1412915: Spurious split-brain error messages are seen in rebalance logs
  • #1412916: [ganesha+ec]: Contents of original file are not seen when hardlink is created
  • #1412922: ls and move hung on disperse volume
  • #1412941: Regression caused by enabling client-io-threads by default
  • #1414655: Upcall: Possible memleak if inode_ctx_set fails
  • #1415053: geo-rep session faulty with ChangelogException "No such file or directory"
  • #1415132: Improve output of "gluster volume status detail"
  • #1417802: debug/trace: Print iatts of individual entries in readdirp callback for better debugging experience
  • #1420184: [Remove-brick] Hardlink migration fails with "lookup failed (No such file or directory)" error messages in rebalance logs

10 Feb 2017

Avoiding a $10,000 AWS trap

If you’re a back end developer, DevOps or infrastructure person, or you’re interested in AWS Amazon Web Services, check out this blog post from our Developer Adrian Hindle.

It follows his lightning talk for ‘Dispatches from the tech front line’, Cogapp’s Wired Sussex Open Studios event as part of Brighton Digital Festival .

Adrian delivering his talk at Cogapp’s Wired Sussex Open Studios event

Cloud computing

At Cogapp, when we need to build large load-balanced and auto-scaled infrastructure, we use AWS. Here’s AWS’ offering in their own words — “Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services. Free to join, pay only for what you use”.

We also use GlusterFS on some of our sites to store large amounts of data (usually images). GlusterFS is a scalable network filesystem and we usually use a distributed and replicated setup over four servers.

I’m a back end and infrastructure developer, and part of my role is to be responsible for creating and maintaining this kind of setup.

We’ve found that this arrangement normally works well, having reliably used it across a couple of sites. I’m here to tell you about a rare occasion when it stopped working, with potentially expensive consequences! In the process of avoiding this perilous trap, we’re pleased (and relieved!) to say we also avoided any downtime to the site.

A curious problem

Everything had been running smoothly with this site until we started getting AWS Cloudwatch alerts about one of our Drupal instances dropping out of our load balancer. We looked into it, and noticed that the Cloudwatch monitoring had stopped on one of our Gluster servers (Gluster1).

It turned out that Gluster1 instance had failed and was unavailable, in other words, it just stopped working. The good thing with our GlusterFS setup is that even if one server stops working, the others carry on working and the data carries on being served. I’m still not sure why this caused one of our Drupal instances to drop out of the load balancer for a minute.

Cogapp investigates

Initially, I tried to replace the Gluster1 server and its ‘bricks’ with another one but for some strange reason the cluster did not accept the new server. After spending hours reading the documentation and asking questions on the IRC and on the mailing list, I gave up and we decided to replace the entire cluster. The old cluster never stopped working, but going down to three servers instead of four was dangerous because it would only take another server to go offline and the entire site would look broken or stop working properly. All our servers are provisioned with Ansible, so creating another cluster is quick and easy. But instead of getting the data from the old cluster, we thought we would use one of our backups and do a ‘fire drill’.


We back up every archive image to Glacier. We use Glacier instead of S3 because our GlusterFS setup is redundant and is, in a way, a backup of itself. The premise of Glacier is very cheap storage for infrequently accessed data, where a retrieval time of several hours is suitable. The way to put data into Glacier is through S3. You put data in S3 and create a rule on the S3 bucket to move the data to Glacier. You cannot access Glacier directly; you have to go through S3. To get data from Glacier you need to request a ‘restore’ of your files, they will then be available in the S3 bucket after a couple of hours.

The trap

In keeping with its intended niche of long term data archiving, storage costs on Glacier are low; but retrieval fees can be high. Just before I was going to restore the 5TB of images, I thought I would check how much this restoration would cost. This was quite a surprise. The estimated cost was going to be $10,000+!

Length of time for retrieval 4 hours:
Retrieval cost: $9,900.00
Transfer cost: $449.91
Total cost: $10,349.91

After some time playing around with a couple of cost calculators I realised, thankfully, that I could throttle the restoration over 2–3 days, and the cost would be divided by 10.

Length of time for retrieval 72 hours:
Retrieval cost: $547.25
Transfer cost: $449.91
Total cost: $997.16

Problem solved…

So, instead of restoring the entire bucket, I wrote a small script that listed all the files in Glacier and wrote them to a text file. Then, looping on each file the script asked for the restoration of the file and then paused for a few seconds. The output was written to another file so that if the script stopped or if the request failed we could restart it and know where it stopped.

In our case, to restore 540,000+ JPEG 2000 (5TB+), the restoration took 4 days. Once the files were restored, the download speed from S3 to Gluster was (depending on the size of the images) 150–200 images per minute (total download ~125 hours). I started the restore script on a Monday morning, but because I started downloading the images as soon as they were available, by Friday the new Gluster cluster had all the images.

Once the new Gluster cluster was provisioned and had all the data, I did a couple of checks and got ready to replace the old cluster with the new one the following Monday.

…or was it?

When I came back to the office on Monday morning I had an email from Amazon. It said that one of the instances from the new cluster was scheduled to be retired due to problems with its underlying architecture! Which basically means I had to start all over again…but this time I didn’t get the data from Glacier!

What does this mean for you?

If you have a complicated technical project with some complex infrastructure and you need someone to set it up as cost-efficiently as possible, get in touch with Adrian and the Cogapp team.

Avoiding a $10,000 AWS trap was originally published in Cogapp on Medium, where people are continuing the conversation by highlighting and responding to this story.

31 Jan 2017

GlusterFS 3.7.20

GlusterFS 3.7.20 released

GlusterFS-3.7.20 has been released. This is regular bug fix release for GlusterFS-3.7, and is currently the last planned release of GlusterFS-3.7. GlusterFS-3.10 is expected next month, and GlusterFS-3.7 enters EOL once it is released. The community will be notified of any changes to the EOL schedule.

The release-notes for GlusterFS-3.7.20 can be read here.

The release tarball and community provided packages can obtained from download.gluster.org. The CentOS Storage SIG packages have been built and should be available soon from the centos-gluster37 repository.

30 Jan 2017

Gerrit OS Upgrade

When I started working on Gluster, Gerrit was a large piece of technical debt. We were running quite an old version on CentOS 5. Both of these items needed fixing. The Gerrit upgrade happened in June causing me a good amount of stress for a whole week as I dealt with the fall out. The OS upgrade for Gerrit happened last weekend after a marathon working day that ended at 3 am. We ran into several hacks in the old setup and we worked on getting them working in a more acceptable manner. That took quite a bit of our time and energy. At the end of it, I’m happy to say, Gerrit now runs on a machine with CentOS 7. Now of course, it’s time to upgrade Gerrit again and start the whole cycle all over again.

There's light at the end of the tunnel, hopefully, it's not a train

Michael and I managed to coordinate well across timezones. We had a document going where we listed out the tasks to do. As we discovered more items, they went on the todo list. This document also listed all the hacks we discovered. We fixed some of them but did not move the fix over to Ansible. We left some hacks in because fixing it will take some more time.

Things we learned the hard way:

  • Running the git protocol with xinetd was a trial and error process to configure. It took me hours to get it right. Here’s the right config file:
service git
        disable         = no
        socket_type     = stream
        wait            = no
        user            = nobody
        server          = /usr/libexec/git-core/git-daemon
        server_args     = --export-all --reuseaddr --base-path=/path/to/git/folder --inetd --verbose --base-path-relaxed
        log_on_failure  += USERID
  • There was some selinux magic we needed for cgit. The documentation had some notes on how to get it right, but that didn’t work for us. Here’s what what needed:
semanage fcontext -a -t git_user_content_t "/path/to/git/folder(/.*)?"
  • When you setup replication to Github for the first time, you need to add the Github host keys to known_hosts. The easiest way is to try to ssh into github. That will fail with a friendly error message and prompt you to add your keys. You could also get it from Github.
  • Gerrit needs AllowEncodedSlashes On and ProxyPass nocanon. Without these two bits of configuration, Gerrit returns random 404s.

We’ve removed two big items out of our tech debt backlog and into successes over the past year or so. Next step is a tie between a Jenkins upgrade and a Gerrit upgrade :)

Image credit: Captain Tenneal Steam Train (license)

24 Jan 2017

How to configure linux vxlans with multiple unicast endpoints

Sometimes you just can't use multicast. Some cloud providers just do not provide it. In that scenario, you need to configure your vxlan layer using unicast addresses. This is done easily using iproute2.

3 Node Network

With the preceding layout, we need the docker instances to be able to communicate with each other. We cannot use L3 routes because the provider will not route any thing that's not on the network, so we need to set up our own L2 network layer over which we can establish our L3 routes. For this we'll use a Virtual Extensible LAN (VXLAN).

Linux has all the tools for setting up these VXLANS and the most common method is to use multicasting. This network doesn't support multicast routing so it's not a possibility. We must use unicast addressing.

We'll start by creating a vxlan interface on the first node.

ip link add vxlan0 type vxlan id 42 dev enp1s0 dstport 0

This creates the vxlan0 device, attaches it to enp1s0 listening on the iana default port. This does not assign any endpoints, so we'll create connections to and

bridge fdb append to 00:00:00:00:00:00 dst dev vxlan0
bridge fdb append to 00:00:00:00:00:00 dst dev vxlan0

Assign an address and turn up the interface

ip addr add dev vxlan0
ip link set up dev vxlan0

Do the same on each of the other nodes.

ip link add vxlan0 type vxlan id 42 dev emp1s0 dstport 0
bridge fdb append to 00:00:00:00:00:00 dst dev vxlan0 bridge fdb append to 00:00:00:00:00:00 dst dev vxlan0
ip addr add dev vxlan0
ip link set up dev vxlan0

ip link add vxlan0 type vxlan id 42 dev emp1s0 dstport 0
bridge fdb append to 00:00:00:00:00:00 dst dev vxlan0 bridge fdb append to 00:00:00:00:00:00 dst dev vxlan0
ip addr add dev vxlan0
ip link set up dev vxlan0

Confirm you can ping via the vxlan.

ping -c4 ; ping -c4


PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from icmp_seq=3 ttl=64 time=0.089 ms
64 bytes from icmp_seq=4 ttl=64 time=0.061 ms

--- ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.061/0.078/0.092/0.015 ms
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=2.01 ms
64 bytes from icmp_seq=2 ttl=64 time=1.64 ms
64 bytes from icmp_seq=3 ttl=64 time=1.02 ms
64 bytes from icmp_seq=4 ttl=64 time=1.79 ms

--- ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 1.027/1.619/2.015/0.367 ms

Your new vxlan network is now ready for adding your l3 routes.

Add your docker l3 routes.

ip route add via
ip route add via

ip route add via
ip route add via

ip route add via
ip route add via

Now your docker containers can reach each other.

NOTE: This is not yet something that can be configured via systemd-networkd. https://github.com/systemd/systemd/issues/5145

15 Jan 2017

An other Gluster 3.8 Long-Term-Maintenance update with the 3.8.8 release

The Gluster team has been busy over the end-of-year holidays and this latest update to the 3.8 Long-Term-Maintenance release intends to fix quite a number of bugs. Packages have been built for many different distributions and are available from the download server. The release-notes for 3.8.8 have been included below for the ease of reference. All users on the 3.8 version are recommended to update to this current release.

Release notes for Gluster 3.8.8

This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6 and 3.8.7 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

Bugs addressed

A total of 38 patches have been merged, addressing 35 bugs:
  • #1375849: [RFE] enable sharding with virt profile - /var/lib/glusterd/groups/virt
  • #1378384: log level set in glfs_set_logging() does not work
  • #1378547: Asynchronous Unsplit-brain still causes Input/Output Error on system calls
  • #1389781: build: python on Debian-based dists use .../lib/python2.7/dist-packages instead of .../site-packages
  • #1394635: errors appear in brick and nfs logs and getting stale files on NFS clients
  • #1395510: Seeing error messages [snapview-client.c:283:gf_svc_lookup_cbk] and [dht-helper.c:1666ht_inode_ctx_time_update] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x5d75c)
  • #1399423: GlusterFS client crashes during remove-brick operation
  • #1399432: A hard link is lost during rebalance+lookup
  • #1399468: Wrong value in Last Synced column during Hybrid Crawl
  • #1399915: [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
  • #1401029: OOM kill of nfs-ganesha on one node while fs-sanity test suite is executed.
  • #1401534: fuse mount point not accessible
  • #1402697: glusterfsd crashed while taking snapshot using scheduler
  • #1402728: Worker restarts on log-rsync-performance config update
  • #1403109: Crash of glusterd when using long username with geo-replication
  • #1404105: Incorrect incrementation of volinfo refcnt during volume start
  • #1404583: Upcall: Possible use after free when log level set to TRACE
  • #1405004: [Perf] : pcs cluster resources went into stopped state during Multithreaded perf tests on RHGS layered over RHEL 6
  • #1405130: `gluster volume heal split-brain' does not heal if data/metadata/entry self-heal options are turned off
  • #1405450: tests/bugs/snapshot/bug-1316437.t test is causing spurious failure
  • #1405577: [GANESHA] failed to create directory of hostname of new node in var/lib/nfs/ganesha/ in already existing cluster nodes
  • #1405886: Fix potential leaks in INODELK cbk in protocol/client
  • #1405890: Fix spurious failure in bug-1402841.t-mt-dir-scan-race.t
  • #1405951: NFS-Ganesha:Volume reset for any option causes reset of ganesha enable option and bring down the ganesha services
  • #1406740: Fix spurious failure in tests/bugs/replicate/bug-1402730.t
  • #1408414: Remove-brick rebalance failed while rm -rf is in progress
  • #1408772: [Arbiter] After Killing a brick writes drastically slow down
  • #1408786: with granular-entry-self-heal enabled i see that there is a gfid mismatch and vm goes to paused state after migrating to another host
  • #1410073: Fix failure of split-brain-favorite-child-policy.t in CentOS7
  • #1410369: Dict_t leak in dht_migration_complete_check_task and dht_rebalance_inprogress_task
  • #1410699: [geo-rep]: Config commands fail when the status is 'Created'
  • #1410708: glusterd/geo-rep: geo-rep config command leaks fd
  • #1410764: Remove-brick rebalance failed while rm -rf is in progress
  • #1411011: atime becomes zero when truncating file via ganesha (or gluster-NFS)
  • #1411613: Fix the place where graph switch event is logged