20 Sep 2019

6 Sep 2019

22 Aug 2019

19 Aug 2019

Geo-Replication in Gluster

What is geo-replication?
It is as simple as having a copy of your data, somewhere else in the globe!

Data from a location, is asynchronously replicated to a secondary location so that the same data exists in both the locations.

Thus, there exists a backup, all the time, even if the data in the primary location is completely destroyed. Same data from secondary location/locations can always be retrived!


But how does this work?
Let me explain with respect to glusterFS.

It all happens this way:
1) Lets say, you have a volume (I will call it primary volume)
2) Now you want to have it replicated.
3) So, you will create a new volume  in a different location. (I will call it  a secondary volume)
4) You want to replicate all the data from primary volume, also sync data to secondary volume whenever new changes are made to primary volume.       
5) So, you will create a gluster geo-replication session which takes care of all these replications for you:)

The primary volume is called master, from where the unidirectional sync happens to the secondary volume(the slave).

How is the connection established between the master and the slave?
Click here to know...

How is the changes in master detected?
To learn this, you should know what is xtime and stime.

stime is an extended attribue stored in each master brick root which stores the timestamp until which the changelogs are completely synced to slave.

Wondering what is changelog? Read about it here.
xtime is an extended attribute stored on all the file system objects, both files and directories. To each modification in a file or directory in the file system, an extended attribute with modification time is stored on it.

Gluster geo-replication detects the changes made on master by crawling the master volume.
There occur, three types of crawls:

1) Hybrid crawl/ Xsync Crawl:
It only happens when there is already data in master volume before setting up  geo-replication.

If geo-rep is setup before creation of data on master, it never goes to hybrid crawl.

On each directory, it compares stime with xtime and only enters inside if
stime < xtime ( i.e last time the data got synced on slave is lesser than the last time any modification was made on master!)

2) History crawl:
Lets assume geo-rep is in stopped state and there is one month of data pending to be synced to slave. stime would have been marked until which geo-rep is synced.
when  geo-rep is started after a month, all the changelogs are processed, this phase is essentially called as history crawl.

3) Changelog crawl:
When live changelogs or close to live changelogs are processed to sync the data to slave, we call it changelog crawl.

How does the data sync from master to slave? 
At first we see Entry Syncing, where a zero byte file is created in slave with same gfid as master.

Here, gfid is nothing but a unique identifier which identifies file in gluster volume which is similar to inode number in file system.

And then we see Data syncing.
Data syncing includes:
  • Rsync over ssh  (default method)
  • Tar  over ssh 
      (one among these can be opted for syncing the data)

      Hmm.. how are rsync and tar over ssh defferent?
      The first backup with rsync will be slow or will be similar to tar over ssh because all files are copied. Subsequently, when the new changes are made on master volume, only new changes are copied to slave, making the replication faster and efficient. 
      Thus it is recommended to use rsync for large files.

      Tar is an archive utility. Tar ball is created for all the files, unlike rsync which only considers deltas once you copy the initial file in secondary location.
      Thus it is recommended to use tar for small files.

      How frequently the sync occur?
      As I said it is asynchronous. Although the replication process may occur in near-
      real-time, it is more common for replication to occur on a scheduled basis.

      A few changelogs will be  batched together and are processed.
      • First, the entry and metadata changes are synced serially from each changelog of the batch. 
      • Data changes gets added to rsync/tarssh queue as soon as the entry and metadata are synced from first changelog of the batch. 
      • Once all the entry, meta data and data are synced within batch, the stime is updated in brick root. 
      • If something breaks in between re-processing is involved.
      Now as you have come this far, I will tell you about Geo-rep processes. These  processes efficiently take up above mentioned tasks.

      There are three precesses, namely,
      1) Monitor process
      2) worker process
      3) Agent process
      There is a file called gsyncd.py  inside glusterfs repository.
      It acts as a single entry point for geo-replication.

      Based on different arguments passed, it acts as moniter, worker or agent.

      When we start the geo-rep session,using the command:
      #gluster volume geo-rep <mastervol><slavehost>::<slavevol> start

      gluster management daemon will start the monitor process.

      This process monitors, worker and agent processes. If worker/agent crashes, monitor restarts them. (Note: if the agent crashes, worker automatically crashes and vice versa)

      There will be one moniter process per geo-rep session in a node.

      The agent process consumes changelogs generated by changelog xlator.
      Worker process uses parsed changelogs generated by agent process to sync the data to slave.

      Each brick will have a agent and a worker process.

      Let us try it out:
      Let us create our own geo-rep session, following the instructions mentioned in this blog

      Let us see if data is really replicated from master to slave :p

      How do we do that?
      Mount master and slave to mount points,
      using this command.

      # mount -t glusterfs `hostname`:`volume-name` /mount_point
      eg:   #mount -t glusterfs virtual_machine1:master /dir1/mount_point_master
              #mount -t glusterfs virtual_machine2:slave /dir2/mount_point_slave

       create files inside mount point of master.
      #cd  /dir1/mount_point_master
      #touch file{1..10}

      Check inside slave mount point now.
      #ls /dir2/mount_point_slave
      See that file1 to file10 are already replicated :)

      also check log files, to know more details:
      All logs can be found at:

      geo-rep specific logs are present at:

      But is there any mechanism for data retrival?
      Yes! If the master node gets detroyed or goes offline, we can promote our slave as master! All the data access can be done from the promoted volume(the new master volume). We call this procedure as  failover.

      See how...

      When the original master is back online, we can perform a failback procedure on the original slave so that it synchronizes the differences back to the original master.

      So this was it. Here ends my narration of geo-replication story:) Do explore more!.

      7 Aug 2019

      2 Aug 2019

      27 Jul 2019

      19 Jul 2019

      How do I manage my time at work


      Often people may wonder that how a technical people manager can handle the responsibilities of people management and the other technical responsibilities such as bug scrubbing, patch reviews, root causing critical issues, replying to community user queries et all. My recipe to this problem is simple – respect the time and utilize it at its best.
      When I start my day, the obvious thing I do is reply to the emails  where things are waiting on me and then scan through the new entries in my inbox and on an average this takes about 30 to 60 minutes.

      The next 1 to 1.5 hours I dedicate my time on patch review and bug triaging. Being a Gluster co maintainer and managing GlusterD & CLI components, I have my responsibilities to ensure the review backlog doesn’t grow and all new patches which have passed regression have reviews within 2 days of submission, ofcourse this is my own SLA and other maintainers might be following a different one. The overall time of the review varies depending on the type and complexity of the patches and it could take me to complete this in 10 minutes or even an hour. But my principle is not to overshoot it . If I need additional time, I will park the rest of the things and find time later the day or at worst case on the next day.

      Post the review backlog, my focus shifts to bug triaging of the components I maintain (both for Gluster & the downstream product RHGS). Now one might debate on this topic that why would you triage bugs regularly, but I find this frequency to be the most effective one as (a) your time to respond on any issues stay quite low (b) You never accumulate bugs staying in NEW state in Bugzilla which aren’t being ever looked at (c) the plan to address any critical bugs can be done well ahead of releases.

      Many of us think that triaging a bug means identifying the problem, root causing it and then decide on how to fix the problem and ship it for the customers/users. However, my way of understanding bug triaging is that you need to assess the severity of the problem and see if the explanation of the problem is valid or not and based on that put up a plan on when you’d want to work on it and what (tentative) release this bug should be tagged to. Of course some times, you’d need to coordinate with your fellow colleagues if a bug needs assistance from different component maintainers/owners. Also the triage results should translate into a decision of the bug – should be fixed or not? I see lot of hesitance in choosing the latter and that’s where your backlog grows! It’s absolutely fine to say that “no we can’t work on this at this moment given our roadmap and priority and this bug isn’t critical enough to impact your production, if you have any other justification, please comment”.

      So after 2 to 2.5 hours of work on emails, reviews, bug triaging I get into my regular meetings which could be team wide, one and one conversation with my directs. I hate continuous meetings, so I prefer to have a half an hour slot of gap after every (long) meetings where I can start working on my other agenda (release work, coordination/status check, providing assitance to team members in debugging problems etc).  Of course, in the middle if there’re adhoc meetings due to unplanned urgent situations you’d need to accomodate them in your schedule.

      Along with the evening meetings (program calls and other stuffs where we have colleagues from different time zones) I try to spend 30 to 60 minutes by picking up bug (if not two, at least one), working on the root cause and if possible sending a fix.  And at the end of the day, another routine of scanning through your inbox, starring them for later and finish any important people management pending items.

      I must say that all the above are applicable mostly for Monday to Thursday. I keep Friday a bit more flexible and lighter w.r.t meetings and do something which I don’t do regularly, for eg – I am writing this blog on a Friday itself!

      So far, in last 3 years of my career working as a people manager, following a time discipline and bucketing items into slots have helped me managing my time irrespective of having long commute time (23 kilometers one way and that too in Bangalore), not every day but twice or thrice in a week and I hope this small post might be helpful for others who are struggling on the time management.

      17 Jul 2019