28 Oct 2019

Deploying a Highly Available Nextcloud with MySQL Galera Cluster and GlusterFS

Nextcloud is an open source file sync and share application that offers free, secure, and easily accessible cloud file storage, as well as a number of tools that extend its feature set. It's very similar to the popular Dropbox, iCloud and Google Drive but unlike Dropbox, Nextcloud does not offer off-premises file storage hosting. 

Nextcloud Logo

In this blog post, we are going to deploy a high-available setup for our private "Dropbox" infrastructure using Nextcloud, GlusterFS, Percona XtraDB Cluster (MySQL Galera Cluster), ProxySQL with ClusterControl as the automation tool to manage and monitor the database and load balancer tiers. 

Note: You can also use MariaDB Cluster, which uses the same underlying replication library as in Percona XtraDB Cluster. From a load balancer perspective, ProxySQL behaves similarly to MaxScale in that it can understand the SQL traffic and has fine-grained control on how traffic is routed. 

Database Architecture for Nexcloud

In this blog post, we used a total of 6 nodes.

  • 2 x proxy servers 
  • 3 x database + application servers
  • 1 x controller server (ClusterControl)

The following diagram illustrates our final setup:

Highly Available MySQL Nextcloud Database Architecture

For Percona XtraDB Cluster, a minimum of 3 nodes is required for a solid multi-master replication. Nextcloud applications are co-located within the database servers, thus GlusterFS has to be configured on those hosts as well. 

Load balancer tier consists of 2 nodes for redundancy purposes. We will use ClusterControl to deploy the database tier and the load balancer tiers. All servers are running on CentOS 7 with the following /etc/hosts definition on every node:

192.168.0.21 nextcloud1 db1

192.168.0.22 nextcloud2 db2

192.168.0.23 nextcloud3 db3

192.168.0.10 vip db

192.168.0.11 proxy1 lb1 proxysql1

192.168.0.12 proxy2 lb2 proxysql2

Note that GlusterFS and MySQL are highly intensive processes. If you are following this setup (GlusterFS and MySQL resides in a single server), ensure you have decent hardware specs for the servers.

Nextcloud Database Deployment

We will start with database deployment for our three-node Percona XtraDB Cluster using ClusterControl. Install ClusterControl and then setup passwordless SSH to all nodes that are going to be managed by ClusterControl (3 PXC + 2 proxies). On ClusterControl node, do:

$ whoami

root

$ ssh-copy-id 192.168.0.11

$ ssh-copy-id 192.168.0.12

$ ssh-copy-id 192.168.0.21

$ ssh-copy-id 192.168.0.22

$ ssh-copy-id 192.168.0.23

**Enter the root password for the respective host when prompted.

Open a web browser and go to https://{ClusterControl-IP-address}/clustercontrol and create a super user. Then go to Deploy -> MySQL Galera. Follow the deployment wizard accordingly. At the second stage 'Define MySQL Servers', pick Percona XtraDB 5.7 and specify the IP address for every database node. Make sure you get a green tick after entering the database node details, as shown below:

Deploy a Nextcloud Database Cluster

Click "Deploy" to start the deployment. The database cluster will be ready in 15~20 minutes. You can follow the deployment progress at Activity -> Jobs -> Create Cluster -> Full Job Details. The cluster will be listed under Database Cluster dashboard once deployed.

We can now proceed to database load balancer deployment.

Nextcloud Database Load Balancer Deployment

Nextcloud is recommended to run on a single-writer setup, where writes will be processed by one master at a time, and the reads can be distributed to other nodes. We can use ProxySQL 2.0 to achieve this configuration since it can route the write queries to a single master. 

To deploy a ProxySQL, click on Cluster Actions > Add Load Balancer > ProxySQL > Deploy ProxySQL. Enter the required information as highlighted by the red arrows:

Deploy ProxySQL for Nextcloud

Fill in all necessary details as highlighted by the arrows above. The server address is the lb1 server, 192.168.0.11. Further down, we specify the ProxySQL admin and monitoring users' password. Then include all MySQL servers into the load balancing set and then choose "No" in the Implicit Transactions section. Click "Deploy ProxySQL" to start the deployment.

Repeat the same steps as above for the secondary load balancer, lb2 (but change the "Server Address" to lb2's IP address). Otherwise, we would have no redundancy in this layer.

Our ProxySQL nodes are now installed and configured with two host groups for Galera Cluster. One for the single-master group (hostgroup 10), where all connections will be forwarded to one Galera node (this is useful to prevent multi-master deadlocks) and the multi-master group (hostgroup 20) for all read-only workloads which will be balanced to all backend MySQL servers.

Next, we need to deploy a virtual IP address to provide a single endpoint for our ProxySQL nodes so your application will not need to define two different ProxySQL hosts. This will also provide automatic failover capabilities because virtual IP address will be taken over by the backup ProxySQL node in case something goes wrong to the primary ProxySQL node.

Go to ClusterControl -> Manage -> Load Balancers -> Keepalived -> Deploy Keepalived. Pick "ProxySQL" as the load balancer type and choose two distinct ProxySQL servers from the dropdown. Then specify the virtual IP address as well as the network interface that it will listen to, as shown in the following example:

Deploy Keepalived & ProxySQL for Nextcloud

Once the deployment completes, you should see the following details on the cluster's summary bar:

Nextcloud Database Cluster in ClusterControl

Finally, create a new database for our application by going to ClusterControl -> Manage -> Schemas and Users -> Create Database and specify "nextcloud". ClusterControl will create this database on every Galera node. Our load balancer tier is now complete.

GlusterFS Deployment for Nextcloud

The following steps should be performed on nextcloud1, nextcloud2, nextcloud3 unless otherwise specified.

Step One

It's recommended to have a separate this for GlusterFS storage, so we are going to add additional disk under /dev/sdb and create a new partition:

$ fdisk /dev/sdb

Follow the fdisk partition creation wizard by pressing the following key:

n > p > Enter > Enter > Enter > w

Step Two

Verify if /dev/sdb1 has been created:

$ fdisk -l /dev/sdb1

Disk /dev/sdb1: 8588 MB, 8588886016 bytes, 16775168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Step Three

Format the partition with XFS:

$ mkfs.xfs /dev/sdb1

Step Four

Mount the partition as /storage/brick:

$ mkdir /glusterfs

$ mount /dev/sdb1 /glusterfs

Verify that all nodes have the following layout:

$ lsblk

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda      8:0 0 40G  0 disk

└─sda1   8:1 0 40G  0 part /

sdb      8:16 0   8G 0 disk

└─sdb1   8:17 0   8G 0 part /glusterfs

Step Five

Create a subdirectory called brick under /glusterfs:

$ mkdir /glusterfs/brick

Step Six

For application redundancy, we can use GlusterFS for file replication between the hosts. Firstly, install GlusterFS repository for CentOS:

$ yum install centos-release-gluster -y

$ yum install epel-release -y

Step Seven

Install GlusterFS server

$ yum install glusterfs-server -y

Step Eight

Enable and start gluster daemon:

$ systemctl enable glusterd

$ systemctl start glusterd

Step Nine

On nextcloud1, probe the other nodes:

(nextcloud1)$ gluster peer probe 192.168.0.22

(nextcloud1)$ gluster peer probe 192.168.0.23

You can verify the peer status with the following command:

(nextcloud1)$ gluster peer status

Number of Peers: 2



Hostname: 192.168.0.22

Uuid: f9d2928a-6b64-455a-9e0e-654a1ebbc320

State: Peer in Cluster (Connected)



Hostname: 192.168.0.23

Uuid: 100b7778-459d-4c48-9ea8-bb8fe33d9493

State: Peer in Cluster (Connected)

Step Ten

On nextcloud1, create a replicated volume on probed nodes:

(nextcloud1)$ gluster volume create rep-volume replica 3 192.168.0.21:/glusterfs/brick 192.168.0.22:/glusterfs/brick 192.168.0.23:/glusterfs/brick

volume create: rep-volume: success: please start the volume to access data

Step Eleven

Start the replicated volume on nextcloud1:

(nextcloud1)$ gluster volume start rep-volume

volume start: rep-volume: success

Verify the replicated volume and processes are online:

$ gluster volume status

Status of volume: rep-volume

Gluster process                             TCP Port RDMA Port Online Pid

------------------------------------------------------------------------------

Brick 192.168.0.21:/glusterfs/brick         49152 0 Y 32570

Brick 192.168.0.22:/glusterfs/brick         49152 0 Y 27175

Brick 192.168.0.23:/glusterfs/brick         49152 0 Y 25799

Self-heal Daemon on localhost               N/A N/A Y 32591

Self-heal Daemon on 192.168.0.22            N/A N/A Y 27196

Self-heal Daemon on 192.168.0.23            N/A N/A Y 25820



Task Status of Volume rep-volume

------------------------------------------------------------------------------

There are no active volume tasks

Step Twelve

Mount the replicated volume on /var/www/html. Create the directory:

$ mkdir -p /var/www/html

Step Thirteen

13) Add following line into /etc/fstab to allow auto-mount:

/dev/sdb1 /glusterfs xfs defaults,defaults 0 0

localhost:/rep-volume /var/www/html   glusterfs defaults,_netdev 0 0

Step Fourteen

Mount the GlusterFS to /var/www/html:

$ mount -a

And verify with:

$ mount | grep gluster

/dev/sdb1 on /glusterfs type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

localhost:/rep-volume on /var/www/html type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

The replicated volume is now ready and mounted in every node. We can now proceed to deploy the application.

Nextcloud Application Deployment

The following steps should be performed on nextcloud1, nextcloud2 and nextcloud3 unless otherwise specified.

Nextcloud requires PHP 7.2 and later and for CentOS distribution, we have to enable a number of repositories like EPEL and Remi to simplify the installation process.

Step One

If SELinux is enabled, disable it first:

$ setenforce 0

$ sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config

You can also run Nextcloud with SELinux enabled by following this guide.

Step Two

Install Nextcloud requirements and enable Remi repository for PHP 7.2:

$ yum install -y epel-release yum-utils unzip curl wget bash-completion policycoreutils-python mlocate bzip2

$ yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm

$ yum-config-manager --enable remi-php72

Step Three

Install Nextcloud dependencies, mostly Apache and PHP 7.2 related packages:

$ yum install -y httpd php72-php php72-php-gd php72-php-intl php72-php-mbstring php72-php-mysqlnd php72-php-opcache php72-php-pecl-redis php72-php-pecl-apcu php72-php-pecl-imagick php72-php-xml php72-php-pecl-zip

Step Four

Enable Apache and start it up:

$ systemctl enable httpd.service

$ systemctl start httpd.service

Step Five

Make a symbolic link for PHP to use PHP 7.2 binary:

$ ln -sf /bin/php72 /bin/php

Step Six

On nextcloud1, download Nextcloud Server from here and extract it:

$ wget https://download.nextcloud.com/server/releases/nextcloud-17.0.0.zip

$ unzip nextcloud*

Step Seven

On nextcloud1, copy the directory into /var/www/html and assign correct ownership:

$ cp -Rf nextcloud /var/www/html

$ chown -Rf apache:apache /var/www/html

**Note the copying process into /var/www/html is going to take some time due to GlusterFS volume replication.

Step Eight

Before we proceed to open the installation wizard, we have to disable pxc_strict_mode variable to other than "ENFORCING" (the default value). This is due to the fact that Nextcloud database import will have a number of tables without primary key defined which is not recommended to run on Galera Cluster. This is explained further details under Tuning section further down.

To change the configuration with ClusterControl, simply go to Manage -> Configurations -> Change/Set Parameters:

Change Set Parameters - ClusterControl

Choose all database instances from the list, and enter:

  • Group: MYSQLD
  • Parameter: pxc_strict_mode
  • New Value: PERMISSIVE

ClusterControl will perform the necessary changes on every database node automatically. If the value can be changed during runtime, it will be effective immediately. ClusterControl also configure the value inside MySQL configuration file for persistency. You should see the following result:

Change Set Parameter - ClusterControl

Step Nine

Now we are ready to configure our Nextcloud installation. Open the browser and go to nextcloud1's HTTP server at http://192.168.0.21/nextcloud/ and you will be presented with the following configuration wizard:

Nextcloud Account Setup

Configure the "Storage & database" section with the following value:

  • Data folder: /var/www/html/nextcloud/data
  • Configure the database: MySQL/MariaDB
  • Username: nextcloud
  • Password: (the password for user nextcloud)
  • Database: nextcloud
  • Host: 192.168.0.10:6603 (The virtual IP address with ProxySQL port)

Click "Finish Setup" to start the configuration process. Wait until it finishes and you will be redirected to Nextcloud dashboard for user "admin". The installation is now complete. Next section provides some tuning tips to run efficiently with Galera Cluster.

Nextcloud Database Tuning

Primary Key

Having a primary key on every table is vital for Galera Cluster write-set replication. For a relatively big table without primary key, large update or delete transaction would completely block your cluster for a very long time. To avoid any quirks and edge cases, simply make sure that all tables are using InnoDB storage engine with an explicit primary key (unique key does not count).

The default installation of Nextcloud will create a bunch of tables under the specified database and some of them do not comply with this rule. To check if the tables are compatible with Galera, we can run the following statement:

mysql> SELECT DISTINCT CONCAT(t.table_schema,'.',t.table_name) as tbl, t.engine, IF(ISNULL(c.constraint_name),'NOPK','') AS nopk, IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx, IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx FROM information_schema.tables AS t LEFT JOIN information_schema.key_column_usage AS c ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name AND c.constraint_name = 'PRIMARY') LEFT JOIN information_schema.statistics AS s ON (t.table_schema = s.table_schema AND t.table_name = s.table_name AND s.index_type IN ('FULLTEXT','SPATIAL'))   WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql') AND t.table_type = 'BASE TABLE' AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL')) ORDER BY t.table_schema,t.table_name;

+---------------------------------------+--------+------+-------+--------+

| tbl                                   | engine | nopk | ftidx | gisidx |

+---------------------------------------+--------+------+-------+--------+

| nextcloud.oc_collres_accesscache      | InnoDB | NOPK | | |

| nextcloud.oc_collres_resources        | InnoDB | NOPK | | |

| nextcloud.oc_comments_read_markers    | InnoDB | NOPK | | |

| nextcloud.oc_federated_reshares       | InnoDB | NOPK | | |

| nextcloud.oc_filecache_extended       | InnoDB | NOPK | | |

| nextcloud.oc_notifications_pushtokens | InnoDB | NOPK |       | |

| nextcloud.oc_systemtag_object_mapping | InnoDB | NOPK |       | |

+---------------------------------------+--------+------+-------+--------+

The above output shows there are 7 tables that do not have a primary key defined. To fix the above, simply add a primary key with auto-increment column. Run the following commands on one of the database servers, for example nexcloud1:

(nextcloud1)$ mysql -uroot -p

mysql> ALTER TABLE nextcloud.oc_collres_accesscache ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_collres_resources ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_comments_read_markers ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_federated_reshares ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_filecache_extended ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_notifications_pushtokens ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_systemtag_object_mapping ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

Once the above modifications have been applied, we can reconfigure back the pxc_strict_mode to the recommended value, "ENFORCING". Repeat step #8 under "Application Deployment" section with the corresponding value.

READ-COMMITTED Isolation Level

The recommended transaction isolation level as advised by Nextcloud is to use READ-COMMITTED, while Galera Cluster is default to stricter REPEATABLE-READ isolation level. Using READ-COMMITTED can avoid data loss under high load scenarios (e.g. by using the sync client with many clients/users and many parallel operations).

To modify the transaction level, go to ClusterControl -> Manage -> Configurations -> Change/Set Parameter and specify the following:

Nextcloud Change Set Parameter - ClusterControl

Click "Proceed" and ClusterControl will apply the configuration changes immediately. No database restart is required.

Multi-Instance Nextcloud

Since we performed the installation on nextcloud1 when accessing the URL, this IP address is automatically added into 'trusted_domains' variable inside Nextcloud. When you tried to access other servers, for example the secondary server, http://192.168.0.22/nextcloud, you would see an error that this is host is not authorized and must be added into the trusted_domain variable.

Therefore, add all the hosts IP address under "trusted_domain" array inside /var/www/html/nextcloud/config/config.php, as example below:

  'trusted_domains' =>

  array (

    0 => '192.168.0.21',

    1 => '192.168.0.22',

    2 => '192.168.0.23'

  ),

The above configuration allows users to access all three application servers via the following URLs:

Note: You can add a load balancer tier on top of these three Nextcloud instances to achieve high availability for the application tier by using HTTP reverse proxies available in the market like HAProxy or nginx. That is out of the scope of this blog post.

Using Redis for File Locking

Nextcloud’s Transactional File Locking mechanism locks files to avoid file corruption during normal operation. It's recommended to install Redis to take care of transactional file locking (this is enabled by default) which will offload the database cluster from handling this heavy job.

To install Redis, simply:

$ yum install -y redis

$ systemctl enable redis.service

$ systemctl start redis.service

Append the following lines inside /var/www/html/nextcloud/config/config.php:

  'filelocking.enabled' => true,

  'memcache.locking' => '\OC\Memcache\Redis',

  'redis' => array(

     'host' => '192.168.0.21',

     'port' => 6379,

     'timeout' => 0.0,

   ),

For more details, check out this documentation, Transactional File Locking.

Conclusion

Nextcloud can be configured to be a scalable and highly available file-hosting service to cater for your private file sharing demands. In this blog, we showed how you can bring redundancy in the Nextcloud, file system and database layers. 

 

26 Oct 2019

16 Oct 2019

20 Sep 2019

6 Sep 2019

22 Aug 2019

19 Aug 2019

Geo-Replication in Gluster



What is geo-replication?
It is as simple as having a copy of your data, somewhere else in the globe!

Data from a location, is asynchronously replicated to a secondary location so that the same data exists in both the locations.

Thus, there exists a backup, all the time, even if the data in the primary location is completely destroyed. Same data from secondary location/locations can always be retrived!

Cool!

But how does this work?
Let me explain with respect to glusterFS.

It all happens this way:
1) Lets say, you have a volume (I will call it primary volume)
2) Now you want to have it replicated.
3) So, you will create a new volume  in a different location. (I will call it  a secondary volume)
4) You want to replicate all the data from primary volume, also sync data to secondary volume whenever new changes are made to primary volume.       
5) So, you will create a gluster geo-replication session which takes care of all these replications for you:)

The primary volume is called master, from where the unidirectional sync happens to the secondary volume(the slave).

How is the connection established between the master and the slave?
Click here to know...

How is the changes in master detected?
To learn this, you should know what is xtime and stime.

stime is an extended attribue stored in each master brick root which stores the timestamp until which the changelogs are completely synced to slave.

Wondering what is changelog? Read about it here.
 
xtime is an extended attribute stored on all the file system objects, both files and directories. To each modification in a file or directory in the file system, an extended attribute with modification time is stored on it.

Gluster geo-replication detects the changes made on master by crawling the master volume.
There occur, three types of crawls:

1) Hybrid crawl/ Xsync Crawl:
It only happens when there is already data in master volume before setting up  geo-replication.

If geo-rep is setup before creation of data on master, it never goes to hybrid crawl.

On each directory, it compares stime with xtime and only enters inside if
stime < xtime ( i.e last time the data got synced on slave is lesser than the last time any modification was made on master!)

2) History crawl:
Lets assume geo-rep is in stopped state and there is one month of data pending to be synced to slave. stime would have been marked until which geo-rep is synced.
when  geo-rep is started after a month, all the changelogs are processed, this phase is essentially called as history crawl.

3) Changelog crawl:
When live changelogs or close to live changelogs are processed to sync the data to slave, we call it changelog crawl.


How does the data sync from master to slave? 
At first we see Entry Syncing, where a zero byte file is created in slave with same gfid as master.

Here, gfid is nothing but a unique identifier which identifies file in gluster volume which is similar to inode number in file system.

And then we see Data syncing.
Data syncing includes:
  • Rsync over ssh  (default method)
  • Tar  over ssh 
      (one among these can be opted for syncing the data)

      Hmm.. how are rsync and tar over ssh defferent?
      The first backup with rsync will be slow or will be similar to tar over ssh because all files are copied. Subsequently, when the new changes are made on master volume, only new changes are copied to slave, making the replication faster and efficient. 
      Thus it is recommended to use rsync for large files.

      Tar is an archive utility. Tar ball is created for all the files, unlike rsync which only considers deltas once you copy the initial file in secondary location.
      Thus it is recommended to use tar for small files.

      How frequently the sync occur?
      As I said it is asynchronous. Although the replication process may occur in near-
      real-time, it is more common for replication to occur on a scheduled basis.

      A few changelogs will be  batched together and are processed.
      • First, the entry and metadata changes are synced serially from each changelog of the batch. 
      • Data changes gets added to rsync/tarssh queue as soon as the entry and metadata are synced from first changelog of the batch. 
      • Once all the entry, meta data and data are synced within batch, the stime is updated in brick root. 
      • If something breaks in between re-processing is involved.
      Now as you have come this far, I will tell you about Geo-rep processes. These  processes efficiently take up above mentioned tasks.

      There are three precesses, namely,
      1) Monitor process
      2) worker process
      3) Agent process
       
      There is a file called gsyncd.py  inside glusterfs repository.
      It acts as a single entry point for geo-replication.

      Based on different arguments passed, it acts as moniter, worker or agent.

      When we start the geo-rep session,using the command:
      #gluster volume geo-rep <mastervol><slavehost>::<slavevol> start

      gluster management daemon will start the monitor process.

      This process monitors, worker and agent processes. If worker/agent crashes, monitor restarts them. (Note: if the agent crashes, worker automatically crashes and vice versa)

      There will be one moniter process per geo-rep session in a node.

      The agent process consumes changelogs generated by changelog xlator.
      Worker process uses parsed changelogs generated by agent process to sync the data to slave.

      Each brick will have a agent and a worker process.

      Let us try it out:
      Let us create our own geo-rep session, following the instructions mentioned in this blog

      Let us see if data is really replicated from master to slave :p

      How do we do that?
      Mount master and slave to mount points,
      using this command.

      # mount -t glusterfs `hostname`:`volume-name` /mount_point
      eg:   #mount -t glusterfs virtual_machine1:master /dir1/mount_point_master
              #mount -t glusterfs virtual_machine2:slave /dir2/mount_point_slave

       create files inside mount point of master.
      #cd  /dir1/mount_point_master
      #touch file{1..10}

      Check inside slave mount point now.
      #ls /dir2/mount_point_slave
      See that file1 to file10 are already replicated :)

      also check log files, to know more details:
      All logs can be found at:
      /var/log/glusterfs/

      geo-rep specific logs are present at:
      /var/log/glusterfs/geo-replication/

      But is there any mechanism for data retrival?
      Yes! If the master node gets detroyed or goes offline, we can promote our slave as master! All the data access can be done from the promoted volume(the new master volume). We call this procedure as  failover.

      See how...

      When the original master is back online, we can perform a failback procedure on the original slave so that it synchronizes the differences back to the original master.
       

      So this was it. Here ends my narration of geo-replication story:) Do explore more!.


      7 Aug 2019

      2 Aug 2019

      27 Jul 2019

      19 Jul 2019

      How do I manage my time at work

      time-management-tips-toggl-91af550b909338184f45a8496aa8ae9a.jpg

      Often people may wonder that how a technical people manager can handle the responsibilities of people management and the other technical responsibilities such as bug scrubbing, patch reviews, root causing critical issues, replying to community user queries et all. My recipe to this problem is simple – respect the time and utilize it at its best.
      When I start my day, the obvious thing I do is reply to the emails  where things are waiting on me and then scan through the new entries in my inbox and on an average this takes about 30 to 60 minutes.

      The next 1 to 1.5 hours I dedicate my time on patch review and bug triaging. Being a Gluster co maintainer and managing GlusterD & CLI components, I have my responsibilities to ensure the review backlog doesn’t grow and all new patches which have passed regression have reviews within 2 days of submission, ofcourse this is my own SLA and other maintainers might be following a different one. The overall time of the review varies depending on the type and complexity of the patches and it could take me to complete this in 10 minutes or even an hour. But my principle is not to overshoot it . If I need additional time, I will park the rest of the things and find time later the day or at worst case on the next day.

      Post the review backlog, my focus shifts to bug triaging of the components I maintain (both for Gluster & the downstream product RHGS). Now one might debate on this topic that why would you triage bugs regularly, but I find this frequency to be the most effective one as (a) your time to respond on any issues stay quite low (b) You never accumulate bugs staying in NEW state in Bugzilla which aren’t being ever looked at (c) the plan to address any critical bugs can be done well ahead of releases.

      Many of us think that triaging a bug means identifying the problem, root causing it and then decide on how to fix the problem and ship it for the customers/users. However, my way of understanding bug triaging is that you need to assess the severity of the problem and see if the explanation of the problem is valid or not and based on that put up a plan on when you’d want to work on it and what (tentative) release this bug should be tagged to. Of course some times, you’d need to coordinate with your fellow colleagues if a bug needs assistance from different component maintainers/owners. Also the triage results should translate into a decision of the bug – should be fixed or not? I see lot of hesitance in choosing the latter and that’s where your backlog grows! It’s absolutely fine to say that “no we can’t work on this at this moment given our roadmap and priority and this bug isn’t critical enough to impact your production, if you have any other justification, please comment”.

      So after 2 to 2.5 hours of work on emails, reviews, bug triaging I get into my regular meetings which could be team wide, one and one conversation with my directs. I hate continuous meetings, so I prefer to have a half an hour slot of gap after every (long) meetings where I can start working on my other agenda (release work, coordination/status check, providing assitance to team members in debugging problems etc).  Of course, in the middle if there’re adhoc meetings due to unplanned urgent situations you’d need to accomodate them in your schedule.

      Along with the evening meetings (program calls and other stuffs where we have colleagues from different time zones) I try to spend 30 to 60 minutes by picking up bug (if not two, at least one), working on the root cause and if possible sending a fix.  And at the end of the day, another routine of scanning through your inbox, starring them for later and finish any important people management pending items.

      I must say that all the above are applicable mostly for Monday to Thursday. I keep Friday a bit more flexible and lighter w.r.t meetings and do something which I don’t do regularly, for eg – I am writing this blog on a Friday itself!

      So far, in last 3 years of my career working as a people manager, following a time discipline and bucketing items into slots have helped me managing my time irrespective of having long commute time (23 kilometers one way and that too in Bangalore), not every day but twice or thrice in a week and I hope this small post might be helpful for others who are struggling on the time management.