11 Dec 2017

Want to Install Ceph, but afraid of Ansible?



There is no doubt that Ansible is a pretty cool automation engine for provisioning and configuration management. ceph-ansible builds on this versatility to deliver what is probably the most flexible Ceph deployment tool out there. However, some of you may not want to get to grips with Ansible before you install Ceph...weird right?

No, not really.


If you're short on time, or just want a cluster to try ceph for the first time, a more guided installation approach may help. So I started a project called ceph-ansible-copilot

The idea is simple enough; wrap the ceph-ansible playbook with a text GUI. Very 1990's, I know, but now instead of copying and editing various files you simply start the copilot tool, enter the details and click 'deploy'. The playbook runs in the background within the GUI and any errors are shown there and then...no more drowning in an ocean of scary ansible output :)

The features and workflows of the UI are described in the project page's README file.

Enough rambling, lets look at how you test this stuff out. The process is fairly straight forward;
  1. configure some hosts for Ceph
  2. create the Ansible environment
  3. run copilot
The process below describes each of these steps using CentOS7 as the deployment target for Ansible and the Ceph cluster nodes.
    1. Configure Some Hosts for Ceph
    Call me lazy, but I'm not going to tell you how to build vm's or physical servers. To follow along, the bare minimum you need are a few virtual machines - as long as they have some disks on them for Ceph, you're all set!

    2. Create the Ansible environment
    Typically for a Ceph cluster you'll want to designate a host as the deployment or admin host. The admin host is just a deployment manager, so it can be a virtual machine, a container or even a real (gasp!) server. All that really matters is that your admin host has network connectivity to the hosts you'll be deploying ceph to.

    On the admin host, perform these tasks (copilot needs ansible 2.4 or above)
    > yum install git ansible python-urwid -y
    Install ceph-ansible (full installation steps can be found here)
    > cd /usr/share
    > git clone https://github.com/ceph/ceph-ansible.git
    > cd ceph-ansible
    > git checkout master
    Setup passwordless ssh between the admin host and for candidate ceph hosts
    > ssh-keygen
    > ssh-copy-id root@<ceph_node>
    On the admin host install copilot
    > cd ~
    > git clone https://github.com/pcuzner/ceph-ansible-copilot.git
    > cd ceph-ansible-copilot
    > python setup.py install 
    3. Run copilot
    The main playbook for ceph-ansible is in /usr/share/ceph-ansible - this is where you need to run copilot from (it will complain if you try to run it in some other place!)
    > cd /usr/share/ceph-ansible
    > copilot
    Then follow the UI..

    Example Run
    Here's a screen capture showing the whole process, so you can see what you get before you hit the command line.



    The video shows the deployment of a small 3 node ceph cluster, 6 OSDs, a radosgw (for S3), and an MDS for cephfs testing. It covers the configuration of the admin host, the copilot UI and finally a quick look at the resulting ceph cluster. The video is 9mins in length, but for those of us with short attention spans, here's the timeline so you can jump to the areas that interest you.

    00:00 Pre-requisite rpm installs on the admin host
    01:12 Installing ceph-ansible from github
    01:52 Installing copilot
    02:58 Setting up passwordless ssh from the admin host to the candidate ceph hosts
    04:04 Ceph hosts before deployment
    05:04 Starting copilot
    08:10 Copilot complete, review the Ceph hosts



    What's next?
    More testing...on more and varied hardware...

    So far I've only tested 'simple' deployments using the packages from ceph.com (community deployments) against a CentOS target. So like I said, more testing is needed, a lot more...but for now there's enough of the core code there for me to claim a victory and write a blog post!

    Aside from the testing, these are the kinds of things that I'd like to see copilot handle
    • collocation rules (which daemons can safely run together)
    • resource warnings (if you have 10 HDD's but not enough RAM, or CPU...issue a warning)
    • handle the passwordless ssh setup. copilot already checks for passwordless ssh, so instead of leaving it to the admin to resolve any issues, just add another page to the UI.
    That's my wishlist - what would you like copilot to do? Leave a comment, or drop by the project on github.

    Demo'd Versions
    • copilot 0.9.1
    • ceph-ansible MASTER as at December 11th 2017
    • ansible 2.4.1 on CentOS




    4 Dec 2017

    Static Analysis for Gluster

    Static analysis programs are quite useful, but also prone to false positives. It’s really hard to keep track of static analysis failures on a fairly large project. We’ve looked at several approaches in the past. The one that we used to do was to publish a report every day which people could look at if they wished. This guaranteed that nobody looked at it. Despite knowing where to look for it, even I barely looked at it.

    The second approach was to run them twice, before your patch is merged and after your patch is merged in. If the count goes up with your patch, the test fails. This has a problem that it doesn’t account for false positives. An argument could be made that you could go fix another static analysis failure in your patch. But that means your patch now does two things, which isn’t fun for when you want to do a backport, for instance. Or even for history purposes. That’s landing two unrelated changes in one patch.

    The approach that we’ve now gone with is to have them run on a nightly basis with Jenkins. Deepshika did almost all the work for this and wrote about it on her blog. It has more details on the actual implementation. This puts all the results in one place for everyone to take a look at. Jenkins also gives us a visual view of what changed over the course of time, which wasn’t as easy in the past.

    She’s working on further improving the visual look by uniting all the jobs that are tied to static analysis. That way, we’ll have a nightly pipeline run for each branch that will put all the tests we care about for a particular branch in one place.

    1 Dec 2017

    Gluster Summit 2017

    Right after Open Source Europe, we had Gluster Summit. It was a 2-day event with talks and BoFs. I had two key things to do at the Gluster Summit. One was build out the minnowboard setup to demo Tendrl. This didn’t work out. I had volunteered to help with the video work as well. According to my plans. The setup for minnowboards would take about 1h and then I’d be free to help with camera work. I had a talk scheduled for the second day of the event. I’d have expected one of these to two wrong. I didn’t expect all to go wrong :)

    The venue had a balcony, which made for great photos

    On the first day, Amar and I arrived early and did the camera setup. The venue staff were helpful. They gave us a line out from their audio setup for the camera. Our original plan was that speakers would have a lapel mic for the camera. That was prone to errors from speakers and also would need us to check batteries every few hours. When we first tried to work with the line in, we had interference. The camera power supply wasn’t grounded (there wasn’t even a ground out. The venue staff switched out the boxes they used for line out and it worked like a charm after that.

    We did not have a good start for the demo. Jim had pre-setup the networking on the boards from home and brought them to Prague. But whatever we did, we couldn’t connect to it’s network the night before the event. That was the day we kept free to do this. That night we gave up, because we needed a monitor, an HDMI cable, and a keyboard to debug it. At the venue, we borrowed a keyboard and hooked up the board to the monitor. There was no user for dnsmasq, so it wasn’t assigning out IPs and that’s why the networking didn’t work. Once we got past that point, it was about getting the network to work with my laptop. That took a while. We decided to go with a server in the cloud as the Tendrl server. By evening, we got the playbook run and get everything installed and configured. But I’d made a mistake. I used IPs instead of FQDNs, so the dashboard wouldn’t work. This meant re-installing the whole setup. That’s the point where I gave up on it.

    We even took the group picture from the balcony

    My original content for my talk was to look at our releases. Especially to list out what we committed to at the start of the release and what we finished with. There is definitely a gap. This is common for software projects and how people estimate work. This topic was more or less covered on the first day. I instead focused on how we fail. How we fail our users, developers, and community. I followed the theme of my original talk a bit, pointing out that we can small large problems in smaller chunks.

    We’re running a marathon, not a sprint.

    29 Nov 2017

    16 Nov 2017

    Upgrading the Gluster Jenkins Server

    I’ve been wanting to work on upgrading build.gluster.org setup for ages. There’s a lot about that setup that isn’t ideal in how people use Jenkins anymore.

    We used the unix user accounts for access to Jenkins. This means Jenkins needs to read /etc/passwd and everyone has SSH access via passwords by default. Very often, the username wasn’t tied to an actual email address. I had to guess the account owner based on their usernames elsewhere. This was also open to brute force attacks. The only way to change passwords was to login to the server and run passwd command. We fixed this problem a few months ago by switching our auth to Github. Now access control is a Github group which gives you more permissions. Logging in will not give you any more permissions than not logging in.

    Our todo list during the Jenkins upgrade

    Jenkins community now recommends not running jobs on the master node at all. But our old setup depended on certain jobs always running on master. One by one, I’ve eliminated them so that they can now run on any node agent. The last job left is our release job. We make the tar from every release available on an FTP-like server. In our old setup, the this server and Jenkins were the same machine. The job ran on master and depended on them both being the same machine. We decided to split up the systems so we could take down Jenkins without any issue. We intend to fix this with an SCP command at the end of the release job to copy artifacts to the FTP-like server.

    One of the Red Hat buildings in Brno

    Now, we have a Jenkins setup that I’m happy with. At this point, we’ve fixed a vast majority of the annoying CI-related infra issues. In a few years, we’ll rip them all out and re-do them. For now, spending a week with my colleague in Brno working on an Infra sprint has been well worth our time and energy.

    5 Nov 2017

    Catching up with Infrastructure Debt

    If you run an infrastructure, there’s a good chance you have some debt tucked in your system somewhere. There’s also a good chance that you’re not getting enough time to fix those debts. There will most likely be a good reason why something is done in the way it is. This is just how things are in general. After I joined Gluster, I’ve worked with my fellow sysadmin to tackle our large infrastructure technical debt over the course of time. It goes like this:

    • We run a pretty old version of Gerrit on CentOS 5.
    • We run a pretty old version of Jenkins on CentOS 6.
    • We run CentOS 6 for all our regressions machines.
    • We run CentOS 6 for all our build machines.
    • We run NetBSD on Rackspace in a setup that is not easy to automate nor is it currently part of our automation.
    • We have a bunch of physical machines in a DC, but we haven’t had time to move our VMs over and use Rackspace as burstable capacity.

    That is in no way and exhaustive list. But we’ve managed to tackle 2.5 items from the list. Here’s what we did in order:

    • Upgraded Gerrit to the then latest version.
    • Setup Gerrit staging to test newer versions regularly for scheduling migration.
    • Created new CentOS 7 VMs on our hardware and moved the builds in there.
    • Moved Gerrit over to a new CentOS 7 host.
    • Wrote ansible scripts to manage most of Gerrit, but deployed currently only to staging.
    • Upgraded Jenkins to the latest LTS.
    • Moved Jenkins to a CentOS 7 host (Done last week, more details coming up!)

    If I look at it, it almost looks like I’ve failed. But again, like dealing with most infrastructure debt, you touch one thing and you realize it’s broken in someway and someone depended on that breakage. What I’ve done is I’ve had to pick and prioritize what things I would spend my time on. At the end of the day, I have to justify my time in terms of moving the project forward. Fixing the infrastructure debt for Gerrit was a great example. I could actually focus on it with everyone’s support. Fixing Jenkins was a priority since we wanted to use some of the newer features, again I had backing to do that. Moving things to our hardware is where things get tricky. There’s some financial goals we can hit if we make the move, but outside of that, we have no reason to move. But long-term, we want to me mostly in our hardware, since we spent money on it. This is, understandably going slow. There’s a subtle capacity difference and the noisy neighbor problem affects us quite strongly when we try to do anything in this regard.

    14 Oct 2017

    4 Oct 2017

    Containers aren’t just for applications

    Containers have grabbed so much attention because they demonstrated a way to solve the software packaging problem that the IT industry had been poking and prodding at for a very long time. Linux package management, application virtualization (in all its myriad forms), and virtual machines had all taken cuts at making it easier to bundle and install software along with its dependencies. But it was the container image format and runtime that is now standardized under the Open Container Initiative (OCI) that made real headway toward making applications portable across different systems and environments.

    Containers have also both benefited from and helped reinforce the shift toward cloud-native application patterns such as microservices. However, because the most purist approaches to cloud-native architectures de-emphasized stateful applications, the benefits of containers for storage portability haven’t received as much attention. That’s an oversight. Because it turns out that the ability to persist storage and make it portable matters, especially in the hybrid cloud environments spanning public clouds, private clouds, and traditional IT that are increasingly the norm.

    Data gravity

    One important reason that data portability matters is “data gravity,” a term coined by Dave McCrory. He’s since fleshed it out in more detail, but the basic concept is pretty simple. Because of network bandwidth limits, latency, costs, and other considerations, data “wants” to be near the applications analyzing, transforming, or otherwise working on it. This is a familiar idea in computer science. Non-Uniform Memory Access (NUMA) architectures — which describes pretty much all computer systems today to a greater or lesser degree — have similarly had to manage the physical locality of memory relative to the processors accessing that memory.

    Likewise, especially for applications that need fast access to data or that need to operate on large data sets, you need to think about where the data is sitting relative to the application using that data. And, if you decide to move an application from on-premise to a public cloud for rapid scalability or other reasons, you may find you need to move the data as well.

    Software-defined storage

    But moving data runs into some roadblocks. Networking limits and costs were and are one limitation; they’re a big part of data gravity in the first place. However, traditional proprietary data storage imposed its own restrictions. You can’t just fire up a storage array at an arbitrary public cloud provider to match the one in your own datacenter.

    Enter software-defined storage.

    As the name implies, software-defined storage decouples storage software from hardware. It lets you abstract and pool storage capacity across on-premise and cloud environments to scale independently of specific hardware components. Fundamentally, traditional storage was built for applications developed in the 1970s and 1980s. Software-defined storage is geared to support the applications of today and tomorrow, applications that look and behave nothing like the applications of the past. Among these are rapid scalability, especially for high volume unstructured data that may need to expand rapidly.

    However, with respect to data portability specifically, one of the biggest benefits of software-defined storage like Gluster is that the storage software itself runs on generic industry standard hardware and virtualized infrastructure. This means that you can spin up storage wherever it makes the most sense for reasons of cost, performance, or flexibility.

    Containerizing the storage

    What remains is to simplify the deployment of persistent software-defined storage. It turns out that containers are the answer to this as well. In effect, storage can be treated just like a containerized application within a Kubernetes cluster — Kubernetes being the orchestration tool that groups containerized application components into a complete application.
    With this approach, storage containers are deployed alongside other containers within the Kubernetes nodes. Rather than simply accessing ephemeral storage from within the container, this model deploys storage in its own containers, alongside the containerized application. For example, storage containers can implement a Red Hat Gluster Storage brick to create a highly-available GlusterFS volume that handles the storage resources present on each server node.

    Depending on system configuration, some nodes might only run storage containers, some might only run containerized applications, and some nodes might run a mixture of both. Using Kubernetes with its support for persistent storage as the overall coordination tool, additional storage containers could be easily started to accommodate storage demand, or to recover from a failed node. For instance, Kubernetes might start additional containerized web servers in response to demand or load, but might restart both application and storage containers in the event of a hardware failure.

    Kubernetes manages this through Persistent Volumes (PV). PV is a resource in the cluster just like a node is a cluster resource. PVs are a plugin related to the Kubernetes Volumes abstraction, but have a lifecycle independent of any individual pod that uses the PV. This allows for clustered storage that doesn’t depend on the availability or health of any specific application container.

    Modular apps plus data

    The emerging model for cloud-native application designs is one in which components communicate through well-documented interfaces. Whether or not a given project adopts a pure “microservices” approach, applications are generally becoming more modular and service-oriented. Dependencies are explicitly declared and isolated. Scaling is horizontal.

    27 Sep 2017

    Scale your Gluster Cluster, 1 node at a time!!

    One of the growing pains of any cluster is scaling it out, and any tool / company which made the scaling part easy are doing pretty well for themselves. One of the reasons OpenSource, Software Defined Storage Community is liking Gluster is because it can provide scale out options for Storage, and to a large extent, can handle the things smoothly!

    But regardless of all the improvements over the years, when we (I represent Gluster developer community) have to migrate data as part of scale-out operations, it is never completely pain less! User expectation and reality always ended up having some mismatches.

    In this blog, I will try to explain few steps an Admin can do to make life easy for themselves when there is scaling of cluster involved! From now on, this blog would go little more technical about Gluster’s design and few of the recent features which would help people planning to scale their storage cluster, 1 (or N) node at a time.

    Step 1: Create volume with more bricks than the number of hosts.

    A general assumption of gluster’s volume is that it should export just 1 brick from each peer involved. Also, in most of the cases, we, the developers recommend an homogeneous setup (ie, a machine with same size, same config) across all the involved nodes for better performance and ‘support-ability’.

    Starting 3.12 we have the brick mux feature, and the statfs enhancement (to divide by bricks on a backend mount), which enables creating more bricks on a node. If an admin wants to use high availability, minimum required node is 3. Also, recommended to use ‘brick-multiplexing’ to be enabled on the volume if this feature needs to be enabled!

    n1$ gluster peer probe n2
    n1$ gluster peer probe n3
    n1$ gluster volume create demo replica 3 n1:/br/b1 n2:/br/b1 n3:/br/b1 n1:/br/b2 n2:/br/b2 n3:/br/b2 n1:/br/b3 n2:/br/b3 n3:/br/b3 n1:/br/b4 n2:/br/b4 n3:/br/b4 n1:/br/b5 n2:/br/b5 n3:/br/b5 n1:/br/b6 n2:/br/b6 n3:/br/b6 n1:/br/b7 n2:/br/b7 n3:/br/b7 n1:/br/b8 n2:/br/b8 n3:/br/b8

    Notice that there are total of 24 bricks from just 3 nodes.

    Step 2: Start the volume, and consume the storage

    Nothing special. Use the volume as if you would use any Gluster volume, no special treatment!

    Step 3: Add new node to cluster, and expand your storage!

    Now is the fun part! All the trouble you took to create the volume now pays you back!

    When you add the new node, all you have to do is, run many replace-brick commands (or add-brick + remove-brick in case of plain distribute volume).

    n1$ gluster peer probe n4
    n1$ gluster volume demo replace-brick n1:/br/b2 n4:/br/b2
    n1$ gluster volume demo replace-brick n1:/br/b6 n4:/br/b6
    n1$ gluster volume demo replace-brick n2:/br/b3 n4:/br/b3
    n1$ gluster volume demo replace-brick n2:/br/b7 n4:/br/b7
    n1$ gluster volume demo replace-brick n3:/br/b4 n4:/br/b4
    n1$ gluster volume demo replace-brick n3:/br/b8 n4:/br/b8

    Notice that the above commands makes sure different bricks from different replica pair moves to new node. So, technically it becomes 6 bricks on each node from 8 bricks.

    Step 4: Happy Scaling!

    With the above steps, you would notice a significant difference in the way Gluster handled scale-out. This doesn’t need any further rebalance operations, which solves the issue of more than required data being migrated with Gluster Scale out operation!

    All the above steps will lead to migration of data with ‘self-heal’ instead of rebalance. Make sure that you complete the ‘healing’ process before adding any further nodes.

    Step 5: Grow!

    As and when you need to grow your storage, repeat the steps 3 and 4.

    Lets take the example to add one more machine after some time to this volume.

    n1$ gluster peer probe n5
    n1$ gluster volume demo replace-brick n1:/br/b3 n5:/br/b3
    n1$ gluster volume demo replace-brick n2:/br/b4 n5:/br/b4
    n1$ gluster volume demo replace-brick n3:/br/b5 n5:/br/b5
    n1$ gluster volume demo replace-brick n4:/br/b6 n5:/br/b6

    Notice that at this time, the bricks are balanced as 5,5,5,5,4, on each node respectively. Also this can be performed only after all the pending self-heal counts are 0 due to previous set of operation in Step3.

    NOTE:

    Limitations and Concerns in this approach:

    • Snapshots wouldn’t work seemlessly, as there are brick movements here.
    • Quota: Technically, it should work fine, but not validated with tests.
    • Prone to manual errors while giving ‘replace-brick’ command
    • Choose the proper bricks to migrate, or else, we would loose the good copy of the data.
    • With this approach, you can’t scale to any number of machines unless you run rebalance operation like earlier.
    • Notice that the ‘inode’ usage on the brick mount point will be very high in this model. Decide to use this model after you are clear about the inode usage.

    8 Sep 2017

    Sneak peak into Gluster’s native subdir mount feature

    Gluster’s recently announced glusterfs-3.12 release brings feature of sub-directory mount option in native fuse mount. In this post, I would like to give snippets of how the functionality works!

    Commit message

    Below is the snippet from commit message:

    glusterfsd: allow subdir mount
    Changes:
    1. Take subdir mount option in client (mount.gluster / glusterfsd)
    2. Pass the subdir mount to server-handshake (from client-handshake)
    3. Handle subdir-mount dir’s lookup in server-first-lookup and handle all fops resolution accordingly with proper gfid of subdir
    4. Change the auth/addr module to handle the multiple subdir entries in option, and valid parsing.
    How to use the feature:
    # mount -t glusterfs $hostname:/$volname/$subdir /$mount_point 
    Or
    # mount -t glusterfs $hostname:/$volname -osubdir_mount=$subdir /$mount_point
    Options can be set like:
    # gluster volume set <volname> auth.allow "/(192.168.10.*|192.168.11.*),/subdir1(192.168.1.*),/subdir2(192.168.8.*)”

    I am a sys-admin, why do I need this feature?

    This feature will just provide the namespace isolation for the separate clients where a single Gluster volume can be shared to many different clients, and they all can be mounting only subset of the volume namespace.

    This can also be seen as NFS’s subdirectory mount feature, where one can also export a subdirectory of the already exported volume. If you have a use case where you need to restrict the full access to volume (or other user’s data), this feature can be used.

    All the features of Gluster will work with subdir mount. Snapshot work at volume level, and hence we can’t take just a single directory snapshot. Other than this, one can continue to use the feature.

    More things to know before using the feature

    For any user who is starting fresh with glusterfs-3.12 or later, this feature comes as default, and with the default authentication being "*” (ie, allow everyone) for the volume, any given subdirectory in the volume can be mounted by default. If admin sets the auth.allow option to control the access, then only the directories present in auth-allow string will be allowed to mount.

    If one has already set the auth.allow option then, make sure to change the format same as described above in snippet.

    Try out the option, and write to us at gluster-users@gluster.org

    6 Sep 2017

    Where to buy bitcoin in Australia

    This past week the Bitcoin drop hit the headlines quite a few times. China's ICO regulation announcement caused quite the stir and opening the opportunity for many spectators to jump in.

    However purchasing bitcoin in Australia is not as simple as that. The past week I went through three of the most popular options. Jumped through the loops of identity verification and played the waiting game hoping the prices won't shoot up again while my bank deposit took place.

    Independent Reserve

    Independent Reserve is a popular exchange. They allow trading in multiple AUD pairings as well as USD and NZD.

    The best thing about Independent Reserve is their verification and purchase time, I was able to verify and place an instant order with Poli in less than 1 hour. Unlike Cointree, your poli payments allow you to withdraw the Bitcoin you purchase immediately.

    This was my biggest savior as it allowed me to withdraw my funds straight away into exchanges like Bitrex or Binance and get back to trading.

    Their support is definitely amazing, I was able to increase my account limits within less then 1 hour.

    If you want to buy in NOW then these guys are the best choice considering the speed of ID verification and payment. Kudos to them.

    If you decide to use the Independent Reserve, please checkout with my referral code https://www.independentreserve.com?invite=WJPMJN

    BTCMarkets

    BTCMarkets is an exchange where you place buy and sell orders for the amount of bitcoin you want to buy. They currently have a number of pairings like Bitcoin, Ethereum, Litecoin and Monero.

    They charge a 0.85% sliding commission which means the more you trade the lower the commission. Depositing funds is done through Poli and Bpay.

    BTCMarkets seems to be the most popular with multiple trades happening every minute so you are not left there waiting for your buy order to be fulfilled (assuming you place a reasonable request).

    I saw a large volume of orders ranging from $200 all the way up to $100k+ so it's definitely not a small time exchange.

    Their support is however an absolute let down. Prepare to wait days for a unhelpful response. My poli deposit has taken almost a whole week and is still pending. There are lots of complaints on their Facebook page about slow payments too.

    If you wish to get in quick BTCMarkets is not the option as their verification process will take up to 10 days as they opt to send you a letter in the mail for ID verification. Once you're in though, the trade volume is definitely looks good.

    Cointree

    Unlike the other two options, Cointree only allows you to buy at the "market" price. You must put faith in their business model where they promise to find you the best possible price.

    They charge a 3% commission which can be quite high if you plan on purchasing a large amount.

    Cointree's payment model is not very comforting as you will be required to place an order without knowing exactly how much you are getting in return.

    Cointree has three payment methods:

    • Bank transfer payments (with a limit of $500?!) can take up to 2-3 days, so you must wait and pray that the price does not fluctuate too much.
    • Their Poli payment option is "instant" so you are able to purchase at their current rate, however your Bitcoin is locked in their account for 1-2 days until the funds clear.
    • Their over the counter deposit at NAB bank allows you to deposit up to $5,000 and they claim it takes about 30 minutes for the confirmation.

    If you simply want to buy Bitcoin without worrying about the price, then Cointree will be the best option for you.

    If you understand the bitcoin market then their business model may not be the best fit for you. For example, I put through 1 bank transfer order during the dip, however by the time it took for my funds to clear I ended up purchasing at a higher rate. My poli transfer was "instant" but locked in the account for 2 days, not allowing me to withdraw the funds and trade in the exchanges. (This was depressing). This is unlike independent reserve who let you buy and withdraw instantly.

    Cointree support seems to be very responsive, live chat is always online however I found to be rather rude and unhelpful.

    If you do decide to end up using Cointree please use my referral link https://www.cointree.com.au/?r=3300


    In summary, if you want the best price and just to rely on someone, use the Cointree Poli instant payment. That way you can lock in a price which is generally slightly lower than other vendors. Don't bother with their Bank Transfer or you will take the risk of a price hike like I did.

    Use Independent Reserve if you want quick bitcoin to take to the exchanges.

    As always, DYOR and good luck!

    21 Aug 2017

    Clang Analyze for Gluster

    Deepshika recently worked on getting a clang analyze job for Gluster setup with Jenkins. This job worked on both our laptops, but not on our build machines that run CentOS. It appears that the problem was clang on CentOS is 3.4 vs 4.0 on Fedora 26. It fails because one of our dependencies need -fno-stack-protector, which wasn’t in clang until 3.8 or so. It’s been on my list of things to fix. I realized that the right way would be to get a newer version of clang on Fedora. I could have just compiled clang or build 4.0 packages but I didn’t want to end up having to maintain the package for our specific install. I decided to reduce complexity by doing a compilation inside a Fedora 6 chroot. This sounded like the least likely to add maintenance burden. When I looked for documentation on how to go about this, I couldn’t find much. The mock man page, however, is very well written and that’s all I needed. This is the script I used comments about each step.

    #!/bin/bash
        # Create a new chroot
        sudo mock -r fedora-26-x86_64 --init
    
        # Install the build dependencies
        sudo mock -r fedora-26-x86_64 --install langpacks-en glibc-langpack-en automake autoconf libtool flex bison openssl-devel libxml2-devel python-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel fuse-devel redhat-rpm-config clang clang-analyzer git
    
        # Copy the Gluster source code inside the chroot at /src
        sudo mock -r fedora-26-x86_64 --copyin $WORKSPACE /src
    
        # Execute commands in the chroot to build with clang
        sudo mock -r fedora-26-x86_64 --chroot "cd /src && ./autogen.sh"
        sudo mock -r fedora-26-x86_64 --chroot "cd /src && ./configure CC=clang --enable-gnfs --enable-debug"
        sudo mock -r fedora-26-x86_64 --chroot "cd /src && scan-build -o /src/clangScanBuildReports -v -v --use-cc clang --use-analyzer=/usr/bin/clang make"
    
        # Copy the output back into the working directory
        sudo mock -r fedora-26-x86_64 --copyout /src/clangScanBuildReports $WORKSPACE/clangScanBuildReports
    
        # Clean up the chroot
        sudo mock -r fedora-26-x86_64 --clean

    16 Aug 2017

    GlusterFS 3.8.15 is available, likely the last 3.8 update

    The next Long-Term-Maintenance release for Gluster is around the corner. Once GlusterFS-3.12 is available, the oldest maintained version (3.8) will be retired and no maintenance updates are planned. With this last update to GlusterFS-3.8 a few more bugs have been fixed.

    Packages for this release will become available for the different distributions and their versions listed on the community packages page.

    Release notes for Gluster 3.8.15

    This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9, 3.8.10, 3.8.11, 3.8.12, 3.8.13 and 3.8.14 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

    End Of Life Notice

    This is most likely the last bugfix release for the GlusterFS 3.8 Long-Term-Support version. GlusterFS 3.12 is planned to be released at the end of August 2017 and will be the next Long-Term-Support version. It is highly recommended to upgrade any Gluster 3.8 environment to either the 3.10 or 3.12 release. More details about the different Long-Term-Support versions can be found on the release schedule.

    Bugs addressed

    A total of 4 patches have been merged, addressing 4 bugs:
    • #1470495: gluster volume status --xml fails when there are 100 volumes
    • #1471613: metadata heal not happening despite having an active sink
    • #1480193: Running sysbench on vm disk from plain distribute gluster volume causes disk corruption
    • #1481398: libgfapi: memory leak in glfs_h_acl_get

    29 Jul 2017

    Hyper-converged GlusterFS + heketi on Kubernetes

    gluster-kubernetes is a project to provide Kubernetes administrators a mechanism to easily deploy a hyper-converged GlusterFS cluster along with heketi onto an existing Kubernetes cluster. This is a convenient way to unlock the power of dynamically provisioned, persistent GlusterFS volumes in Kubernetes.

    Link: https://github.com/gluster/gluster-kubernetes

    Component Projects

    • Kubernetes, the container management system.
    • GlusterFS, the scale-out storage system.
    • heketi, the RESTful volume management interface for GlusterFS.

    Presentations

    You can find slides and videos of community presentations here.

    >>> Video demo of the technology! <<<

    Documentation

    Quickstart

    You can start with your own Kubernetes installation ready to go, or you can use the vagrant setup in the vagrant/directory to spin up a Kubernetes VM cluster for you. To run the vagrant setup, you'll need to have the following installed:

    • ansible
    • vagrant
    • libvirt or VirtualBox

    To spin up the cluster, simply run ./up.sh in the vagrant/ directory.

    Next, copy the deploy/ directory to the master node of the cluster.

    You will have to provide your own topology file. A sample topology file is included in the deploy/ directory (default location that gk-deploy expects) which can be used as the topology for the vagrant libvirt setup. When creating your own topology file:

    • Make sure the topology file only lists block devices intended for heketi’s use. heketi needs access to whole block devices (e.g. /dev/sdb, /dev/vdb) which it will partition and format.
    • The hostnames array is a bit misleading. manage should be a list of hostnames for the node, but storage should be a list of IP addresses on the node for backend storage communications.

    If you used the provided vagrant libvirt setup, you can run:

    $ vagrant ssh-config > ssh-config
    $ scp -rF ssh-config ../deploy master:
    $ vagrant ssh master
    [vagrant@master]$ cd deploy
    [vagrant@master]$ mv topology.json.sample topology.json

    The following commands are meant to be run with administrative privileges (e.g. sudo su beforehand).

    At this point, verify the Kubernetes installation by making sure all nodes are Ready:

    $ kubectl get nodes
    NAME STATUS AGE
    master Ready 22h
    node0 Ready 22h
    node1 Ready 22h
    node2 Ready 22h

    NOTE: To see the version of Kubernetes (which will change based on latest official releases) simply do kubectl version. This will help in troubleshooting.

    Next, to deploy heketi and GlusterFS, run the following:

    $ ./gk-deploy -g

    If you already have a pre-existing GlusterFS cluster, you do not need the -g option.

    After this completes, GlusterFS and heketi should now be installed and ready to go. You can set the HEKETI_CLI_SERVERenvironment variable as follows so that it can be read directly by heketi-cli or sent to something like curl:

    $ export HEKETI_CLI_SERVER=$(kubectl get svc/heketi --template 'http://{{.spec.clusterIP}}:{{(index .spec.ports 0).port}}')
    $ echo $HEKETI_CLI_SERVER
    http://10.42.0.0:8080
    $ curl $HEKETI_CLI_SERVER/hello
    Hello from Heketi

    Your Kubernetes cluster should look something like this:

    $ kubectl get nodes,pods
    NAME STATUS AGE
    master Ready 22h
    node0 Ready 22h
    node1 Ready 22h
    node2 Ready 22h
    NAME READY STATUS RESTARTS AGE
    glusterfs-node0-2509304327-vpce1 1/1 Running 0 1d
    glusterfs-node1-3290690057-hhq92 1/1 Running 0 1d
    glusterfs-node2-4072075787-okzjv 1/1 Running 0 1d
    heketi-3017632314-yyngh 1/1 Running 0 1d

    You should now also be able to use heketi-cli or any other client of the heketi REST API (like the GlusterFS volume plugin) to create/manage volumes and then mount those volumes to verify they're working. To see an example of how to use this with a Kubernetes application, see the following:

    Hello World application using GlusterFS Dynamic Provisioning

    14 Jul 2017

    GlusterFS 3.8.14 is here, 3.8 even closer to End-Of-Life

    The 10th of the month has passed again, that means a 3.8.x update can't be far out. So, here it is, we're announcing the availability of glusterfs-3.8.14. Note that this is one of the last updates in the 3.8 Long-Term-Maintenance release stream. This schedule on the website shows what options you have for upgrading your environment. Remember that many distributions have packages included in their standard repositories, and other versions might be available from external locations. All the details about what packages to find where are on the Community Package page in the docs.

    Release notes for Gluster 3.8.14

    This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9, 3.8.10, 3.8.11, 3.8.12 and 3.8.13 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

    Bugs addressed

    A total of 3 patches have been merged, addressing 2 bugs:
    • #1462447: brick maintenance - no client reconnect
    • #1467272: Heal info shows incorrect status

    28 Jun 2017

    GlusterFS 3.8.13 update available, and 3.8 nearing End-Of-Life

    The Gluster releases follow a 3-month cycle and, with alternating Short-Term-Maintenance and Long-Term-Maintenance versions. GlusterFS 3.8 is currently the oldest Long-Term-Maintenance release, and will become End-Of-Life with the GlusterFS 3.12 version. If all goes according to plan, 3.12 will get released in August and is the last 3.x version before Gluster 4.0 hits the disks.

    There will be a few more releases in the GlusterFS 3.8 line, but users should start to plan an upgrade to a version that receives regular bugfix updates after August.

    Release notes for Gluster 3.8.13

    This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9, 3.8.10, 3.8.11 and 3.8.12 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

    Bugs addressed

    A total of 13 patches have been merged, addressing 8 bugs:
    • #1447523: Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
    • #1449782: quota: limit-usage command failed with error " Failed to start aux mount"
    • #1449941: When either killing or restarting a brick with performance.stat-prefetch on, stat sometimes returns a bad st_size value.
    • #1450055: [GANESHA] Adding a node to existing cluster failed to start pacemaker service on new node
    • #1450380: GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers
    • #1450937: [New] - Replacing an arbiter brick while I/O happens causes vm pause
    • #1460650: posix-acl: Whitelist virtual ACL xattrs
    • #1460661: "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf

    22 May 2017

    Enjoy more bugfixes with GlusterFS 3.8.12

    Like every month, there is an update for the GlusterFS 3.8 stable version. A few more bugfixes have been included in this release. Packages are already available for many distributions, some distributions might still need to promote the update from their testing repository to release, so hold tight if there is no update for your favourite OS yet.

    Release notes for Gluster 3.8.12

    This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9, 3.8.10 and 3.8.11 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

    Bugs addressed

    A total of 13 patches have been merged, addressing 11 bugs:
    • #1440228: NFS Sub-directory mount not working on solaris10 client
    • #1440635: Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance
    • #1440810: Update rfc.sh to check Change-Id consistency for backports
    • #1441574: [geo-rep]: rsync should not try to sync internal xattrs
    • #1441930: [geo-rep]: Worker crashes with [Errno 16] Device or resource busy: '.gfid/00000000-0000-0000-0000-000000000001/dir.166 while renaming directories
    • #1441933: [Geo-rep] If for some reason MKDIR failed to sync, it should not proceed further.
    • #1442933: Segmentation fault when creating a qcow2 with qemu-img
    • #1443012: snapshot: snapshots appear to be failing with respect to secure geo-rep slave
    • #1443319: Don't wind post-op on a brick where the fop phase failed.
    • #1445213: Unable to take snapshot on a geo-replicated volume, even after stopping the session
    • #1449314: [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job always failed on win7-32/win2012/win2k8R2 guest

    4 May 2017

    2 May 2017

    Struggling to containerize stateful applications in the cloud?

    Struggling to containerize stateful applications in the cloud? Here’s how with Red Hat Gluster Storage.

    The newest release of Red Hat’s Reference Architecture “OpenShift Container Platform 3.5 on Amazon Web Services” now incorporates container-native storage, a unique approach based on Red Hat Gluster Storage to avoid lock-in, enable stateful applications, and simplify those applications’ high availability.

    In the beginning, everything was so simple. Instead of going through the bureaucracy and compliance-driven process of requesting compute, storage, and networking resources, I would pull out my corporate credit card and register at the cloud provider of my choice. Instead of spending weeks forecasting the resource needs and costs of my newest project, I would get started in less than 1 hour. Much lower risk, virtually no capital expenditure for my newest pet project. And seemingly endless capacity — well, as long as my credit card was covered. If my project didn’t turn out to be a thing, I didn’t end up with excess infrastructure, either.

    Until I found out that basically what I was doing was building my newest piece of software against a cloud mainframe. Not directly, of course. I was still operating on top of my operating system with the libraries and tools of my choice, but essentially I spend significant effort getting to that point with regards to orchestration and application architecture. And these are not easily ported to another cloud provider.

    I realize that cloud providers are vertically integrated stacks, just as mainframes were. Much more modern and scalable with an entirely different cost structure — but, still, eventually and ultimately, lock-in.

    Avoid provider lock-in with OpenShift Container Platform

    This is where OpenShift comes in. I take orchestration and development cycles to a whole new level when I stop worrying about operating system instances, storage capacity, network overlays, NAT gateways, firewalls — all the things I need to make my application accessible and provide value.

    Instead, I deal with application instances, persistent volumes, services, replication controllers, and build configurations — things that make much more sense to me as an application developer as they are closer to what I am really interested in: deploying new functionality into production. Thus, OpenShift offers abstraction on top of classic IT infrastructure and instead provides application infrastructure. The key here is massive automation on top of the concept of immutable infrastructure, thereby greatly enhancing the capability to bring new code into production.

    The benefit is clear: Once I have OpenShift in place, I don’t need to worry about any of the underlying infrastructure — I don’t need to be aware of whether I am actually running on OpenStack, VMware, Azure, Google Cloud, or Amazon Web Services (AWS). My new common denominator is the interface of OpenShift powered by Kubernetes, and I can forget about what’s underneath.

    Well, not quite. While OpenShift provides a lot of drivers for various underlying infrastructure, for instance storage, they are all somewhat different. Their availability, performance, and feature set is tied to the underlying provider, for instance Elastic Block Storage (EBS) on AWS. I need to make sure that critical aspects of the infrastructure below OpenShift are reflected in OpenShift topology. A good example are AWS availability zones (AZs): They are failure domains in a region across which an application instance should be distributed to avoid downtime in the event a single AZ is lost. So OpenShift nodes need to be deployed in multiple AZs.

    This is where another caveat comes in: EBS volumes are present only inside a single AZ. Therefore, my application must replicate the data across other AZs if it uses EBS to store it.

    So there are still dependencies and limitations a developer or operator must be aware of, even if OpenShift has drivers on board for EBS and will take care about provisioning.

    Introducing container-native storage

    With container-native storage (CNS), we now have a robust, scalable, and elastic storage service out-of-the-box for OpenShift Container Platform — based on Red Hat Gluster Storage. The trick: GlusterFS runs containerized on OpenShift itself. Thus, it runs on any platform that OpenShift is supported on — which is basically everything, from bare metal, to virtual, to private and public cloud.

    With CNS, OpenShift gains a consistent storage feature set across, and independent of, all supported cloud providers. It’s deployed with native OpenShift / Kubernetes resources, and GlusterFS ends up running in pods as part of a DaemonSet:

    [ec2-user@ip-10-20-4-55 ~]$ oc get pods
    NAME READY STATUS RESTARTS AGE
    glusterfs-0bkgr 1/1 Running 9 7d
    glusterfs-4fmsm 1/1 Running 9 7d
    glusterfs-bg0ls 1/1 Running 9 7d
    glusterfs-j58vz 1/1 Running 9 7d
    glusterfs-qpdf0 1/1 Running 9 7d
    glusterfs-rkhpt 1/1 Running 9 7d
    heketi-1-kml8v 1/1 Running 8 7d

    The pods are running in privileged mode to access the nodes’ block device directly. Furthermore, for optimal performance, the pods are using host-networking mode. This way, OpenShift nodes are running a distributed, software-defined, scale-out file storage service, just as any distributed micro-service application.

    There is an additional pod deployed that runs heketi — a RESTful API front end for GlusterFS. OpenShift natively integrates via a dynamic storage provisioner plug-in with this service to request and delete storage volumes on behalf of the user. In turn, heketi controls one or more GlusterFS Trusted Storage Pools.

    Container-native storage on Amazon Web Services

    The EBS provisioner has been available for OpenShift for some time. To understand what changes with CNS on AWS, a closer look at how EBS is accessible to OpenShift is in order.

    Dynamic provisioning EBS volumes are dynamically created and deleted as part of storage provisioning requests (PersistentVolumeClaims) in OpenShift.

    Local block storage
    EBS appears to the EC2 instances as a local block device. Once provisioned, it is attached to the EC2 instance, and a PCI interrupt is triggered to inform the operating system.

    NAME                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    xvda 202:0 0 15G 0 disk
    ├─xvda1 202:1 0 1M 0 part
    └─xvda2 202:2 0 15G 0 part /
    xvdb 202:16 0 25G 0 disk
    └─xvdb1 202:17 0 25G 0 part
    ├─docker_vol-docker--pool_tmeta 253:0 0 28M 0 lvm
    │ └─... 253:2 0 23.8G 0 lvm
    │ ├─... 253:8 0 3G 0 dm
    │ └─... 253:9 0 3G 0 dm
    └─docker_vol-docker--pool_tdata 253:1 0 23.8G 0 lvm
    └─docker_vol-docker--pool 253:2 0 23.8G 0 lvm
    ├─... 253:8 0 3G 0 dm
    └─... 253:9 0 3G 0 dm
    xvdc 202:32 0 50G 0 disk
    xvdd 202:48 0 100G 0 disk

    OpenShift on AWS also uses EBS to back local docker storage. EBS storage is formatted with a local filesystem like XFS..

    Not shared storage EBS volumes cannot be attached to more than one EC2 instance. Thus, all pods mounting an EBS-based PersistentVolume in OpenShift must run on the same node. The local filesystem on top of the EBS block device does not support clustering either.

    AZ-local storage EBS volumes cannot cross AZs. Thus, OpenShift cannot failover pods mounting EBS storage into different AZs. Basically, an EBS volume is a failure domain.

    Performance characteristics
    The type of EBS storage, as well as capacity, must be selected up front. Specifically, for fast storage a certain minimum capacity must be requested to have a minimum performance level in terms of IOPS.

    This is the lay of the land. While these characteristics may be acceptable for stateless applications that only need to have local storage, they become an obstacle for stateful applications.

    People want to containerize databases, as well. Following a micro-service architecture where every service maintains its own state and data model, this request will become more common. The nature of these databases differs from the classic, often relational, database management system IT organizations have spent millions on: They are way smaller and store less data than their big brother from the monolithic world. Still, with the limitations of EBS, I would need to architect replication and database failover around those just to deal with a simple storage failure.

    Here is what changes with CNS:

    Dynamic provisioning The user experience actually doesn’t change. CNS is represented like any storage provider in OpenShift, by a StorageClass. PersistentVolumeClaims (PVCs) are issued against it, and the dynamic provisioner for GlusterFS creates the volume and returns it as a PersistentVolume (PV). When the PVC is deleted, the GlusterFS volume is deleted, as well.

    Distributed file storage on top of EBS
    CNS volumes are basically GlusterFS volumes, managed by heketi. The volumes are built out of local block devices of the OpenShift nodes backed by EBS. These volumes provide shared storage and are mounted on the OpenShift nodes with the GlusterFS FUSE client.

    [ec2-user@ip-10-20-5-132 ~]$ mount
    ...
    10.20.4.115:vol_0b801c15b2965eb1e5e4973231d0c831 on /var/lib/origin/openshift.local.volumes/pods/80e27364-2c60-11e7-80ec-0ad6adc2a87f/volumes/kubernetes.io~glusterfs/pvc-71472efe-2a06-11e7-bab8-02e062d20f83 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
    ...

    Container-shared storage Multiple pods can mount and write to the same volume. The access mode for the corresponding node is known as “RWX” — read-write many. The containers can run on different OpenShift nodes, and the dynamic provisioner will mount the GlusterFS volume on the right nodes accordingly. Then, this local mount directory is bind-mounted to the container.

    Cross-availability zone
    CNS is deployed across AWS AZs. The integrated, synchronous replication of GlusterFS will mirror every write 3 times. GlusterFS is deployed across OpenShift nodes in at least different AZs, and thus the storage is available in all zones. The failure of a single GlusterFS pod, an OpenShift node running the pod, or a block device accessed by the pod will have no impact. Once the failed resources come back, the storage is automatically re-replicated. CNS is actually aware of the failure zones as part of the cluster topology and will schedule new volumes, as well as recovery, so that there is no single point of failure.

    Predictable performance
    CNS storage performance is not tied to the size of storage request by the user in OpenShift. It’s the same performance whether 1GB or 100GB PVs are requested.

    Storage performance tiers
    CNS allows for multiple GlusterFS Trusted Storage Pools to be managed at once. Each pool consists of at least 3 OpenShift nodes running GlusterFS pods. While the OpenShift nodes belong to a single OpenShift cluster, the various GlusterFS pods form their own Trusted Storage Pools. An administrator can use this to equip the nodes with different kinds of storage and offer their pools with CNS as distinct storage tiers in OpenShift, via its own StorageClass. An administrator instance might, for example, run CNS on 3 OpenShift nodes with SSD (e.g., EBS gp2) storage and call it “fast,” whereas another set of OpenShift nodes with magnetic storage (e.g., EBS st1) runs a separate set of GlusterFS pods as an independent Trusted Storage Pool, represented with a StorageClass called “capacity.”

    This is a significant step toward simplifying and abstracting provider infrastructure. For example, a MySQL database service running on top of OpenShift is now able to survive the failure of an AWS AZ, without needing to set up MySQL Master-Slave replication or change the micro-service to replicate data on its own.

    Storage provided by CNS is efficiently allocated and provides performance with the first Gigabyte provisioned, thereby enabling storage consolidation. For example, consider six MySQL database instances, each in need of 25 GiB of storage capacity and up to 1500 IOPS at peak load. With EBS, I would create six EBS volumes, each with at least 500 GiB capacity out of the gp2 (General Purpose SSD) EBS tier, in order to get 1500 IOPS guaranteed. Guaranteed performance is tied to provisioned capacity with EBS.
    With CNS, I can achieve the same using only 3 EBS volumes at 500 GiB capacity from the gp2 tier and run these with GlusterFS. I would create six 25 GiB volumes and provide storage to my databases with high IOPS performance, provided they don’t peak all at the same time.

    Doing that, I would halve my EBS cost and still have capacity to spare for other services. My read IOPS performance is likely even higher because in CNS with 3-way replication I would read from data distributed across 3×1500 IOPS gp2 EBS volumes.

    Finally, the setup for CNS is very simple and can run on any OpenShift installation based on version 3.4 or newer.

    This way, no matter where I plan to run OpenShift (i.e., which cloud provider currently offers lowest prices), I can rely on the same storage features and performance. Furthermore, the Storage Service grows with the OpenShift cluster but still provides elasticity. Only a subset of OpenShift nodes must run CNS, at least 3 ideally across 3 AZs.

    Deploying container-native storage on AWS

    Installing OpenShift on AWS is dramatically simplified based on the OpenShift on Amazon Web Services Reference Architecture. A set of Ansible playbooks augments the existing openshift-ansible installation routine and creates all the required AWS infrastructure automatically.

    A simple python script provides a convenient wrapper to the playbooks found in the openshift-ansible-contrib repository on GitHub for deploying on AWS.

    All the heavy lifting of setting up Red Hat OpenShift Container Platform on AWS is automated with best practices incorporated.

    The deployment finishes with an OpenShift Cluster with 3 master nodes, 3 infrastructure nodes, and 2 application nodes deployed in a highly available fashion across AWS AZs. The external and internal traffic is load balanced, and all required network, firewall, and NAT resources are stood up.

    Since version 3.5, the reference architecture playbooks now ship with additional automation to make deployment of CNS as easy. Through additional AWS CloudFormation templates and Ansible playbook tasks, the additional, required infrastructure is stood up. This mainly concerns provisioning of additional OpenShift nodes with an amended firewall configuration, additional EBS volumes, and then joining them to the existing OpenShift cluster.

    In addition, compared to previous releases, the CloudFormation templates now emit more information as part of the output. These are picked up by the playbooks to further reduce the information needed from the administrator. They will simply get the right information from the existing CloudFormation stack to retrieve the proper integration points.

    The result is AWS infrastructure ready for the administrator to deploy CNS. Most of the manual steps of this process can therefore be avoided. Three additional app nodes are deployed with configurable instance type and EBS volume type. Availability zones of the selected AWS region are taken into account.

    Subsequent calls allow for provisioning of additional CNS pools. The reference architecture makes reasonable choices for the EBS volume type and the EC2 instance with a balance between running costs and initial performance. The only thing left for the administrator to do is to run the cns-deploy utility and create a StorageClass object to make the new storage service accessible to users.

    At this point, the administrator can choose between labeling the nodes as regular application nodes or provide a storage-related label that would initially exclude them from the OpenShift scheduler for regular application pods.

    Container-ready storage

    The reference architecture also incorporates the concept of Container-Ready Storage (CRS). In this deployment flavor, GlusterFS runs on dedicated EC2 instances with a heketi-instance deployed separately, both running without containers as ordinary system services. The difference is that these instances are not part of the OpenShift cluster. The storage service is, however, made available to, and used by, OpenShift in the same way. If the user, for performance or cost reasons, wants the GlusterFS storage layer outside of OpenShift, this is made possible with CRS. For this purpose, the reference architecture ships add-crs-storage.py to automate the deployment in the same way as for CNS.

    Verdict

    CNS provides further means of OpenShift Container Platform becoming an equalizer for application development. Consistent storage services, performance, and management are provided independently of the underlying provider platform. Deployment of data-driven applications is further simplified with CNS as the backend. This way, not only stateless but also stateful applications become easy to manage.

    For developers, nothing changes: The details of provisioning and lifecycle of storage capacity for containerized applications is transparent to them, thanks to CNS’s integration with native OpenShift facilities.

    For administrators, achieving cross-provider, hybrid-cloud deployments just became even easier with the recent release of the OpenShift Container Platform 3.5 on Amazon Web Service Reference Architecture. With just two basic commands, an elastic and fault-tolerant foundation for applications can be deployed. Once set up, growth becomes a matter of adding nodes.

    It is now possible to choose the most suitable cloud provider platform without worrying about various tradeoffs between different storage feature sets or becoming too close to one provider’s implementation, thereby avoiding lock-in long term.

    The reference architecture details the deployment and resulting topology. Access the document here.

    Originally published at redhatstorage.redhat.com on May 2, 2017 by Daniel Messer.

    18 Apr 2017

    Bugfix release GlusterFS 3.8.11 has landed

    An other month has passed, and more bugs have been squashed in the 3.8 release. Packages should be available or arrive soon at the usual repositories. The next 3.8 update is expected to be made available just after the 10th of May.

    Release notes for Gluster 3.8.11

    This is a bugfix release. The Release Notes for 3.8.0, 3.8.1, 3.8.2, 3.8.3, 3.8.4, 3.8.5, 3.8.6, 3.8.7, 3.8.8, 3.8.9 and 3.8.10 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.8 stable release.

    Bugs addressed

    A total of 15 patches have been merged, addressing 13 bugs:
    • #1422788: [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible
    • #1427390: systemic testing: seeing lot of ping time outs which would lead to splitbrains
    • #1430845: build/packaging: Debian and Ubuntu don't have /usr/libexec/; results in bad packages
    • #1431592: memory leak in features/locks xlator
    • #1434298: [Disperse] Metadata version is not healing when a brick is down
    • #1434302: Move spit-brain msg in read txn to debug
    • #1435645: Disperse: Provide description of disperse.eager-lock option.
    • #1436231: Undo pending xattrs only on the up bricks
    • #1436412: Unrecognized filesystems (i.e. btrfs, zfs) log many errors about "getinode size"
    • #1437330: Sharding: Fix a performance bug
    • #1438424: [Ganesha + EC] : Input/Output Error while creating LOTS of smallfiles
    • #1439112: File-level WORM allows ftruncate() on read-only files
    • #1440635: Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance