Michael Stanclift

Virtually Benevolent & Fully Federated

Just under a year ago I started working on a side project, which I talked about here, called Gravity Sync.

The basic premise was to create a script that would synchronize two Pi-hole 5.0 instances. What started out as a few lines of bash code though has grown into something kind of crazy, and has been a fun experiment in (re)learning how to code.

echo 'Copying gravity.db from HA primary'
rsync -e 'ssh -p 22' ubuntu@192.168.7.5:/etc/pihole/gravity.db /home/pi/gravity-sync
echo 'Replacing gravity.db on HA secondary'
sudo cp /home/pi/gravity-sync/gravity.db /etc/pihole/
echo 'Reloading configuration of HA secondary FTLDNS from new gravity.db'
pihole restartdns reload-lists

That's it. The basic premise was simple. You have two Pi-hole, all of the blocklist configurations are stored in a single database called gravity.db that need to be the same at both sites. If you made a change to those settings on the primary, you'd login to the secondary and run this code and it would copy the file over from the primary and replace it.

It ran on my system just fine, and it met my needs just fine. I originally included most of this in a post  last May, and shared it on a couple of Slack channels with people I knew.

After a little bit I decided I should make this easier to get going, so I started adding things like variables to the top of the script for things like the username and IP address of the other Pi-hole. I also added a few colors to the text.

Feature Requests

The first person to say “yeah that's great but it'd be better if…” was Jim Millard. He wanted the ability to send changes from the secondary back to the primary. From this the first arguments were introduced. Previously you'd just run ./gravity-sync.sh and then things would happen in one way. If I wanted the script to be bi-directional, I'd need a way to indicate that. So with 1.1 of the script, you could say ./gravity-sync.sh pull or ./gravity-sync.sh push and now the data went one way or the other.

At this point I’d realized posting new copies of the raw code to Slack wasn’t very efficient, so I’d move to a GitHub Gist. The script was sort of retroactively deemed 1.1, because there was really no version control or changelog, it was all mostly in my head.

Shortly after breaking out the push and pull functions, I decided to break out the configuration variables into their own file, so that you could just copy in a new version of the main script without having to reset your install each time.

At this point since I had more than one file, using a Gist wasn't very practical, so I moved into a new GitHub repository. Having a repo made me think it might be pretty easy to update the script by just pulling the refreshed code down to my Pi-hole. I started doing this manually, and then realized I could incorporate this into the script by creating a new ./gravity-sync.sh update argument to do this for me.

The ability to check if you had the latest version of the code came shortly after that, and then some basic logging.

Functions

The whole script itself was just one big file and sort of processed through itself serially. That’s really all a bash script can do, is run line by line. I think one of the smarter things I did early on was as the script started to grow, was figure out how to rip out various components and put them into functions. There were parts of the script that were repeatable and having the code physically written out twice in the same script to do the same thing is a waste and introduces the potential for errors.

Around this time I also started to experiment with different status update indicators. At this time the script was really still designed to be run manually, although you could automate it with cron if you wanted. Image.jpeg The various STAT GOOD FAIL WARN messages that the script would throw up, I was also including in each message, including the color. I had a variable for the color code, so I didn’t have to include that, but if I wanted to change the color, or the message itself I’d have to find and replace on every instance.

echo -e “[${CYAN}STAT${NC}] Pulling ${GRAVITYFI} from ${REMOTEHOST}”

What if I put the message itself into a variable, and then had a function that assembled the output based on what type of indicator I wanted?

CYAN='\033[0;96m' NC='\033[0m' STAT=”[${CYAN}STAT${NC}]”

function echo_stat() { echo -e “${STAT} ${MESSAGE}” }

MESSAGE=“Pulling ${GRAVITYFI} from ${REMOTEHOST}” echo_stat

I now had a process that was repeatable and I wanted to change any element of the output, I can do it once in the color variable, or the STAT block, and then every time echo_stat was referenced they'd all get updated.

Hashing

One of the big problems I still had at this point is that everytime the script was run, the database was replicated. This was fine for my enviornment, with a small database, but it generated a lot of write activity on the SD cards, and wasn’t ideal.

This started with a simple MD5 hashing operation on both databases. Gravity Sync would look at the local database, and record the hash. It would query the remote databases, and record it’s hash. If they were the same, the script would exit as no replication was required.

If they were different, then it would initate a replication based on the direction you indicated when it was ran (pull or push) and copy the database just like it did before.

Simplifying Deployment

With each release I was starting to add things that required more work to be done with deployment. I wanted to get to the point where I could cut out as much human error as possible and make the script easy as possible to implement.

With 1.4, I added the ability to run a configuration utility to automate the creation of SSH keys, and the required customization of the .conf file.

By this point I’d started to add some “optional” features to the script that were designed to work around a few edge cases. I wanted to validate that the system on the other side was reachable by the secondary Pi-hole, so I added a quick ping test. But not everyone allows ping replies, so this forced me to start letting folks opt out based on their network.

I had to also start deciding if that should be part of the configuration script, or be something that you have to contunue to manually configure the script yourself with. Ironically in my quest to simplify deployment, I made the script a lot more complicated for myself to maintain. The configuration option eats up almost as much time as the core script does, as with any new feature addition I now have to decide if I need to build that into the configuration workflow to get a functional deployment with it, and the implication on other features and functions.

Saying Yes

Around this time another friend of mine asked me if the script would sync the local DNS records in Pi-hole. I didn’t use the feature at the time, and didn’t know how it worked. It turned out to be a mostly simple flat file that could be replicated along side the main gravity.db file.

I was able to reuse the bulk of the replication code to add this, while making it optional for folks who didn’t use the function and therefore didn’t have the file (at the time, Pi-hole has since changed it so the file is created by default even if it’s not used, thanks!)

Saying No

My same friend was using DietPi on his installs, and by default DietPi uses an alternative SSH server/client called Dropbear. Gravity Sync makes extensive use of both SSH and RSYNC to issue commands and files from the secondary to the primary host. Most of the commands only work specific to the OpenSSH implementation and were not the same for Dropbear.

I spent a lot of time getting Dropbear support worked into the script, and announced it as a new feature in version 1.7. Other people started using it.

But it never worked right, and was just confusing for the vast majoirty of folks who had OpenSSH installed by default.

Up to this point I’d tried to work in any feature request that I could because they seemed like reasonable concessions and made the script work better. With this request I should have said “No, just switch DietPi to OpenSSH” and left it alone. Plenty of development cycles were wasted on this, and the ability to use Dropbear was later removed in version 2.1.

Unexpected Side Effects

The decision to support Dropbear wasn’t all bad, as it drove me to rewrite more and more of the existing script in ways that allowed them to be repeatable functions. I wanted to make it as simple as I could for each SSH command to execute how it needed to based on the client and server options, similar to the example with status messages I explained previously. This would go on to serve me well later with future code changes.

Getting Smarter

Version 2.0 was a real big turning point for Gravity Sync. As I was going along breaking more and more of the script up into functions, becoming more modular, I figured out how I could use those modules differently. Up until now it was pull or push. With 2.0, I figured out how I could decided which files had been changed, and then send them the way they needed to be without any user intervention.

It was around this time where the popularity of the script really started to take off. With more attention came a more critical eye. Someone on Reddit pointed out that my method of simply ripping the running database away in a copy could lead to corruption and that the right way to do this was by doing a SQLITE3 based backup of the database.

I started by building out a backup routine with this new method, and incorporating that into the automation script. Once I was satisfied with a few installs, I started working on a way to use this new method as the actual replication method.

Containers

Originally Gravity Sync had no support for containerized deployments of Pi-hole, because up until last fall I really did nothing with containers. This was a feature request from almost the beginning, and one that I ignored as I wasn’t sure how to properly support it. As I started playing around with them more I realized this would be both easier to support than I anticipated and also really useful for me personally.

Documentation

It was pretty early on when someone pointed out that I didn’t have any changelog or proper release notes for Gravity Sync. I’m glad they did. Looking back on the last year it’s nice to see all the work laid out, and actually made writing this retrospective a lot easier.

I just got on the Pi-hole bandwagon a few weeks ago, and boy do I love it. Really, who doesn't love DNS? And what is better than a Pi-hole? Two Pi-hole!

With the release of Pi-hole 5.0, I wanted to rig up a quick and dirty way to accomplish keeping my Pi-hole HA instances in-sync, but it has quickly esclated to more than just dirty and has now become a little more elaborate.

Originally I posted the installation documentation on this blog, but as it gained more brain time, I have moved those over to the README file of the GitHub repo where the script now lives.

The script assumes you have one “primary” PH as the place you make all your configuration changes through the Web UI, doing things such as; manual whitelisting, adding blocklists, device/group management, and other list settings. The script will pull the configuration of the primary PH to the secondary.

After the script executes it will copy the gravity.db from the master to any secondary nodes you configure it to run on. In theory they should be “exact” replicas every 30 minutes (default timing of the cronjob).

If you ever make any blocklist changes to the secondary Pihole it’ll just overwrite them when the syncronization script kicks off. However, it should not overwrite any settings specific to the configuration of the secondary Pihole such as upstream resolvers, networking, query log, admin passwords, etc. Only the “Gravity” settings that FTLDNS (Pihole) uses to determine what to block and what to allow are carried over.

Generally speaking I don't forsee any issues with this unless your master Pihole is down for an extended period of time, in which you can specify that you'd like to “push” the configuration from the secondary back over to the primary node. Only do this if you make list changes during a failed over and want them back in production.

Disclaimer: I've tested this on two Piholes running 5.0 GA in HA configuration, not as DHCP servers, on my network. Your mileage may vary. It shouldn't do anything nasty but if it blows up your Pihole, sorry.

The actual method of overwriting is what the Pihole developers have suggested doing over at /r/pihole, and apparently is safe 🤞 It might be a little more aggressive than it needs to be about running every 30 minutes (defined by the crontab setting) but I figure the way I have mine setup the second one isn’t really doing anything other than watching for the HA address to failover, so it shouldn’t disrupt users during the reload. Plus, the database itself isn't that big, and according to the Pihole team the database file isn’t locked unless you’re making a change to it on the master (it’s just reading) so there shouldn’t be any disruption to the primary to make a remote copy.

I want to note that the intial release (1.0) had no error handling or real logic other than executing exactly what it's told to do. If you set it up exactly it'll just work.

I've since posted 1.1 and higher with some additional arguments and features, if you deployed the script previously I suggest upgrading and adjusting your crontab to include the “pull” argument.

I've also moved the script to GitHub, which should allow you to keep an updated copy on your system more easily. The script can even update itself if you set it up for that.

Enjoy!

Greg Morris writes:

… I do suffer quite a lot with imposter syndrome. The great thing is, I have learnt not to check my blog stats, I’m not bothered about podcast downloads and I sure as hell don’t care how many people follow me on social media.

I too, have mastered the art of not checking blog stats, in part by not collecting them at all.

Yet every time I do stumble over the figures, I am always surprised because I don’t think I am interesting enough. … When I listen to other people on podcasts, and read others writing, they seem infinity more interesting than I think I am. With more to say on topics that I find really interesting. Does everyone feel like this?

Yes.

This video from Atlassian was shared internally at VMware a couple of weeks ago, and my initial comment was that minus the few references to their company specifically, this video was a great representation of the role of Technical Account Managers, generally.

I was apparently not the only one who thought this, a little while later a posting appeared from their corporate account on LinkedIn with positive comments from representatives of:

  • VMware
  • Zscaler
  • LivePerson
  • Adobe
  • Five9
  • Citrix
  • Nutanix
  • Symantec
  • Microsoft

Plus a half dozen or so other places I'd never even heard of. If you can get that many representatives of different places to agree you're probably onto something.

Kudos to Atlassian.

Introducing VMware Project Nautilus:

Project Nautilus brings OCI container support to Fusion, enabling users to build, run and test apps for nearly any OS or cloud right from the comfort of your own Mac.

With Project Nautilus, Fusion now has the ability to run Containers as well as VMs. Pull images from a remote repository like Docker Hub or Harbor, and run them in specially isolated 'Pod VMs'.

This is built into the latest Tech Preview of VMware Fusion, which we've changed how we're releasing.

As Mike Roy explains in New Decade, New Approach to “Beta”:

This year, in an ongoing way, we’ll be releasing multiple updates to our Tech Preview branches, similar to how we update things in the main generally available branch. The first release is available now, and we’re calling it ’20H1′.

We’re also moving our documentation and other things over to GitHub. We’ll be continuing to add more to the org and repos there, maintain and curate it, as well as host code and code examples that we are able to open source.

I’ve already been playing with Project Nautilus, and It’s pretty slick. I had an nginx server up in a couple minutes after installing, even pulling the image down from the Docker Hub. Being able to spin up container workloads right on macOS, along side Fusion virtual machines, without the Docker runtime installed.

You can even run VMware Photon OS, as a container inside the PodVM. Project Nautilus should eventually make it's way into VMware Workstation, but is not currently available.

You should also able to do the same thing on ESXi later this year with Project Pacific.

There are some things that just aren’t worth putting on your resume. This was the reminder that came to mind during replies to Owen Williams on the tweet machine.

I once joined a company that announced on my first day that they had very little money left, and that all “unnecessary” spending (i.e, what I was hired to do) would be cut immediately.

Learned a hard lesson to ask specifics about company finances during the interview. https://t.co/oh5V2gqGJw — Owen Williams ⚡ (@ow) January 10, 2020

For a very short time I worked for a small family business that sold woodworking tools. Everything from glue and chisels to large computer controlled “put wood in this side and get a cabinet out the other side” machines. I was recommended to the position by a friend who was leaving to work for an ISP. The job I had at the time was part-IT/part-retail for a small grocery store chain, and I wanted to go all in on IT.

But on what I remember to be my first (or maybe second) day I was asked by the President of the company to disable the accounts of two of his brothers who were VPs (the four boys ran it) — A few hours later one of them shows up at my desk trying to figure out why his email is locked, and doesn’t have a clue who I am.

This guy looked like he killed wild animals barehanded for fun, and at maybe 21 years old I’m a much scrawnier version of my current self at maybe 160lbs. What a joy it was to tell him to go talk to his brother and then have him return and demand that I reactivate it.

I left a couple of months later once I found something that was slightly more stable. The company is no longer in business.

The first time I used Veeam's backup software was in 2010. Up to that point I'd had experience with Symantec Backup Exec, Microsoft Data Protection Manager, and Commvault Simpana. The first time I used VBR to backup my vSphere infrastructure it was like the proverbial iced water to a man in hell.

As a consultant I'd deployed VBR for customers more times than I can count. Bringing iced water to the hot masses.

Today's news has me worried for their future:

Insight Partners is acquiring Veeam Software Group GmbH, a cloud-focused backup and disaster recovery services provider, in a deal valued at about $5 billion—one of the largest ever for the firm.

Veeam—first backed by Insight in 2013 with a minority investment—will move its headquarters to the U.S. from Baar, Switzerland, as a result of the acquisition. The deal is intended to help increase the company’s share of the American market.

Hopefully my worry is for nothing, but Insight Partners is a private equity firm. What does that mean, exactly, remains to be seen. But generally speaking:

  • It restructures the acquired firm and attempts to resell at a higher value.
  • Private equity makes extensive use of debt financing to purchase companies.

Also, as noted by Blocks & Files:

Co-founders Andrei Baronov and Ratmir Timashev will step down from the board. Baranov and Timashev founded Veeam in 2006 and took in no outside funding until the Insight $500m injection in January 2019.

I sincerely hope that I'm wrong in my gut reaction here, but wish the best of luck to all my friends at Veeam.

I don't know who Peter Drucker is, but Matt's quote attributed to him, is sound:

“Culture eats strategy for breakfast”

This cannot be overstated. pic.twitter.com/pWwjEw8Z4j — Matt Haedo (@matthaedo) January 7, 2020

Apparently Peter is a kind of a big deal, at least according to Wikipedia:

Peter Ferdinand Drucker (/ˈdrʌkər/; German: [ˈdʀʊkɐ]; November 19, 1909 – November 11, 2005) was an Austrian-born American management consultant, educator, and author, whose writings contributed to the philosophical and practical foundations of the modern business corporation. He was also a leader in the development of management education, he invented the concept known as management by objectives and self-control, and he has been described as “the founder of modern management”.

Even the best ideas will fall flat if the culture of the orginization refuses to adapt to service them. As I said last week:

The trick, I suppose, is knowing how much of the old ideas and processes are actually still required and why. ... In order to do that you need to understand more than just the business and the technical requirements. ... You have to understand the culture in which it will operate.

Idea: move everything to the cloud! Culture: we must control every aspect of the infrastructure.

🤔

It turns out that finding something to write about every day is really hard. Shocking, I know. You may have noticed (or, maybe not) that January 1-4 there was a new post here every day. I skipped yesterday, but I blame my participation with this tweet from Jehad.

A decision I’m happy about in 2019 is writing more. I didn’t write as much as I’d like but it’s been really helpful to articulate my thoughts. It has helped me write better, connect with more folks, & more. If you’re thinking about writing, do it. https://t.co/15u7TZqgFX — Jehad Affoneh (@jaffoneh) January 5, 2020

Not really, I knew I wasn’t going to keep up posting every day. I had a lot of free time on my hands, especially after New Years Day. Today was the first day back to work after being off since December 20. The first half of this time was spent participating in, and in preparation of, the various holiday celebrations our family was invited to.

Not having work things rolling around in my brain, having ample downtime, gives me a chance to reflect on life. Which in turn prompted me to write them down. Lucky you. Going forward I hope to get at least a couple of posts done every week, for my benefit if anything. Three would probably be a stretch goal.

On Privilege

I take this time period off every year, or at least try to. When I worked for the university starting, in 2006, we just had this time period off as the campus was completed closed. Students didn't come back until around MLK Day, so even after returning to campus it was eerily quite, but gave us a couple weeks to catch up and finish any small projects and prepare for the spring semester.

Even the VAR that I worked for, it was expected that only a skeleton crew would be staffing the company the week of Christmas, and it was built into our company PTO schedule that we be off week. It sort of set a trend that with the exception of a couple years before my children were born, I’ve tried to keep.

I realize that I’m in a very fortunate position because of the type of work that I do, who I’ve worked for, and especially who I currently work for, that I’m not someone working on Christmas Eve, and rushing back to the office on December 26. The same thing on Thanksgiving.

I’m incredibly privileged, even living and working among “classically privileged” individuals. Hearing friends and family over the holiday struggle with things like managing vacation days, lack of maternity leave, losing benefits, pay issues, etc, I often bite my tongue and don’t allow myself to reiterate how generous VMware is in many of these areas, for fear of being seen as a braggart.

Sometimes I even check myself when it comes to internal conversations about these topics, and remind myself that even the most generous and well intentioned efforts are usually faulted when you’re forced to deal with the US medical system.

My aunt did ask me on Christmas Day if I had to use PTO in order to be off for so long, and I was forced to explain that VMware doesn’t track PTO time. Also, that my manager doesn’t have intimate knowledge of my daily or weekly routine.

All of this combined usually blows people’s mind, but I try to stay grounded about it, while pretending it will last forever.

For the last couple weeks I’ve been confused why Microsoft Outlook on my Mac would start consuming over 100% CPU while sitting idle, spinning up my fans, and generating a bunch of disk write activity.

At first I assumed it was because I am running the Fast Ring in order to run the new macOS design. However, the same build on my wife’s Mac, also running Catalina, never came anywhere near that even during what could be described only as “aggressive emailing.”

After messing around with adding and deleting accounts, hoping another beta update would fix it ... I finally got the idea to just drag Outlook to the Trash, and let Hazel detect this and offer to dump all of the associated files (cache, settings, etc) with it.

After I put Outlook back in Applications, and effectively set it up as new, everything is back to normal. 0.4% CPU