Restore, Backup’s Ugly Duckling

By now it’s obvious you’re doing backups. You have your 3-2-1 backup scheme humming in the background and take comfort knowing that your data is safe by duplication.

You should probably sleep better than most knowing that your information is sheltered from faulty hardware, robberies, etc… but there’s something in the back of your mind nagging you. When it fails, how will it work?

Content duplication comes in many shapes, taken a couple of years ago.

Truth be told, the vast majority of people who do backups, only ever try to do a restore when they actually need it. And though an “all-hands-on-deck” / “disaster-has-hit” sort of behavior will work most of the time, it’s definitely not recommended.

Schedule a quarterly/bi-anual event and do a restore. You’ll thank me.

A myriad of scenarios where things may not be functioning exactly as expected is bound to surface. The most probable one is that no backup was ever being made! I’ve seen it happen first hand. Somewhere along the way, over the years, something changed and all this time you were threading without a safety net.

But all of the above aludes to personal files which are not a time-sensitive recovery. What about backups pertaining to operational deployments? Full pipelines that exist to allow your company to function? If a support server inside your firm dies, whats the process? What would it halt? How long would it take to bring it back up? In Business Continuity we call this RTO, Recovery Time Objective. The lower the time to have something back online the better.

And that is one of the reasons why not only should you do scheduled restores you should also have an automated recovery script. Luckily for us, what a couple of years ago was an excruciating thing to set up, by the ways of Docker, you can spin almost everything you can dream about on your own personal machine.

Thanks to Sean Kenney you can even spin a Docker LEGO whale onto your desk.

For my pet projects I have an Atlassian Bitbucket installation running on an HP ProLiant MicroServer. Though I’m the solo developer on this instance let’s imagine it’s the deploy of a small sized company and we must implement a restore pipeline that, should disaster strike, will be able to have a provisional copy running as soon as possible on an employee machine.

On the ProLiant MicroServer, BitBucket data is stored both on a MySQL database and on the filesystem under /home/bitbucket. For simplicity sake lets forget about the zipping, unzipping and storage of such backup. So after unpacking the backup we have:

  • a folder with a Desktop\restore\bitbucket.sql file dump.
  • a folder with the Desktop\restore\home\bitbucket file structure.

What we need to do is:

1) Launch a MySQL container with our bitbucket.sql database

2) Launch a BitBucket container with the filesystem mapped to the local Desktop\restore\home\bitbucket

3) We also need to download MySQL’s JDBC drivers that BitBucket doesn’t natively provice

4) Your BitBucket installation will probably be running under https, or a reverse proxy so we also have to strip all that from the config

The script bellow, prep_for_local_docker_restore.sh does steps 3 (MySQL JDBC Drivers) and 4 (change properties files).

The bash commands are pretty self explanatory. You have curl to download stuff from the web and sed to do regex replacements. If you have doubts on those drop me a message.

We then have a docker-compose.yml file that takes care of spinning both the MySQL and the BitBucket container.

Given this, restoring the full BitBucket server to a computer running Docker would be as simple as opening up your shell and writing:

./prep_for_local_docker_restore.sh
docker-compose up

And in a matter of minutes you’d be able to point your browser to localhost:7990 and keep on going while waiting for the ProLiant server’s new disk.

Backups Done Right

For all those who are still riding the “Crashplan Home” wagon, you know that time is running out.

The disk is the single component of your computer that will definitely fail.

I always had the feeling I was doing backups wrong. I’ve done them since I can remember but I could never shake the “there has to be a better way”. From early on, we’re talking 90’s, I mashed a home-brewed set of scripts that did the job. It then evolved to an intricate solution involving downtime and Clonezilla and eventually to the ever-present rsync where I tried to hammer life into a differential backup. That is, up until 2010 when I smashed head-first into Crashplan and started doing three-tier backups.

Backup to three simultaneous zones each progressively slower to restore

Crashplan obviously allowed to backup to Code42’s Cloud but excelled in doing backups to multiple zones. 3-2-1 backups are the recommended form of backup by the US-CERT Team and they excel at several levels:

  • Local Backup – fastest to restore from but the first to disappear if instead of a broken hard drive you experience theft or a fire. I know a couple of people and at least one business who only had local backups and were left empty-handed after robberies.

  • Nearby Backup – a little slower as you’ll have to drive somewhere but fast enough to go live on the same day, prone to area-wide disasters like floods. I don’t know a single person who experienced both local and nearby backup failure but it may happen and if it does…

  • Cloud Backup – the safest of them all, also the slowest; depending on the amount of information it may take you a long time to download the full set of data (I’ve witnessed a friend cloud recover taking almost 9 days). Nothing feels safer though, than knowing your data is ten thousand miles away stored inside a former nuclear bunker by a team of dedicated experts.

Crashplan, against what the majority of the industry was doing in 2010, was unparalleled at this 3-2-1 backup with just a little configuration. While you could only backup to their cloud it offered a persuasive product in which you could send your data mostly everywhere – another drive in your computer, a friend’s machine across town or your own server on the other side of the Atlantic. Cloud-wise, for obvious reasons, it’s a walled-garden product; there’s no compelling reason for Code42 to allow you to use the competition.

Sometimes you’re forced to take an alternative path…

Until I had to re-evaluate my options…

And it’s fortunate that sometimes you’re forced to take the other side of the pathway. You see, 9 years ago Stefan Reitshamer started what is now known as Arq Backup and it’s fundamentally the tool we were waiting for. It’s got everything on the alternatives with the fundamental diference of freeing you from a specific cloud. Your software. Your data. Your choice of cloud.

Arq Backup has all the fundamentals correct. What a nice surprise it was when I dug up resource usage:

Compared to Crashplan, Arq Backup is minimal on resources.

The values from above are idle. During backups Arq will go up to 100MB and take about 5% CPU where Crashplan will go up to 800MB and eat about 8%. CPU values will obviously differ depending on your processor.

What about price?
Crashplan for unlimited data charged $9/month.
Arq Backup has a single fee of $50 and then it depends. It’s your choice, really.

Prices for Q3 2018:

Provider $/GB/month GB to reach Crashplan’s $9
Amazon S3 $0.021 428 GB
Backblaze B2 $0.005 1800 GB
Google Nearline $0.010 900 GB
Google Coldline $0.007 1285 GB
Microsoft Azure $0.018 500 GB

Your mileage will vary but my 200GB of digital trash are currently costing me $1/month on Backblaze B2. Should I also go with Backblaze’s backup client? Definitely no! While I do appreciate what Backblaze has been doing from the very start, their blog is mandatory, switching clouds using Arq is as simple as pointing to another location. For me, personally, the cost of Arq will be offset in 6 months. But it’s not even that what compels me to use it, it’s the freedom you gain. It’s knowing that if Amazon tomorrow goes bananas and drops price to $0.001 you can switch clouds with a simple click keeping the backup client you’re already familiar with. No alternate setup. No extra testing.

Don’t just take my word for it:

This post is not sponsored, I put my money on software I believe in.


Edit:
Mike from PhotoKaz brought to attention the price difference between B2 and BackBlaze when the the amount of data starts pilling.

I have 5TB of data stored at Backblaze, if I put that on their B2 service it would cost >$300/year instead of $60. I’ll pass.

BackBlaze regular gives you unlimited data. Storage has a direct physical correlation to drives thus the ideia of it being inexhaustible doesn’t scale. Like BackBlaze says, BitCasa, Dell DataSafe, Xdrive, Mozy, Amazon, Microsoft and Crashplan all tried but eventually discontinued unlimited plans. The only two still in the game are BackBlaze and Cabonite which can only offer it by offsetting the returns of the majority with that of the few that effectively overcost.

As a final customer you should go with the deal that best fits you.

As such, the cost effective sweet-spot is everything bellow 1TB.

Size BackBlaze B2
100GB $60 $6
500GB $60 $30
1TB $60 $60
5TB $60 $300

But I like Arq Backups and their inherent flexibility so much that if I’d ever find myself on the 5TB+ tier I’d probably still use Arq client and setup a NAS on a remote location. Backups are completely encrypted and I’d keep the flexibility of choice. Mind you that an 8TB NAS goes for $400 so BackBlaze home, cost effective, is probably still a better choice. Thank for the input Mike!

Massively Distributed Spear Phishing

The difference between phishing and spear phishing is precisely that the latter targets a specific individual making the subject of this post – massively distributed spear phishing – particularly convoluted. The thing is that we are there. We are now at a time and place where spear-phishing will now target millions of simultaneous people.

From: Random Name [email protected]
Subject: your-hopefully-old-password – your username
To: [email protected]

I know, dragon is one of your Password and now I will cut to the chase. You don’t know anything about me but I know you very well and you must be thinking why you are receiving this email, correct? (…) scare tactics – you did something nasty and I’ve got you (…) BTC ADDRESS: 1K5xuXn573Uyh49qhgwuvfEMPvVtuMVkGJ Notice: You have one day in order to make the payment

spear phishing illustrative image
Photo by Jeremy Bishop

This week alone I got more than half a dozen reports from friends and colleagues who were targeted by this very scam. Even for me, the volume was a novelty. I know that Kerbs has focused on this specific sort of attack for over 7 years but only now do I believe we’ve gone main-stream.

Why now?
Scammers are cleaver. You’re probably bombarded by three or four scams weekly, the inheritance scam, the no-effort high-paying job, the computer-has-been-hacked scam. And there is something common to all of them; they’re awfully constructed. Poorly written. Imperfectly crafted. You would guess that the oil-magnate from Nigeria would have an email better looking than [email protected] All those tell-tales? They are there for a reason. It’s on purpose. A filter to weed out the less gullible ones. They are looking for the easy targets. The highest return on investment which is quite the opposite of spear phishing.

Oh, no! Pwned on 11 breached sites.

A New Breach?
Doesn’t look like it. MySpace was breached 10 years ago leaking 360 million records. LinkedIN 5 years ago. I think it’s simply a matter of cost/opportunity. Three years ago Google claimed to catch more than 99.9% of spam, I’m pretty sure the competition is also steering towards those numbers. Throwing stuff to the wall hopping some will stick is just not cutting it anymore so scammers are simply stepping up.

SCAM 2.0?
We’re bound to see this sort of abuse increase spectacularly. As both the cost of AI and infrastructure plummets it will become commonplace to track your email to your Facebook, LinkedIN, Instagram, etc and from there cross and match with your leaked passwords and craft this sort of abuse in a rupturing new fashion. Attaching-your-old-password-to-your-email-scam is just the beginning. Soon you’ll see decade old pictures you forgot you’ve posted online, names of previous companies you worked for, etc waived in front of your eyes presenting a million of new opportunities for you to take the bait.

What can you do?
All companies I’ve worked for have had a top-notch defense team who continuously educate the workforce by partaking in this sort of specific attacks against employees to keep them alert and vigilant. If you have any say in your company, start there. As an individual you have the responsibility of protecting yourself by:

If this very same scam started like this:

I know, PhrBjyS0QNk2h%Z4y^HwxGoqrv#UVS is one of your Password and now I will cut to the chase…

You would not have the intended jolt of adrenaline (in fact you’d have no recollection of such password being yours). But you’d do your due diligence and search for it in LastPass’s Vault, replace if need be and just forget it.

Email and the password manager are the ultimate resorts. The castle within the castle. Treat them as different beasts. 2FA for both is mandatory.

In the digital age, stay long and unique. Stay safe.
Obligatory xkcd.

Addendum:
Only now did I stumble on Kerbs take on this phenomenon. Kerbs being Kerbs it’s a definite must read.

Cookie in the Jar

Privacy. Few words seems more ubiquitous in this day and age.

When we started the joyride the world was a different beast; nowadays the TV is tracking you, the refrigerator knows when you’re home and websites follow you like Dick Tracy.

While the angry mob points its finger at the ever so delicious cookie, how did it came to be?

dick tracy art
Dick Tracy by Thomas Pitilli http://www.thomaspitilli.com/

Where does the name come from?

There are several explanations. Lou Montulli, an engineer at Netscape in ’94, said to come up with the concept after a brief meeting about producing a site with a shopping cart. No available technology existed to store user session.

Montulli’s blog (archive.org): I had heard the term “magic cookie” from an operating systems course from college. The term has a somewhat similar meaning to the way Web Cookies worked and I liked the term “cookies” for aesthetic reasons. Cookies was the first thing I came up with and the name stuck.

Wikipedia refers to the man-page of fseek from ’79 which states:

ftell returns the current value of the offset relative to the beginning of the file associated with the named stream. It is measured in bytes on UNIX; on some other systems it is a magic cookie, and the only foolproof way to obtain an offset for fseek.

Being that it precedes Netscape where did it originally came from? There are some wild guesses. One of them points us to a cartoon strip from Odd Bodkins which ran from ’63 to ’70 where the term magic cookie was an euphemism for LSD.

Picture from original strip of Dan O'Neil
Dan O’Neill on Magic Cookie Land (click for larger version)

There are still others who refer to the resemblance between the cookie storing information and the paper inside a fortune cookie or even those who see the resemblance between a cookie jar and the browser implementation of cookies.

Are cookies per-se a privacy violation? No! While you’re surfing a particular website it’s expectable that the providing party tracks and stores information about what you’re doing. It’s their site. Their server. It’s how your email stores who you are. How your bank identifies you. Any single site that has the possibility of login needs to keep tabs on who you are.

Why the fuss then? Corporations jumped the wagon and started tracking you across a mirĂ­ade of sites. That Facebook “like button” you see on blog posts? It can ping-back to Facebook and report what site you’re visiting; even if you don’t click it. Anything that’s served from a third-party can aggregate information about your visits to the sites where they are currently present.

uBlock image
6% of all requests saved

What can you do? The fastest route to improve privacy while speeding up your browsing experience is to install something like uBlock Origin.

You’ll probably want to white-list a few advertisers. As the time of writing I’ve got 42 domains unlocked (yes, even adwords). Not all publicity is bad and websites depend on it.

If you’re feeling particularly inclined to deal with specific elements you can instead install uMatrix (from the same developer) where you can fine-tune your specific preferences.

The following sample config file for uMatrix is pretty self-descriptive and may help you start tunning your own rules.

Message in a Bottle

While running on the beach I stumbled upon the bottle bellow. Inside was a memorando to the writers future self detailing present events and pitfalls.

In a way that’s what commenting should be about; a message to thy future self about the why’s of here and now. You’re most likely not the one who’s going to read it, but that rusty piece of code, washed away by years of sunshine and salt you left for the world, will be much better with a message attached.

message in a bottle picture
stumbled upon this last afternoon

The message I found was emotionally sound and really everything you’d expect from a silo to the future. Writing proper comments though, even considering the poetic aspect of coding, should stick to some rules always handy to remember.

1) Remember that code is always better than comments.
While the following Java method comment may look ok, it probably has better alternatives.

If you’re using an editor like IntelliJ you can use @Nullable or @NotNull to make your intentions specific. If not you always have the handy Option<String> which will add run-time checking.

2 Document intention, not functionality
From the programmers bible Code Complete you get that IBM did a several months study where maintenance programmers reported that the most difficult problem they faced was getting the authors original intent.

The above regex parses emails. The comment above merely describes a problem the programmer had with Top Level Domains. It’s not descriptive of it’s intent but merely focuses on the functionality. The following comment not only specifies the intent but also provides the foundation for the chosen rules.

While hard to do while your mind is focusing on the problems at hand, comments will provide the higher level of abstraction needed for your latter self.

About that bottle? I did what anyone would do, tagged it with #mafaldasmessageinabottle and freed it back to the ocean.