ZFS Dataset Hierarchy | Data Hoarder Edition

OpenZFS LogoZFS is flexible and will let you name and organize datasets however you choose–but before you start building datasets there’s some ways to make management easier in the long term.  I’ve found the following convention works well for me.  It’s not “the” way by any means, but I hope you will find it helpful, I wish some tips like this had been written when I built my first storage system 4 years ago.

Here are my personal ZFS best practices and naming conventions to structure and manage ZFS data sets.

ZFS Pool Naming

I never give two zpools the same name even if they’re in different servers in case there is the off-chance that sometime down the road I’ll need to import two pools into the same system.  I generally like to name my zpool tank[n] where is an incremental number that’s unique across all my servers.

So if I have two servers, say stor1 and stor2 I might have two zpools :

stor1.b3n.org: tank1
stor2.b3n.org: tank2

Top Level ZFS Datasets for Simple Recursive Management

Create a top level dataset called ds[n] where n is unique number across all your pools just in case you ever have to bring two separate datasets onto the same zpool.  The reason I like to create one main top-level dataset is it makes it easy to manage high level tasks recursively on all sub-datasets (such as snapshots, replication, backups, etc.).  If you have more than a handful of datasets you really don’t want to be configuring replication on every single one individually.  So on my first server I have:

tank1/ds1

I usually mount tank/ds1 as readonly from my CrashPlan VM for backups.  You can configure snapshot tasks, replication tasks, backups, all at this top level and be done with it.

freenas_snapshot_pruning
ZFS snaps and pruning recursively managed at the top level dataset

Name ZFS Datasets for Replication

One of the reasons to have a top level dataset is if you’ll ever have two servers…

stor1.b3n.org
   | - tank1/ds1

stor2.b3n.org
   | - tank2/ds2

I replicate them to each other for backup.  Having that top level ds[n] dataset lets me manage ds1 (the primary dataset on the server) completely separately from the replicated dataset (ds2) on stor1.

stor1.b3n.org
 | - tank1/ds1
 | - tank2/ds2 (replicated)

stor2.b3n.org
 | - tank2/ds2
 | - tank1/ds1 (replicated)

Advice for Data Hoarders.  Overkill for the Rest of Us

supermicro_zfs

The ideal is we backup everything.  But in reality storage costs money, WAN bandwidth isn’t always available to backup everything remotely.  I like to structure my datasets such that I can manage them by importance.  So under the ds[n] dataset create sub-datasets.

stor1.b3n.org
 | - tank1/ds1/kirk – very important – family pictures, personal files
 | - tank1/ds1/spock – important – ripped media, ISO files, etc.
 | - tank1/ds1/redshirt – scratch data, tmp data, testing area
 | - tank1/ds1/archive – archived data
 | - tank1/ds1/backups – backups

Kirk – Very Important.  Family photos, home videos, journal, code, projects, scans, crypto-currency wallets, etc.  I like to keep four to five copies of this data using multiple backup methods and multiple locations.  It’s backed up to CrashPlan offsite, rsynced to a friend’s remote server, snapshots are replicated to a local ZFS server, plus an annual backup to a local hard drive for cold storage.  3 copies onsite, 2 copies offsite, 2 different file-system types (ZFS, XFS) and 3 different backup technologies (CrashPlan, Rsync, and  ZFS replication) .  I do not want to lose this data.

Multiple Backup Locations Across the World
Important data is backed up to multiple geographic locations

Spock – Important.  Important data that would be a pain to lose, might cost money to reproduce, but it isn’t catastrophic.  If I had to go a few weeks without it I’d be fine.  For example, rips of all my movies, downloaded Linux ISO files, Logos library and index, etc.  If I lost this data and the house burned down I might have to repurchase my movies and spend a few weeks ripping them again, but I can reproduce the data.  For this dataset I want at least 2 copies, everything is backed up offsite to CrashPlan and if I have the space local ZFS snapshots are replicated to a 2nd server giving me 3 copies.

redshirt_startrek

Redshirt – This is my expendable dataset.  This might be a staging area to store MakeMKV rips until they’re transcoded, I might do video editing here or test out VMs.  This data doesn’t get backed up… I may run snapshots with a short retention policy.  Losing this data would mean losing no more than a days worth of work.  I might also run zfs sync=disabled to get maximum performance here.  And typically I don’t do ZFS snapshot replication to a 2nd server.  In many cases it will make sense to pull this out from under the top level ds[n] dataset and have it be by itself.

Backups – Dataset contains backups of workstations, servers, cloud services–I may backup the backups to CrashPlan or some online service and usually that is sufficient as I already have multiple copies elsewhere.

Archive – This is data I no longer use regularly but don’t want to lose. Old school papers that I’ll probably never need again, backup images of old computers, etc.  I set set this dataset to compression=gzip9, and back it up to CrashPlan plus a local backup and try to have at least 3 copies.

Now, you don’t have to name the datasets Kirk, Spock, and Redshirt… but the idea is to identify importance so that you’re only managing a few datasets when configuring ZFS snapshots, replication, etc.  If you have unlimited cheap storage and bandwidth it may not worth it to do this–but it’s nice to have the option to prioritize.

Now… once I’ve established that hierarchy I start defining my datasets that actually store data which may look something like this:

stor1.b3n.org
| - tank1/ds1/kirk/photos
| - tank1/ds1/kirk/git
| - tank1/ds1/kirk/documents
| - tank1/ds1/kirk/vmware-kirk-nfs
| - tank1/ds1/spock/media
| - tank1/ds1/spock/vmware-spock-nfs
| - tank1/ds1/spock/vmware-iso
| - tank1/ds1/redshirt/raw-rips
| - tank1/ds1/redshirt/tmp
| - tank1/ds1/archive
| - tank1/ds1/archive/2000
| - tank1/ds1/archive/2001
| - tank1/ds1/archive/2002
| - tank1/ds1/backups
| - tank1/ds1/backups/incoming-rsync-backups
| - tank1/ds1/backups/windows
| - tank1/ds1/backups/windows-file-history

 

With this ZFS hierarchy I can manage everything at the top level of ds1 and just setup the same automatic snapshot, replication, and backups for everything.  Or if I need to be more precise I have the ability to handle Kirk, Spock, and Redshirt differently.

 

Intranet SSL Certificates Using Let’s Encrypt | DNS-01

Let's EncryptLet’s Encrypt is a great service offering the ability to generate free SSL certs.  The way it normally works is using http-01 challenge…  to respond to the Let’s Encrypt challenge the client (typically Certbot) puts an answer in the webroot.  Let’s Encrypt makes an http request and if it finds the response to the challenge it issues the cert.

Certbot

Certbot is great for public web-servers.

Generating Intranet SSL Certs Using DNS-01 Challenge

But, what if you’re generating an SSL certificate for a mail server, or mumble server, or anything but a webserver?  You don’t want to spin up a web-server just for certificate verification.

Or what if you’re trying to generate an SSL certificate for an intranet server  Many homelabs, organizations and businesses need publicly signed SSL certs on internal servers.  You may not even want external A records for these services, much less a web-server for validation.

ACME DNS Challenge

Fortunately, Let’s Encrypt introduced the DNS-01 challenge in January of 2016.  Now you can respond to a challenge by creating a TXT record in DNS.

ACME Let's Encrypt DNS-01 Challenge Diagram

 

Lukas Schauer wrote dehydrated (formerly letsencrypt.sh) which can be used to automate the process.  Here’s a quick guide on Ubuntu 16.04, but it should work on any Linux distribution (or even FreeBSD).

Install dehydrated / letsencrypt.sh

Hook for DNS-01 Challenge

At this point, you need to install a hook for your DNS provider.  If your DNS provider doesn’t have a hook available you can write one against their API, or switch to a provider that has one.

If you need to pick a new provider with a proper API my favorite DNS Providers are CloudFlare and Amazon Route53.  CloudFlare is what I use for b3n.org.  It gets consistently low latency lookup times according to SolveDNS, and it’s free (I only use CloudFlare for DNS, I don’t use their proxy caching service which can be annoying for visitors from some regions).  Route53 is one of the most advanced DNS providers.  It’s not free but usually ends up cheaper than most other options and is extremely robust.  The access control, APIs, and advanced routing work great.  I’m sure there are other great DNS providers but I haven’t tried them.

Here’s how to set up a CloudFlare hook as an example:

In letsencrypt-cloudflare-hook/hook.py change the top line to point at python3:

Config File

Edit the “/etc/dehydrated/config” file… add or uncomment the following lines:

domains.txt

Create an /etc/dehydrated/domains.txt file, something like this:

The first four lines will each generate their respective certificates, the last line creates a multi-domain or SAN (Subject Alternate Name) cert with multiple entries in a single SSL certificate.

Finally, run

The first time you run it, it should get the challenge from Let’s Encrypt, and provision a DNS TXT record with the response.  When validated the certs will be placed under the certs directory and from there you can distribute them to the appropriate applications.  The certificates will be valid for 90 days.

For subsequent runs letsencrypt.sh will check to see if the certificates have less than 30 days left and attempt to renew them.

Automate

It would be wise to run dehydrated -c from cron once or twice a day and let it renew certs as needed.

To deploy the certs to the respective servers I suggest using an IT Automation tool like Ansible, you can configure an ansible playbook to run from a daily cron job to copy updated certificates to remote servers and automatically reload services if the certificates have been updated.  Here’s an example of an Ansible Playbook which could be called daily to copy certs to all web-servers and reload nginx if the certs were updated or renewed:

 

PSD is not my favourite file format

This programmer does not like the PSD File Format:

/*

At this point, I’d like to take a moment to speak to you about the Adobe PSD format.

PSD is not a good format. PSD is not even a bad format. Calling it such would be an insult to other bad formats, such as PCX or JPEG. No, PSD is an abysmal format. Having worked on this code for several weeks now, my hate for PSD has grown to a raging fire that burns with the fierce passion of a million suns.

If there are two different ways of doing something, PSD will do both, in different places. It will then make up three more ways no sane human would think of, and do those too. PSD makes inconsistency an art form. Why, for instance, did it suddenly decide that *these* particular chunks should be aligned to four bytes, and that this alignement should *not* be included in the size? Other chunks in other places are either unaligned, or aligned with the alignment included in the size. Here, though, it is not included. Either one of these three behaviours would be fine. A sane format would pick one. PSD, of course, uses all three, and more.

Trying to get data out of a PSD file is like trying to find something in the attic of your eccentric old uncle who died in a freak freshwater shark attack on his 58th birthday. That last detail may not be important for the purposes of the simile, but at this point I am spending a lot of time imagining amusing fates for the people responsible for this Rube Goldberg of a file format.

Earlier, I tried to get a hold of the latest specs for the PSD file format. To do this, I had to apply to them for permission to apply to them to have them consider sending me this sacred tome. This would have involved faxing them a copy of some document or other, probably signed in blood. I can only imagine that they make this process so difficult because they are intensely ashamed of having created this abomination. I was naturally not gullible enough to go through with this procedure, but if I had done so, I would have printed out every single page of the spec, and set them all on fire.

Were it within my power, I would gather every single copy of those specs, and launch them on a spaceship directly into the sun.

PSD is not my favourite file format.

*/

— code comment from https://github.com/gco/xee/blob/master/XeePhotoshopLoader.m#L108

 

 

RHEL/CentOS, Debian, Fedora, Ubuntu & FreeBSD Comparison

Over the years I’ve used a number of Linux distributions (and FreeBSD), these are my top 5 and how I rank them:

centos_debian_fedora_ubuntu_freebsd_score

Desktop

Gnome ScreenshotI’m not a big fan of Ubuntu’s Unity, so Ubuntu-Gnome, Kubuntu, Debian and Fedora are my top distros for desktop choices.  If you want the latest Gnome features Fedora gets them first.  For KDE I think Kubuntu does a great job at reasonable default settings (like say, having the Start button open the KDE menu, why is it KDE programmers think that shouldn’t be default behavior?) where I have to do quite a bit more tweaking on other distros.  Ubuntu-Gnome also provides an optional PPA which tracks the latest version of Gnome bringing it almost as up to date as Fedora is.

Ugly fonts – for some reason, on FreeBSD, Fedora, CentOS, and Debian the fonts look ugly… I don’t know if they can’t detect my video card properly or if there’s something wrong with the fonts themselves but on every system I’ve tried the fonts look much better on Ubuntu based distributions.

If you’re interested in FreeBSD for a desktop PC-BSD is worth a look, but in my experience Linux runs a lot better on the desktop than FreeBSD.

Server

FreeBSD is historically my favorite server OS, but they tend to lag behind on some things and I have trouble getting some software working on it so for the most part I use Ubuntu for servers as it seems to have the best out of the box setup.  90% of the time I’m deploying in virtual environments and open-vm-tools is now enabled by default in 16.04.

With perhaps the exception of Fedora all the distros make decent servers.

Packages

All the package management systems are pretty decent, I do prefer apt just because I never have any problems with it and it’s faster.  Debian and Ubuntu have the most packages available, and Ubuntu has PPA support which makes it easy to manage 3rd party repositories.

One thing I don’t like about Debian, while it does have a lot of packages is a lot of packages are out of date.  A few months ago I tried to install Redmine from the repository and even though the repository had it at version 3.0 the actual version that was installed was 2.6.  Someone needs to do some clean up.

CentOS hardly offers any packages so you have to enable the EPEL just to make it functional and even then it’s limited.   My main issue with CentOS is it seems if you want to do anything other than a very basic install you’re dealing with not finding packages (like rdiff-backup, why isn’t that in the repos?) or needing packages from conflicting repositories and sometimes having to download them manually.  It’s a nightmare.

One other thing I like about apt is the philosophy of Debian and Ubuntu of setting up some sensible default configurations and enabling the service.  After installing packages on Fedora, CentOS, or FreeBSD I’m often left manually creating configuration files.  CentOS is the most annoying–maybe it’s just me but if I install a service I want SELinux to not block me from running that service… and when I make a change in SELinux it should take effect immediately instead of arbitrarily taking a few minutes to come to it’s senses.

Free Software

Richard Stallman
By – Thesupermat – CC BY-SA 3.0

While Richard Stallman wouldn’t endorse any of the distributions I’m comparing, if he had to choose from these Debian would likely be his choice.

Debian LogoAll the OSes include or provide ways of obtaining non-free software, but Debian is at the forefront of making it a goal to move to Free Software.  Fortunately I think they do this in a smart way where they’re still including ways to install non-free drivers so you can at least make a system usable.  I think Debian does the best job of making it clear what’s free and what isn’t, and allowing the user to make the choice.

 

Evilness

RedHat LogoI used to be a big RedHat fan back in the RH 6 and 7 days.  Then one day my loyalty was rewarded when out of the blue RedHat decided to start charging for updates for their “Free” OS… RedHat’s new free alternative was Fedora which was so unstable it was unusable.  I was suddenly going to need to buy lots of licenses… this left me scrambling for a solution and I eventually switched over to Ubuntu.  Since then I’m wary about anything related to RedHat.  CentOS is now the free version of RedHat while Fedora is where all the new features are available and it’s not so unstable these days.  And, yes, RedHat, I’m still bitter.

Ubuntu introduced Amazon ad supported searches and even worse was by default sending search keywords from the unity lens to Canonical.  I’d consider this an invasion of privacy and really the first time I started looking for Ubuntu alternatives after I switched from RedHat.   Fortunately the feature was easy to disable, and now Ubuntu has since disabled it.

Out of Box Hardware Support

Dell XPS 13 with UbuntuUbuntu has the best out of box hardware support.  Dell’s XPS 13 even comes in a developer edition that ships with Ubuntu 14.04 LTS.  It works outUbuntu Logo of the box on just about every laptop I’ve tried it on.  Also it was the first distro to support VMware’s VMXNET3 and SCSI Paravirtual driver in the default install and now I believe it’s the only distro that has open-vm-tools pre-installed.  All this cuts down on the amount of time and effort it takes to deploy.

I wish Debian did better here.  Debian excludes some non-free drivers which is good for the FSF philosophy but it’s also means I had no WiFi on a fresh Debian install.  Apparently you’re supposed to download the drivers separately.  This is particularly bad when your laptop doesn’t have an Ethernet port so you have no way to download the WiFi drivers.  I suppose I could have re-installed Ubuntu then downloaded the Debian, WiFi drivers, save them off to a USB drive, re-install Debian and side-load the WiFi drivers… but what a hassle.

Automatic Security Updates

Ubuntu and Debian give the option of enabling automatic security updates at install time.  The other systems have ways of enabling automatic updates but there isn’t an option to enable it by default at install time.  My opinion is all operating systems should automatically install security updates by default.

Init System

FreeBSD DaemonFreeBSD avoids the nonsense for the win here.  I do not like systemd.  I’d rather spend time not fighting systemd.  Maybe I can figure it out someday.  Why didn’t we all switch to upstart?  I liked upstart.

Cutting Edge vs Stability

Fedora LinuxFor cutting edge Fedora or Ubuntu standard (every 6 month) releases keep you up to date, great for wanting to stay cutting edge on a Desktop Environment.

FreeBSD is the most stable OS I’ve ever used.  If I was told I was building a solution that would still be around in 30 years I’d probably choose FreeBSD.  Changes to the base system are rare and well thought out.  If you wrote a program or script on FreeBSD 10 years ago it would probably still work today on the latest version.   In the Linux world I like Debian stable or Ubuntu’s LTS (after the first point release) and CentOS (aslo after the first point release) are great options.

Ubuntu provides the best of both worlds getting cutting edge with LTS releases which I find very beneficial for having a stable environment but still having relevant development tools and up to date server environments.  If you need something newer you have PPAs, but most of the time the standard packages are new enough.  Right now for example Ubuntu 16.04 LTS is the only distribution that ships with a version of OpenSSL and NGINX that supports an http/2 implementation that works with Google Chrome.  To top if off both OpenSSL and NGINX packages fall under Ubuntu’s 5-year support.  You don’t have to add 3rd party repos, solve dependency issues.  Just one command: “apt install nginx” and you’re good for 5-years.

Ubuntu 16.04 LTS is the only distro that supports http/2

(above screenshot from: https://www.nginx.com/blog/supporting-http2-google-chrome-users/)

Upgrading

FreeBSD LogoFreeBSD is the best OS I’ve ever used at upgrading to a newer release.  You could probably start at FreeBSD 4, and upgrade all the way to 11 and have no issues.  Debian and Ubuntu also have pretty good upgrade support… in all cases I test upgrading before doing it on a production system.

Long Term Support (LTS)

CentOS LogoCentOS has the longest support offering at 10-years!  Combined with the EPEL repository (which also has the same goal) I’d say RedHat/CentOS is the best distribution for a “deploy and forget” application that gets thrown in /opt if you don’t want to worry about changes or upgrades breaking the app for the next 10-years.  This is probably why enterprise applications like this distribution.

Debian is just starting a 5-year LTS program through a volunteer effort.  I’m looking forward to seeing how this goes.  I’m glad to see this change as lack of LTS was one of the main reasons I decided on Ubuntu over Debian.

Ubuntu offers 5-year LTS.  Ubuntu’s LTS not only covers the base system but also the Ubuntu team supports many packages (use “apt-cache show packagename”) and if you see 5y you’re good.

Predictable Release Cadence

release-chart-desktop

Ubuntu has the most predictable release cadence.  They release every 6 months with a 5-year LTS release every 2-years.  Having been a sysadmin and a developer I like knowing exactly how long systems are supported.  I plan out development, deployments, and upgrades years in advance based on Ubuntu’s release cadence.

My Thoughts

When I was younger it was fun to build my entire system from scratch using Gentoo and compile FreeBSD packages from ports (I also compiled the kernel).  Linux wasn’t as easy back then.  I remember just trying to get my scroll wheel working in RedHat 7.

Screenshot of how to get the scroll wheel working
I found this old note.  I finally got the scroll wheel working in RedHat 7.1!

Linux distributions are tools.  At some point you have to stop trying to build the perfect hammer and start using it to put nails in things.

Now days I don’t have time to compile from scratch, solve RPM dependency issues, or find out why packages aren’t the right version.  In the year 2000 I could understand having to fix ugly font issues and messing around with wifi-drivers.  But we should be beyond that now.  That was the past.

Calvin and Hobbes Comic Strip
By Bill Waterson, 1995-08-27, Fair Use – 17 U.S.C. § 107

Onward

Ben wearing RedHat
I used to wear the official RedHat Fedora

Fonts, automatic updates, scroll wheel, touchpad, bluetooth, wifi, printers, and hardware in general should be working out of the box by now–if it isn’t I’m not going to put a lot of effort into getting the distro working.  It’s time to move forward and focus work on things beyond the distribution–while I love all sorts of distros, I don’t want to be like Calvin fighting the computer the whole way.  I actually do work on them and need something stable and up to date out of the box with sane default settings.  Having predictable release cycles also helps.  If I could combine the philosophy of Debian with the few extras that Ubuntu provides I’d have the perfect distro.  But for the time being Ubuntu is close enough to what I want–I’ve been using it probably since 5.04 (Hoary Hedgehog) and standardized on it when they started doing LTS releases.  That doesn’t mean it’s for everyone, not everyone likes it, some people prefer the more vanilla feel from Debian, others might want something easier like Mint.  If you prefer CentOS, Fedora, Arch, etc. and they work well for you, use them.

Actually I don’t use Ubuntu for everything.  For my production environment I’ve standardized on Windows 10 for desktops, ESXi for virtualization, FreeNAS for storage, pfSense for firewalls, and Ubuntu for servers.  Honestly, none of the above systems were my first choice… but I’m at where I am because my first choices let me down.  It will likely evolve in the future, but for the time being that’s my setup and it works pretty well.

The great thing about modern day Linux distributions (and FreeBSD) is they’re all pretty good.  I haven’t had to hack an Xorg file to get the scroll wheel working in a long time.

 

 

Journey to Facebook

Week 1:

Number of Friends: 6.  (That’s probably enough)
Number of Likes: 0.
Species: Kind of like the Borg.

Defender (Star Trek USS Enterprise) of Freedom vs Facebook (Borg ship)

I see my home, b3n.org, getting further into the distance.  My blog is in one of the most beautiful locations nestled in the mountains between the Tech and Conservative Blogs, definitely more on the Tech side and well away from the Bay of Flame.  I can see the tech blogging area I’m most familiar with getting smaller and smaller.  A few minutes later I see Lifehacker passing by and I’m flying over the Sea of Opinions.   And then it hit me.   I’ve left the Blogosphere.

After a long flight I stop for a layover at Reddit, then I was back in the air and landed just north of Data Mines, Facebook.  And I joined Facebook.  The reason for my travel?  I’m looking for information locked away in a closed Facebook group.

That was last week.

Map of Social Networks showing my travel from the Blogosphere to Facebook

Most of my friends left the Blogosphere for MySpace, and then moved further north to Facebook years ago (and I’ve re-united with six of them so far).   My impression of Facebook so far: It’s like a bunch of mindless drones all talking at once–well, let me start over.  It’s like a bunch of ads all talking at once and mindless drones trying to shout above them.

Facebook is a land I’ve always avoided–It’s basically what AOL or Geocities should have become–a step back from freedom and individuality.

It’s Not Social Networking That’s the Problem

When you join Facebook, you have to abide by their rules and subject yourself to their censorship.  If you disagree with Facebook, you either comply or you’re out.  There’s no alternative.

Websites, Blogging, and Email on the other hand are based on what the internet should be–open protocols.  If I run my own email server I can send an email to anybody else no-matter what provider they use!  This blog is run on a server I control.  Currently it’s rented from DigitalOcean because I no longer have the bandwidth at my house to run it, but in the past I’ve run it from my dorm room, my bedroom closet, from right under my desk, and from Jeff’s house.  And the thing is anybody can setup their own server–but they don’t have to.  They can use a provider like Blogger or Gmail if they prefer–but if you can get better service somewhere else you can migrate to different provider at will and not lose anything.

But Facebook isn’t open and federated.  Facebook users can only talk to other Facebook users and as long as you want to talk to your Facebook friends the only way is to be on Facebook yourself.  The content is all stored on their servers so you are at their mercy for control and privacy of your content.  Or is it your content?  On Facebook, you are not your own individual, or your own community.  You are part of the Borg.

I’m not against social networking, but Facebook is designed in a very centralized manner which isn’t consistent with how the internet services should be–more distributed and federated.  Some social networks I might be more interested in are Friendica and Diaspora but I don’t think they have much traction yet.

One More Thing

One particularly concerning thing about Facebook is you don’t pay for it–which means, that you’re not Facebook’s customer.  No, indeed.  You, my liked Friend, are the product being sold.

 

Automatic Ripping Machine | Headless | Blu-Ray/DVD/CD

The A.R.M. (Automatic Ripping Machine) detects the insertion of an optical disc, identifies the type of media and autonomously performs the appropriate action:

  • DVD / Blu-ray -> Rip with MakeMKV and Transcode with Handbrake
  • Audio CD -> Rip and Encode to FLAC and Tag the files if possible.
  • Data Disc -> Make an ISO backup

It runs on Linux, it’s completely headless and fully automatic requiring no interaction or manual input to complete it’s tasks (other than inserting the disk).  Once it completes a rip it ejects the disc for you and you can pop in another one.

Flowchart of Ripping Process

I uploaded the scripts to GitHub under the MIT license.  As of version 1.1.0 (which pulls in muckngrind4’s changes) the ARM can rip from multiple drives simultaneously, and send push notifications to your phone when it’s complete using Pushbullet or IFTTT.

Instructions to get it installed on Ubuntu 14.04 or 16.04 LTS follows.

Automatic Ripping Machine (Supermicro MicroServer) under my desk

ARM Equipment & Hardware

Blu-Ray Hardware and VMware Settings

The ARM is an Ubuntu Linux 16.04 LTS VM running under a VMware server.  At first I tried using an external USB Blu-Ray drive but the VM didn’t seem to be able to get direct access to it.  My server case has a slim-DVD slot on it so I purchased the Panasonic UJ160 Blu-Ray Player Drive  ($45) because it was one of the cheaper Blu-Ray drives.

I wasn’t sure if VMware would recognize the Blu-Ray functions on the drive but it does!  Once physically installed edit the VM properties so that it uses the host device as the CD/DVD drive and then select the optical drive.

VMware Machine Properties, select CD/DVD drive, set Device Type to Host Device and select the optical drive.

Regions…

I kept getting this error while trying to rip a movie:

MSG:3031,0,1,”Drive BD-ROM NECVMWar VMware IDE CDR10 1.00 has RPC protection that can not be bypassed. Change drive region or update drive firmware from http://tdb.rpc1.org. Errors likely to follow.”,”Drive %1 has RPC protection that can not be bypassed. Change drive region or update drive firmware from http://tdb.rpc1.org. Errors likely to follow.”,”BD-ROM NECVMWar VMware IDE CDR10 1.00″

Defective By Design Logo

After doing a little research I found out DVD and Blu-Ray players have region codes that only allow them to play movies in the region they were intended–by default the Panosonic drive shipped with a region code set to 0.

World Map with DVD Region Codes
CC BY-SA 3.0 from https://en.wikipedia.org/wiki/DVD_region_code#/media/File:DVD-Regions_with_key-2.svg

Notice that North America is not 0.

Looking at http://tdb.rpc1.org/ it looks like it is possible to flash some drives so that they can play videos in all region codes.  Fortunately before I got too far down the flash the drive path I discovered you can simply change the region code!  Since I’m only playing North American movies I set the region code to 1 using:

You can only change this setting 4 or 5 times then it gets stuck so if you’re apt to watch movies from multiple regions you’ll want to look at getting a drive that you can flash the firmware.

Install MakeMKV, Handbrake, ABCDE and At

Mount Samba/CIFS Media Share

If you’re ripping to the local machine skip this section, if you’re ripping to a NAS like I am do something like this…

In FreeNAS I created a media folder on my data share at \\zfs\data\media

Edit /etc/fstab

Once that’s in the file mount the folder and create an ARM and an ARM/raw folder.

Install ARM Scripts

Create a folder to install the Automatic Ripping Scripts.  I suggest putting them in /opt/arm.

You should look over the config file to make sure it suits your needs, if you want to add Android or iOS push notifications that’s where to do it.

Figure out how to restart udev, or reboot the VM (make sure your media folder gets mounted on reboot).  You should be set.

Automatic Ripping Machine Usage

  1. Insert Disc.
  2. Wait until the A.R.M. ejects the disc.
  3. Repeat

Test out a movie, audio cd, and data cd and make sure it’s working as expected.  Check the ouput logs at /opt/arm/logs and also syslog if you run into any issues.

Install MakeMKV License

MakeMKV will run on a trial basis for 30 days.  Once it expires you’ll need to purchase a key or while it’s in BETA you can get a free key…  I would love to build this solution on 100% free open source software but MakeMKV saves so much time and is more reliable compared to anything else I’ve tried.  I will most likely purchase a license when it’s out of beta.

Grab the latest license key from: http://www.makemkv.com/forum2/viewtopic.php?f=5&t=1053

Edit the /root/.MakeMKV/settings.conf  and add a line:

How it Works?

When UDEV detects a disc insert/eject as defined by /lib/udev/rules.d/51-automedia.rules it runs the wrapper which in turn runs /opt/arm/identify.sh which identifies the type of media inserted and then calls the appropriate scripts.  (if you ever need it this is a great command get get info on a disk):

Video Discs (Blu-Ray/DVD)

All tracks get ripped using MakeMKV and placed in the /mnt/media/ARM/raw folder as soon as ripping is complete the disk ejects and transcoding starts with HandBrakeCli transcoding every track into /mnt/media/ARM/timestamp_discname.  You don’t have to wait for transcoding to complete, you can immediately insert the next disk to get it started.

FileBot Screenshot Selecting files for rename

There is some video file renaming that needs to be done by hand.  The ARM will name the folder using the disc title, but this isn’t always accurate.  For a Season of TV shows I’ll name them using FileBot and then move them to one of the Movie or TV folders that my Emby Server looks at.  Fortunately this manual part of the process can be done at any time, it won’t hold up ripping more media.  The Emby Server then downloads artwork and metadata for the videos.

Screenshot of Emby's Movies Page

Audio CDs

If an audio track is detected it is ripped to a FLAC file using the abcde ripper.  I opted for the FLAC format because it’s lossless, well supported, and is un-proprietary.  If you’d prefer a different format ABCDE can be configured to rip to MP3, AAC, OGG, whatever you want.  I have it dropping the audio files in the same location as the video files but I could probably just move it directly to the music folder where Emby is looking.

emby_beethovens_last_night

Data Disks (Software, Pictures, etc.)

If the data type is ISO9660 then a script is run to make a backup ISO image of the disc.

Screenshot of TurboTax ISO file

Morality of Ripping

Two Evils: Piracy vs. DRM

I am for neither Piracy or DRM.  Where I stand morally is I make sure we own every CD, DVD, and Blu-Ray that we rip using the ARM.

I don’t advocate piracy.  It is immoral for people to make copies of movies and audio they don’t own.  On the other hand there is a difference between Piracy and copying for fair use  which publisher’s often wrongly lump together.

What really frustrates me is DRM.  It’s waste of time.  I shouldn’t have to mess with region codes, and have to use software like MakeMKV to decrypt a movie that I bought! And unfortunately the copy-protection methods in place do nothing to stop piracy and everything to hinder legitimate customers.

For me it doesn’t really even matter because I don’t really like watching movies anyway–there’s not much more painful than sitting for an hour to get through a movie.  I just like making automatic ripping machines.

Well, hope you enjoy the ARM.

War Games DVD in Tray

 

 

Gridcoin Mining for Science

GridcoinA cryptocurrency that’s actually productive!  I came across Gridcoin (Ticker: GRC) the other day.

Gridcoin helps with cancer and malaria research as well as studying astronomy and solving math problems.

BOINC LogoIt caught my eye because Bitcoins and most cryptocurrencies require miners to spend a lot of computing power mining hashes that aren’t really useful.  Some argue that this is wasting gigawatts of energy each day.  Gridcoin mining actually pays miners to do useful computing by teaming up with BOINC (Berkeley Open Infrastructure for Network Computing).  BOINC is a way for people to donate the idle time on their computer for various research or grid computing projects.  Gridcoin uses a DPOR (Distributed Proof of Research) mechanism to reward miners by paying out Gridcoin based on the work they do on approved BOINC projects.

Asteriod

Miners can choose to work on a variety of projects.  Solving math problems ranging from computing primes to cracking Enigma messages; researching cures for diseases like cancer and malaria and simulating protein folding; astronomy projects researching asteroids, searching for pulsar stars, mapping the milky way; and various other projects like monitoring wildlife.

Unlike Bitcoin which require specialized ASICS to mine efficiently, because there are a variety of BOINC projects they will most likely be better for general purpose computing hardware.  Some projects are better suited for CPU, some for AMD GPUs, some for NVIDIA, Android devices, etc.  You pick the projects based on the type of hardware you already have.

GRCWithTextOnBottomI don’t see Gridcoin as being very profitable (monetarily) for miners-however, this is a fantastic idea and I hope we see more projects that reward miners for doing actual useful computing!

If you want to get started mining Gridcoin start out by following the directions on the Gridcoin.co Pool.

I’ve been mining for three weeks with a VM (given 8 vCPUs) with the Xeon D-1540 and so far have around 430 coins.  With the current exchange that’s $2.67.  Enough to buy two cheeseburgers.

GRC: SDBycsHrreXXhFAm1nZiYa7SGobBzkVbSH

visual