Swift leaks

Apple released their new language Swift into the world earlier this year and there are a lot of people that still cannot get their arms down about it (to use a great Danish expression), myself included. There are a great many things to enjoy about this new language. Now Swift is at version 1.1 and by no means perfect yet.

Declaration leak

If you were to use the seemingly handy syntax for creating a constant property on your new and fresh Swift UIViewController, you will find that your constant is leaked. The following example will leak a UITableView:

class LeakyViewController : UIViewController {
    let tableView = UITableView() // leaked
    
    func viewDidLoad() {
        // your normal usage of the table view
    }
}

How? After spending hours of quality time with Instruments I can say a few things about it. The UIViewController class is created along with the constant property, then the property is recreated leaking the first property creation.

The remedy is easy enough, but it does sadden me that we are really denied this shorthand syntax, since the alternative makes us duplicate code like with the following non-leaking example:

class WaterProofViewController : UIViewController {
    let tableView:UITableView
    
    override init() {
        tableView = UITableView()
        super.init(nibName: nil, bundle: nil);
    }

    required init(coder aDecoder: NSCoder) {
        tableView = UITableView()
        super.init(coder:aDecoder)
    }

    func viewDidLoad() {
        // your normal usage of the table view
    }
}

For now we will just have to write some extra code until this issue is fixed, so keep enjoying Swift folks.

The dream of multi-threaded rsync

As a former maintainer of a rather large datastore I know the problems of maintaining a backup of your datastore.

The common approach is to use rsync on a regular basis to keep your backup up to date. If you have a few millions files to keep backups of, then each rsync session will take much longer than when you only had a few thousand files.

Keeping the backup window as small as possible is key when you want to limit the loss of data when disaster strikes.

If you are like me, you will have found through trial and error that multiple rsync sessions each taking a specific ranges of files will complete much faster. Rsync is not multithreaded, but for the longest time I sure wished it was.

An idea was born

I was reading about some shell programming somewhere online and found the missing tool I needed to make rsync "threaded". The missing tool was wait which waits for the current forked processes to complete.

The idea is to create a bunch of forked processes to act as threads for the backup process. There are a few prerequisites for the way I have chosen to implement what I dubbed megasync. They are:

  • The primary server and backup server must be running a Linux or Unix system.
  • A few commands and paths might need to be changed if your primary server is not running FreeBSD.
  • The user running megasync is set up with passwordless ssh access to the backup server.
  • The files should be divided into many directories with a similiar amount of files in each.
  • The directories containing the files must have a single shared parent directory.
  • There should only be files in directories that are a certain deepth into the directory structure.

Putting theory into action

The lazy reader may skip the theory and explainations and go direct to the megasync.sh file.

To put the theory into actually code I will make a few assumtions for the purposes of explaining:

  • The data is located in a directory named /data/ on both the primary and backup server.
  • The data is divided using the following pattern /data/​<department>​/<client id>​/<short hash>​/<short hash>/ for example.
  • You feel that 6 thread is the right number for you.

Given the prerequisites and the assumtions the execution plan is as follows:

  1. List every directory from /data/ with a deepth of 4.
  2. Divide the list into 6 equal parts.
  3. Fork and wait for 6 processes that creates all the output directories on the backup server.
  4. Fork and wait for 6 processes that rsync each of the directories to the backup server.

Step 1 - listing

Use the find command with parameters defining a maxdepth of 4 and type of directories. This will give a list of directories, but it will include paths to directories that are just one, two and three levels in. We can fix this by using a regex grep. So the commands that will create the basis for megasync is:

depth=4
localpath=/data/
tmpdir=/tmp/rsync.$$

regex=""
for i in $(jot - 1 $depth); do
        regex="$regex[^/]*/"
done

find $localpath -maxdepth $depth -type d | grep "$regex" > $tmpdir/dirlist

This is why it is important that there should only be files in directories that are a certain deepth into the directory structure.

Step 2 - dividing

Now that we have a list of the directory that needs to be backed up, we need to divide it into six equal parts. Naturally we create a convoluted while loop to make number named files with an extension of .dirlist.

rsyncs=6
tmpdir=/tmp/rsync.$$

total=$(wc -l $tmpdir/dirlist|cut -d\/ -f1|tr -d ' ')
n=$(expr $total / $rsyncs)

if [ "$total" = "0" ]; then
        echo "No directories to sync you dumbass.";
        exit 1;
fi

offset=$n
i=0
while true; do
        tail=$(expr $total + $n - $offset)
        if [ $tail -gt $n ]; then
                tail=$n
        fi
        head -n $offset $tmpdir/dirlist | tail -n $tail > $tmpdir/$i.dirlist
        c=$(wc -l $tmpdir/$i.dirlist|cut -d\/ -f1|tr -d ' ')
        if [ "$c" = "0" ]; then
                rm $tmpdir/$i.dirlist
                break
        elif [ $c -lt $n ]; then
                break
        fi
        i=$(expr $i + 1)
        offset=$(expr $offset + $n)
done

Yawn... Is the math over, yet? Good. Moving on.

Step 3 & 4 - forking

Before we can backup using rsync this way, we need to ensure the destination directories exists on the backup server. Really it is simply a mkdir command for each directory, but let us do it threaded anyway. Forking a while loop making the actual directories from inside a for loop and then using the wait command to make sure all the directories are created before continueing. The wait command is just awesome.

rsyncs=6
tmpdir=/tmp/rsync.$$
userandhost=backup@backup.example.com
remotepath=/data/
rsyncopts="-a"

for i in $(jot - 1 $rsyncs); do
        while read r; do ssh $userandhost "mkdir -p $remotepath$r" ; done < $tmpdir/$i.dirlist &
done
wait
for i in $(jot - 1 $rsyncs); do
        while read r; do /usr/local/bin/rsync $rsyncopts $r $userandhost:$remotepath$r 2>&1 | tee $tmpdir/$i.dirlist.log ; done < $tmpdir/$i.dirlist &
done
wait

The actual rsync commands are forked in the same way as creating the directories, so I will spare you the same explaination twice.

Usage

Grab your copy of megasync.sh now and place it somewhere handy.

You can use it like so:

#!/bin/sh
sh megasync.sh /data/ 4 6 "-a" backup@backup.example.com /data/

If, by chance, anything goes wrong, here is a kill script:

#!/bin/sh
ps auxww | grep megasync.sh | grep -v grep | awk '{print $2}' | xargs kill
killall rsync

Asymmetric encryption in PHP

One of the use cases of asymmetric encryption is to allow others to send you encrypted data that only you can read. No one says the receiver and sender are both running PHP, in fact there will be multiple language examples available.

This type of encryption is also refered to as public key cryptography, because it requires you to use a private key and a public key.

Let us start with creating those, like so:

// create the private key private.key
openssl genrsa -out private.key 2048
// create the public key public.pem
openssl rsa -in private.key -outform PEM -pubout -out public.pem

Mind the pitfalls

You might be tempted to use the openssl_​public_​encrypt and openssl_​private_​decrypt methods, but be warned they are not really useful. They only support very small data sizes, for instance on my mac the maximum size of input is 245 bytes.

The encryption and decryption methods you will want to use is openssl_​seal and openssl_​open, but they do require a little extra of you as a developer. You need to manage not just the encrypted bytes, but also some extra bytes that match the public key used to encrypt the data. This is because the sealing method allows you to encrypt the same data for multiple recipients.

There and back again

Below is an example of how to generate sealed data and an envelope and use them to recreate the original data:

<?php

$private_key = openssl_get_privatekey(file_get_contents('private.key'));
$public_key = openssl_get_publickey(file_get_contents('public.pem'));

$data = '{"data":"makes life worth living"}';

echo "data in:\n$data\n\n";

$encrypted = $e = NULL;
openssl_seal($data, $encrypted, $e, array($public_key));

$sealed_data = base64_encode($encrypted);
$envelope = base64_encode($e[0]);

echo "sealed data:\n$sealed_data\n\n";
echo "envelope:\n$envelope\n\n";

$input = base64_decode($sealed_data);
$einput = base64_decode($envelope);

$plaintext = NULL;
openssl_open($input, $plaintext, $einput, $private_key);

echo "data out:\n$plaintext\n";

Below is some example output, but note that the output will be different each time you generate sealed data and an envelope.

data in:
{"data":"makes life worth living"}

sealed data:
ZDrH0um1qRyFiQMOivlS6taxLrR+KyXH3cDAcqgcxWOPCw==

envelope:
x9qSCAoyx6ueTFH5cyosPpUhye0hlBvWxF7DxniLNBv/EpIsebXqHhCh4zhTqnaNFS+48PewNZbGUwnkMCLr8MrpMr5mNxtrovcGmhHL5pwBovyUorHcGeiQHN3QXn9n4vDVPGZuEnPw3SZxqw8HqItYyjuXsrxtCdN4nHlwwRJ9s37kXYr+Y8UQ7gzMRbYoO4E188RnWt7HhvKg08emRJHCRzW5YJDOx1gxd0+qE1EMjXGpfw0WB9lacl09Sg4tdsrMDIvKu2Fi21c7HD9Er21dmGUaq465a0zRYqLaDz476RYlTim40BdjDPPHb1TJGBM4BD+ElkI8YbXJ7AjfAQ==

data out:
{"data":"makes life worth living"}

For examples in other languages check out the unsealed-secrets at github.

Enjoy your new knowledge and your data being safer.