Unfair and pointless performance race: bash vs C

🐢

You know, when I was writing my crispy doom install script for Raspberry Pi, I was thinking: how slow is bash, really?

Bourne-again shell scripts are for gluing commands together, not for crunching numbers of course. But I wanted to try it, because I was curious.

If you want to know what the program does, see Project Euler problem 92. You might not want to read the source code if you want to fix it yourself. By the way, there are more efficient implementations that run 2 magnitudes faster, but this one is more simple and I didn't have the brains for the better one anyways.

First the C version(I cherry-picked the fastest due to some run-to-run variance):

cc -O3 -o sqdigchains2 -Wall -march=native -mtune=native sqdigchains2.c
time ./sqdigchains2
8581146

real    0m1.639s
user    0m1.638s
sys     0m0.001s

Not bad. Now bash:

time ./sqdigchains.sh
8581146

real    238m10.498s
user    238m3.131s
sys     0m1.190s

Hmm. That seems slower. About 8719 times slower. Still, within 4 orders of magnitude. I already optimized the bash version and it would take 50% less time compared to the old one, mainly by manually inlining stuff. Everything is running on a Raspberry Pi 3 Model A+. This solution is about 88% slower than a C implementation on a Arduino Uno/Atmega 328p

The time seems to be from interpeting the script. The bash version actually has to calculate more, probably. But it contains less variables. The C version, the compiler does all sort of clever stuff. Reordering instructions so it doesn't stall the pipeline. Plus it's machine code, and it can use branch prediction effectively, etc.

#include <stdio.h>
#include <stdlib.h>

#define TEST_RANGE 10000000

static inline size_t digit_sum_square(size_t n)
{
	size_t sum = 0;
	size_t a = 0;
	size_t b = 0;
	while (n > 0)
	{
		a = (size_t)(n / 10);
		b = n % 10;
		b *= b;
		sum += b;
		n = a;
	}
	return sum;
}

char chain(size_t n)
{
	while (n != 1)
	{
		if (n == 89)
		{
			return 89;
		}
		
		n = digit_sum_square(n);
	}
	
	return 1;
}


int main(int argc, char **argv)
{
	size_t eightynine = 0;
	for (size_t i = 1; i < TEST_RANGE; i++)
	{
		if (chain(i) == 89)
		{
			eightynine++;
		}
	}
	printf("%zu\n", eightynine);
	
	
	return 0;
}

#!/usr/bin/env bash

i=1
en=0 # short for eightynine
while (( i < 10000000 ))
do
	n=i
	
	while (( n != 1 ))
	do
		if (( n == 89 ))
		then
			((en++))
			break
		fi
		t=$n
		n=0
		
		while (( t > 0 ))
		do
			# you'd probably think (( b = t % 10 )) and then (( n = n + b * b )) is faster, but the bottom seems to be faster
			(( n = n + (t % 10) * (t % 10) ))
			(( t = t / 10 ))
		done
	done
	((i++))
done

echo $en

That's all for this pointless test. Have a nice one!

BONUS UPDATE (on the same day): strata sent me a perl implementation of the program!

time ./sqdigchains.pl
8581146

real	7m41.425s
user	7m40.883s
sys	0m0.051s

Much faster than bash, ~282 times slower than the C version. 2 orders of magnitude instead of 4. Perl is also a script language, but it also has a compile stage, which creates a syntax tree, and then walks on it. During this compiling it also does some optimizations, such as constant folding and peephole optimization(all according to wikipedia) and you can clearly see it back in the runtimes. Strata was surprised, he thought it would take much longer considering how slow bash was.

#!/usr/bin/env perl

use strict;
use warnings;
use 5;

my $TEST_RANGE = 10000000;

sub digit_sum_square {

  my $n = shift;

  my ($sum, $a, $b) = 0;
  while ($n > 0) {
    $a = int($n / 10);
    $b = $n % 10;
    $b *= $b;
    $sum += $b;
    $n = $a;
  }

  return $sum;
}

sub chain {

  my $n = shift;

  while ($n != 1) {
    return $n if $n == 89;
    $n = digit_sum_square($n);
  }

  return 1;
}

sub main {
  my $eightynine = 0;
  for (1..$TEST_RANGE-1) {
    if (chain($_) == 89) { $eightynine++ }
  }
  printf "%zu\n", $eightynine;
}

main;

Thank you, strata!