Gist of the Day: Inline::C in Perl

Posted: 4th September 2013 by manchicken in Free Software, Gist of the Day, Hacking
Tags: , , , , ,

I like Perl and I like C (most of the time), and sometimes I like to mix the two. The two main reasons I might want to mix the two is for performance, or because something is written in C which I would like to use from Perl. I’ve only really ever used C from Perl, I’ve never used Perl from C. Today’s demonstration is how to implement a simple binary search algorithm in C, but using Perl internals, and calling the algorithm from Perl. The technologies used in today’s demonstration are Perl, C, and a Perl module called Inline::C.

The Code!

So, here’s the flow of the code:

  1. At compile-time the BinSearch.pm module compiles and links in (via DynaLoader) the C code and any needed libraries
  2. The use_binsearch.pl script creates an array of items
  3. The use_binsearch.pl script calls the do_binsearch() function looking for an item in an array
  4. The do_binsearch() function sorts the array (using Perl’s built-in mergesort in Perl 5.7+)
  5. The do_binsearch() function uses the C function _c_bin_search() to find the item within the list
  6. The _c_bin_search() function performs the binary search using the function compare() defined in the __C__ block to perform comparisons, keeping in mind that a value in Perl can be a string or a numeric type
  7. The _c_bin_search() function returns &PL_sv_undef if nothing is found, otherwise it returns the value it found
  8. The do_binsearch() returns the value returned by the _c_bin_search() function

 The use_binsearch.pl script

Here you can see that I’m exercising my code a little bit. I have an array with mixed types, an array with only one element, an array with no elements, and an array with only numeric elements.

The BinSearch.pm module

Note: In the actual file, the BinSearch.pm.c code is appended to the bottom of the BinSearch.pm code. I’m treating it as two separate files in the Gist for syntax highlighting purposes.

The Use of Inline::C

So the first part of this that you’ll notice is now it’s really weird that the C code is in the same file as the Perl code. This is actually really cool, and there are ways you can remove the code from the Perl into its own file, but I’m not going to cover that here. The most common use – for me at least – of Inline::C involves having the code in the __DATA__ block, which is a special data segment in Perl code. I am not going to cover what the __DATA__ segment is, but if you would like more information about this special literal – and others – please See Here.

The Inline::C module has all sorts of really cool and useful options. I have used Inline::C for simple things (like this demonstration), but also for building data feeders for large C applications in Perl. Inline::C is as simple or as complex as you’re willing to use it.

The largest weakness in Inline::C is portability. Inline::C requires a working build environment with compiler and linker, and you have to give it all of the build options you’ll need. As anybody who has dealt with build processes for larger applications will tell you, this is a very platform-specific task. If you’re making a module you need portability with then you should probably pay special attention to that, or maybe find a Pure-Perl way of performing your task.

Now let’s talk about the code. Here you’re going to see a lot of weird symbols flying, here are the fun ones you’ll see:

  • SV – Scalar Value
  • AV – Array Value (really an array reference in Perl)
  • I32 – A 32-bit integer
  • SvNOK() – A macro to determine whether or not a scalar value is a numeric value (includes number types with and without precision)
  • SvOK() – A macro to determine whether or not a value is actually a valid scalar value to Perl (think defined() in Perl)
  • sv_cmp() – An API function which does a string compare in Perl, taking into account that it’s possible that your two values may include scalars holding numeric strings. It is important to note that this will not do a numeric comparison, only a string comparison.
  • av_len() – An API function which returns the last index of an array (e.g. scalar(@$ARY)-1)
  • av_fetch() – An API function which returns a pointer to a pointer of a scalar value. If the last argument to this function is non-zero then it should always return a valid and usable scalar value. I don’t do that since I want to be able to detect undefined values.
  • PL_sv_undef – This is undef. Keep in mind that my function returns an SV*, so I want to return the address of PL_sv_undef when it is time to return undef.
  • SvREFCNT_inc() – This is an API function which increments the reference counter of a scalar value. Since I am returning the scalar value from a C function, I want to increment this count so it doesn’t release it immediately after my function exits scope. The garbage collector will reap this scalar value later on when the thing that my function returns to releases (or never holds) a reference to the scalar.
  • warn() and die() – These are API implementations of the Perl functions with the same names that you have come to know and love (or not)

For more information on the guts and APIs in use here, see the perlguts POD page, and the perlapi POD page.

Warnings

Please remember that Perl is a reference-counting garbage-collected language. Messing with the reference count carelessly will certainly result in memory leaks or GC reaping which results in crashes. Additionally, great care must be taken when using this sort of code in threads. Thread safety is a tricky thing even in the best of cases, and this is rather far from the happy path. Over-optimization is very common for people who dabble in Inline::C.

It is important to note that the cross-over from Perl to C comes at a high performance cost, especially when done in looping situations. If you need to send large volumes of data to a C function you really should put it into chunks and then process it in chunks.

It should be remembered that both the first and second Rule of Optimization Club is that you do not talk about optimization. The other two relevant rules are that you only ever address one factor of an optimization problem at a time, and you should keep testing a long as you have to keep testing.

You should never use Inline::C to solve a performance problem which you have not measured. If you cannot measure a performance problem then you do not have a performance problem (or you do not have the ability to address it).

Conclusion

I find Inline::C to be very useful, and I think you will too. This is a very simple demonstration of a very large and feature-rich aspect of Perl programming. This is also quite possibly the largest Gist of the Day I have done since I began this effort. Please forgive me when most of the other articles are much smaller than this one.

Here is the link to the full Gist: https://gist.github.com/manchicken/6443144