Dr. MonkeyIQ: May 2013

Wednesday, May 29, 2013

FontForge: Rounding out the platforms for binary distrubution

Earlier this year I made it simple to install FontForge on OSX. The process boiled down to expanding a zip file into /Applications. The libraries that fontforge uses have been all tinkered to work from inside the package, and the configuration files and other dynamically opened resources and theme are sought in the right place too.

Now after another stint I have FontForge running under 32bit Windows 7. So finally I had a use for that other OS sitting on my laptop for all this time ;) The first time I got it to run it looked like below. I created a silly glyph to make sure that bezier editing was responsive...

The plan is to have the theme in use so nice modern fonts are used in the menu, and other expected tweaks before making it a simple thing to install on Windows.

One, IMHO, very cool thing I did to get all this happening was to use the OpenSUSE Build System (OBS) to make the binaries. There are some DLL and header file drops for X floating around, but I tend to like to know where libraries that are being linked into the program have come from. Call me old fashioned. So in the process I cross compiled chunks of X Window for Windows on the OBS servers. My OBS win32 support repository contains these needed libraries, right through cairo and pango using the Xft backends to render.

There is a major a schism there: if you are porting a native GTK+2 application over to win32, then you will naturally want to use the win32 backends to cairo et al and have a more native win32 outcome. For FontForge however, the program wants to use the native X Window APIs and the pango xft backend. So you need to be sure that you can render text to an X Window using pango's xft backend to make your life simpler. That is what the pangotest project I created does, just put "hello world" on an X Window using pango-xft.

A big thanks to Keith Packard who provided encouragement at LCA earlier this year that my crazy cross compile on OBS plan should work. I had a great moment when I got xeyes to run, thinking that things might turn out well after the hours and hours trying to cross compile the right collection of X libraries.

I should also mention that I'm looking for a bit of freelance hacking again. So if you have an app you want to also run on OSX/Windows then I might be the guy to make that happen! :) Or if you have cool C/C++ work and are looking to expand your team then feel free to email me.

Sunday, May 19, 2013

Some amateur electronics: hand made 8x8 LED matrix

So I made an 8x8 matrix of LEDs in a common cathode arrangement. Only one column is ever on at any time, but they cycle from left to right so quick that you and your camera can't get to see that little artefact. This does save on power though so the whole layer can be run directly from the arduino LeoStick in the top right of picture. Thanks again to Freetronics for giving those little gems away at LCA!

8x8 LED matrix, two 595 shifties and a ULN2003 current sink from Ben Martin on Vimeo.

The LEDs to light on a row are selected by a 595 shift register providing power for each row. The resistors are on the far right of the grid leading to that shift register. The cathodes for each individual column are connected together leading to the top of the grid (as seen in the video). Those head over to a uln2003 current sink IC. In the future I'll use either two 2003 chips or one single 2803 (which can do all 8 columns at once) to get the first column to light up too.

The uln2003 is itself controlled by supplying power to the opposite side to select which column's cathodes will be grounded at any given moment. The control of the uln2003 is also done by a 595 shift register which is connected to the row shifty too. The joy of all this is you can pump in the new state and latch the shift registers at once to apply the new row pattern and select which column is lit.

The joy of this design is that I can add 8x8 layers on top at the cost of 8 resistors and one 595 to perform row select.

There are also some still images of the array if you're peaked.

The 595 chips can be had for around 40c a pop and the uln2003 for about 30c. LEDs in quantity 500+ go at around 5-7c a pop.

The code is fairly unimaginative, mainly to see how well the column select works and how detectable it is. In the future I should setup a "framebuffer" to run the show and have a timer refresh the array automatically...

#define DATA   6
#define LATCH 8
#define CLOCK 10 // digital 10 to pin 11 on the 74HC595

void setup()
{
pinMode(LATCH, OUTPUT);
pinMode(CLOCK, OUTPUT);
pinMode(DATA, OUTPUT);
}

void loop()
{
int i;
for ( i = 0; i < 256; i++ )
{
    int col = 1;
    for( col = 1; col < 256; col <<= 1 )
    {
      digitalWrite(LATCH, LOW);
      shiftOut(DATA, CLOCK, MSBFIRST, col );
      shiftOut(DATA, CLOCK, MSBFIRST, i   );
      digitalWrite(LATCH, HIGH);
    }

    delay(20);
}
}

Wednesday, May 8, 2013

Save Ferris: Show some love for libferris...

Libferris has been gaining some KDE love in recent times. There is now a KIO slave to allow you to see libferris from KDE, also the ability to get at libferris from plasma.

I've been meaning to update the mounting of some Web services like vimeo for quite some time. I'd also like to expand to allow mounting google+ as a filesystem and add other new Web services.

In order to manage time so that this can happen quicker, I thought I'd try the waters with a pledgie. I've left this open ended rather than sticking an exact "bounty" on things. I had the idea of trying a pledgie with my recent investigation into the libferris indexing plugins on a small form factor ARM machine. I'd like to be able to spend more time on libferris, and also pay the rent while doing that, so I thought I'd throw the idea out into the public.

If you've enjoyed the old tricks of mounting XML, Berkeley DB, SQLite, PostgreSQL and other relational databases, flickr, google docs, identica, and others and want to see more then please support the pledgie to speed up continued development. Enjoy libferris!

Thursday, May 2, 2013

Indexing on limited hardware... what to do

Libferris supports many indexing libraries and technologies through its plugin interface. Larger systems can use a PostgreSQL plugin which is tailored explicitly to get the most out of that RDBMs for larger file server indexes. For smaller end, there are memory mapped files, clucene, soprano, or SQLite. I've been doing some tinkering trying to milk extra performance out of the indexing plugins for ARM machines lately. Note that if you are using debian, the CLucene you'll want is the 2.x series, currently only packaged for experimental.

For testing purposes I built a fairly tiny index of only 130k files. An interesting test case is looking for specific files which have paths that match against a regular expression and returns a fairly small chunk of results. For this case, about 115 resulting files using a four character substring search as the regex. These are a common query for looking for files when you don't recall the exact ordering of the directory names or where a directory was. Small number of results, regex to pick them.

The memory mapped index implementation (boostmmap) uses boost IPC and multi indexed collections created in memory mapped files to maintain the index. The index has also a digram index for each URL allowing regular expressions to resolve through index rather than needing evaluation against full URLs.

The SQLite index is fairly vanilla and doesn't include many customizations for sqlite. Whereas the PostgreSQL index implementation does use many of the features specific to that database. Neither the SQLite or boostmmap indexes in the public libferris repo attempt to do any compression on URL strings or the like.

A fairly basic index on 130k files is about 80mb using either memory mapped files or SQLite. Caches are cleared by echo 3 > drop_caches. Using an odroid-u2 with emmc flash, on a cold cache the SQLite index comes out about 10% faster than the boostmmap for a query finding 115 files. Turning off the regex prefilter index in the boostmmap makes it 10% slower again. This is a trade off, a very fast CPU and a disk with great file location and single extents will show less or no difference with the prefilter as reading 80mb from disk will take less time and the CPU can run 130k regexes very quickly. The prefilter requited only 124 regex evaluations, without the prefilter all 130611 URLs needed a regex evaluation.

The interesting part is with a warm cache the boostmmap is about twice as fast overall as the SQLite index. This is a big difference as the timing is for overall complete run time from the command line, and there is some overhead in starting up the index query itself. As usual, things vary depending on if you are expecting frequent queries (warm cache), have a very fast CPU (regex eval is relatively less costly), or need multiple updaters (SQLite allows it, my memory mapped doesn't).

To then experiment a little further, I brought the ferris clucene plugin into the mix. I disabled the explicit prefilter index on regex code for initial testing, the index became about 70mb and could resolve the query on a cold cache in about 65% the time of the SQLite plugin. On warm cache the clucene was slowest, which is mainly due to the prefilter being disabled and the fallback code making the URL query a WildcardQuery with no pre or postfix to anchor the query on.

Next time around I'll see how speed effective the prefilter index is on clucene. I know it slows down adding documents (you are indexing more), and is larger (I haven't optimized for index size), but it will be interesting to see the performance on the eMMC device for the prefilter.

Filesystem Indexing: Taking the reins

To index data using a small ARM CPU without much RAM you might like to break the indexing run down into many parts, and get more explicit control over what is happening. The below will index all files on /DATA-PATH in batches of 5000 files at a time with libferris. This will use whatever index plugin you have setup for ~/.ferris/ea-index (the default metadata index). Be that PostgreSQL, SQLite, boost memory mapped files, clucene or whatever.

I'm currently racing the boost memory mapped index with the SQLite backed index on simple URL queries against the filesystem. This is being done on about 2ghz ARM machines with either 512 or 2048mb of RAM. The boostmmap plugin is of my own design and contains some smarts while executing regular expression matching against unanchored strings (.*foo.*). Unfortunately the boostmmap plugin is not as smart as it could be regarding scattered updates, transactions, and journaling, which slows it down a bit in the index creation phase relative to the SQLite plugin.

The below is a skeleton bash script to get started adding files. Another option is to ssh into the remote host and run find(1) there which can be much faster over network filesystems. The whitelist environment variable is to override which metadata libferris will index. If your index indicates it wants sha and md5 digests, the act of calculating those can dominate indexing time. An explicit whitelist keeps index adding times down with the obvious side effect of limiting what you can use in your queries. Such a limited list of metadata as in the below brings the index closer to what locate provides.

#!/bin/bash

rm -rf /tmp/fidxtmp
mkdir -p /tmp/fidxtmp
cd /tmp/fidxtmp

find /DATA-PATH | split -l 5000

export LIBFERRIS_EAINDEX_EXPLICIT_WHITELIST="name,size,mtime"

for if in x*
do
echo "adding $if..."
cat "$if" | feaindexadd -v -1 >>/tmp/ferris-index-progress.txt
done

Then you can find all your PDF files for example using the following:

feaindexquery -Z '(url=~pdf)'

The -Z tells libferris not to try to lstat() or resolve URLs to see if they exist currently. Much faster results but at the cost of not weeding out things which might have moved since they were last indexed.

And all the files which have been written this year

feaindexquery -Z '(mtime>=begin this year)'

Unfortunate about needing the quotes as bash wants to do things with naked parenthesis.

Save Ferris! Or just donate to an open source project or organization of your choice if you like the ferris posts.