[argyllcms] Re: Argyll CMS in Fedora (and Mandriva)

  • From: Graeme Gill <graeme@xxxxxxxxxxxxx>
  • To: argyllcms@xxxxxxxxxxxxx
  • Date: Sat, 22 Dec 2007 19:24:54 +1100

C wrote:

I wonder if there's any tool to implement
ICC aware transforms on spooled print jobs for
CUPS / Foomatic / Gutenprint.  It seems like one can
embed a ICC tag into files that are to be printed but
AFAIK there's nothing that actually does (as a print filter /
option) anything with that such as converting from source ICC
to printer colorspace.

Doing this with higher level PDL's is non-trivial (ie.
PDF, PostScript). You need a RIP really, and commercial
products that do this sort of function sell for thousands
of dollars.

I notice that some of ArgyllCMS's calculations can be a bit
CPU intensive.  I wonder if the following compilation options
could be of help in performance:

Yup.

http://gcc.gnu.org/gcc-4.2/changes.html
 > New Targets and Target Specific Improvements
 > IA-32/x86-64
 >
> * -mtune=native and -march=native will produce code optimized for the host architecture as detected using the cpuid instruction.

It may be worth a try, but I'd be surprised if it made much difference.
The difference between debug and optimized isn't that great for instance.

The only significant approach would be to recode some of the
core algorithms to run on multiple CPU's (yes, on the wish list,
but not likely soon).

The pixel conversion engine code (imdi) can be speeded up
by a factor of 2 if it's run on a 64 bit machine,
but I would guess you are not talking about that
aspect.

It might be a relatively easy way to parallelize some of the compute intensive tasks across multiple CPU cores without a lot of code development overhead, and
without breaking the compilation on compilers / CPUs that don't support
OpenMP or multi-cores.

A lot of it is parellizable, but it is not perfectly straightforward.
The curve optimization code would need some careful though to thread,
and it tends to dominate the forward profile construction now (the
rspl code is relatively fast on modern processors, although it
could be threaded relatively easily I think).
The inverse lookup code is a massively parallel problem, so
there's lots of scope there, but it's complicated by the
presence of the intermediate calculation cache. If the
cache was to be retained, it would need locking etc.,
and could easily be a bottleneck.

Graeme Gill.


Other related posts: