[cad-linux] Re: Design methodology for an open geometric data management standard

  • From: Janek Kozicki <janek_listy@xxxxx>
  • To: cad-linux@xxxxxxxxxxxxx
  • Date: Sun, 7 Sep 2003 19:47:37 +0200

As a general reply to your post. I'd like to allow myself to post what
Bruno said about data management standard. I think it's important and
should be said here.

Bruno said:

> Here are my requirements for a properly open and RCS-able CAD file
> format:

> 1. Text only, <cr><lf> delimited, human readable.

> 2. Blocks/references have full inheritance and polymorphism.

> 3. Resources (images, multiline text blocks) exist as separately
>    editable files, not embedded in larger files.

> 4. Changes to drawing objects can be made in-place without
>    regenerating/reordering the whole dataset.
>
>I was going to do something else this afternoon :-) Anyway, here is
>Bruno's Ideal CAD File Format; I really think that something this
>hare-brained is needed to bring Computer Aided Design up to speed
>with developments in Free Software.
>
>(Apologies if this is a bit technical, it assumes some experience of
>CAD as well as file-formats in general)
>
>
>Features
>--------
>
>o  A Drawing is a directory full of files.
>
>   This is the one radical idea, instead of encapsulating all the
>   data into a big structured file format like XML, simply use the
>   file-system as a structured data store.
>   
>   The idea comes from the Maildir format for storing mail; this has
>   lots of advantages over the traditional mbox single-file format
>   (random-access, easy searches, simple deletes, simple appends, no
>   locking).  With normal file-systems, Maildir performs just as
>   well as a single-file format until there are tens of thousands of
>   items in one directory - With modern file-systems like ReiserFS,
>   there are no performance issues with millions of items.
>
>
>o  One file per object.
>
>   Every basic object; line, circle, spline, text, reference etc..
>   is stored in a single file, one object per file.  This means that
>   if your drawing consists of 1000 circles, then there will be
>   1000+ files in the directory representing that drawing.
>
>
>o  Packaging via zip.
>
>   Obviously thousands of files will seem extremely strange to
>   people who are accustomed to one-file-per-drawing.
>
>   For packaging purposes, all the drawing data can be simply
>   zipped-up; this is the strategy adopted by OpenOffice.org - where
>   each word-processing file is actually a zip file containing
>   several other files.
>
>   (tar.gz is better for this job, but zip has wider availability)
>
>
>o  Persistent names for all objects.
>
>   Each object needs a key/name/filename, this should be unique and
>   persistent through the lifetime of the object.  Something like
>   this:
>
>       <HOSTNAME>-<UNIX_DATE>-<PID>-<INODE>.<OBJECT_TYPE>
>
>   Here's an example:
>
>       celery-1045317918-1278-225423.circle
>
>   Some objects will need to be renamed so they can be remembered
>   later, this name will then be accessible from within the CAD
>   program:
>
>       enough-room-to-swing-a-cat.circle
>
>
>o  One line per data item.
>
>   The entire contents of a file might look something like this:
>
>       Centre-X: 345.678
>       Centre-Y: 9876.543
>       Centre-Z: 0.0
>       Radius: 246.8
>       Transform: 1, 0, 0, 0, 1, 0, 0, 0, 1
>       Units: millimetres
>
>   Each line is <CR><LF> separated so that it can be reliably edited
>   on various platforms, all data would be UTF-8.
>
>   The diff for this after simply changing the radius of the circle
>   would be human readable and look like this:
>
>       diff .celery-1045317918-1278-225423.circle 
> celery-1045317918-1278-225423.circle
>       4c4
>       < Radius: 246.8
>       ---
>       > Radius: 250.0
>
>
>o  Object properties stored in lookup tables.
>
>   Geometrical data, like a circle's radius, needs to be stored in
>   the object itself; other types of property such as: layer, color,
>   linetype, visibility etc.. need to be stored in lookup tables.
>
>   This is one of the features of existing file formats that needs
>   to be preserved.  For example, in AutoCAD, "layer 0" doesn't
>   indicate that a circle is "in layer zero", it means that the
>   circle "doesn't have a layer defined".
>
>   So there may be a lookup table for "layer" with a name like this:
>
>       celery-1045317959-1278-3652.layer
>
>   ..it might have content like this:
>
>       celery-1045317918-1278-225423.circle:  walls
>       celery-1045317921-1278-3565.circle:  walls
>       celery-1045318034-1278-36467.circle:  windows
>       celery-1045318056-1278-466875.circle:  windows
>       enough-room-to-swing-a-cat.circle:  construction
>
>   Any items not listed in a layer table would simply not have an
>   assigned layer.
>
>
>o  Blocks, Xrefs, groups and symbols are all the same thing.
>
>   As soon as you start treating a drawing as "a directory full of
>   files", other things start becoming very obvious; for instance, a
>   block/symbol within a drawing is simply a sub-directory inside
>   the drawing - This sub-directory is then editable as a drawing in
>   it's own right.
>
>   Of course this needs to be referenced by the parent drawing in
>   order to be used, so we have another "reference" object type
>   similar to line, circle etc..
>   
>   A reference to a block might contain data like this:
>
>       Centre-X: 567.890
>       Centre-Y: 7654.321
>       Centre-Z: 0.0
>       Location: celery-1045318078-4358-6447.drawing/
>       Transform: 1, 0, 0, 0, 1, 0, 0, 0, 1
>       Units: millimetres
>
>   An Xref, which is normally an external embedded drawing, would
>   have an almost identical format, except with a qualified path:
>
>       Location: ../windows/casement-full-height.drawing/
>
>   Normal, non-drawing objects can be referenced:
>
>       Location: celery-1045317918-1278-225423.circle
>
>   Embedded data can be referenced too:
>
>       Location: logo.png
>
>       Location: ../specification/windows.txt
>       
>
>o  Polymorphism via diff/patch files.
>
>   Any aspect of an object can be overridden, by referencing an
>   associated diff file at the same time.
>   
>   So if you want a circle that is exactly the same as another
>   circle in every way, except for the radius; then your reference
>   file would have this line in it:
>
>       Patch: celery-1045318234-1278-67457.patch
>   
>
>Advantages
>----------
>
>o  Easy access to data using standard non-CAD tools.
>
>   Since the data is extremely parse-able, difficult things like
>   database and report generation become easy.
>
>   For example, if you need to generate a door schedule from a
>   drawing set, simply use the grep tool to find all references to
>   doors - Then search the results to calculate number of
>   left-handed doors etc..
>
>   Mass manipulation of data is also practicable; need to find all
>   lines that are on layer "walls" and that have Z coordinates
>   greater than zero? that's a simple Perl one-liner; want to delete
>   them? that's easy, just `|xargs rm`; want to move them to another
>   drawing entirely? that's easy too.
>
>
>o  Diff files are clean and descriptive.
>
>   Most diff files should be (almost) human readable, inspection of
>   revised drawings becomes an examination of diff files rather than
>   hunting around for a "revision-cloud".
>
>   CAD software should help by allowing users to visually browse all
>   the differences.
>   
>
>o  RCS-able data, revisions are managed by CVS or Subversion.
>
>   By making all data Revision Control System friendly, drawing
>   management becomes much easier.  Big teams can have access to
>   current data via CVS even when geographically separated.
>
>   Rolling back to specific release dates is simple and reliable,
>   changes can even be browsed with standard free web-interfaces.
>
>   Systems like CVS and Subversion can be properly secured with
>   access permissions and data encryption; plus many drawing
>   repositories will want to be publicly readable, this is easy too.
>
>   With Subversion, each drawing would have a public permanent URL
>   accessible via the web.
>
>
>o  Multiple users can edit the same drawing at the same time.
>
>   All files are small and contain small amounts of atomic data,
>   this means that there are no file-locking issues whatsoever.
>   
>   It would be a little crazy, but there is no reason why two people
>   couldn't be editing different ends of a floor plan at the same
>   time - The CAD program could even update the display dynamically.
>
>
>o  Fast saving.
>
>   Opening one of these drawings is going to be a slow operation,
>   but saving will be fast and efficient.  You save more often than
>   you open, so this may lead to better performance overall.
>
>
>o  Merging two drawings is achieved by collapsing directory trees.
>
>   Since all objects within a drawing have unique names, two or more
>   drawings can simply be merged together by dropping all the files
>   into the same place.
>
>   Exploding a block is the same as moving all the files from a
>   subdirectory into the parent.
>
>
>o  Extensible file format.
>
>   Since any one CAD program would only ever manipulate the object
>   types that it knows about, other suppliers could add extensions
>   that wouldn't screw up.
>
>   For example, CAD program A might only know about simple objects
>   like line, circle, arc and text; if CAD program B starts creating
>   complex objects like mesh, sphere and door, then A should simply
>   be able to ignore them entirely without even touching the files.
>   
>
>-- 
>Bruno


I just copied this from freearchitecture mailing list (which
unfortunately seems now quite dead). I hope Bruno will not get angry for
this :)

janek kozicki

Other related posts: