[Marble-devel] Re: Natural Earth / Vector Files [Was: Marble Sprint]

Sun Oct 10 02:08:48 CEST 2010

On Saturday 09 October 2010 14:45:07 you wrote:
> Yes, Shapefile support has been requested by a few people already. And you
> might want to get in touch with Thibault Gridel to get the interfacing done
> properly.
> 
> However there are a few catches with this approach of adding vector data:
> 
> The current mwdbii data with it's supersimplistic format has several
> important advantages (which you're probably still aware of since your
> investigations during Kartographer time):
> 
> * It already comes with a detail property for each node. This allows us to
> quickly render the polylines since high-detail-nodes can be skipped
> quickly. * It uses up mininmal space in the file since it only stores two
> coordinates and the detail attribute, each of the three values in a short.
> This allows us to ship a quite highly detailed vector map without
> 
> Of course we don't want to make marble slower and we still want to ship
> some vector data with Marble.
> So we need to investigate whether there's a way to get using shapefiles and
> have the same advantages.
> 
> The following solutions come to my mind:
> 
> 1.) We convert the shapefiles to PNT (and do the detail levels via usage of
> the Douglas-Peucker Polygon algorithm). We'd basically end up with the same
> approach but updated country borders.
> 
> 2.) We package shapefiles with Marble which have a good ratio between
> detail and space (I'd still like to keep the Marble install package around
> 15-25 MB, so you do the math ;-) :
> 
> Looking at the Natural Earth data I see that basically there are three
> levels offered: 10, 50 and 110. Maybe we can ship the  coarse level (110
> or 50) prepackaged with Marble and automatically download the higher
> levels in the background.
> 
> There's another catch with the second solution: One of Marble's benefits is
> that it doesn't have any dependencies apart from Qt. Now as long as shape
> file support would stay optional I wouldn't mind linking against e.g.
> libshape. However if we rely on it to display basic vector data like
> countries, etc. then this dependency would not be optional anymore. And
> therefore a hard dependency on libshape wouldn't be an option.
> 
> From what I recall at least the polygon parsing (without the database
> aspect) for shape files is quite "easy" to reimplement. So somebody would
> need to reimplement shapefile parsing (or at least find some properly
> licensed code (ideally PD, BSD or at least LGPL)  that we can integrate
> with Marble in order to parse the files.
> 
> Another catch with the second solution is tackling the amount of detail: I
> don't know whether shapefiles provide detail levels for nodes. If they
> don't then we might consider enhancing our GeoDataLineString class to
> internally apply the Douglas-Peucker algorithm automatically and cache the
> result (there is PD licensed code for it available somewhere, so adding it
> shouldn't be too hard).
> 
> We also had the idea of having Quad-Tile layouted vector files (like with
> the raster data). But this would be even more complicated to add (Since
> we'd rely on even more stuff from the GeoGraphicsView that hasn't been
> implemented yet). So I leave this out for a moment.

Cool.

There's clearly two needs here that I don't see working with a single 
solution:
1) A lightweight and fast vector layer to ship as default with no need to 
identify, select or highlight a vector feature
2) A heavier vector layer with the ability to identify, select and highlight a 
vector feature via linked attribute data, that would be used for things like 
educational apps or plugins.

The main benefit for the lightweight requirement switching to Natural Earth 
(NE) would be updated borders, a separate internal border dataset, and a few 
more physical features, but this should be worth it.

The main benefit for the second case is the embedded data attributes such as 
country code, groupings, relative magnitude, etc, which will allow easy 
linking and manipulation.  We could do something similar using PNT, but we 
would have to create our own attribute storage to link things together, and 
then maintain the attribute data, so sticking with the shapefiles would seem a 
better solution there.

As you say, there's the two approaches that could work for the lightweight 
default layer:
a) Convert just the required NE datasets to pnt format, either merging the 3 
scale levels into a single file with just 3 detail levels, or use Douglas-
Peucker on the 1:10m files to create the required detail levels
b) Implement an internal lightweight shapefile parser without dbf support and 
ship only the required NE datasets.

Some pros/cons to consider:

Shapefiles don't support node attributes at all, and only support feature and 
file attributes via dbf files, so different detail levels require separate  
files, or multiple features in the same file with the dbf attributes to 
identify the detail level.  Either way this would obviously be slower and take 
more disk space and memory than the PNT files.

Visually comparing it's clear that the 1:50m data is less detailed than the 
MWDBII, but the 1:10m data is more detailed.  This would imply to me that we 
would have to use the 1:10m data, but to match the features currently 
displayed from MWDBII using NE shapefiles would take 14 MB which is clearly 
unacceptable (see table of sizes below).  By comparison the PNT files only 
take 2.6 MB.

The 1:10m country file is 6.55MB and contains 533,202 points = 12.28 
bytes/point compared to the PNT which is 745KB and contains 127246 points  = 
5.85 bytes/point, which would suggest the NE data in PNT format would be half 
the size, so 6 MB in total.  This could probably be further reduced by a light 
application of Douglas-Peucker.

Visually comparing it's clear the 110, 50, and 10 datasets do not match each 
other or share common vertices so can't be merged together as the detail 
levels, they would have to be used separately.

The NE shapefiles have been carefully processed so shared borders and 
overlapping features like rivers match exactly and other such niceties, 
applying the Douglas-Peucker algorithm might affect that.

A lightweight shapefile parser would allow users/apps to load other 
shapefiles.

We would have to reconvert and check the data every time there's a new NE 
release which could be a lot of effort, but an automated shp2pnt script could 
prove useful to allow apps/users to display their own shapefiles in a simple 
way.

Overall, it seems the best approach for the updating the lightweight layer is 
indeed to convert the shapefiles to PNT format, provided the D-H algorithm can 
be deployed in a way to mark each point with a detail level rather than just 
throwing the points away.

For the heavier vector layer with attributes, rather than just implementing a 
shapefile importer based on shapelib (a C library), it might be better to 
implement an GDAL/OGR importer which would import any vector format and is a 
C++ library.  That would depend on packaging weight and difficulty, so needs 
more investigation, but that's for another day.

Cheers!

John.

A lot of the NE data is duplicated / re-mixed, the core shapefiles sizes are:

                           1:110m     1:50m      1:10m
                           ------     -----      -----
Admin level 0 countries     172 KB    1.36 MB     6.55 MB
Admin level 0 land borders   39 KB     301 KB      896 KB
Admin level 0 sea borders    12 KB      40 KB       79 KB
Admin level 0 disputed                  40 KB      157 KB
Admin level 1 regions        39 KB     339 MB     13.9 MB *
Admin level 1 land borders   16 KB      60 KB     4.82 MB
Coastlines                   79 KB     883 KB     2.15 MB
Rivers                       19 KB     420 KB     3.29 MB
Lakes                        10 KB     286 KB      786 MB              
Glaciers                     13 KB     208 KB     1.23 MB
Dateline                     18 KB      18 KB       18 KB
Playas                                  18 KB      106 KB
Ice Shelves                            105 KB      211 KB
Minor Islands                                      449 KB
Reefs                                              171 KB
                           -------    -------   ---------
                            417 KB    4.08 MB    34.03 MB

* level 1 regions are USA/Canada only at 110m and 50m, but whole world at 10m, 
perfect for KGeography use :-)

Other useful files:
Physical Features Land      146 KB    1.50 MB      692 KB
Physical Features Sea       348 KB     836 KB      836 MB
Populated Places                       347 KB     1.48 MB
Urban Areas                            439 KB     3.48 MB
Bathmetry                                        11.64 MB

Not sure how useful the 1:110m really is, but could be reduced for use in a 
country picker in kdelibs?