[Marble-devel] Re: Natural Earth / Vector Files [Was: Marble Sprint]
John Layt
johnlayt at googlemail.com
Sun Oct 10 02:08:48 CEST 2010
On Saturday 09 October 2010 14:45:07 you wrote:
> Yes, Shapefile support has been requested by a few people already. And you
> might want to get in touch with Thibault Gridel to get the interfacing done
> properly.
>
> However there are a few catches with this approach of adding vector data:
>
> The current mwdbii data with it's supersimplistic format has several
> important advantages (which you're probably still aware of since your
> investigations during Kartographer time):
>
> * It already comes with a detail property for each node. This allows us to
> quickly render the polylines since high-detail-nodes can be skipped
> quickly. * It uses up mininmal space in the file since it only stores two
> coordinates and the detail attribute, each of the three values in a short.
> This allows us to ship a quite highly detailed vector map without
>
> Of course we don't want to make marble slower and we still want to ship
> some vector data with Marble.
> So we need to investigate whether there's a way to get using shapefiles and
> have the same advantages.
>
> The following solutions come to my mind:
>
> 1.) We convert the shapefiles to PNT (and do the detail levels via usage of
> the Douglas-Peucker Polygon algorithm). We'd basically end up with the same
> approach but updated country borders.
>
> 2.) We package shapefiles with Marble which have a good ratio between
> detail and space (I'd still like to keep the Marble install package around
> 15-25 MB, so you do the math ;-) :
>
> Looking at the Natural Earth data I see that basically there are three
> levels offered: 10, 50 and 110. Maybe we can ship the coarse level (110
> or 50) prepackaged with Marble and automatically download the higher
> levels in the background.
>
> There's another catch with the second solution: One of Marble's benefits is
> that it doesn't have any dependencies apart from Qt. Now as long as shape
> file support would stay optional I wouldn't mind linking against e.g.
> libshape. However if we rely on it to display basic vector data like
> countries, etc. then this dependency would not be optional anymore. And
> therefore a hard dependency on libshape wouldn't be an option.
>
> From what I recall at least the polygon parsing (without the database
> aspect) for shape files is quite "easy" to reimplement. So somebody would
> need to reimplement shapefile parsing (or at least find some properly
> licensed code (ideally PD, BSD or at least LGPL) that we can integrate
> with Marble in order to parse the files.
>
> Another catch with the second solution is tackling the amount of detail: I
> don't know whether shapefiles provide detail levels for nodes. If they
> don't then we might consider enhancing our GeoDataLineString class to
> internally apply the Douglas-Peucker algorithm automatically and cache the
> result (there is PD licensed code for it available somewhere, so adding it
> shouldn't be too hard).
>
> We also had the idea of having Quad-Tile layouted vector files (like with
> the raster data). But this would be even more complicated to add (Since
> we'd rely on even more stuff from the GeoGraphicsView that hasn't been
> implemented yet). So I leave this out for a moment.
Cool.
There's clearly two needs here that I don't see working with a single
solution:
1) A lightweight and fast vector layer to ship as default with no need to
identify, select or highlight a vector feature
2) A heavier vector layer with the ability to identify, select and highlight a
vector feature via linked attribute data, that would be used for things like
educational apps or plugins.
The main benefit for the lightweight requirement switching to Natural Earth
(NE) would be updated borders, a separate internal border dataset, and a few
more physical features, but this should be worth it.
The main benefit for the second case is the embedded data attributes such as
country code, groupings, relative magnitude, etc, which will allow easy
linking and manipulation. We could do something similar using PNT, but we
would have to create our own attribute storage to link things together, and
then maintain the attribute data, so sticking with the shapefiles would seem a
better solution there.
As you say, there's the two approaches that could work for the lightweight
default layer:
a) Convert just the required NE datasets to pnt format, either merging the 3
scale levels into a single file with just 3 detail levels, or use Douglas-
Peucker on the 1:10m files to create the required detail levels
b) Implement an internal lightweight shapefile parser without dbf support and
ship only the required NE datasets.
Some pros/cons to consider:
Shapefiles don't support node attributes at all, and only support feature and
file attributes via dbf files, so different detail levels require separate
files, or multiple features in the same file with the dbf attributes to
identify the detail level. Either way this would obviously be slower and take
more disk space and memory than the PNT files.
Visually comparing it's clear that the 1:50m data is less detailed than the
MWDBII, but the 1:10m data is more detailed. This would imply to me that we
would have to use the 1:10m data, but to match the features currently
displayed from MWDBII using NE shapefiles would take 14 MB which is clearly
unacceptable (see table of sizes below). By comparison the PNT files only
take 2.6 MB.
The 1:10m country file is 6.55MB and contains 533,202 points = 12.28
bytes/point compared to the PNT which is 745KB and contains 127246 points =
5.85 bytes/point, which would suggest the NE data in PNT format would be half
the size, so 6 MB in total. This could probably be further reduced by a light
application of Douglas-Peucker.
Visually comparing it's clear the 110, 50, and 10 datasets do not match each
other or share common vertices so can't be merged together as the detail
levels, they would have to be used separately.
The NE shapefiles have been carefully processed so shared borders and
overlapping features like rivers match exactly and other such niceties,
applying the Douglas-Peucker algorithm might affect that.
A lightweight shapefile parser would allow users/apps to load other
shapefiles.
We would have to reconvert and check the data every time there's a new NE
release which could be a lot of effort, but an automated shp2pnt script could
prove useful to allow apps/users to display their own shapefiles in a simple
way.
Overall, it seems the best approach for the updating the lightweight layer is
indeed to convert the shapefiles to PNT format, provided the D-H algorithm can
be deployed in a way to mark each point with a detail level rather than just
throwing the points away.
For the heavier vector layer with attributes, rather than just implementing a
shapefile importer based on shapelib (a C library), it might be better to
implement an GDAL/OGR importer which would import any vector format and is a
C++ library. That would depend on packaging weight and difficulty, so needs
more investigation, but that's for another day.
Cheers!
John.
A lot of the NE data is duplicated / re-mixed, the core shapefiles sizes are:
1:110m 1:50m 1:10m
------ ----- -----
Admin level 0 countries 172 KB 1.36 MB 6.55 MB
Admin level 0 land borders 39 KB 301 KB 896 KB
Admin level 0 sea borders 12 KB 40 KB 79 KB
Admin level 0 disputed 40 KB 157 KB
Admin level 1 regions 39 KB 339 MB 13.9 MB *
Admin level 1 land borders 16 KB 60 KB 4.82 MB
Coastlines 79 KB 883 KB 2.15 MB
Rivers 19 KB 420 KB 3.29 MB
Lakes 10 KB 286 KB 786 MB
Glaciers 13 KB 208 KB 1.23 MB
Dateline 18 KB 18 KB 18 KB
Playas 18 KB 106 KB
Ice Shelves 105 KB 211 KB
Minor Islands 449 KB
Reefs 171 KB
------- ------- ---------
417 KB 4.08 MB 34.03 MB
* level 1 regions are USA/Canada only at 110m and 50m, but whole world at 10m,
perfect for KGeography use :-)
Other useful files:
Physical Features Land 146 KB 1.50 MB 692 KB
Physical Features Sea 348 KB 836 KB 836 MB
Populated Places 347 KB 1.48 MB
Urban Areas 439 KB 3.48 MB
Bathmetry 11.64 MB
Not sure how useful the 1:110m really is, but could be reduced for use in a
country picker in kdelibs?
More information about the Marble-devel
mailing list