[Kstars-devel] Replacing file-system by database in KStars

Wed Jan 29 05:52:23 UTC 2014

Hi Henry,

Thank you for briefing me about the whole scenario of KStars.

What I understood from your explanation are two things:

1) We need to do brief study of structure of data we are handling in KStars
and see whether we can improve current file system or replace it with
better option like Database system.

2) Find out method to use facility of OS to render skyobjects in better way.

If I have misunderstood please correct me :)

And I will study the data structure of KStars and try to come up with some
diagram representing the entire data structure overview which can be put
under improvement cycle. Do you think that will be worthy or I should try
some other methodology ?

Regards,
Vijay

On Wed, Jan 29, 2014 at 9:44 AM, Henry de Valence
<hdevalence at hdevalence.ca>wrote:

> Hi Vijay,
>
> On January 28, 2014 08:22:08 AM Vijay Dhameliya wrote:
> > Hi guys,
> >
> > Currently when KStars is launched, it reads data corresponding to
> different
> > Skyobject from respective file in loaddata() methods. And I have tracked
> > out all the classes where we are loading data by reading file.
>
> Indeed, the code KStars uses to load data from the disk is messy and (IMO)
> not
> as efficient as it could be.
>
> > I researched bit on the topic and I found that loading data from database
> > is always much better option then doing same from file.
>
> The database's data is stored in a file on disk, so loading data from the
> database is loading from a file. It might be faster, if the use case for
> KStars' pattern of data-loading is served well by the database we use, and
> we
> can use the optimized code from the database instead of writing our own.
>
> The problem is that most databases are not actually suited to the kind of
> data
> we have or our usage patterns. The data we deal with is primarily spatial:
> we
> have points on the sphere, with extra metadata to tell us about the
> properties
> of the objects. Currently, KStars has a somewhat complicated system for
> spatially indexing the data with  a heirarchical triangle mesh, and loading
> the data from files as needed.
>
> In order to replace this with an SQL-based system, we'd need to use a
> database
> that has support for spatial queries. To the best of my knowledge, SQLite
> does
> not have such support. It would probably be possible to do something with
> PostgreSQL's PostGIS extension for dealing with geographic data, but KStars
> should not require the user to run and maintain a standalone database
> server,
> so SQLite is the only SQL option (and we do use it for some data).
>
> > If we replace file system with QSql following are the Pros:
> >
> > 1) We will not have to ship so many files with Kstars
>
> File count is less important than file size; if we're shipping the same
> data,
> it's unclear that we would see a big reduction in size. Also, it makes it
> harder to keep track of the data we have.
>
> > 2) Loading from database is quicker than doing same from file
>
> (See discussion above)
>
> > 3) Code for load methods will be reduced in size
>
> Yes, this would be really nice, but I think that there may be other
> avenues to
> do this.
>
> > Cons:
> > 1) I will have to move all data from files into database by temporary
> > methods
>
> I'm not quite sure what you mean. We already have to do this for the data
> we
> have: there's a collection of (as I recall quite hacky) scripts for the
> purpose of building the catalog files we use now. If we change our data
> representation, we have to change these, too.
>
> There's also:
>
> 2.   We lose spatial indexing, meaning that we may need to load an entire
> 2GB
> catalog for one small region of the sky.
>
> 3.   The only SQL database we can use is SQLite, which is designed to be
> small, not high-performance.
>
> > So I am planning to start coding to replace file system by database on my
> > local branch.
> >
> > Can you please give your views and suggestion regarding the same ? I am
> > sure that It will be very helpful to me. :)
>
> I agree that we should rethink the data-handling in KStars, but I think
> that
> it would be best to take a few steps back first, to see the bigger picture.
>
> The first task, in my opinion, is to clearly set out *what data we have*.
> For
> instance, it would be good to have scripts that will completely
> automatically
> fetch the raw datasets we use, and process them into our catalog format so
> that we have the entire process of creating the files written
> programmatically. Even though we don't need to regenerate the catalogs very
> often, the benefit of this is that it's documented in working, runnable,
> unit-
> tested code exactly what we do to the source data. Some datasets (afair)
> were
> assembled by us or by the Stellarium people, in which case those files
> should
> be treated as the 'raw data'.
>
> The question of how we should store our data is something I've been giving
> some thought to recently, but as I've been busy with school I haven't had
> time
> to implement a prototype yet. Since it's come up, though, I might as well
> share what I was thinking.
>
> It's possible to run all of our astrocalculations at much higher speed
> (using,
> e.g., code from my GSOC project), but actually doing this in practice is
> hard,
> since it requires reworking the data handling of the sky components.
>
> Currently, each component manages its own data handling, indexing, etc.,
> usually using the HTM library to compute spatial queries. Different
> components
> handle things differently -- for instance, the deep star component does
> lazy-
> loading of stars in blocks to avoid having to load huge catalogs all at
> once.
>
> One nice thing about most of our data is that it generally doesn't change,
> so
> our problem should be well-suited to an immutable data structure which
> gives
> us thread-safety and bug-avoidance for free. In addition, I think we should
> explore using facilities of the operating system to do the work for us. For
> instance, we could try use mmap (in the form of QFile::map() for
> portability)
> to map the contents of a binary catalog file directly into the virtual
> address
> space. The OS then loads data in pages as needed (and unloads the pages
> according to, AFAIR, least-recently-used *when needed* [^1]). If we arrange
> the data in our catalog file(s) to have spatial locality (i.e., data near
> each
> other in the file are nearby points in the sky), then we can have the
> kernel
> do the work of resource management / loading-unloading for us, greatly
> simplifying our code.
>
> Another issue we have is with proper motion. Technically, most of the
> points
> on the sphere that we have aren't points at all, but are actually "dual
> points" that have the data both of a point and the first-order differential
> near the point (i.e., the proper motion), which we have to take into
> account
> when we do queries in the far future. In effect, we have for each point a
> diffferential equation with initial conditions (the J2000 positions) and
> the
> equation of motion given by the proper motion, and we want to be able to do
> queries like:
>
> "What are all the points within angle alpha of this direction at time t?"
>
> The HTM library we use is not equipped to answer this question -- it only
> deals with points that don't move. So what we do now is go through and
> trash
> our index, reindexing all the points as we do our simulation. Except then
> there's all kinds of problems with stuff like, how fine should the reindex
> interval be, issues about stars in multiple trixels, .... it's a real mess.
>
> This got kind of long since it's sort of a brain dump, but hopefully it
> will
> stir some discussion.
>
> Cheers,
> Henry
>
> P.S. I'm really sorry I haven't been able to put as much time into KStars
> as
> I'd like recently.
>
> [^1]: I don't know about how Windows decides to unload mmap'd files; I
> assume
> it's not totally insane, but I guess I don't really care too much about
> how it
> performs as long as it runs. The more important portability issue, I
> think, is
> dealing with endianness issues, but I don't think that this is a huge
> problem.
> Worst case, stick a BOM in the beginning of the file and if the endianness
> is
> wrong, swizzle all the bytes and write the new catalog. Or tell packagers
> to
> ship compatible files, or something.
> _______________________________________________
> Kstars-devel mailing list
> Kstars-devel at kde.org
> https://mail.kde.org/mailman/listinfo/kstars-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kstars-devel/attachments/20140129/97a15334/attachment-0001.html>