[Kalzium] Fwd: Re: [Blue-obelisk] Merging my data-file with your repository

Carsten Niehaus cniehaus at gmx.de
Thu Sep 15 14:21:14 CEST 2005



----------  Forwarded Message  ----------

Subject: Re: [Blue-obelisk] Merging my data-file with your repository
Date: Thursday 15 September 2005 14:18
From: Egon Willighagen <e.willighagen at science.ru.nl>
To: blue-obelisk at hardly.cubic.uni-koeln.de

On Tuesday 13 September 2005 10:57 am, Carsten Niehaus wrote:
> Kalzium has a pretty big XML-based data file. It has almost 5000 lines of
> xml. This means about 4500 data-entities. You can see the file here [3].
> This file is good, I really took care that the data is correct. It has far
> more data than you have [4].

This weekend I will explain how CDK uses XML (and CML specifically) for
putting information in XML files. More specifically, it makes heavy use of
dictionaries that allow specifying precisely data sources, terminology etc.

The BO has updated this methodology, and I would like to ask the Kalzium
people to have a look at it, and to see wether it makes sense to them.

An example of how the Kalzium data could look like in the schema:

Isotope data (CDK CVS -> src/org/openscience/cdk/config/data/isotope.xml):

    <isotopeList id="H">
        <isotope id="H1" isotopeNumber="1" elementType="H">
            <abundance dictRef="cdk:relativeAbundance">100.0</abundance>
            <scalar dictRef="cdk:exactMass">1.00782504</scalar>
            <scalar dictRef="cdk:atomicNumber">1</scalar>
        </isotope>
        <isotope id="H2" isotopeNumber="2" elementType="H">
            <abundance dictRef="cdk:relativeAbundance">0.015</abundance>
            <scalar dictRef="cdk:exactMass">2.01410179</scalar>
            <scalar dictRef="cdk:atomicNumber">1</scalar>
        </isotope>
    </isotopeList>

One can see how the @dictRef is used, making the source and concept of that
specific field absolute.

Elsewhere CDK has (CDK CVS -> src/org/openscience/cdk/config/data/
chemicalElements.xml):

  <elementType id="H">
      <label dictRef="cas:id">1333-74-0</label>
      <scalar dataType="xsd:Integer" dictRef="cdk:group">1</scalar>
      <scalar dataType="xsd:Integer" dictRef="cdk:period">1</scalar>
      <scalar dataType="xsd:String" dictRef="cdk:name">Hydrogen</scalar>
      <scalar dataType="xsd:Integer" dictRef="cdk:atomicNumber">1</scalar>
      <scalar dataType="xsd:String"
dictRef="cdk:chemicalSerie">Nonmetals</scalar>
      <scalar dataType="xsd:Integer" dictRef="cdk:periode">1</scalar>
      <scalar dataType="xsd:String" dictRef="cdk:phase">Gas</scalar>
  </elementType>

E.g. problems with the below values are:

                <radius>
                        <covalent>37.3</covalent>
                        <atomic>25</atomic>
                        <vdw>120</vdw>
                        <bondxx>74.130</bondxx>
                        <ionic charge="-1">208</ionic>

As defined by which methods/definitions?

                </radius>
                <orbits>1s1</orbits>
                <oxydation>+1.(-1)</oxydation>
                <family>Non-Metal</family>
                <abundance>1520</abundance>

with what scale?

        </element>

The CML-based equivalent would be something like:

<radius>
  <vdw dictRef="emboss:vdw">120</vdw>
</radius>

where the 'emboss' dictionary would explain how it defines the vdw.

Egon
_______________________________________________
Blue-obelisk mailing list
Blue-obelisk at hardly.cubic.uni-koeln.de
http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk

-------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/kalzium/attachments/20050915/fc5bc231/attachment.pgp


More information about the Kalzium mailing list