Generalized problem with extensionless headers.

Fri Jun 3 09:36:05 UTC 2005

I sent this to Trolltech because I see that the new Qt-4 is using headers with 
no extension on the filenames.  I'm not sure the exercise of providing a 
means for handling these files would be bad for KDevelop.  I will suggest 
that an internal "web server" be set up to act as a service for all KDevelop 
components which read and write files.  This server would act as a proxy 
forwarding requests to the actual storage and retrieval devices such as file 
systems, svn repositories, databases, etc.  When there is a need for a 
particular part to handle a new type of file or filename, that part would 
provide a rule to the service either by instantiating an existing rule class, 
or by deriving from an abstract rule class.  The part will also be 
responsible for providing any new functionality required to implement the 
rule.  That functionality will be implemented by using a protocol and API 
defined by the abstract file service.

For example, If I want the C++ part to understand extensionless headers, I 
need to provide a mechanism by which the abstract file server can access 
these files using a generic query, which my part translates into the 
specialized query.  There's no reason, as far as I can see, how this should 
negatively impact the existing parts that allready receive their necessary 
file services.  KDevelop already uses a mechanism similar to this in 
the .kdevelop file.

<kdevfilecreate>
    <useglobaltypes>
      <type ext="ui" />
      <type ext="cpp" />
      <type ext="h" />
    </useglobaltypes>
  </kdevfilecreate>

This example also exposes one of the problematic areas I am trying to address. 
Currently when I'm working with a C++ project, I cannot effectively control 
the types of filename extensions being used.  There are some settings that 
suggest I can do that, but the reality of the matter is they do not behave as 
expected.  If, instead of specifying simply the filename extension to be 
used, I should also specify the function of files mapped to that extension 
type, and decouple the use from the extension.  IOW, I tell the abstract file 
service that I want 

<resource name="C++ Header File" identifier="header">
  <readRules>
    <!-- KDevelop provided mechanism -->
    <filenameExtensionMapping>
      <rule ext=".h">
      <rule ext=".hh">
      <rule ext=".hpp">
    </filenameExtensionMapping>
    <!-- KPart provided -->
    <listLookup/>
    <!- givene a name KPart's listLookup rule provides a url for 
        the file, to the abstract file service ->
   </readRules>   
   <writeRules>
     <asOriginal/> <!-- existing files are saved as they were -->
     <listLookup/> <!-- new files stored to a url provided by listLookup -->
   <writeRules>
   <listingRules>
     <listLookup/> <!-- listLookup provides a list for display or search --> 
   </listingRules>
<resource name="C++ Header File">

Now, within my KPart the only thing I deal with are <resource/>s.  Resources 
with the identifier "header" are treated as C++ header files. 
______________________________________________________________________________

Original message:

Briefly, the problem is that many tools don't know how to handle files that 
have no, or unfamiliar extensions on their names.  I believe there are 
actually two (parts to the) problem(s). This first is how to know what type 
of file you get form a file listing.  For example, we want to filter out 
everything that is not a C++ source interface (AKA header file).  
Traditionally (and I see little wrong with the tradition) we have identified 
content type by using filename extensions.  When there is no extension on the 
filename, that is not an option.

The other part of the problem is knowing what kind of file you are currently 
processing.  The reason this is a different issue is because if you can look 
inside the file, you might find some intentional or unintentional indication 
of what type of file it is.  For example, Emacs honours the  mode 
specification for known types of documents, e.g.:

#          -*- Autoconf -*-

The current Emacs development source does not, however, use that method to 
identify file types.  It uses filename extensions.  For example the 
auto-mode-alist:

(setq auto-mode-alist
      (append '(("\\.h$" . c++-mode)
		("\\.cpp\\'" . c++-mode)
		("\\.moc\\'" . c++-mode))
 	      auto-mode-alist))
See:
http://www.gnu.org/software/emacs/elisp-manual/html_chapter/elisp_23.html#SEC351

Similarly, XML and its older sister SGML use document type declarations:

<!DOCTYPE greeting SYSTEM "hello.dtd">

XML also requires an XML Declaration:

 <?xml version="1.1"?>

See:
http://www.w3.org/TR/2004/REC-xml11-20040204/#NT-XMLDecl

On Unix, and Unix-like operating systems, there is also a method that attempts 
to solve the first problem - knowing the content from the outside - by 
examining a little bit of the inside.  

" ...

The magic number tests are used to check for files with data in particular 
fixed formats. The canonical example of this is a binary executable (compiled 
program) a.out file, whose format is defined in a.out.h and possibly exec.h 
in the standard include directory. These files have a `magic number' stored 
in a particular place near the beginning of the file that tells the UNIX 
operating system that the file is a binary executable, and which of several 
types thereof. The concept of `magic number' has been applied by extension to 
data files. Any file with some invariant identifier at a small fixed offset 
into the file can usually be described in this way. The information 
identifying these files is read from the compiled magic 
file /usr/share/misc/magic.mgc , or /usr/share/misc/magic if the compile file 
does not exist. 

..."

See:
http://www.die.net/doc/linux/man/man1/file.1.html

The reason I'm posting this message here is because, in my quest for a 
solution to the problem of working with variant filenaming and identification 
conventions, I was looking at the API documentation for KURL:

http://developer.kde.org/documentation/library/cvs-api/kdecore/html/classKURL.html

and saw QDataStream mentioned:

http://doc.trolltech.com/3.3/qdatastream.html

That led me to investigate whether QDatastStream provids attributes to 
communicate to the consumer what type of data was being provided, or to the 
provider communicating what type of data is desired.  The short answer is 
"no".  It does not hold that kind of information.  That is probably the 
correct design for a class intended to transfer data.  Communicating content 
type is a higher level activity which is addressed by standards such as MIME.

"
   Since its publication in 1982, STD 11, RFC 822 [RFC-822] has defined
   the standard format of textual mail messages on the Internet.  Its
   success has been such that the RFC 822 format has been adopted,
   wholly or partially, well beyond the confines of the Internet and the
   Internet SMTP transport defined by STD 10, RFC 821 [RFC-821].  As the
   format has seen wider use, a number of limitations have proven
   increasingly restrictive for the user community.
"
See:
http://www.ietf.org/rfc/rfc1521.txt

I believe this approach is taken in parts of the KDE to communicate content 
type to other components.  It is also used in HTTP, and even within XML.

If Trolltech is intending to introduce header files without filename 
extensions, it would seem reasonable that a recommendation as to how the 
information (as opposed to encoding, etc.) content type of such files can be 
discerned by the various components that need to know the content type of 
files.  It would also seem reasonable that Trolltech provide the basic means 
of implementing the recommended solution.

One last observation.  On my apache2 server, I specify mimetypes for 
particular types of file.  This is how it is done.

Alias /dir1/ "/opt/www/dir1/"
<Directory /opt/www/dir1>
        AllowOverride All
        Options Indexes 
        deny from all
        Allow from all
        AddType application/mathematica .nb
        AddType text/xml .xul
        AddType video/x-ms-asf .asf
</Directory>

-- 
Regards,
Steven