Generalized problem with extensionless headers.
Steven T. Hatton
hattons at globalsymmetry.com
Fri Jun 3 09:36:05 UTC 2005
I sent this to Trolltech because I see that the new Qt-4 is using headers with
no extension on the filenames. I'm not sure the exercise of providing a
means for handling these files would be bad for KDevelop. I will suggest
that an internal "web server" be set up to act as a service for all KDevelop
components which read and write files. This server would act as a proxy
forwarding requests to the actual storage and retrieval devices such as file
systems, svn repositories, databases, etc. When there is a need for a
particular part to handle a new type of file or filename, that part would
provide a rule to the service either by instantiating an existing rule class,
or by deriving from an abstract rule class. The part will also be
responsible for providing any new functionality required to implement the
rule. That functionality will be implemented by using a protocol and API
defined by the abstract file service.
For example, If I want the C++ part to understand extensionless headers, I
need to provide a mechanism by which the abstract file server can access
these files using a generic query, which my part translates into the
specialized query. There's no reason, as far as I can see, how this should
negatively impact the existing parts that allready receive their necessary
file services. KDevelop already uses a mechanism similar to this in
the .kdevelop file.
<type ext="ui" />
<type ext="cpp" />
<type ext="h" />
This example also exposes one of the problematic areas I am trying to address.
Currently when I'm working with a C++ project, I cannot effectively control
the types of filename extensions being used. There are some settings that
suggest I can do that, but the reality of the matter is they do not behave as
expected. If, instead of specifying simply the filename extension to be
used, I should also specify the function of files mapped to that extension
type, and decouple the use from the extension. IOW, I tell the abstract file
service that I want
<resource name="C++ Header File" identifier="header">
<!-- KDevelop provided mechanism -->
<!-- KPart provided -->
<!- givene a name KPart's listLookup rule provides a url for
the file, to the abstract file service ->
<asOriginal/> <!-- existing files are saved as they were -->
<listLookup/> <!-- new files stored to a url provided by listLookup -->
<listLookup/> <!-- listLookup provides a list for display or search -->
<resource name="C++ Header File">
Now, within my KPart the only thing I deal with are <resource/>s. Resources
with the identifier "header" are treated as C++ header files.
Briefly, the problem is that many tools don't know how to handle files that
have no, or unfamiliar extensions on their names. I believe there are
actually two (parts to the) problem(s). This first is how to know what type
of file you get form a file listing. For example, we want to filter out
everything that is not a C++ source interface (AKA header file).
Traditionally (and I see little wrong with the tradition) we have identified
content type by using filename extensions. When there is no extension on the
filename, that is not an option.
The other part of the problem is knowing what kind of file you are currently
processing. The reason this is a different issue is because if you can look
inside the file, you might find some intentional or unintentional indication
of what type of file it is. For example, Emacs honours the mode
specification for known types of documents, e.g.:
# -*- Autoconf -*-
The current Emacs development source does not, however, use that method to
identify file types. It uses filename extensions. For example the
(append '(("\\.h$" . c++-mode)
("\\.cpp\\'" . c++-mode)
("\\.moc\\'" . c++-mode))
Similarly, XML and its older sister SGML use document type declarations:
<!DOCTYPE greeting SYSTEM "hello.dtd">
XML also requires an XML Declaration:
On Unix, and Unix-like operating systems, there is also a method that attempts
to solve the first problem - knowing the content from the outside - by
examining a little bit of the inside.
The magic number tests are used to check for files with data in particular
fixed formats. The canonical example of this is a binary executable (compiled
program) a.out file, whose format is defined in a.out.h and possibly exec.h
in the standard include directory. These files have a `magic number' stored
in a particular place near the beginning of the file that tells the UNIX
operating system that the file is a binary executable, and which of several
types thereof. The concept of `magic number' has been applied by extension to
data files. Any file with some invariant identifier at a small fixed offset
into the file can usually be described in this way. The information
identifying these files is read from the compiled magic
file /usr/share/misc/magic.mgc , or /usr/share/misc/magic if the compile file
does not exist.
The reason I'm posting this message here is because, in my quest for a
solution to the problem of working with variant filenaming and identification
conventions, I was looking at the API documentation for KURL:
and saw QDataStream mentioned:
That led me to investigate whether QDatastStream provids attributes to
communicate to the consumer what type of data was being provided, or to the
provider communicating what type of data is desired. The short answer is
"no". It does not hold that kind of information. That is probably the
correct design for a class intended to transfer data. Communicating content
type is a higher level activity which is addressed by standards such as MIME.
Since its publication in 1982, STD 11, RFC 822 [RFC-822] has defined
the standard format of textual mail messages on the Internet. Its
success has been such that the RFC 822 format has been adopted,
wholly or partially, well beyond the confines of the Internet and the
Internet SMTP transport defined by STD 10, RFC 821 [RFC-821]. As the
format has seen wider use, a number of limitations have proven
increasingly restrictive for the user community.
I believe this approach is taken in parts of the KDE to communicate content
type to other components. It is also used in HTTP, and even within XML.
If Trolltech is intending to introduce header files without filename
extensions, it would seem reasonable that a recommendation as to how the
information (as opposed to encoding, etc.) content type of such files can be
discerned by the various components that need to know the content type of
files. It would also seem reasonable that Trolltech provide the basic means
of implementing the recommended solution.
One last observation. On my apache2 server, I specify mimetypes for
particular types of file. This is how it is done.
Alias /dir1/ "/opt/www/dir1/"
deny from all
Allow from all
AddType application/mathematica .nb
AddType text/xml .xul
AddType video/x-ms-asf .asf
More information about the KDevelop-devel