[Kdenlive-devel] Cutting list file format specification, Version 0.02

Sun May 26 16:27:36 UTC 2002

Hi,

I haven't been able to code anything for the last week, partly for exams and 
partly because kde3-devel rpms seem to have become broken in latest mandrake 
cooker. Grrr!

On Sunday 19 May 2002 7:57 am, Christian Berger wrote:
> Jason Wood schrieb:
> > On Saturday 18 May 2002 9:05 pm, Christian Berger wrote:
> > > > > Well I don't know if we'll ever need it, but it sure is an
> > > > > interresting piece of information. Maybe we should also define a
> > > > > "creator" command, telling us which programm created the cutlist.
> > > > > Maybe also a date and comment field. Those fields definitely are
> > > > > good for recovering broken harddisks :)
> > > >
> > > > Of course, if you hard disk has died, recovering the cutting list is
> > > > probably the least of your worries -
> > >
> > > Or think of undeleting files.
> > >
> > >From experience it's normally the video files which get accidentally
> > > deleted
> >
> > if the uninitiated are editing (what, you mean I still actually need the
> > video clips? I thought I'd put it all in the project... :-)), and there
> > isn't a whole lot that can be done about that except to make sure that
> > you've got backups!
>
> Oh well, then maybe we don't need file versions :)

File versions is there more for making sure that the cutter and cutting list 
generator are compatable with each other. Considering that there is no 
particular reason why the cutter and cutting list generator could not reside 
on different computers/operatiing systems (if we look far enough into the 
future...) I think it's for the best that it is added now.

> > > Well I think here my version is best, since it's a real programm we can
> > > do "anything" in forth.
> >
> > We can also do anything in c/c++... when it comes down to it though, the
> > cutting list merely describes what is being done, it should not have
> > anything to do with carrying it out.
>
> True, but especially for the interpolitations I guess forth would make a
> lot of sense. It greatly simplifies the syntax. Maybe we can define
> standard programms used for certain interpolitations which can be
> recogniced by the parser and then "executed" without beeing interpreted.

Which means that there is no real difference between writing as a stack, or 
writing it as some other form of syntax. If we do have programmable 
interpolations, etc. then they would be better off in the user interface, and 
translated down to a simple form for the cutting list. This means that :

a) the cutter isn't as complicated to write, 
b) that the choice of language is irrelevant - the UI could implement 
interpreters for forth, fortran, scheme, perl (as seperate modules) and the 
cutter need not know the difference between them.

Another problem is that the forth interpreter would not be able to "run" 
anyway - the values it outputs depends on what it is asked of it by the scene 
renderer - it's possible that a scene might not be rendered frame-by-frame. 
Perhaps a particular cutter running on a 2 processor machine could make 
clever optimisations by processing two frames at once. This differs from the 
idea of using multiple cutters on a single machine, but we have yet to 
determine which way is a more viable option.

If we assume random access, then we either need a program which takes a value 
and outputs an expression, which is what the cutter would basically be doing 
without an interpreter anyway, or the program would need to be run from the 
start to the required time every time a value was required, which would be 
incredibly slow.

Also remember that this is not strictly required at all - everything to do 
with interpolations could be handled with linear interpolations and splitting 
scenes up to a finer grain (1 for each frame is necessary) so I do not think 
that we should spend to much time trying to be clever on interpolations.

> > I only expect to ever support three types of interpolation : linear
> > (moving between two points), freehand (where the values move randomly
> > about, and so we simply generate a list of values in the order they occur
> > ), and possibly bezier - even this is unlikely though, as it will be just
> > as easy to generate a list of freehand points as it would be to describe
> > the interpolation as a language.
>
> Well the point is, we would simply list all the points, then the number
> of points and then a "freehand" command. It would do everything
> properly.

Consider how you would check that the syntax is correct. How about if you have 
a list of 30 points, and it then says "linear" at the end - but linear only 
accepts two points? Or how about if we read in one point, and then get 
"linear". If a syntax is chosen which is the other way around, then you know 
at any point in time, you know what you should be parsing next - if you don't 
find what you are looking for you throw an error. Since that is how the rest 
of the syntax in this file format works, it would be a good idea to make it 
all work like that.

But either way it would not be just a list of points but a list of points with 
the relationship between them i.e. This point at 1.051 seconds, this point at 
1.321 seconds, this point at 1.521 seconds... etc, but if the points are all 
equally spaced then you wouldn't need to specify 

I'm leaning towards the following sort of syntax :

Each "point" has a value XXXX

A value may have a time associated with it. The time will be a value between 0 
and 1, and is relative to the length of the scene. e.g. 1 is at the end of 
the scene, 0 is at the beginning. If there is a time, it follows the value, 
with a : in between. If no time is specified, then it is assumed to be of an 
equal interval with the 

XXXX:TTTT 		means 		value XXXX at time TTTT

The various transitions between points are represented with various symbols.

If the first value has a time > 0 then that value is used from the beginning 
of the scene anyway. If the last value has a time < 1, then that value is 
used up to the end of the scene. If a value or several values in a row do not 
have a time associated, then it is assumed that they are evenly distributed 
between the outer points which do have times. In the case of a value being 
first or last in the list, then it becomes 0 or 1. (I know what I mean here, 
but I'm having a difficult time expressing it in words - I'll try and write 
it better when I feel more up to it.)

E.g. a comma might mean "step between points" a "greater than sign" (>) might 
mean linear interpolation between these points.

An example interpolation then would be something like this :

[0.0:0.0 > 1.0:1.0]

meaning start at value 0.0 at time 0.0, and linear interpolate to value 1.0 at 
time 1.0 (where times are relative to the length of the scene).

Instead of symbols, XML-style tags might be better, as they would allow 
parameters and better extensibility in the future. The parser would look like 
this :

1. look for a number
2. if(there is a ":" next) {
	3. look for another number. If we don't find one, throw an error.
}
4. if (we are at the end of the list, denoted by finding a ']') {
	5 we have parsed the entire list
}
6. Look for a command symbol/tag. If we don't find one, throw an error.
7 goto 1.

> > > > > Scenes also can have a length of less than a frame. This might be
> > > > > important if you want to "fill up" scenes.
> > > >
> > > > ??? I don't understand this point.
> > > > I would have assumed that a scene would be at least a frame,
> > > > otherwise it would never get rendered by the cutter.
> > >
> > > Well, but it would create a space in our file format, besides it
> > > creates audio.
> >
> > Why would we want to create space in the file format of less than a
> > frame? It would either be ignored (as the next scene writes to the same
> > frame) or would lead to a gap of a frame, thus meaning that the next
> > scene did not start at the time that it was supposed to (and suggesting
> > an error on the UI or cutting generators part).
>
> Imagine there beeing a short "preroll video" showing a countdown from 5
> to 1, which is, for some aakward reason 3.9867 seconds long. (maybe the
> video has a wiered framerate). Then we have to add filling scenes.

Umm... if a video is supposed to be 5 seconds long (I assume this is what you 
mean) then it should be 5 seconds long - otherwise it is a problem with 
either the decoder, or the video.

> > And how does it create audio? Audio will still be measured using the same
> > timings as video. If you are considering a gap where there is audio and
> > no picture, then it will still be limited to a minimum of 1 frame in size
> > - otherwise a gap in video will not be seen or we miss the beginning of
> > the next scene.
>
> Well the point is, we don't really have fixed length frames. So how long
> is a frame?

Well if we are editing at 25 fps then a frame is 1/25 seconds long, if we are 
editing at 30fps then a frame is 1/30 seconds long, if we are editing at 60 
fps then a frame is 1/60 seconds long...

But framerate is only going to be important to the encoder (and decoder) - it 
will say "Tell me what I should encode for the frame at time 10027.25 
seconds" and the effect that it is linked to will calculate and image for 
that frame. When that effect is based on other effects, it will ask them what 
image they should be displaying at that time, which in turn might need to ask 
other effects... until we get to the top off the effect tree and ask the 
decoders for a frame at a particular time. Whilst the frames asked for are in 
order, there may be multiple requests which use the same frame (e.g., if one 
of our videos is only at 15 fps and we are encoding at 30 then the decoder 
will only need to decode a new frame every other request).

As we move from one frame to another, there will need to be a scene monitor 
which determines whether the previously used scene is still relevant, and if 
we have moved passed it on the timeline then we will move into the next 
scene. From this, any scene which is not at least a frame in length would 
never be displayed, as it would merely be skipped over.

>
> > Something that needs considering is whether audio should be generated in
> > the same cutting list as the video. Audio will obviously have to use
> > seperate connections to the video, as the effects that it passes through
> > will be completely different. A lot of the time, the audio will also be
> > coming from completely different files to the video - and depending on
> > the codec, may very well end up in a different "audio" output.
>
> Still in many cases you need both from the same file, and in those cases
> it's important those stay in sync. I would put that into the cuttig
> list, too, it doesn't create to much overhead.

It's not the overhead, it's more a problem of recognition - have I rendered 
this scene before? Consider where we use the exact same video clip 4 times in 
a row, but have different sound underneath. If we are treating sound and 
visual seperately.

As an aside, on a project I was working on recently in Premier 6.0, I found 
out that if you mute a sound file, all of the visuals above it need to be 
re-rendered - and this is using a file format which treats video and audio 
seperately. After much swearing whilst waiting for the 10 minutes it took to 
render everything again, I decided that this is a bug, no matter how you look 
at it.

> > Incidentally, that's a very good point. The codec used when editing with
> > the Matrox RT2000 card uses a seperate file to store video and audio -
> > audio being stored in .wav files. This is a really useful feature,
> > because it means you can use any sound editor going to fiddle with the
> > sound, normalise it, filter out noise, etc. etc. etc. to a much greater
> > degree than is possible in a program not dedicated to sound editing (i.e.
> > kdenlive). Following the same concept with the editing codec will be a
> > very good idea.
>
> Well this is to unsave, after those files have been copied to another
> machine playback might be more difficult to do. Besides clock
> frequencies of sound-cards vary. If the file is interleaved this isn't a
> problem, and the worst which could happen is a click, but everything
> would still be in sync. But maybe we can do both things seperately. We
> just couldn't directly save to MPEG or AVI then.

Clock frequency problems are a problem for playback, but not for the cutter. 
It is still a problem if the file is interleaved - a pop ur a click _is_ the 
worst thing that can happen, as it means that the video we have created is 
effectively broken. 

Anyway, I thought the idea wasn't to save directly to mpeg or Avi anyway? It 
is very easy to render to them though - it just means using a different 
encoder at the end of the process, and passing the audio and video together 
in the correct amounts.

> > > > > Maybe we should somehow distinguish between file IDs and connection
> > > > > IDs since I have to have a "list" of all the files being used in a
> > > > > scene
> > > >
> > > > The trouble is that once we get inside of a scene, we would then need
> > > > a way to distinguish between them whilst parsing them.
> > >
> > > Actually we have to.
> >
> > No you don't. In one case you have a file input which... generates frame
> > images when asked for them and in the other case you have a connection
> > input which... generates frame images when it's asked for them. From an
> > effects point of view, the two should look identical. From the cutter
> > encoders point of view, again, the two should look identical.
>
> Hmm, well maybe it might work, still sounds kinda difficult.
>
> > > > I think a better solution here is using C++ and making use of
> > > > inheritance and polymorphism.
> > >
> > > Ohh well I've worked with those technices in Pascal, they might be
> > > usefull, especially for the Forth interpreter. However C++ still is not
> > > much more than a smart assembler.
> >
> > That's what any programming language is. The whole idea of using a high
> > level programming language is because it makes the programmers job
> > easier.
>
> You've obviously never seen Prolog :) Prolog or Lisp are languages which
> are far more advanced than C. You don't even have to specify the order
> of the commands to be executed.

I have seen prolog and lisp, and no they are not more advanced than c++. They 
are mostly used in the AI community, as they are good for writing highly 
recursive programs quickly. (Oh, and lisp/scheme is used to teach the 
principles of programming languages due to it's simple syntax). Consider that 
neither language has a loop construct though - you have to build them out of 
recursion.

Remember how recursion is really hideously inefficient if done badly? right... 
That's why when your writing prolog/scheme/lisp programs, you need to 
understand the way that the language works to write efficient code, often 
through the use of extra parameters carrying around accumulators, and making 
sure that the recursion happens right at the very end of the function so that 
the compiler doesn't use up all it's memory trying to do something simple 
like add up all the numbers between 1 and 1000000.

And 9 time out of 10, the error message that you have to work with is "stack 
overflow"

Also, when you realise that every program language is equivalent - there is 
nothing more powerful than C/assembler/lisp/prolog/lambda calculus/pascal/ or 
a turing machine. At the end of the day, it's a case of use the right 
language for the right job.

> Well actually Prolog is rather simplistic and you can get it down to
> maybe 2-3 pages and still have enought space for long explainations.
> And just look at it, C is a simple machine-near language, it doesn't
> have functions for parsing, no functional programming, it doesn't even
> have posibilities to parallelice it easily.

Prolog grammar is simplistic, but the implentation is not - there's lots of 
hashing techniques and other optimizations required to make it actually run 
at a decent speed. C is near machine-language but to describe the parsing 
grammar is still pretty complex - consider all of the special cases and 
stuff. Oh, and you can do functional programming in C through the use of 
function pointers, but it would not be the easiest thing in the world to 
do... 

And if you want to consider the simplicity of C++... well consider that 10 
years or so after standardisation, there is only one compiler which fully and 
correctly implements C++.  (I've forgotten it's name, it was on slashdot a 
couple of weeks ago).

Cheers,
Jason

-- 
Jason Wood
I don't give up, I lose interest
Homepage : www.uchian.pwp.blueyonder.co.uk