[Kdenlive-devel] Cutting list file format specification, Version 0.02

Sun May 26 18:09:30 UTC 2002

Jason Wood schrieb:
> 
> Hi,
> 
> I haven't been able to code anything for the last week, partly for exams and
> partly because kde3-devel rpms seem to have become broken in latest mandrake
> cooker. Grrr!

Well I couldn't yet get kde3 to install on any of my machines yet, and
soon my exams will also some soon. I really should do more learning.

> >
> > Oh well, then maybe we don't need file versions :)
> 
> File versions is there more for making sure that the cutter and cutting list
> generator are compatable with each other. Considering that there is no
> particular reason why the cutter and cutting list generator could not reside
> on different computers/operatiing systems (if we look far enough into the
> future...) I think it's for the best that it is added now.

Well the cutter still can't know wether he can read a later version or
not :) Anyhow I'd put it in together with a text indication of what
programm coded the cutlist, and how the movie is called and stuff. Maybe
also a "project number" or something for accounting.

> > > > Well I think here my version is best, since it's a real programm we can
> > > > do "anything" in forth.
> > >
> > > We can also do anything in c/c++... when it comes down to it though, the
> > > cutting list merely describes what is being done, it should not have
> > > anything to do with carrying it out.
> >
> > True, but especially for the interpolitations I guess forth would make a
> > lot of sense. It greatly simplifies the syntax. Maybe we can define
> > standard programms used for certain interpolitations which can be
> > recogniced by the parser and then "executed" without beeing interpreted.
> 
> Which means that there is no real difference between writing as a stack, or
> writing it as some other form of syntax. If we do have programmable
> interpolations, etc. then they would be better off in the user interface, and
> translated down to a simple form for the cutting list. This means that :
> 
> a) the cutter isn't as complicated to write,
> b) that the choice of language is irrelevant - the UI could implement
> interpreters for forth, fortran, scheme, perl (as seperate modules) and the
> cutter need not know the difference between them.

Well we still can do anything in forth. Forth is kinda like assembler,
you can translate any language to Forth efficiently. Forth is easy to
implement, easier than implementing different parsers for lots of
different interpolitaions. And it would keep the cutter small. Putting
in algorithms for lots of different interpolitaions would mean a lot of
overhead in each one of it. Lots of if or case clauses. A forth
interpreter could be done a lot more elegant in OOP.

> Another problem is that the forth interpreter would not be able to "run"
> anyway - the values it outputs depends on what it is asked of it by the scene
> renderer - it's possible that a scene might not be rendered frame-by-frame.
> Perhaps a particular cutter running on a 2 processor machine could make
> clever optimisations by processing two frames at once. This differs from the
> idea of using multiple cutters on a single machine, but we have yet to
> determine which way is a more viable option.
> 
> If we assume random access, then we either need a program which takes a value
> and outputs an expression, which is what the cutter would basically be doing
> without an interpreter anyway, or the program would need to be run from the
> start to the required time every time a value was required, which would be
> incredibly slow.

No, first of all I think I can code a forth interpreter almoust as fast
as direct C, and second it only needs to be called once per frame.
Processing a frame probably takes at least 20 ms (if we are VERRY fast).
Interpreting a small Forth programm probably takes less than a single
microsecond.

I'd like to do the interpreter the following way. When parsing the
programm (only done once) I create an object for each command. Every
command is derived from a basic class containing an execute method. Each
command has a connection to the next command it's got to execute, or 2
in case of "conditional jumps" (loops, etc). Now you simply call the
execute method of the first object (command), it contains rather fast
code to do it's job and then calls the execute method of the next
object, and so on. The overhead is rather small.

> Also remember that this is not strictly required at all - everything to do
> with interpolations could be handled with linear interpolations and splitting
> scenes up to a finer grain (1 for each frame is necessary) so I do not think
> that we should spend to much time trying to be clever on interpolations.

Yes, but I think my way would be the most flexiblest. And the Forth
language can be used in so many places. Just think of the effects.
Besides interpolations are a basic part of our file-syntax so we should
care about.

> > > I only expect to ever support three types of interpolation : linear
> > > (moving between two points), freehand (where the values move randomly
> > > about, and so we simply generate a list of values in the order they occur
> > > ), and possibly bezier - even this is unlikely though, as it will be just
> > > as easy to generate a list of freehand points as it would be to describe
> > > the interpolation as a language.
> >
> > Well the point is, we would simply list all the points, then the number
> > of points and then a "freehand" command. It would do everything
> > properly.
> 
> Consider how you would check that the syntax is correct. How about if you have
> a list of 30 points, and it then says "linear" at the end - but linear only
> accepts two points? Or how about if we read in one point, and then get
> "linear". If a syntax is chosen which is the other way around, then you know
> at any point in time, you know what you should be parsing next - if you don't
> find what you are looking for you throw an error. Since that is how the rest
> of the syntax in this file format works, it would be a good idea to make it
> all work like that.

Well first of all the whole Forth programm would be parsed as a string,
in the first stage of parsing. Then it would be parsed to the forth
structure.
Well second, if we have to many points and linear something like this
would happen:

1 2 3 4 5 6 7 8 linear
and linear is defined to be    position length start stop   linear  
degree
Then the output will be 7.833333333 (constant).
Althought this might not make sense in any case, it gives a well
defined, reproducible result.
The statement is as sensible as 5+4 (without an assignment) in most
programming languages. It's just possible for the beauty of it, it
doesn't have much practical use.

As for missing parameters, that shurely will give a little problem.
There would be a stack underrun, such an error should be reported then,
probably with a linenumber or something. But since such an error is
almoust impossible, since our cutlists get generated by a programm, I
think it's OK when the error is only found during the cutting progress.

> But either way it would not be just a list of points but a list of points with
> the relationship between them i.e. This point at 1.051 seconds, this point at
> 1.321 seconds, this point at 1.521 seconds... etc, but if the points are all
> equally spaced then you wouldn't need to specify
> 
> I'm leaning towards the following sort of syntax :
> 
> Each "point" has a value XXXX
> 
> A value may have a time associated with it. The time will be a value between 0
> and 1, and is relative to the length of the scene. e.g. 1 is at the end of
> the scene, 0 is at the beginning. If there is a time, it follows the value,
> with a : in between. If no time is specified, then it is assumed to be of an
> equal interval with the
> 
> XXXX:TTTT               means           value XXXX at time TTTT
> 
> The various transitions between points are represented with various symbols.
> 
> If the first value has a time > 0 then that value is used from the beginning
> of the scene anyway. If the last value has a time < 1, then that value is
> used up to the end of the scene. If a value or several values in a row do not
> have a time associated, then it is assumed that they are evenly distributed
> between the outer points which do have times. In the case of a value being
> first or last in the list, then it becomes 0 or 1. (I know what I mean here,
> but I'm having a difficult time expressing it in words - I'll try and write
> it better when I feel more up to it.)
> 
> E.g. a comma might mean "step between points" a "greater than sign" (>) might
> mean linear interpolation between these points.
> 
> An example interpolation then would be something like this :
> 
> [0.0:0.0 > 1.0:1.0]
> 
> meaning start at value 0.0 at time 0.0, and linear interpolate to value 1.0 at
> time 1.0 (where times are relative to the length of the scene).
> 
> Instead of symbols, XML-style tags might be better, as they would allow
> parameters and better extensibility in the future. The parser would look like
> this :
> 
> 1. look for a number
> 2. if(there is a ":" next) {
>         3. look for another number. If we don't find one, throw an error.
> }
> 4. if (we are at the end of the list, denoted by finding a ']') {
>         5 we have parsed the entire list
> }
> 6. Look for a command symbol/tag. If we don't find one, throw an error.
> 7 goto 1.

Well to be honest, that all can easily done in forth without the
overhead of an even more complex parser. With a little forth interpreter
(which might only take a few hours to program) we can do all that, and
much much more.

> > > > > > Scenes also can have a length of less than a frame. This might be
> > > > > > important if you want to "fill up" scenes.
> > > > >
> > > > > ??? I don't understand this point.
> > > > > I would have assumed that a scene would be at least a frame,
> > > > > otherwise it would never get rendered by the cutter.
> > > >
> > > > Well, but it would create a space in our file format, besides it
> > > > creates audio.
> > >
> > > Why would we want to create space in the file format of less than a
> > > frame? It would either be ignored (as the next scene writes to the same
> > > frame) or would lead to a gap of a frame, thus meaning that the next
> > > scene did not start at the time that it was supposed to (and suggesting
> > > an error on the UI or cutting generators part).
> >
> > Imagine there beeing a short "preroll video" showing a countdown from 5
> > to 1, which is, for some aakward reason 3.9867 seconds long. (maybe the
> > video has a wiered framerate). Then we have to add filling scenes.
> 
> Umm... if a video is supposed to be 5 seconds long (I assume this is what you
> mean) then it should be 5 seconds long - otherwise it is a problem with
> either the decoder, or the video.

No, if the person sitting on the computer thinks this clip should be 5
seconds long, but in reality it's shorter, "short scenes" could give him
the possibility to padd a scene. Besides how long is a frame? We are
working with non-fixed-rate video? What if a scene contains a file with
1 fps and another one with 50 fps, but the scene is 100 ms long, should
we give a warning?

> > > And how does it create audio? Audio will still be measured using the same
> > > timings as video. If you are considering a gap where there is audio and
> > > no picture, then it will still be limited to a minimum of 1 frame in size
> > > - otherwise a gap in video will not be seen or we miss the beginning of
> > > the next scene.
> >
> > Well the point is, we don't really have fixed length frames. So how long
> > is a frame?
> 
> Well if we are editing at 25 fps then a frame is 1/25 seconds long, if we are
> editing at 30fps then a frame is 1/30 seconds long, if we are editing at 60
> fps then a frame is 1/60 seconds long...

We do not edit at certain framerates. The framerate can, and will be
different in every stream we use at a scene.

> But framerate is only going to be important to the encoder (and decoder) - it
> will say "Tell me what I should encode for the frame at time 10027.25
> seconds" and the effect that it is linked to will calculate and image for
> that frame. When that effect is based on other effects, it will ask them what
> image they should be displaying at that time, which in turn might need to ask
> other effects... until we get to the top off the effect tree and ask the
> decoders for a frame at a particular time. Whilst the frames asked for are in
> order, there may be multiple requests which use the same frame (e.g., if one
> of our videos is only at 15 fps and we are encoding at 30 then the decoder
> will only need to decode a new frame every other request).
>
> As we move from one frame to another, there will need to be a scene monitor
> which determines whether the previously used scene is still relevant, and if
> we have moved passed it on the timeline then we will move into the next
> scene. From this, any scene which is not at least a frame in length would
> never be displayed, as it would merely be skipped over.

I would do it the other way. We go throught the scenes. Every scene
finds out what the timecode of the next frame is:

File 1     File 2     File 3                                Encoder
(0.20 per frame)
0.000       2.000     1.000       Encoder needs frame        0.000
0.010                             
0.020       2.020                 Encoder needs frame        0.020
		      1.025  	
0.030       2.030     
0.040       2.040		  Encoder needs frame 	     0.040

We try to find the next frame in any of the streams, then we'll ask the
encoder wether it wants a new frame with that timecode. If it says yes,
we render that frame, if no, we move on to the next frame or any of the
input files. Out input files could easily produce hundreds of frames per
second (since every file might advance at another time), but the encoder
simply "picks" the frames it wants. Maybe we have an encoder which can
manage non constant framerate video, it may be happy with any frame it
can get.

> >
> > > Something that needs considering is whether audio should be generated in
> > > the same cutting list as the video. Audio will obviously have to use
> > > seperate connections to the video, as the effects that it passes through
> > > will be completely different. A lot of the time, the audio will also be
> > > coming from completely different files to the video - and depending on
> > > the codec, may very well end up in a different "audio" output.
> >
> > Still in many cases you need both from the same file, and in those cases
> > it's important those stay in sync. I would put that into the cuttig
> > list, too, it doesn't create to much overhead.
> 
> It's not the overhead, it's more a problem of recognition - have I rendered
> this scene before? Consider where we use the exact same video clip 4 times in
> a row, but have different sound underneath. If we are treating sound and
> visual seperately.
> 
> As an aside, on a project I was working on recently in Premier 6.0, I found
> out that if you mute a sound file, all of the visuals above it need to be
> re-rendered - and this is using a file format which treats video and audio
> seperately. After much swearing whilst waiting for the 10 minutes it took to
> render everything again, I decided that this is a bug, no matter how you look
> at it.

OK then let's write another cutter only for the sound. With C(++) we
might use Macros or something to use mostly the came code for both. It
actually might make some things a lot easier, so I agree.

> > > Incidentally, that's a very good point. The codec used when editing with
> > > the Matrox RT2000 card uses a seperate file to store video and audio -
> > > audio being stored in .wav files. This is a really useful feature,
> > > because it means you can use any sound editor going to fiddle with the
> > > sound, normalise it, filter out noise, etc. etc. etc. to a much greater
> > > degree than is possible in a program not dedicated to sound editing (i.e.
> > > kdenlive). Following the same concept with the editing codec will be a
> > > very good idea.
> >
> > Well this is to unsave, after those files have been copied to another
> > machine playback might be more difficult to do. Besides clock
> > frequencies of sound-cards vary. If the file is interleaved this isn't a
> > problem, and the worst which could happen is a click, but everything
> > would still be in sync. But maybe we can do both things seperately. We
> > just couldn't directly save to MPEG or AVI then.
> 
> Clock frequency problems are a problem for playback, but not for the cutter.
> It is still a problem if the file is interleaved - a pop ur a click _is_ the
> worst thing that can happen, as it means that the video we have created is
> effectively broken.

Well when cutting seperately we wouldn't have to worry about it, we just
insert timecodes into the audio file and just interpolate the audio rate
if it shouldn't match to what we want.

> Anyway, I thought the idea wasn't to save directly to mpeg or Avi anyway? It
> is very easy to render to them though - it just means using a different
> encoder at the end of the process, and passing the audio and video together
> in the correct amounts.

Well I think we can use seperate files and cutters. I agree it makes
things easier, at least if we add timecodes or something simmilar.

> > You've obviously never seen Prolog :) Prolog or Lisp are languages which
> > are far more advanced than C. You don't even have to specify the order
> > of the commands to be executed.
> 
> I have seen prolog and lisp, and no they are not more advanced than c++. They
> are mostly used in the AI community, as they are good for writing highly
> recursive programs quickly. (Oh, and lisp/scheme is used to teach the
> principles of programming languages due to it's simple syntax). Consider that
> neither language has a loop construct though - you have to build them out of
> recursion.

Yes, but they are specialiced and little hardware orientated just like
our language. 

> Remember how recursion is really hideously inefficient if done badly? right...
> That's why when your writing prolog/scheme/lisp programs, you need to
> understand the way that the language works to write efficient code, often
> through the use of extra parameters carrying around accumulators, and making
> sure that the recursion happens right at the very end of the function so that
> the compiler doesn't use up all it's memory trying to do something simple
> like add up all the numbers between 1 and 1000000.
> 
> And 9 time out of 10, the error message that you have to work with is "stack
> overflow"

Well Prolog or Lisp were not designed to run on primitive computers
efficiently. I mean why should I call a programming language advanced if
the difference between it and machine code is really tiny.

> Also, when you realise that every program language is equivalent - there is
> nothing more powerful than C/assembler/lisp/prolog/lambda calculus/pascal/ or
> a turing machine. At the end of the day, it's a case of use the right
> language for the right job.

Well I mean our language is so advanced because it doesn't care about
how computers work, you don't have to know how the cutter works to do
write a cutlist. The cutter can optimize the cutlist itself to a greater
degree than any C compiler can do (C compilers can't even split up a
single threadthed C programm on an SMP machine, can they?)

> > Well actually Prolog is rather simplistic and you can get it down to
> > maybe 2-3 pages and still have enought space for long explainations.
> > And just look at it, C is a simple machine-near language, it doesn't
> > have functions for parsing, no functional programming, it doesn't even
> > have posibilities to parallelice it easily.
> 
> Prolog grammar is simplistic, but the implentation is not - there's lots of
> hashing techniques and other optimizations required to make it actually run
> at a decent speed. C is near machine-language but to describe the parsing
> grammar is still pretty complex - consider all of the special cases and
> stuff. Oh, and you can do functional programming in C through the use of
> function pointers, but it would not be the easiest thing in the world to
> do...
>
> And if you want to consider the simplicity of C++... well consider that 10
> years or so after standardisation, there is only one compiler which fully and
> correctly implements C++.  (I've forgotten it's name, it was on slashdot a
> couple of weeks ago).

Well see, C++ has grown far to complex. And still C++ is nice for
machine near computing (drivers, kernels). But, like Java, it's to
reliant on things like the binary system or accurate integers. Our
language could proably even run on analog computers.

> Cheers,
> Jason

Servus
  Casandro

> --
> Jason Wood
> I don't give up, I lose interest
> Homepage : www.uchian.pwp.blueyonder.co.uk
> 
> _______________________________________________________________
> 
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm
> 
> _______________________________________________
> Kdenlive-devel mailing list
> Kdenlive-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kdenlive-devel