[linux-audio-dev] Fwd: CSL Motivation (fwd)

Stefan Westerfeld stefan at space.twc.de
Sun Mar 2 23:25:43 GMT 2003


   Hi!

On Thu, Feb 27, 2003 at 12:39:29PM -0500, Paul Davis wrote:
> >No. PortAudio makes a lot of choices for the software developer, and thus
> >provides an easy abstraction. 
> 
> The point is that PortAudio follows the same basic abstraction that
> the audio APIs on the overwhelmingly dominant platforms for audio
> software development follow (callback-driven). Those APIs have emerged
> from years of practical experience writing serious audio applications,
> and I strongly believe (as you know) that they should be taken very
> seriously. The fact that a couple of early linux hackers felt that the
> open/read/write/close/ioctl model was the right thing seems to me of
> much less significance than the number of working, useful,
> sophisticated software that have been built around a set of
> abstractions that are more or less identical to the ones PortAudio
> provides.

I take callback APIs very seriously, because they provide an excellent base
for developing music applications. In fact, artsd is entierly based around
the concept of callback APIs (you have calculateBlock functions with float
data blocks which are called in a graph of connected synthesis modules).

I do have years of practical experience writing such software ; and I'd say
Tim Janik is not worse than me on this front (www.arts-project.org and
beast.gtk.org). Of course, thats together only 16 man-years of learning audio
development, not full-time, even. However, while discussing our experience
on the list might give people an idea whether or not they should even bother
listening to our opinion. Ultimately, I would however prefer to discuss the
technical aspects of CSL ;).

The open/read/write/close/ioctl as you say it is just an _interface_ between
the kernel and the application, because on linux, we do things this way, rather
than having any other means of doing callbacks from kernel space to user
space. I do agree that it makes a _poor_ API for most low-latency realtime
music software, because its extremely hard to use.

However, you must also admit that it _is_ capable of mapping the features that
the hardware has to the software without additional buffering. If you combine
it with memory mapping (like linux supports), you can write to the soundcards
buffer directly, and get the hardware interrupts the card generates via
ioctl. Quake for instance uses this to prerender a lot of audio and rerender
it once a new sound needs to be mixed into the soundstream.

It thereby can combine robustness (which is needed on a timesharing system
such as unix) with extremely high responsiveness. This is not something you
can do with an easy callback API.

All that said, note that CSL _has_ a callback API in addition to the normal
one. Its just that CSL is somewhat more low level than PortAudio in the
sense that it allows you to access the open/read/write/close/ioctl stuff more
directly, which has two effects

(a) you can get all low level features (such as fragments) linux provides for
  you and deal with them yourself if you want to
(b) you can implement the callback API you need ontop of it if you want to ;
  that is: you can implement PortAudio on top of CSL without any loss of
  performance (through extra copies and such) - you can however _not_ implement
  CSL on top of PortAudio without any loss of performance

> >			       This will mean, however, that actually porting
> >software to PortAudio will probably be hard (compared to CSL), 
> 
> that depends on where you started from. if you started from ASIO, it
> won't be hard. if your app started as a VSTi, it won't. but sure, if
> you started as your typical linux audio app that reads a stereo file
> from the disk and shoves it into the audio interface, it will be
> fairly easy to use CSL and a bit harder to use PortAudio.
> 
> this porting thing bothers me. we have only a handful of really great
> apps for audio under linux right now (though we have lots of rather
> interesting ones) and most of them already can use JACK. i would much
> rather see people forced (just like apple have done) to work with the
> "correct" abstraction than continue with multiple wrappers and
> multiple different abstractions as we move into a period where i hope
> to see an explosion in linux audio applications.

You can not force people to do the right thing in the free software community.
They will continue doing whatever they feel like doing. And I like that
aspect of free software development. If you provide something that is
convenient for them to use, they will use it. Thus, I'd rather like to provide
them with CSL, instead of having them use OSS all the time ; I hate the OSS
API, and the fact that 99% of all free software supports the OSS API seems
to be an indication to me that if you provide something "like" the OSS API
(and CSL can do this), then 99% of all free software developers at least
would be willing to also support it, if it provides no additional overhead
over the OSS API ; or maybe even switch.

> >								whereas
> >writing new software for PortAudio might be convenient, _if_ the software
> >falls in the scope of what the abstraction was made for.
> 
> well, there are at least two sets of evidence to consider there. i
> think there is plenty of evidence in the world of windows that a large
> amount of interesting audio software works with the ASIO and DirectX
> models (which are semantically similar, if syntactically worlds
> apart). its not limited to audio synthesis. but at the same time, its
> worth noting that "most" apps on windows that emit audio for some
> reason (i.e the ones that are not actually audio applications) do not
> use ASIO or DirectX. so there is evidence to support the idea that at
> least a couple of abstractions are necessary.
> 
> in contrast, CoreAudio offers only 1 abstraction model as i understand
> it. so apple at least appear to have been willing to bet that all
> software can use a single model.

I think the point is similar to that of programming languages: all programming
languages that are turing complete are equivalent in the sense that you can
implement everything in them. Thus, you can choose to write KDE or Ardour in
Scheme, C++, C, Assembler or Java.

However, there are differences in convenience and performance for a given task ;
thus if you have extreme performance requirements, you will need to take some
inconvenience on you, if you don't, you can choose whatever you like.

When it comes to audio abstractions, music applications (the low-latency
kind-of thing) do have extreme performance requirements ; thus you want to
have immediate notification on the buffer state, which you can do by a
callback API (under linux/unix, this pretty much means an extra, high priority
thread using select/write with careful fragment size/count settings). 

Other applications can use whatever is convenient for them ; thus some use
this and some that. As long as CSL can provide what is required to support
all applications in an efficient manner, I see no reason why we should force
everybody to use just one abstraction (we can't anyway).

And even if CoreAudio is only 1 model, around which all other apps are
based, I would say that there will be wrappers, which also provide new
abstractions on top of it.

Even PortAudio has a layer (pablio) to emulate blocking read/write, probably
because somebody felt it was needed.

> >[... JACK ...]
> 	
> i don't think its about liking it so much as the fact that it does
> some things that are extremely important to many of us and that
> nothing else can do. 

Yes, I think there are few people who like something that is not useful to
them.

> >			 If some people will like CSL, why not?
> 
> well, i am *really* not trying to be argumentative just for the sake
> of it, but ... the reason not to is that CSL doesn't offer any new
> functionality. its just another wrapper, and we already at least one
> of them, with some evidence that its capable of supporting all apps.

PortAudio can support all applications at extra performance costs, because
emulating a blocking API on PortAudio does require extra buffering. Especially
for low latency stuff, this is problematic.

For instance, I would say that artsd could not use PortAudio without extra
latency costs, because it needs to integrate its network-aware mainloop
with the audio mainloop. I doubt thats possible with PortAudio, whereas aRts
does already support CSL.

> >On the other hand, if you added JACK support to CSL, you could also mix the
> >output of all of these "sound servers" into JACK, without endangering your
> >latency properties.
> 
> since JACK has no desire to be a general purpose API for all apps,
> this would be more appropriate than the other way around, i
> think. there is no reason to run a JACK system that does audio i/o via
> a sound server - it just isn't going to work with the performance
> promises that JACK is supposed to provide.

Well, you might want to run a professional audio APP with suboptimal latency
if you're a home user, and just interested in composing a song. You don't
need extremely low latency for that, but you might want to use a "professional
APP". But I agree with you. A JACK backend for CSL is higher priority than
a CSL backend for JACK.

> i think that the fundamental problem here is the division between:
> 
>  * the kinds of apps that emerge from the repeated questions on gtk-list
>    (and i'm sure the KDE equivalent): "what's the best way to play a
>    sound?"
> 
>  * "serious" audio applications (and, i suppose, video apps too)
> 
> CSL seems to be about providing something to the first group that
> PortAudio *could* provide, even though it may not be the best or
> easiest method (although it might be the most appropriate). but i
> really don't see how it offers anything to the second group that want
> features (and to some extent, *lack* of features) beyond those CSL or
> even PortAudio provide.
> 
> i see it this way because CSL doesn't appear to me to have emerged
> from efforts of audio developers. its come mostly from the desktop
> world as way to meet the need for a "cross-desktop" way to say "you've
> got mail" or play the KDE startup theme, and other "singular" audio
> events.

As I said, I consider myself an audio developer. But regardless of what I
consider myself or what you consider me, I can't improve CSL but on techincal
facts. Its irrelevant where it came from or who wrote it. I just want to
make it perfect for the purpose it has. If you'd say it isn't right now, I
would be more than happy if you let me know that needs to be improved.

> as i said above, the windows experience suggests that it may be
> useful, perhaps even necessary, to have two different abstractions to
> support these two rather different classes of applications. but the
> jury is out on that.

Well, its important to clarify that we need to distinguish two things:

 (a) the interface the operating system provides
 (b) the interface the application is using

For the scope of CSL, (b) is not too important. We only want to wrap
the operating system interface with little overhead, so that you can provide
any abstraction the application will need on top of that, if its different
than what CSL provides.

The operating system (linux) currently provides these:
 - a callback API (select/write)
 - a blocking API (write)
 - a non-blocking non-copying non-linear callback-aware API (mmap)

Depending on how you handle concurrency in your application (event-based
single threaded, not at all, multi threaded), and what your audio needs are,
any of these might be optimal for you. CSL is realizing this, and currently
wraps the first two ; there already have been some discussions on whether
we also need to provide a pwrite like write call in CSL to handle the third
abstraction as well.

   Cu... Stefan
-- 
  -* Stefan Westerfeld, stefan at space.twc.de (PGP!), Hamburg/Germany
     KDE Developer, project infos at http://space.twc.de/~stefan/kde *-         



More information about the kde-multimedia mailing list