KDE4 and video acquisition

Wed Jan 25 12:50:08 GMT 2006

Hi Matthias,

Sorry for the late reply, I've been pretty busy with my new job, things should 
hopefully be a bit quieter now.

> > As KDE developers are busy creating an architecture for the multimedia
> > framework in KDE4, I was wondering if any of you have thought about
> > webcam support. Everything I read about Phonon states that sound will be
> > supported through various backends (GStreamer, NMM or MAS), but the video
> > input has not been publicly discussed so far. With all the controls that
> > a webcam can expose (brightness/contrast/hue/saturation, but also
> > exposure time, gain, power line frequency setting, pan/tilt/zoom, iris,
> > ...), the job is not trivial.
>
> You are right, that currently there hasn't been any work on a VideoCapture
> interface (only audio). While it should be possible to add this interface
> in a later version of KDE, I'd like to have it done for KDE 4.0, if
> possible.

That's good news. Users put great hopes in KDE4, and it would probably 
disapoint them if there was no video capture support.

> As I have never done any video capture work (on the developers 
> side, as a user (on Windoze and Linux) capturing from a DV camera over
> Firewire I have a little experience) I'd be very grateful for helping hands
> on defining that interface.

I can only comment on V4L2 and IP-based devices, I'll rely on your experience 
for Firewire video capture. We also need someone with DVB experience.

> Here are a few thoughts from the top of my head:
> 1. it can be a capture source that delivers both audio and video (like a DV
> stream)

I can think of different kind of capture devices :

1. video only : this is probably the simplest case. The device delivers video 
in a given format, either frame based (uncompressed, MJPEG, DV) or stream 
based (MPEG).
2. audio/video in a single stream : the device delivers a data stream which 
contains interleaved audio and video data. This applies to MPEG and DV 
devices.
3. audio/video in different streams : the device delivers two streams of data, 
one for audio, the other for video. The audio clock can be locked 
(synchronized) to the video clock or not. A common case of such devices are 
USB devices with separate audio and video interfaces.

1. and 3. are easy to handle with either a single VideoCapture instance (1.) 
or separate VideoCapture and AudioCapture instances (3.). 2. is a bit more 
difficult, and we will need input from experienced users.

> 2. if it's a video only capture you often want to have the audio signal
> from the soundcard to be in sync with the video signal. So either there
> needs to be an interface for defining the sync between a SoundcardCapture
> and a VideoCapture or there should be only one interface "Capture" where
> the audio and video source can be independently set, and if both are set
> they're implicitly kept in sync.

Audio/video sync will probably be one of the major problems. Even if the 
device can capture both audio and video, the two streams can be unlocked, and 
we must provide a way to resynchronize them.

> 3. There's a class CaptureSource in SVN that describes the available
> sources to the user. (It's a really simple class, like a struct containing
> an id, name and description, but with users of the API only being able to
> read, of course). This class could just as well be used to describe the
> available video capture sources to the user, or is there any information
> missing?

Where in SVN ? Is there an SVN repository for Phonon ? In the main KDE 
repository ?

Maybe we could add some kind of capability. What is the CaptureSource used for 
exactly ?

> 4. What properties does the VideoCapture interface need? You 
> already mentioned - brightness
> - contrast
> - hue
> - saturation
> - exposure time
> - gain (I only know gain from audio signal processing, is that what you
> meant or is there actually a gain for video signals?)

There is a gain for video signals as well.

> - power line freq (Is that setting used for filtering? Is it available in
> the hardware interface? Is it available in the mediaframework interfaces?)

It's available in the hardware interface for some devices (newest Logitech 
webcams for instance), and will be accessible through V4L2.

> - pan
> - tilt
> - zoom
> - iris

There can be numerous other controls. Devices can define vendor-specific 
controls, so we probably need to deal with controls in an extensible way.

> As I, like I already said, have no experience with the developer side of
> video capture, I don't know whether these properties are all settings the
> hardware exposes or that need to be done in software. If the latter is
> needed these things should be done in the VideoPath which is meant for
> processing of the video signal.

I think we need to deal with properties exported by the hardware first, and 
then implement video processing in the VideoPath (whatever that is, can you 
tell me where I can find more information ?).

Here's how things work for V4L2 devices, please compare that to DV capture 
over firewire and tell me if there are differences which are worth being 
noted.

A V4L2 device can capture or output video, text or radio (don't ask me why 
Video4Linux supports radio capture). Let's focus on video capture for now.

The device exposes controls, which are booleans or 32-bit unsigned integers 
(though I would like to change that to other data types for more advanced 
controls). Those controls map directly to hardware features, or at least 
features that are partially implemented in hardware. Controls can be related 
to image processing (brightness, contrast, power line frequency, ...), image 
capture (exposure time, scan mode, ...), physical motion of the 
sensor-related components (focus, iris, pan/tilt/zoom, ...) or any other 
hardware feature that the device supports. Controls can be enumerated, 
queried (to find the lower and higher bounds), read and written. A control 
has an ID and a driver-supplied name.

The device also exposes a video capture interface. The user has to set up the 
interface before starting video streaming (setting the frame format, frame 
size, frame rate and compression quality). When the interface has been set up 
properly, video can be captured using one of 3 different methods (read/write, 
streaming I/O, async I/O - not all methods are supported by all drivers).

The video capture API should IMHO expose the following functions :

A. Queries

A.1. Enumeration. That would include the device name, serial number and 
location if available (to be able to tell two identical USB webcams apart 
based on the USB port number for instance).

A.2. Capabilities query. What can the device do ? Internal information such as 
the supported capture methods should probably not be reported to the user. 
What else would be needed ?

A.3. Image format query. What are the image formats (YUV, MJPEG, DV, 
MPEG, ...) supported by the device ? Even if we decompress MJPEG to supply 
YUV or RGB data to the user, knowing which formats are supported by the 
hardware might be useful for efficiency.

A.4. Image size query. What image sizes can we request ? For devices which 
support multiple image formats, this information probably depends on the 
format. Some devices support a number of predefined sizes, some others 
include a hardware scale/crop engine. Scaling/cropping can be done in 
software, but once again it would be useful to know what the hardware 
supports.

A.5. Frame rate query. What are the supported frame rates ? This can depend on 
the image format and size.

A.6. Controls enumeration. This would include, for each control, the control 
name and type (boolean, integer, ...).

A.7. Control query. What are the minimum, maximum and default value for a 
given control ?

B. Controls

B.1. Read. Return the control current value.

B.2. Write. Set the control value.

We can implement a number of wrappers for common controls (brightness, ...), 
but need to access the driver for device-specific controls. How can we do 
that in a portable way ?

C. Image capture

C.1 Image format and size setting. We need a way to set the image format and 
size. If the hardware doesn't support the requested size, V4L2 allows the 
driver to return the closest supported size. It would be quite useful to 
expose that to the user. V4L2 also has a "try size" function, which acts 
exactly as a "set size" except that no change is applied to the hardware.

C.2 Frame rate and quality. As for image size, it would be nice to have the 
video capture class return the closest supported values if the requested 
values are not valid. A "try frame rate/quality" would be interesting as 
well.

C.3 Enable/disable streaming. This is quite easy, start or stop video 
streaming. The acquisition method could be fixed or choosen dynamically. 
Letting the kernel allocate memory buffers and mapping them to userspace is 
the most effective way in V4L2. The number of buffers could have a default 
value which the user could overwrite. Do we want to let the user use another 
capture method ? I'm not sure how much we need to expose to the user. Do we 
need a blocking read, a signal when a frame is ready, or something else ? 
Emitting a signal when a frame is ready is quite easy if the device can be 
select()ed, which is the case for V4L2.

C.4 Single shot capture and various helpers. Those functions would use the 
streaming API internally and perform simple tasks such as capturing a single 
frame.

I've probably forgotten a few important things. One of them is trigger 
support. How do we want to support the capture button that several webcams 
feature ? The other is image format software conversion. We definitely need 
to implement a few decompressors for common video formats used in webcams 
(MJPEG, PWX, ...). There's also the issue of image scaling/cropping in 
software. I just realised that I haven't tackled audio/video synchronisation 
in this mail, but I don't have enough experience with that and need input 
from someone else.

The V4L2 API misses some important features, such as complex controls, frame 
rate & quality "try and set" method, and trigger support. I sent a mail to 
the video4linux mailing list a week or so ago, and the developers are not too 
reluctant to make the API evolve as long as backward compatibility is kept. I 
haven't pushed the matter further, as I think the two issues (V4L2 evolution 
and KDE video capture API) are related to eachother, and I wanted to discuss 
the matter here. People come from time to time with a request for a v4l user 
space library. I'm not sure if that is needed or if all the user space part 
should be implemented in KDE. I'm also not sure how we will interact with 
backends which support video capture (GStreamer).

I know this sounds quite complex, but if the API is done right, I think I 
could implement a V4L2 backend quite easily.

> I'd be thankful if you want to try to write such an interface. We can
> discuss this here on this list.

I can at least try. As for discussing this on the list, it's your turn to 
answer :-)

Laurent Pinchart