KDE4 and video acquisition
Laurent Pinchart
laurent.pinchart at skynet.be
Wed Jan 25 12:50:08 GMT 2006
Hi Matthias,
Sorry for the late reply, I've been pretty busy with my new job, things should
hopefully be a bit quieter now.
> > As KDE developers are busy creating an architecture for the multimedia
> > framework in KDE4, I was wondering if any of you have thought about
> > webcam support. Everything I read about Phonon states that sound will be
> > supported through various backends (GStreamer, NMM or MAS), but the video
> > input has not been publicly discussed so far. With all the controls that
> > a webcam can expose (brightness/contrast/hue/saturation, but also
> > exposure time, gain, power line frequency setting, pan/tilt/zoom, iris,
> > ...), the job is not trivial.
>
> You are right, that currently there hasn't been any work on a VideoCapture
> interface (only audio). While it should be possible to add this interface
> in a later version of KDE, I'd like to have it done for KDE 4.0, if
> possible.
That's good news. Users put great hopes in KDE4, and it would probably
disapoint them if there was no video capture support.
> As I have never done any video capture work (on the developers
> side, as a user (on Windoze and Linux) capturing from a DV camera over
> Firewire I have a little experience) I'd be very grateful for helping hands
> on defining that interface.
I can only comment on V4L2 and IP-based devices, I'll rely on your experience
for Firewire video capture. We also need someone with DVB experience.
> Here are a few thoughts from the top of my head:
> 1. it can be a capture source that delivers both audio and video (like a DV
> stream)
I can think of different kind of capture devices :
1. video only : this is probably the simplest case. The device delivers video
in a given format, either frame based (uncompressed, MJPEG, DV) or stream
based (MPEG).
2. audio/video in a single stream : the device delivers a data stream which
contains interleaved audio and video data. This applies to MPEG and DV
devices.
3. audio/video in different streams : the device delivers two streams of data,
one for audio, the other for video. The audio clock can be locked
(synchronized) to the video clock or not. A common case of such devices are
USB devices with separate audio and video interfaces.
1. and 3. are easy to handle with either a single VideoCapture instance (1.)
or separate VideoCapture and AudioCapture instances (3.). 2. is a bit more
difficult, and we will need input from experienced users.
> 2. if it's a video only capture you often want to have the audio signal
> from the soundcard to be in sync with the video signal. So either there
> needs to be an interface for defining the sync between a SoundcardCapture
> and a VideoCapture or there should be only one interface "Capture" where
> the audio and video source can be independently set, and if both are set
> they're implicitly kept in sync.
Audio/video sync will probably be one of the major problems. Even if the
device can capture both audio and video, the two streams can be unlocked, and
we must provide a way to resynchronize them.
> 3. There's a class CaptureSource in SVN that describes the available
> sources to the user. (It's a really simple class, like a struct containing
> an id, name and description, but with users of the API only being able to
> read, of course). This class could just as well be used to describe the
> available video capture sources to the user, or is there any information
> missing?
Where in SVN ? Is there an SVN repository for Phonon ? In the main KDE
repository ?
Maybe we could add some kind of capability. What is the CaptureSource used for
exactly ?
> 4. What properties does the VideoCapture interface need? You
> already mentioned - brightness
> - contrast
> - hue
> - saturation
> - exposure time
> - gain (I only know gain from audio signal processing, is that what you
> meant or is there actually a gain for video signals?)
There is a gain for video signals as well.
> - power line freq (Is that setting used for filtering? Is it available in
> the hardware interface? Is it available in the mediaframework interfaces?)
It's available in the hardware interface for some devices (newest Logitech
webcams for instance), and will be accessible through V4L2.
> - pan
> - tilt
> - zoom
> - iris
There can be numerous other controls. Devices can define vendor-specific
controls, so we probably need to deal with controls in an extensible way.
> As I, like I already said, have no experience with the developer side of
> video capture, I don't know whether these properties are all settings the
> hardware exposes or that need to be done in software. If the latter is
> needed these things should be done in the VideoPath which is meant for
> processing of the video signal.
I think we need to deal with properties exported by the hardware first, and
then implement video processing in the VideoPath (whatever that is, can you
tell me where I can find more information ?).
Here's how things work for V4L2 devices, please compare that to DV capture
over firewire and tell me if there are differences which are worth being
noted.
A V4L2 device can capture or output video, text or radio (don't ask me why
Video4Linux supports radio capture). Let's focus on video capture for now.
The device exposes controls, which are booleans or 32-bit unsigned integers
(though I would like to change that to other data types for more advanced
controls). Those controls map directly to hardware features, or at least
features that are partially implemented in hardware. Controls can be related
to image processing (brightness, contrast, power line frequency, ...), image
capture (exposure time, scan mode, ...), physical motion of the
sensor-related components (focus, iris, pan/tilt/zoom, ...) or any other
hardware feature that the device supports. Controls can be enumerated,
queried (to find the lower and higher bounds), read and written. A control
has an ID and a driver-supplied name.
The device also exposes a video capture interface. The user has to set up the
interface before starting video streaming (setting the frame format, frame
size, frame rate and compression quality). When the interface has been set up
properly, video can be captured using one of 3 different methods (read/write,
streaming I/O, async I/O - not all methods are supported by all drivers).
The video capture API should IMHO expose the following functions :
A. Queries
A.1. Enumeration. That would include the device name, serial number and
location if available (to be able to tell two identical USB webcams apart
based on the USB port number for instance).
A.2. Capabilities query. What can the device do ? Internal information such as
the supported capture methods should probably not be reported to the user.
What else would be needed ?
A.3. Image format query. What are the image formats (YUV, MJPEG, DV,
MPEG, ...) supported by the device ? Even if we decompress MJPEG to supply
YUV or RGB data to the user, knowing which formats are supported by the
hardware might be useful for efficiency.
A.4. Image size query. What image sizes can we request ? For devices which
support multiple image formats, this information probably depends on the
format. Some devices support a number of predefined sizes, some others
include a hardware scale/crop engine. Scaling/cropping can be done in
software, but once again it would be useful to know what the hardware
supports.
A.5. Frame rate query. What are the supported frame rates ? This can depend on
the image format and size.
A.6. Controls enumeration. This would include, for each control, the control
name and type (boolean, integer, ...).
A.7. Control query. What are the minimum, maximum and default value for a
given control ?
B. Controls
B.1. Read. Return the control current value.
B.2. Write. Set the control value.
We can implement a number of wrappers for common controls (brightness, ...),
but need to access the driver for device-specific controls. How can we do
that in a portable way ?
C. Image capture
C.1 Image format and size setting. We need a way to set the image format and
size. If the hardware doesn't support the requested size, V4L2 allows the
driver to return the closest supported size. It would be quite useful to
expose that to the user. V4L2 also has a "try size" function, which acts
exactly as a "set size" except that no change is applied to the hardware.
C.2 Frame rate and quality. As for image size, it would be nice to have the
video capture class return the closest supported values if the requested
values are not valid. A "try frame rate/quality" would be interesting as
well.
C.3 Enable/disable streaming. This is quite easy, start or stop video
streaming. The acquisition method could be fixed or choosen dynamically.
Letting the kernel allocate memory buffers and mapping them to userspace is
the most effective way in V4L2. The number of buffers could have a default
value which the user could overwrite. Do we want to let the user use another
capture method ? I'm not sure how much we need to expose to the user. Do we
need a blocking read, a signal when a frame is ready, or something else ?
Emitting a signal when a frame is ready is quite easy if the device can be
select()ed, which is the case for V4L2.
C.4 Single shot capture and various helpers. Those functions would use the
streaming API internally and perform simple tasks such as capturing a single
frame.
I've probably forgotten a few important things. One of them is trigger
support. How do we want to support the capture button that several webcams
feature ? The other is image format software conversion. We definitely need
to implement a few decompressors for common video formats used in webcams
(MJPEG, PWX, ...). There's also the issue of image scaling/cropping in
software. I just realised that I haven't tackled audio/video synchronisation
in this mail, but I don't have enough experience with that and need input
from someone else.
The V4L2 API misses some important features, such as complex controls, frame
rate & quality "try and set" method, and trigger support. I sent a mail to
the video4linux mailing list a week or so ago, and the developers are not too
reluctant to make the API evolve as long as backward compatibility is kept. I
haven't pushed the matter further, as I think the two issues (V4L2 evolution
and KDE video capture API) are related to eachother, and I wanted to discuss
the matter here. People come from time to time with a request for a v4l user
space library. I'm not sure if that is needed or if all the user space part
should be implemented in KDE. I'm also not sure how we will interact with
backends which support video capture (GStreamer).
I know this sounds quite complex, but if the API is done right, I think I
could implement a V4L2 backend quite easily.
> I'd be thankful if you want to try to write such an interface. We can
> discuss this here on this list.
I can at least try. As for discussing this on the list, it's your turn to
answer :-)
Laurent Pinchart
More information about the kde-multimedia
mailing list