Quantcast
Channel: Windows API – Fooling Around
Viewing all 49 articles
Browse latest View live

Little known DirectShow VMR-7 snapshot problem

$
0
0

There is so little information about this problem (not really a bug, rather a miscalculation) out there because it is coming up with customized Video Mixing Renderer Filter 7 and there is no problem with straightforward use.

In windowless mode the renderer is accepting media samples and displays them as configured. IVMRWindowlessControl::GetCurrentImage method is available to grab currently presented image and obtain a copy of what is displayed at the moment – the snapshot. The renderer is doing a favor and converts it to RGB, and the interface method is widely misused as a way to access uncompressed video frame, esp. in format compatible with other APIs or saving to bitmap (a related earlier post: How To: Save image to BMP file from IBasicVideo or VMR windowless interface).

One of the problems with the method is that it reads back from video memory, which is – in some configurations – an extremely expensive operation and is simply unacceptable because of its impact overall.

This time, however, I am posting another issue. By default VMR-7 is offering a memory allocator of one media sample. It accepts a new frame and then blits it into video device. Simple. With higher resolutions, higher frame rates and in the same time having VMR-7 as a legacy API working through compatibility layers, we are getting into situation that this presentation method becomes a bottleneck. We cannot pre-load next video frame before getting back from presentation call. For 60 frames/second video this means that with any congestion 17 millisecond long we might miss a chance to present next video frame of a video stream. Virtual artifact and these things are perceptible.

An efficient solution to address this problem is to increase number of buffers in video renderer’s memory allocator, and then fill buffers asynchronously. This does work well: we fill the buffers well in advance, the costly operation does not have to complete within frame presentation time frame. Pushing media pipeline pre-loads video buffers in efficient way and then video renderer simply grabs out of the queue a prepared frame and presents it. Terrific!

The video renderer’s input is thus a queue of media samples. It keeps popping and presenting them matching their time stamps against presentation clock waiting respective time. Now let us have a look at snapshot method signature:

HRESULT GetCurrentImage(
  [out] BYTE **lpDib
);

We have an image, that’s good and now the problem is that it is not clear which sample from the queue this image corresponds to. VMR-7 does not report associated time stamp even though it has this information. The problem is that it could have accepted a frame already and returned control, but presentation is only scheduled and the caller cannot derive the time stamp even from the fact that renderer filter completed the delivery call.

Video Mixing Renderer 9 is presumably subject to the same problem.

In constrast, EVR method’s IMFVideoDisplayControl::GetCurrentImage call is already:

HRESULT GetCurrentImage(
  [in, out] BITMAPINFOHEADER *pBih,
  [out]     BYTE             **pDib,
  [out]     DWORD            *pcbDib,
  [in, out] LONGLONG         *pTimeStamp
);

That is, at some point someone asked the right question: “So we have the image, where is time stamp?”.

Presumably, VMR-7 custom allocator/presenter can work this problem around as presenter processes the time stamp information and can reports what standard VMR-7 does not.


Anniversary Webcam Saga: It’s clear who’s guilty, now what to do? (Updated)

$
0
0

As new and new people discover the Windows 10 Anniversary Update breaking changes (expectedly running mad), let’s reiterate the possible solutions:

  1. You don’t like the idea that video sharing service adds latency, and adds man-in-the-middle access/spying over a video feed.
    See #6 below.
  2. You are consuming raw video from camera using one of the uncompressed modes within USB 2.0 bandwidth.
    You are likely to be not affected by the changes.
  3. You are consuming raw video from camera but resolution/rate combination makes it unable to capture raw video, so you captured M-JPEG instead and decoded that, via DirectShow API.
    It is no longer possible, but you can use Media Foundation API instead. Or someone will develop a wrapper that re-exposes Media Foundation captured video via DirectShow.
  4. Same as #3 above but via Media Foundation API.
    You have the option to consume already decoded video, new subsystem will automatically capture M-JPEG and decode into NV12.
  5. You take advantage of compressed format of video captured (DirectShow or Media Foundation) so that you don’t need compress it for storage or network transmission purposes.
    Compressed captured video is no longer available, see #6 below.
  6. You take advantage of H.264 video capture offered by UVC 1.5 device, including fine tuning of hardware H.264 compression.
    Just as in #1 and #5 above, you are in trouble. Windows Camera Frame Server no longer offers access to such video feed. You need a hack (yes, it’s confirmed to be possible) that restores original behavior of video capture hardware.

These and other reasons related to the fact that applications no longer talk to real capture device, but rather a Frame Server Client that proxies a web camera, will possibly require that video capture applications are updated in order to work well in new version of the operating system.

It is unclear if and how Microsoft and Media Foundation team will respond to customer pain voices. First, it looked as a bug and one could expect a response and fix. But with the information from Windows Camera Team it looks completely different. No, they did not accidentally break it up – it was a planned change. Then they connected new behavior with new Microsoft Products – new products rely on new behavior. Then they did a few nasty things, not just one: added proxy service, killed UVC related compression control over the device, reduced range of operation modes for DirectShow they look for ways to deprecate, conceptually removed compressed video capture modes. I think there is no way back – Windows Camera Frame Server is new reality. The best to expect is that some of the mentioned problems are relaxed by offering greater flexibility by the platform. Maybe they will add some sort of exclusive modes for video capture or “professional” hardware which offers more through the API. In any event these changes are unlikely to appear soon, as they will go through the full cycle of development and take months to get delivered. Public pressure might force that to appear rather earlier than later, but I don’t think it is what is going to happen.

16 Aug update: Windows Camera Team reported that they see customer pain and will do something to ease it shortly. As I see it, they will address scenarios #3, #4, #5 above, for MJPG video, to allow compressed formats pass Frame Server so that users could consume them from their applications and Frame Server would be able to release frames not just after decoder, but also before the shared decoder. Also as use of H.264 is limited, it might be not included into hotfix at all, or will be included much later being given more serious consideration (which might end up as dropped support in DirectShow and something new introduced for Media Foundation).

19 Aug update: Someone took time to locate a registry value. User WithinRafael on MSDN Forums:

Try opening up

HKLM\SOFTWARE\Microsoft\Windows Media Foundation\Platform (32- and 64-bit OS)
HKLM\SOFTWARE\WOW6432Node\Microsoft\Windows Media Foundation\Platform (64-bit OS)

and add a DWORD value with name EnableFrameServerMode. Set its value to 0 and try again.

Put a sticky note on your monitor to revisit this if/when Microsoft issues a fix.

Also:

Untitled

Number of streams served by IMFSourceReader interface

$
0
0

It looks confusing that IMFSourceReader interface does not offer a dedicated method to find out the number of streams behind it. There is a IMFMediaSource instance behind the reader, and its streams are available through IMFMediaSource::CreatePresentationDescriptor method and IMFPresentationDescriptor::GetStreamDescriptorCount method call.

I am under impression that source reader’s method just has to be there even though I am not seeing it looking at the list of methods. Okay, there are other methods, namely IMFSourceReader::GetStreamSelection method, which takes either ordinal stream index or an alias as the first argument, then returns MF_E_INVALIDSTREAMNUMBER if you run out of streams. However the problem is that this is associated with an internal exception, and I consider exceptions as exceptional conditions the code should not normally hit. I would expect to have a legal exception-free way to find out the number of streams. I am using debugger that breaks on exception or at least pollutes output log for no reason, I use other tools that intercept and log exceptions as something that needs attention – getting number of streams is nowhere near there.

Internal MF_E_INVALIDSTREAMNUMBER Exception

Even though it is not a real drawback of the API since it is still possible to get the data and the API acts as documented, I still think someone overlooked this and API like this should have have a normal method or argument to request number of streams explicitly. Or I am just not seeing it even though I am trying thoroughly.

DirectShowCaptureCapabilities and MediaFoundationCaptureCapabilities: API version of EnableFrameServerMode state

$
0
0

Both tools now include exact version of the API and also include an export or registry key related to frame server.

Capture Capabilities: API Version and State

mfcore.dll version of 10.0.14393.105 corresponds to Cumulative Update for Windows 10 Version 1607: August 31, 2016 also known as KB3176938 with DirectShow and Media Foundation improvement for Windows 10 Anniversary Update that restores availability of compressed media types.

See:

Enumeration of DirectShow Capture Capabilities (Video and Audio)

Media Foundation Video/Audio Capture Capabilities

KB3176938’s Frame Server update visually

$
0
0

  1. M-JPEG and H.264 media types are available again (good)
  2. Nevertheless connected, H.264 video is not processed correctly; new bug or old one? Not clear. Even though it sort of works, in DirectShow it looks broken in another new way, perhaps a collateral damage and maybe never ever fixed…
  3. There is no camera sharing between the applications even though it was the justification for the changes in first place. For now Frame Server is just useless overhead, which adds bad stuff, is polished a bit to do not so much harm, and maybe turns to be good some time later.
    • for the record, the camera works in Skype when it is not consumed elsewhere concurrently

BTW the hack that bypasses FrameServer survived the update and remains in good standing.

Applicability of Virtual DirectShow Sources

$
0
0

Virtual DirectShow  sources have been a long time synonym of software-only camera implementation exposed to applications along with physical cameras in a way that applications consume the sources without making a difference whether the camera is real or virtual. Vivek’s template was a starting point for many:

Capture Source Filter filter (version 0.1) 86 KB zipped, includes binaries.  A sample source filter that emulates a video capture device contributed by Vivek (rep movsd from the public newsgroups).  Thanks Vivek!  TMH has not tested this filter yet.  Ask questions about this on microsoft.public.win32.programmer.directx.video.

With API changes over years, the sample and the concept is still understood as the method of adding a virtual camera, however new scenarios exist where the concept no longer works. Typical problems:

  1. 64-bit applications cannot consume virtual 32-bit virtual sources
  2. Virtual sources are no visible and accessible to applications consuming video using Media Foundation API

The diagram below explains the applicability of virtual cameras:

Applicability of Virtual DirectShow Sources

Important is that virtual sources can only be consumed by the DirectShow-based applications of the same bitness.

If source developer needs to synchronize virtual source throughout multiple applications (e.g. video is synthesized by another application and needs to be deliverable to multiple clients), he needs to add interprocess synchronization on the backyard of virtual source.

If developer needs to support both 32- and 64-bit apps, he needs both variants of virtual sources registered, and possibly synchronization of the kind of the paragraph above.

The only virtual device which is visible to all video capture applications if implemented by kernel level driver (implementations are rare but exist).

Bug in Media Foundation MPEG-4 File Source related to timestamping video frames of a fragmented MP4 file

$
0
0

Some recent update in Media Foundation platform introduced a new bug related to fragmented MP4 files and H.264 video. The bug shows up consistently with file versions:

  • mfplat.dll – 10.0.14393.351 (rs1_release_inmarket.161014-1755)    15-Oct-16 05:48
  • mfmp4srcsnk.dll – 10.0.14393.351 (rs1_release_inmarket.161014-1755)    15-Oct-16 05:45

The nature of the problem is that MPEG-4 File Source is incorrectly time stamping the data: frame time stamps are incorrect, they seems to be getting wrong durations and increments, then quickly jumps into future… and on playback this leads to unobvious playback freezes. As Media Foundation is used by Windows Media Player, Windows 10 Movies & TV Player, the bug is present there as well.

The original report is on MSDN Forums.

Presumably it is possible to roll certain Windows Update package back, or alternatively one has to wait for Microsoft to fix the problem and deliver a new update deploying the fix.

Intel Quick Sync Video Consumption by Applications

$
0
0

I wrote a few posts on hardware H.264 encoding (e.g. this and the latest one Applying Hardsubs to H.264 Video with GPU). A blog reader asked a question regarding availability of the mentioned Intel Quick Sync Video support with low end Intel x5-Z8300 CPU.

[…] Intel has advertised that the Cherry Trail CPUs support H264 encoding and / or QSV, but nowhere have I seen a demo of this being used […].
What did you use to encode the video? Is the QSV codec available in the x5-z8300 for possible 720p realtime encoding? I’d like to see this checked in regards to using software like FFmpeg with qsv_h264 -codec and OBS. […]

A picture below explains how applications are consuming Intel’s hardware video compression offering in Windows.

Intel QSV includes hardware implementation of the encoder and corresponding drivers which provide a frontend API to software. This includes a component which integrates the codec with Microsoft’s Media Foundation API. Applications are to choose between interfacing the codec using Windows API – this is the way stock Microsoft applications work, and this is the way I used for video encoding development mentioned on the blog. Other applications prefer to interface through Intel Media SDK, which is an alternate route ending up at the same hardware-backed services.

Intel x5-Z8300 system in question has H.264 video encoding support integrated into Windows API and the services can be consumed without additional Intel runtime and/or development kit. The codec, according to the benchmarks made earlier, is fast enough to handle real-time 720p video encoding nevertheless the device is a budget thing.


Media Foundation’s MFT_MESSAGE_SET_D3D_MANAGER with Frame Rate Converter DSP

$
0
0

It might look weird why would someone try Direct3D mode with a DSP, which is not supposed to be Direct3D aware, but still. I am omitting the part why I even got to such scenario. The documentation says a few things about MFT_MESSAGE_SET_D3D_MANAGER:

  • This message applies only to video transforms. The client should not send this message unless the MFT returns TRUE for the MF_SA_D3D_AWARE attribute (MF_SA_D3D11_AWARE for Direct3D 11).
  • Do not send this message to an MFT with multiple outputs.
  • An MFT should support this message only if the MFT uses DirectX Video Acceleration for video processing or decoding.
  • If an MFT supports this message, it should also implement the IMFTransform::GetAttributes method and return the value TRUE…
  • If an MFT does not support this message, it should return E_NOTIMPL from ProcessMessage. This is an exception to the general rule that an MFT can return S_OK from any message it ignores.

Frame Rate Converter DSP is a hybrid DMO/MFT, which in turn basically means that its “legacy” DMO upgraded to MFT using specialized wrapper. It is not supposed to be Direct3D aware, not documented as such.

However it could presumably normalize frame rate of Direct3D aware samples by dropping/duplicating samples respectively. It could easily be Direct3D aware since it does not need, in its simplest implementation, to change the data. It is easy to see that the MFT satisfies the other conditions: it is single output video transform.

The MFT correctly and expectedly does not advertise itself as Direct3D aware. It does not have transform attributes.

However, it fails to comply with documented behavior on returning E_NOTIMPL in MFT_MESSAGE_SET_D3D_MANAGER message. The message is defined to be an exception, however DSP implementation seems to be ignoring that. The wrapper could possibly be created even before the exception was introduced in first place.

The DSP does not make an exception, returns success code as if it does handle the message, and does not act as documented.

No conversion with MF_CONNECT_ALLOW_CONVERTER

$
0
0

Microsoft Media Foundation Media Session API topology resolution is way less clear compared to DirectShow. The API takes away a part of component connection process and makes it less transparent to API consumer. Then, while DirectShow Intelligent Connect is use scenario agnostic,  Media Foundation Media Session apparently targets playback scenarios, and its topology resolution process is tuned respectively.

MF_CONNECT_ALLOW_DECODER

Add a decoder transform upstream upstream from this node, if needed to complete the connection. The numeric value of this flag includes the MF_CONNECT_ALLOW_CONVERTER flag. Therefore, setting the MF_CONNECT_ALLOW_DECODER flag sets the MF_CONNECT_ALLOW_CONVERTER flag as well.

[…] If this attribute is not set, the default value is MF_CONNECT_ALLOW_DECODER.

Well, that’s double upstream and suggests that the thing is impressively reliable. However it is not.

In a non-playback topology, if a direct connection is impossible and required conversion is not typical for playback, MF_CONNECT_ALLOW_CONVERTER flag seems to be not helpful for topology resolution.

Apparently, Microsoft does have suitable code, esp. used in Sink Writer API, however it does not seem to be available in any form other than a packet deal with Sink Writer object and its own limitations. Media Session API does not implement this (non-playback, that is) style of topology resolution and node connection. Transcode API too has the necessary topology resolution code, but again it comes with its own constraints making it useless unless you want to do something really simple.

 

Effects of IMFVideoProcessorControl2::EnableHardwareEffects

$
0
0

IMFVideoProcessorControl2::EnableHardwareEffects method:

Enables effects that were implemented with IDirectXVideoProcessor::VideoProcessorBlt.

[…] Specifies whether effects are to be enabled. TRUE specifies to enable effects. FALSE specifies to disable effects.

All right, it is apparently not IDirectXVideoProcessor and MSDN link behind the identifier takes one to correct Direct3D 11 API method: ID3D11VideoContext::VideoProcessorBlt.

Worse news is that having the effects “enabled”, the transform (the whole thing belongs to Media Foundation’s Swiss knife of conversion [with just one blade and no scissors though] – Video Processing MFT) fails to deliver proper output and produces green black fill instead of proper image.

Or, perhaps, this counts as a hardware effect.

Microsoft Media Foundation code samples online

$
0
0

Media Foundation Team Blog (2009-2011) lost connection with the community some time ago, and its sample code hosted at http://code.msdn.microsoft.com/mfblog passed away too.

Three Four of the five samples were saved and were made back available by user mofo77, and the other two one MFSimpleEncode and MFManagedEncode are is still wanted.

I put the sample code online at GitHub here. If someone happens to have the two missing projects, please post there or email me to have them pushed to the repository. Feel free to use the samples if, for whatever reason,
Media Foundation is what you decided to mess with.

Also, be aware that older Windows SDK Media Foundation samples can be found in:

As it becomes a collection of Media Foundation related links here we go with bonus reading:

Now you are well set, GOOD LUCK!

Microsoft AAC Encoder’s MF_E_TRANSFORM_NEED_MORE_INPUT after MFT_OUTPUT_STATUS_SAMPLE_READY

$
0
0

Media Foundation AAC Encoder is a pure MFT, as opposed to legacy DSP’s which are made dual DMO/MFT interfaced and presumably have higher chances for smaller artifacts.

The transform is synchronous and is supposed to be simpler inside: fully passive and drives by input/output calls.

Nevertheles, advertising MFT_OUTPUT_STATUS_SAMPLE_READY via IMFTransform::GetOutputStatus call, if might falsely indicate availability of data. Subsequent ProcessOutput call returns MF_E_TRANSFORM_NEED_MORE_INPUT… Documented behavior:

If the method returns the MFT_OUTPUT_STATUS_SAMPLE_READY flag, it means you can generate one or more output samples by calling IMFTransform::ProcessOutput.

MFTs are not required to implement this method. If the method returns E_NOTIMPL, you must call ProcessOutput to determine whether the transform has output data.

The method is optional, but it is implemented on this particular MFT. Also, this MFT is one of the stock transforms that are documented for public use. Microsoft could apparently have done a better job implementing it cleanly.

See also:

Greetings from H.265 / HEVC Video Decoder Media Foundation Transform

$
0
0

H.265 / HEVC Video Decoder Media Foundation has been around for a while, but using Media Foundation overall one step off straightforward basic paths is like walking a minefield.

A twenty-liner below hits memory access violation inside IMFTransform::GetInputStreamInfo “Exception thrown at 0x6D1E71E7 (hevcdecoder.dll) in MfHevcDecoder01.exe: 0xC0000005: Access violation reading location 0x00000000.”:

#include "stdafx.h"
#include <mfapi.h>
#include <mftransform.h>
#include <wmcodecdsp.h>
#include <atlbase.h>
#include <atlcom.h>

#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "wmcodecdspuuid.lib")

int main()
{
    ATLVERIFY(SUCCEEDED(CoInitialize(NULL)));
    ATLVERIFY(SUCCEEDED(MFStartup(MF_VERSION)));
    CComPtr<IMFTransform> pTransform;
#if 1
    static const MFT_REGISTER_TYPE_INFO InputTypeInformation = { MFMediaType_Video, MFVideoFormat_HEVC };
    IMFActivate** ppActivates;
    UINT32 nActivateCount = 0;
    ATLVERIFY(SUCCEEDED(MFTEnumEx(MFT_CATEGORY_VIDEO_DECODER, 0, &InputTypeInformation, NULL, &ppActivates, &nActivateCount)));
    ATLASSERT(nActivateCount > 0);
    ATLVERIFY(SUCCEEDED(ppActivates[0]->ActivateObject(__uuidof(IMFTransform), (VOID**) &pTransform)));
#else
    ATLVERIFY(SUCCEEDED(pTransform.CoCreateInstance(CLSID_CMSH265EncoderMFT)));
#endif
    MFT_INPUT_STREAM_INFO InputInformation;
    ATLVERIFY(SUCCEEDED(pTransform->GetInputStreamInfo(0, &InputInformation)));
    return 0;
}

Interestingly, alternative path around IMFActivate (see #if above) seems to be working fine.

 

H.265/HEVC Video Decoder issues in Windows 10 Fall Creators Update

$
0
0

Microsoft released new Windows 10 update and again there is a new tsunami approaching.

Media Foundation H.265 decoder: there is no longer “preliminary documentation” notice in MSDN article. The decoder is a substandard quality software item but it covers more or less what it has to. In particular, it indeed backs H.265 video playback including with the use of DXVA2 where applicable. It could have been an impression that technology matures and who knows maybe it will be even updated to at least the quality and feature set of H.264 decoder.

To make life more interesting and challenging Fall Creators Update does a breaking change. The changes are not yet documented, and they are important: H.265 decoder is taken away. Luckily, not completely: the decoder (along with encoder) is moved to a separate Windows Store download: HEVC Video Extension.

Even though it looks like being the same decoder, it is packaged differently and consumers might require to update code to continue decoder use/consumption.

Even though mentioned above makes at least some sense, there is also an obvious bug on top of all this: 32-bit version of the decoder is BROKEN.

The released variant of the decoder is sealed by digital signature and enforces integrity checks. This looks good with 64-bit version of the software, and in particular stock Movies and TV Player application can indeed play H.265 video in 64-bit Windows because 64-bit version of the application is used, and in turn this pulls 64-bit version of the decoder. However, 32-bit version of the DLL is broken and does not work, hence 32-bit applications relying on H.265 video decoding capabilities are going to stop working with Fall Creators Update.

The problem is apparently the integrity check because if you manage to remove that, 32-bit version of H.265/HEVC decoder is operational.

It will take some time for Microsoft to identify and fix the problem. It will be fixed though, so if it is important to have 32-bit apps functional in the part of H.265 video decoding/playback, one should rather postpone Fall Creators Update.

Further reading:


NVIDIA H.264 Encoder Media Foundation Transform’s REGDB_E_CLASSNOTREG

$
0
0

For already years Nvidia’s H.264 video encoder Media Foundation Transform has been broken giving REGDB_E_CLASSNOTREGClass not registered” failure in certain circumstances, like main display is not the one connected to Nvidia video adapter.

As simple as this:

CoInitialize(NULL);
MFStartup(MF_VERSION);
class __declspec(uuid("{60F44560-5A20-4857-BFEF-D29773CB8040}")) CFoo; // NVIDIA H.264 Encoder MFT
CComPtr<IMFTransform> pTransform;
const HRESULT nCoCreateInstanceResult = pTransform.CoCreateInstance(__uuidof(CFoo));
// NOTE: nCoCreateInstanceResult is 0x80040154 REGDB_E_CLASSNOTREG

The COM server itself is, of course, present and registered properly via their nvEncMFTH264.dll, it’s just broken inside. Being called straightforwardly via IClassFactory.CreateInstance it gives 0x8000FFFF E_UNEXPECTED “Catastrophic Failure”, and so it also fails in Media Foundation API scenarios.

This is absolutely not fun, Nvidia!

Microsoft Media Foundation webcam video capture in one screen of code

$
0
0

Being complicated for many things, Media Foundation is still quite simple for the basics. To capture video with Media Foundation the API offers Source Reader API which uses Media Foundation primitives to build a pipeline that manages origin of the data (not necessarily a live source as in this example, but can also be a file or remote resource) and offers on-request reading of the data by the application, without consumption by Media Foundation managed primitives (in this aspect Source Reader is opposed to Media Session API).

The simplest use of Source Reader to read frames from a web camera is simple enough to fit a few tens of lines of C++ code. Sample VideoCaptureSynchronous project captures video frames in the form of IMFSample samples in less then 100 lines of code.

Friendly Name: Logitech Webcam C930e
nStreamIndex 0, nStreamFlags 0x100, nTime 1215.074, pSample 0x0000000000000000
nStreamIndex 0, nStreamFlags 0x0, nTime 0.068, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.196, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.324, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.436, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.564, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.676, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.804, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 0.916, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.044, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.156, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.284, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.396, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.524, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.636, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.764, pSample 0x000002CAF3805D10
nStreamIndex 0, nStreamFlags 0x0, nTime 1.956, pSample 0x000002CAF3805D10
...

VideoCaptureSynchronous project does not show what to do with the samples or how to request specific format of the samples, it just shows the ease of video capture per se. The capture takes place synchronously with blocking calls requesting and obtaining the samples.

Media Foundation API is asynchronous by its design and Source Reader API in synchronous mode hides the complexity. The blocking call issues a request for a frame internally, waits until the frame arrives and makes it available.

Source Reader API does offer asynchronous model too, making it available again in as simple as possible way.

VideoCaptureAsynchronous project is doing the same video capture but asynchronously: the controlling thread just starts capture, and frames are delivered via a callback once they are available, on a worker thread.

So when does one use synchronous and asynchronous?

Even though synchronous model results in cleaner and more reliable code with less chances for a mistake, and the gains over asynchronous model in most cases can be neglected esp. to those who are interested in beginner material like this post, video capture is real time process where one doesn’t want to block controlling thread and instead receive the frames out of nowhere as soon as they are ready. Hence, the asynchronous version. Asynchronous still can be simple: VideoCaptureAsynchronous is already more than 100 lines of code, but 120 lines might be also okay.

Download links

  • Source code:
    • VideoCaptureSynchronous: SVN, Trac
    • VideoCaptureAsynchronous: SVN, Trac
  • License: This software is free to use

About Microsoft FLAC Audio Encoder MFT

$
0
0

This is basically a cross-post of StackOverflow answer:

How do I encode raw 48khz/32bits PCM to FLAC using Microsoft Media Foundation?

So, Microsoft introduced a FLAC Media Foundation Transform (MFT) Encoder CLSID_CMSFLACEncMFT in Windows 10, but the codec remains undocumented at the moment.

Supported Media Formats in Media Foundation is similarly out of date and does not reflect presence of recent additions.

I am not aware of any comment on this, and my opinion is that the codec is added for internal use but the implementation is merely a standard Media Foundation components without licensing restrictions, so the codecs are unrestricted too by, for example, field of use limitations.

This stock codec seems to be limited to 8, 16 and 24 bit PCM input options (that is, not 32 bits/sample – you need to resample respectively). The codec is capable to accept up to 8 channels and flexible samples per second rate (48828 Hz is okay).

Even though the codec (transform) seems to be working, if you want to produce a file, you also need a suitable container format (multiplexer) which is compatible with MFAudioFormat_FLAC (the identifier has 7 results on Google Search at the moment of the post, which basically means noone is even aware of the codec). Outdated documentation does not reflect actual support for FLAC in stock media sinks.

I borrowed a custom media sink that writes a raw MFT output payload into a file, and such FLAC output is playable as the FLAC frames contain necessary information to parse the bitstream for playback.

enter image description here

For the reference, the file itself is: 20180224-175524.flac.

An obvious candidate among stock media sinks WAVE Media Sink is unable to accept FLAC input. Nevertheless it potentially could, the implementation is presumably limited to simpler audio formats.

AVI media sink might possibly take FLAC audio, but it seems to be impossible to create an audio only AVI.

Among other media sink there is however a media sink which can process FLAC: MPEG-4 File Sink. Again, despite the outdated documentation, the media sink takes FLAC input, so you should be able to create .MP4 files with FLAC audio track.

Sample file: 20180224-184012.mp4. “FLAC (framed)”

 

To sum it up:

  • FLAC encoder MFT is present in Windows 10 and is available for use; lacks proper documentation though
  • One needs to take care of conversion of input to compatible format (no direct support for 32-bit PCM)
  • It is possible to manage MFT directly and consume MFT output, then obtain FLAC bitstream
  • Alternatively, it is possible to use stock MP4 media sink to produce output with FLAC audio track
  • Alternatively, it is possible to develop a custom media sink and consume FLAC bitstream from upstream encoder connection

Potentially, the codec is compatible with Transcode API, however the restrictions above apply. The container type needs to be MFTranscodeContainerType_MPEG4 in particular.

The codec is apparently compatible with Media Session API, presumably it is good for use with Sink Writer API either.

In your code as you attempt to use Sink Writer API you should similarly either have MP4 output with input possibly converted to compatible format in your code (compatible PCM or compatible FLAC with encoder MFT managed on your side). Knowing that MP4 media sink overall is capable to create FLAC audio track you should be able to debug fine details in your code and fit the components to work together.

Bonus reading:

DirectShowSpy: Who sent EC_ERRORABORT once again

$
0
0

A few years ago the question was already asked: DirectShow Spy: Who Sent EC_ERRORABORT?. The spy already attempted to log a call stack in order to identify the sender of the message. However overtime the tracing capability stopped functioning. There were a few reasons and limitations of internal stack unwinder specifically resulted in inability to capture the information of interest.

It is still important once in a while to back trace the event sender, so now it is time to improve the logging.

Updated DirectShow Spy received an option to produce a minidump at the moment of Filter Graph Manager’s IMediaEventSink.Notify call. The advantage of minidump is that it can be expanded retroactively and depending on minidump type it is possible to capture sufficient amount of details and trace the event back to certain module and/or filter.

The image below displays an example of call stack captured by the minidump. For the purpose of demonstration I used EC_COMPLETE instead though without waiting for actual EC_ERRORABORT.

The call stack shows that event is sent by quartz.dll’s Video Mixing Renderer filter’s input pin as a part of end-of-stream notification handling, which in turn has a trace of video decoder and media file splitter higher on the call stack.

The minidump generation is available in both 32 and 64 bit versions of the spy.

To enable the minidump generation, one needs to add/set the following registry value (that is, set ErrorAbort MiniDump Mode = 2):

  • Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Alax.Info\Utility\DirectShowSpy
  • Key Name for 32-bit application in 64-bit Windows: HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy
  • Value Name: ErrorAbort MiniDump Mode
  • Value Type: REG_DWORD
  • Value: 0 Default (equal to 1), 1 Disabled, 2 Enabled

Then, to control the type of generated minidump file the following registry value can be used:

  • Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Alax.Info\Utility\DirectShowSpy\Debug
  • Key Name for 32-bit application in 64-bit Windows: HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy\Debug
  • Value Name: MiniDump Type
  • Value Type: REG_DWORD
  • Value: respectively to MINIDUMP_TYPE enumeration; default value of zero produces MiniDumpNormal minidump

Some useful minidump type flag combinations can be looked up here: What combination of MINIDUMP_TYPE enumeration values will give me the most ‘complete’ mini dump?, and specifically the value 0x1826 expands to the following:
MiniDumpWithFullMemory | MiniDumpWithFullMemoryInfo | MiniDumpWithHandleData | MiniDumpWithThreadInfo | MiniDumpWithUnloadedModules – this gives a “complete” output.

That is, to put it short, to enable .DMP creation on EC_ERRORABORT, the following registry script needs to be merged in addition to spy registration:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy]
"ErrorAbort MiniDump Mode"=dword:00000002

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Alax.Info\Utility\DirectShowSpy\Debug]
"MiniDump Type"=dword:00001826

DirectShowSpy.log file will also mention the minidump file name once it is generated.

Download links

Media Foundation API primitive styling of WinRT windows.mediaCodec

$
0
0

An interesting Media Foundation question on StackOverflow: Weird connection between CoInitializeSecurity() and Media Foundation Encoders

A developer has reasons to discard standard COM security and go RPC_C_IMP_LEVEL_ANONYMOUS. Surprisingly this has effect on Windows Media Foundation API in the part of available codecs (Media Foundation Transforms).

Sample code:

#include <iostream>

#include <winrt\base.h>
#include <mfapi.h>
#include <mftransform.h>

#pragma comment(lib, "mf.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfuuid.lib")

int main()
{
	// https://stackoverflow.com/questions/59368680/weird-connection-between-coinitializesecurity-and-media-foundation-encoders
	WINRT_VERIFY(SUCCEEDED(MFStartup(MF_VERSION, MFSTARTUP_FULL)));
	WINRT_VERIFY(SUCCEEDED(CoInitializeSecurity(nullptr, -1, nullptr, nullptr, RPC_C_AUTHN_LEVEL_DEFAULT, RPC_C_IMP_LEVEL_ANONYMOUS, nullptr, EOAC_NONE, nullptr)));
	//WINRT_VERIFY(SUCCEEDED(CoInitializeEx(nullptr, COINITBASE_MULTITHREADED)));
	IMFActivate** Activates = nullptr;
	UINT32 ActivateCount = 0;
	WINRT_VERIFY(SUCCEEDED(MFTEnumEx(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_ALL, nullptr, nullptr, &Activates, &ActivateCount)));
	WINRT_ASSERT(Activates || !ActivateCount);
	BOOL Found = FALSE;
	for(UINT32 Index = 0; Index < ActivateCount; Index++)
	{
		IMFActivate*& Activate = Activates[Index];
		WINRT_ASSERT(Activate);
		WCHAR* FriendlyName = nullptr;
		UINT32 FriendlyNameLength;
		WINRT_VERIFY(SUCCEEDED(Activate->GetAllocatedString(MFT_FRIENDLY_NAME_Attribute, &FriendlyName, &FriendlyNameLength)));
		std::wcout << FriendlyName << std::endl;
		if(wcscmp(FriendlyName, L"HEVCVideoExtensionEncoder") == 0)
			Found = TRUE;
		CoTaskMemFree(FriendlyName);
	}
	WINRT_ASSERT(Found);
	CoTaskMemFree(Activates);
	//CoUninitialize();
	WINRT_VERIFY(SUCCEEDED(MFShutdown()));
}

CoInitializeSecurity line in the code above removes the following from enumeration:

  • HEIFImageExtension
  • VP9VideoExtensionEncoder
  • HEVCVideoExtensionEncoder

See the pattern? Why?

Similarly to other and older APIs like DirectShow enumeration function combines uniformly interfaced primitives coming from multiple sources. In this case of Media Foundation, enumeration combines (and filters/orders when requested) traditional transforms registered with the API with information in system registry, limited scope transforms and also transforms coming from Windows Store.

For example, H.256/HEVC video decoder and encoder is added to the system as HEVC Video Extension Windows Store item.

<Identity ProcessorArchitecture="x64" Version="1.0.23254.0" Publisher="CN=Microsoft Corporation, O=Microsoft Corporation, L=Redmond, S=Washington, C=US" Name="Microsoft.HEVCVideoExtension"/>

Its video encoder, in turn, is:

        <uap4:Extension Category="windows.mediaCodec">
          <uap4:MediaCodec DisplayName="HEVCVideoExtensionEncoder" Description="This is a video codec" Category="videoEncoder" wincap3:ActivatableClassId="H265Encoder.CH265EncoderTransform" wincap3:Path="x86\mfh265enc.dll" wincap3:ProcessorArchitecture="x86">
            <uap4:MediaEncodingProperties>
              <uap4:InputTypes>
                <!-- NV12 -->
                <uap4:InputType SubType="3231564e-0000-0010-8000-00aa00389b71" />
                <!-- IYUV -->
                <uap4:InputType SubType="56555949-0000-0010-8000-00AA00389B71" />
                <!-- YV12 -->
                <uap4:InputType SubType="32315659-0000-0010-8000-00AA00389B71" />
                <!-- YUY2 -->
                <uap4:InputType SubType="32595559-0000-0010-8000-00AA00389B71" />
              </uap4:InputTypes>
              <uap4:OutputTypes>
                <!-- HEVC -->
                <uap4:OutputType SubType="43564548-0000-0010-8000-00AA00389B71" />
              </uap4:OutputTypes>
            </uap4:MediaEncodingProperties>
          </uap4:MediaCodec>
        </uap4:Extension>
      </Extensions>

Media Foundation API is capable to pick such mediaCodec from Windows Store applications and make it available to desktop applications. However, there in COM security in the middle of the glue layer. Media Foundation would create a special activation object (CMFWinrtInprocActivate) to implement IMFActivate for such WinRT codec and once activated, the transform acts as usual and as documented. These codecs are also dual interfaced and in addition to traditional Media Foundation interfaces like IMFTransform expose UWP interfaces like IMediaExtension and so they are consumable by UWP applications as well.

Viewing all 49 articles
Browse latest View live