Quantcast
Channel: Windows API – Fooling Around
Viewing all 49 articles
Browse latest View live

Is my system a hybrid (switchable) graphics system?

$
0
0

There is a number of systems out there equipped with multiple GPUs which work cooperatively. The technology itself started from Integrated graphics processing units which brought a “free” GPU into system. Such system equipped with an additional discrete card obtained two GPUs at a time and from certain point it was a challenge to not just choose between the two but also run the two concurrently and utilize the capacity of both.

These GPUs are typically quite different, and there is a rational reason to prefer one to another in certain scenarios. Integrated graphics (iGPU) is typically slower and power consumption efficient, and discrete graphics (dGPU) is a powerful fully featured unit offering “performance over power saving” capabilities.

At certain point of development the seamless operation of two GPUs received a name of hybrid graphics.

By the original definition, “The discrete GPU is a render-only device, and no display outputs are connected to it.” and so if was the case for quite some time when systems like laptops were given two GPUs with an option to choose the GPU for an application to run on. The cooperative operation of the GPUs was as this: “when the discrete GPU is handling all the rendering duties, the final image output to the display is still handled by the Intel integrated graphics processor (IGP). In effect, the IGP is only being used as a simple display controller, resulting in a seamless, flicker-free experience with no need to reboot.

That is, the principal feature of hybrid graphics technology is to be able to transfer data between GPUs in efficient way so that computationally intensive rendering could happen on performance GPU with the results transferred to the other GPU which has physical wiring to a monitor.

We leverage this hardware capability in Rainway game streaming to offer seamless experience of low latency game streaming using any hardware encoder present in the system, not necessarily belonging to the piece of hardware whether video originates at.

Microsoft Windows operating system and its DirectX Graphics Infrastructure (DXGI) in particular stepped in to hide the details of switchable graphics from applications. Depending on settings, which can be defined per application, an application would see different enumeration order of adapters and operating system would either indicate a “true” adapter as a host of connected monitor, or it would otherwise indicate a different GPU transferring the rendering results between the GPUs behind the scene, such as during desktop composition process.

Side effects of seamless operation of GPUs and misreporting of GPU having monitor connection is that Desktop Duplication API cannot work with undescriptive error codes in certain cases (Error generated when Desktop Duplication API-capable application is run against discrete GPU) or Output Protection Manager API communication reports wrong security certificates.

Recent updates of Windows introduced GPU preference right in the OS settings:

Starting with Windows 10 build 17093, Microsoft is introducing a new Graphics settings page for Multi-GPU systems that allows you to manage the graphics performance preference of your apps. You may be familiar with similar graphics control panels from AMD and Nvidia, and you can continue to use those control panels. When you set an application preference in the Windows Graphics settings, that will take precedence over the other control panel settings.

However these ongoing updates actually extended the boundaries of hybrid system itself. If original hybrid system was defined as a system with primary iGPU with a monitor connected, and additional render-only secondary powerful dGPU, the recent version of Microsoft Windows can run multiple GPU systems with full featured discrete graphics adapter with monitor connected to it, and secondary iGPU being still a part of heterogeneous setup. In certain sense this update invalidated previous technical information and definitions that assumed that it is iGPU which as physical wiring to monitor in hybrid systems.

Even though there is a seemingly seamless operation of multiple GPUs, the GPUs still remain in master/slave relation: the operating system is responsible for composition of final image on the GPU with monitor (DXGI output) connection.

I developed a simple application (see download link at the bottom of the post) that discovers properties of hybrid systems and identifies the “main” GPU with output connection. The application is displaying the details on whether operating system can trick applications and report another GPU following GPU preference settings, and indicates “main” GPUs with an asterisk.

Here are some of the results:

NVIDIA Optimus laptop:

DXGI adapters:
   Intel(R) HD Graphics 520 (0.0x00010765)
     \.\DISPLAY1 [*]
   NVIDIA GeForce 940MX (0.0x00010A9D)
   Microsoft Basic Render Driver (0.0x00010A66)

Minimum power: Intel(R) HD Graphics 520 (0.0x00010765)
High performance: NVIDIA GeForce 940MX (0.0x00010A9D)

Hybrid: Output \.\DISPLAY1 is shared by DXGI adapters:
 - Minimal power adapter: Intel(R) HD Graphics 520 (0.0x00010765) [*]
 - High performance adapter: NVIDIA GeForce 940MX (0.0x00010A9D)  

AMD PowerXpress laptop:

DXGI adapters:
   Intel(R) HD Graphics Family (0.0x00009121)
     \.\DISPLAY1 [*]
   AMD Radeon HD 8850M (0.0x0000A57A)
   Microsoft Basic Render Driver (0.0x0000A4E6)

Minimum power: Intel(R) HD Graphics Family (0.0x00009121)
High performance: AMD Radeon HD 8850M (0.0x0000A57A)

Hybrid: Output \.\DISPLAY1 is shared by DXGI adapters:
 - Minimal power adapter: Intel(R) HD Graphics Family (0.0x00009121) [*]
 - High performance adapter: AMD Radeon HD 8850M (0.0x0000A57A)  

The two above are hybrid systems in their original definition.

Desktop system with discrete video adapter and two connected monitors, with Intel CPU and a “free” additional headless GPU. This acts similarly to traditional hybrid systems with the exception that GPUs changed their roles:

DXGI adapters:
   Radeon RX 570 Series (0.0x0000D18A)
     \.\DISPLAY4 [*]
     \.\DISPLAY5 [*]
   Intel(R) UHD Graphics 630 (0.0x8E94827B)
   Microsoft Basic Render Driver (0.0x0000D163)

Minimum power: Intel(R) UHD Graphics 630 (0.0x8E94827B)
High performance: Radeon RX 570 Series (0.0x0000D18A)

Hybrid: Output \.\DISPLAY4 is shared by DXGI adapters:
 - Minimal power adapter: Intel(R) UHD Graphics 630 (0.0x8E94827B) 
 - High performance adapter: Radeon RX 570 Series (0.0x0000D18A) [*]

Hybrid: Output \.\DISPLAY5 is shared by DXGI adapters:
 - Minimal power adapter: Intel(R) UHD Graphics 630 (0.0x8E94827B) 
 - High performance adapter: Radeon RX 570 Series (0.0x0000D18A) [*] 

Intel Hades Canyon NUC with Radeon Vega graphics is closer to desktop setup above than to hybrid mobile configurations:

DXGI adapters:
   Radeon RX Vega M GH Graphics (0.0x0000D41B)
     \.\DISPLAY1 [*]
   Intel(R) HD Graphics 630 (0.0x0000E000)
   Microsoft Basic Render Driver (0.0x0000DFDB)

Minimum power: Intel(R) HD Graphics 630 (0.0x0000E000)
High performance: Radeon RX Vega M GH Graphics (0.0x0000D41B)

Hybrid: Output \.\DISPLAY1 is shareable by DXGI adapters:
 - Minimal power adapter: Intel(R) HD Graphics 630 (0.0x0000E000) 
 - High performance adapter: Radeon RX Vega M GH Graphics (0.0x0000D41B) [*] 

Download links

Binaries:

  • 64-bit: DxgiHybrid.exe (in .7z archive)
  • License: This software is free to use


Hardware accelerated JPEG video decoder MFT from AMD

$
0
0

Video GPU vendors (AMD, Intel, NVIDIA) ship their hardware with drivers, which in turn provide hardware-assisted decoder for JPEG (also known as MJPG and MJPEG. and Motion JPEG) video in form-factor of a Media Foundation Transform (MFT).

JPEG is not included in DirectX Video Acceleration (DXVA) 2.0 specification, however hardware carries implementation for the decoder. A separate additional MFT is a natural way to provide OS integration.

AMD’s decoder is named “AMD MFT MJPEG Decoder” and looks weird from the start. It is marked as MFT_ENUM_FLAG_HARDWARE, which is good but this normally assumes that the MFT is also MFT_ENUM_FLAG_ASYNCMFT, but the MFT lacks the markup. AMD’s another decoder MFT “AMD D3D11 Hardware MFT Playback Decoder” has the same problem though.

Hardware MFTs must use the new asynchronous processing model…

Presumably the MFT has the behavior of normal asynchronous MFT, however as long as this markup does not have side effects with Microsoft’s software, AMD does not care for this confusion to others.

Furthermore, the registration information for this decoder suggests that it can handle decoding into MFVideoFormat_NV12 video format, and sadly it is again inaccurate promise. Despite the supposed claim, the capability is missing and Microsoft’s Video Processor MFT jumps in as needed to satisfy such format conversion.

These were just minor things, more or less easy to tolerate. However, a rule of thumb is that Media Foundation glue layer provided by technology partners such as GPU vendors is only satisfying minimal certification requirements, and beyond that it causes suffering and pain to anyone who wants to use it in real world scenarios.

AMD’s take on making developers feel miserable is the way how hardware-assisted JPEG decoding actually takes place.

The thread 0xc880 has exited with code 0 (0x0).
The thread 0x593c has exited with code 0 (0x0).
The thread 0xa10 has exited with code 0 (0x0).
The thread 0x92c4 has exited with code 0 (0x0).
The thread 0x9c14 has exited with code 0 (0x0).
The thread 0xa094 has exited with code 0 (0x0).
The thread 0x609c has exited with code 0 (0x0).
The thread 0x47f8 has exited with code 0 (0x0).
The thread 0xe1ec has exited with code 0 (0x0).
The thread 0x6cd4 has exited with code 0 (0x0).
The thread 0x21f4 has exited with code 0 (0x0).
The thread 0xd8f8 has exited with code 0 (0x0).
The thread 0xf80 has exited with code 0 (0x0).
The thread 0x8a90 has exited with code 0 (0x0).
The thread 0x103a4 has exited with code 0 (0x0).
The thread 0xa16c has exited with code 0 (0x0).
The thread 0x6754 has exited with code 0 (0x0).
The thread 0x9054 has exited with code 0 (0x0).
The thread 0x9fe4 has exited with code 0 (0x0).
The thread 0x12360 has exited with code 0 (0x0).
The thread 0x31f8 has exited with code 0 (0x0).
The thread 0x3214 has exited with code 0 (0x0).
The thread 0x7968 has exited with code 0 (0x0).
The thread 0xbe84 has exited with code 0 (0x0).
The thread 0x11720 has exited with code 0 (0x0).
The thread 0xde10 has exited with code 0 (0x0).
The thread 0x5848 has exited with code 0 (0x0).
The thread 0x107fc has exited with code 0 (0x0).
The thread 0x6e04 has exited with code 0 (0x0).
The thread 0x6e90 has exited with code 0 (0x0).
The thread 0x2b18 has exited with code 0 (0x0).
The thread 0xa8c0 has exited with code 0 (0x0).
The thread 0xbd08 has exited with code 0 (0x0).
The thread 0x1262c has exited with code 0 (0x0).
The thread 0x12140 has exited with code 0 (0x0).
The thread 0x8044 has exited with code 0 (0x0).
The thread 0x6208 has exited with code 0 (0x0).
The thread 0x83f8 has exited with code 0 (0x0).
The thread 0x10734 has exited with code 0 (0x0).

For whatever reason they create a thread for every processed video frame or close to this… Resource utilization and performance is affected respectively. Imagine you are processing a video feed from high frame rate camera? The decoder itself, including its AMF runtime overhead, decodes images in a millisecond or less but they spoiled it with absurd threading topped with other bugs.

However, AMD video cards still have the hardware implementation of the codec, and this capability is also exposed via their AMF SDK.

 AMFVideoDecoderUVD_MJPEG

 Acceleration Type: AMF_ACCEL_HARDWARE
 AMF_VIDEO_DECODER_CAP_NUM_OF_STREAMS: 16 

 CodecId    AMF_VARIANT_INT64   7
 DPBSize    AMF_VARIANT_INT64   1

 NumOfStreams    AMF_VARIANT_INT64   16

 Input
 Width Range: 32 - 7,680
 Height Range: 32 - 4,320
 Vertical Alignment: 32
 Format Count: 0
 Memory Type Count: 1
 Memory Type: AMF_MEMORY_HOST Native
 Interlace Support: 1 

 Output
 Width Range: 32 - 7,680
 Height Range: 32 - 4,320
 Vertical Alignment: 32
 Format Count: 4
 Format: AMF_SURFACE_YUY2 
 Format: AMF_SURFACE_NV12 Native
 Format: AMF_SURFACE_BGRA 
 Format: AMF_SURFACE_RGBA 
 Memory Type Count: 1
 Memory Type: AMF_MEMORY_DX11 Native
 Interlace Support: 1 

I guess they stop harassing developers once they switch from out of the box MFT to SDK interface into their decoder. “AMD MFT MJPEG Decoder” is highly likely just a wrapper over AMF interface, however my guess is that the problematic part is exactly the abandoned wrapper and not the core functionality.

Microsoft HEVCVideoExtension software H.265/HEVC encoder

$
0
0

The engineering quality of most recent Microsoft’s work around Media Foundation is terrible. It surely passes some internal tests to make sure that software items meet requirements of the use cases required for internal products, but published work gives impression that there is noone left to care about API offerings to wide audience.

One new example of this is how H.265/HEVC video encoder implemented by respective Windows Store extension in mfh265enc.dll works.

I have been putting the component into existing code base in order to extend it with reference software video encoding, now in H.265/HEVC format. Hence, the stock software encoder regardless of its performance and qualtiy metrics.

Encoder started giving nonsensical exceptions and errors, in particular rejecting obviously valid input. Sorting out a few things, I started seeing the MFT producing E_FAIL on the very first video frame it receives.

The suspected problem was (and there were not so many other things left) that output media type was set two times. Both calls were valid, with good arguments and before any payload processing. Second call supplied the same media type, all the same attributes EXACTLY. Both media type setting call were successful. The whole media type setting story did not produce any errors at the stage of handling streaming start messages.

Still the second call apparently ruined internal state because – and there can be no other explanation – of shitty quality of the MFT itself.

The code fragment that discards the second media type setting call at wrapping level gets the MFT back to processing. What can I say…

Virtual Camera API in Windows 11 (Build 22000)

$
0
0

There is a new API coming with Windows 11. Finally we will get well defined way to register virtual cameras (perhaps for applications built against Windows Media Foundation API, not DirectShow): MFCreateVirtualCamera.

Creates a virtual camera object which can be used by the caller to register, unregister, or remove the virtual camera from the system.

Frame server reference is a good sign and suggests that an application might be able to register its own implementation, then system wide service would act as a proxy and expose the implementation to video capture applciations built to work with cameras.

MFVirtualCameraType_SoftwareCameraSource “The virtual camera is a software camera source.”

There already is a sample on GitHub for this API: Windows-Camera/Samples/VirtualCamera at master · microsoft/Windows-Camera (github.com)

See also:

Some other interesting things are also coming, e.g. “virtual audio device that supports audio loopback based on a process ID instead of the device interface path of a physical audio device” (AUDIOCLIENT_ACTIVATION_TYPE_PROCESS_LOOPBACK and friends). We will be able to re-capture individual process audio, whcih is a cool new one, but keep patience: new stuff is scheduled for Windows 10 Build 20348.

DxgiTakeWindowSnapshot & Window Recording w/ Audio

$
0
0

I am sometimes using a rework of earlier DxgiTakeSnapshot application for one specific purpose mentioned below. In addition to Desktop Duplication API, recent version of Windows offer a similar (in sense of acquisition of external video content) API: Windows.Graphics.Capture (hereinafter “WGC”), and the new rework is using this API to capture visual content as a snapshot or, now, as a video stream.

I will skip the technical details and just link Robert Mikhayelyan’s Win32CaptureSample project on GitHub, which is perhaps the best place to ask questions about this new API.

The today’s application takes a snapshot of given monitor, window or process windows (each window separately) similar to original DxgiTakeSnapshot application, and this might be worth a separate post, but at this time I just wanted to mention an quick hack with -Video command line argument.

The hack uses the API along with taking audio currently played and records the audiovisual stream into MP4 file until you stop it with Control+Break.

For example, this way I recorded video below by recording browser window with YouTube playback. I started the app first to quickly identify HWND of interest.

C:\>DxgiTakeWindowSnapshot.exe -Process msedge.exe
DxgiTakeWindowSnapshot.exe 20210717.1-16-g34e6c97 (Release)
34e6c972dad5568347d44e58ab0338b7daa1dba7
HEAD -> video, origin/video
2021-11-20 21:07:46 +0200
--
Found process 023228 "C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
Found 2 windows for process 023228
Trying to capture window 0x0000000000A608E0, Filatov & Karas - ????? ? ?????? (Live @????????? ) - YouTube - Personal - Microsoft??? Edge
Trying to capture window 0x00000000016D066E, Filatov & Karas - Don???t Be So Shy (RuSongTV - Turkey) - YouTube and 3 more pages - Personal - Microsoft??? Edge
Found process 013944 "C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
Found 0 windows for process 013944

C:\>DxgiTakeWindowSnapshot.exe -Window 0x0000000000A608E0 -Video
DxgiTakeWindowSnapshot.exe 20210717.1-16-g34e6c97 (Release)
34e6c972dad5568347d44e58ab0338b7daa1dba7
HEAD -> video, origin/video
2021-11-20 21:07:46 +0200
--
Trying to capture window 0x0000000000A608E0, Filatov & Karas - ????? ? ?????? (Live @????????? ) - YouTube - Personal - Microsoft??? Edge
Stopping: CTRL_C_EVENT

(see next post on why video below is blocked here)

Original content: Filatov & Karas – ????? ? ?????? (Live @????????? )

The application is currently hardcoding produced video as 1920×1080@60 with 10 MBps bitrate, and is subject to certain limitations because of quick and hacky implementation:

  • as mentioned, 1920×1080@60 with 10 MBps video bitrate (& max documented bitrate for stock AAC audio encoder)
  • video might be letterboxed to preserve aspect ratio (default behavior of Media Foundation XVP)
  • compatible Media Foundation video encoder which is expected to be discoverable as default (hardware encoding is used as designed)
  • a pretty much regular audio device expected, certain not so usual settings like 5.1 audio as shared format are likely to cause exceptions
  • the application picks default multimedia audio device for recording (actual code line is auto const Device = DefaultAudioEndpoint(eRender, eMultimedia);)

So that one specific purpose the application is good for is meeting recording: record a window with Google Meet session, Microsoft Teams or alike and you have a convenient copy of content for review or share.

As of now the application is recording only audio playback device, but not the microphone – I will reach this some time later.

If you want something more convenient, there is also Robert’s Simple Screen Recorder in Windows Store, however this will be video only yet with source code on GitHub.

DxgiTakeWindowSnapshot uses Media Foundation in old-school way: it creates a Microsoft Media Foundation Media Session pipeline with a few custom primitives and some stock ones, esp. video encoder, video processor and audio encoder it takes all this stuff off to record media…

Download links

Binaries:

MFCreateFMPEG4MediaSink has no H.265/HEVC support

$
0
0

MFCreateFMPEG4MediaSink (CMPEG4MediaSink class) has no support for H.265/HEVC… MF_E_INVALIDMEDIATYPE

It should have been there. We’re not expecting Dolby AC-4 to be supported [yet], but H.265?

Demo: Direct3D 11 aware SuperResolution scaler based on AMD framework

$
0
0

A variant of previous CaptureEngineVideoCapture demo application which features AMD Advanced Media Framework SuperResolution scaler for video.

It is basically a live video camera application started in low resolution mode, and it enables you to switch between GPU (OpenCL probably?) implemented realtime upscaling modes.

AMD scaler is wrapped into Media Foundation Transform and is applied to Microsoft Media Foundation Capture Engine as a video effect.

Note that SuperResolution 1.1 mode assumes upscaling in range 1.1 to 2.0 only and might not work if you select video resolution & target scaling resolution outside of the range.


The scaler is fast and fully GPU backed, perfect for real time, however the effect is not that obvious. Still it’s easy to see yourselves, just run and that’s it. Maybe next time I will do side by side, and then also DNN backed Media Foundation Transform to possibly address more expressive video output.

Also obviously it would help to dynamically change output resolution to be 1:1 with window size… It is also for the next experiment.

Demo: Direct3D 11 aware SuperResolution scaler based on AMD framework (updated)

$
0
0

While on it, a quick update to the previous post/application:

  • added dynamic media type change support to upscaler effect MFT
  • the video is automatically upscaled to map 1:1 to effective pixels & window size, as you resize the application window
  • added high DPI awareness markup/support to the application
  • removed FSR 1.1 as it would fail if upscaling falls outside supported range
  • default resolution is changed to 1280×720 for better UX out of the box

Demo: Webcam with YOLOv4 object detection via Microsoft Media Foundation and DirectML

$
0
0

As we see announcements for more and more powerful NPUs and AI support in newer consumer hardware, here is a new demo with Artificial Intelligence attached to a good old sample camera application based on the Microsoft Media Foundation Capture Engine API.

This demo is a blend of several technologies: Media Foundation, Direct3D 11, Direct3D 12, DirectML (of Windows AI), and Direct2D, all at once!

The video data never leaves the video hardware. We have the video captured via Media Foundation, internally processed with the Windows Camera Frame Server service, and then shared through the Capture Engine API (this perhaps still does not really work as “shared access” but we are used to this). The data then is converted from Direct3D 11 to Direct3D 12 and, further, to a DirectML tensor. From there, we run the YOLOv4 model (see DirectML YOLOv4 sample) on the data while the video smoothly goes to preview presentation. As soon as the Machine Learning model produces predictions, we pass the data to a Direct2D overlay, which attaches a mask onto the video flow going to the presentation.

The compiled application’s only dependency is the yolov4.weights file (you need to download it separately and place in the same directory as the executable, there is an alternative download link); the rest is the Windows API and the application itself. As the work is handled by the GPU, you will see Task Manager showing GPU load close to 100% while CPU load is minimal.

The model is trained on 608 by 608 pixel images, and so is the model input and rescaling. That is, the resolution of the camera video does not matter much, except that the overlay mask is more accurate with higher resolution video—the overlay is burnt into the video stream itself. To show the recent progress in hardware capabilities, here are some new numbers:

  1. Intel(R) UHD Graphics 770 integrated in the Intel® Core™ i9-13900K CPU achieves around 2.5 fps for real-time video processing.
  2. AMD Radeon 780M Graphics integrated in the AMD Ryzen 9 7940HS CPU runs the processing up to four times faster, achieving around 8.0 fps.
  3. [2025-04-04 Update] NVIDIA GeForce RTX 2060 SUPER is capable to process video at around 24 fps.

Intel’s CPU is still the good one but its video is not so, and it’s GPU at work now. The AMD Ryzen 9 7940HS includes AMD’s dedicated AI XDNA Technology. The NPU performance is rated at up to 10 TOPS (Tera Operations Per Second), upcoming Copilot+ PCs are expected to have 40+ TOPS, so the new hardware should be a good fit for this class of applications.

Have fun!

Minimal Requirements:

  1. Windows 11 x64 — DirectML might run on Windows 10 as well, but for the simplicity of the demo it requires Windows 11; Windows 11 ARM64 is of certain interest as well, but I don’t have hardware at hand to check)
  2. DirectX 12 compatible video adapter, which is used across APIs — for the simplicity of the demo there is no option to offload to another GPU or NPU
  3. The .weights file downloaded and placed as mentioned, besides the application in the archive above
Viewing all 49 articles
Browse latest View live