Update "How ALVR works" wiki page (#1703)

This commit is contained in:
Riccardo Zaglia 2023-07-10 18:12:15 +02:00 committed by GitHub
parent e8422abd15
commit fc2c729bb7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 258 additions and 146 deletions

View File

@ -4,228 +4,340 @@ This document details some technologies used by ALVR.
If you have any doubt about what is (or isn't) written in here you can contact @zarik5, preferably on Discord.
**Note: At the time of writing, not all features listed here are implemented**
This document was last updated on June 27th 2023 and refers to the master branch.
## Table of contents
* Architecture
* The packaged application
* Programming languages
* Source code organization
* Logging and error management
* The event system
* Session and settings
* Procedural generation of code and UI
* The dashboard
* The user interface
* Driver communication
* Driver lifecycle
* The streaming pipeline: Overview
* Client-driver communication
* Discovery
* Streaming
* SteamVR driver
* Client and driver compositors
* Foveated rendering
* Color correction
* Video transcoding
* Audio
* Tracking and display timing
* Other streams
* Upcoming
* Phase sync
* Sliced encoding
## Architecture
### The built application
### The packaged application
ALVR is made of two applications: the server and client. The server is installed on the PC and the client is installed on the headset. While the client is a single APK, the server is made of three parts: the launcher, the driver and the dashboard. The launcher (`ALVR Launcher.exe`) is the single executable found at the root of the server app installation. The driver is located in `bin/win64/` and named `driver_alvr_server.dll`. The dashboard is a collection of files located in `dashboard/`.
ALVR is made of two applications: the streamer and client. The streamer can be installed on Windows and Linux, while the client is installed on Android VR headsets. The client communicates with the driver through TCP or UDP sockets.
The launcher sets up the PC environment and then opens SteamVR, which loads the ALVR driver. The driver is responsible for loading the dashboard and connecting to the client.
The client is a single unified APK, named `alvr_client_android.apk`. It is powered by OpenXR and it is compatible with Quest headsets, recent Pico headsets and HTC Focus 3 and XR Elite.
The streamer is made of two main parts: the dashboard and the driver (also known as server). The driver is dynamically loaded by SteamVR. This is the file structure on Windows:
* `bin/win64/`
* `driver_alvr_server.dll`: The main binary, responsible for client discovery and streaming. Loaded by SteamVR.
* `driver_alvr_server.pdb`: Debugging symbols
* `openvr_api.dll`: OpenVR SDK used for updating the chaperone.
* `vcruntime140_1.dll`: Windows SDK used by C++ code in the driver.
* `ALVR Dashboad.exe`: Dashboard binary used to change settings, manage clients, monitor statistics and do installation actions. It can launch SteamVR.
* `driver.vrdrivermanifest`: Auxiliary config file used by the driver.
At runtime, some other files are created:
* `session.json`: This contains unified configuration data used by ALVR, such as settings and client records.
* `session_log.txt`: Main log file. Each line is a json structure and represents an event generated by the driver. This gets cleared each time a client connects.
* `crash_log.txt`: Auxiliary log file. Same as `session_log.txt`, except only error logs are saved, and does not get cleared.
### Programming languages
ALVR is written in multiple languages: Rust, C, C++, Java, HTML, Javascript, HLSL, GLSL. C++ is the most present language in the codebase but Rust is the language that plays the most important role, as it is used as glue and more and more code is getting rewritten in Rust.
Rust is a system programming language focused on memory safety and ease of use. It is as performant as C++ but code written on it is less likely to be affected by bugs at runtime. A feature of Rust that is extensively used by ALVR is enums, that correspond to tagged unions in C++. Rust's enums are a data type that can store different kinds of data, but only one type can be accessed at a time. For example the type `Result` can contain either an `Ok` value or an `Err` value but not both. Together with pattern matching, this is the foundation of error management in Rust applications.
C++ and Java code in ALVR is legacy code inherited by the developer @polygraphene; it is almost unmaintained and it is getting replaced by Rust. HTML and Javascript are used to write the dashboard.
ALVR is written in multiple languages: Rust, C, C++, HLSL, GLSL. The main language used in the codebase is Rust, which is used for the dashboard, networking, video decoding and audio code. C and C++ are used for graphics, video encoding and SteamVR integration. HLSL is used for graphics shaders on the Windows driver, GLSL is used on the Linux driver and the client. Moving forward, more code will be rewritten from C/C++ to Rust and HLSL code will be moved to GLSL or WGSL.
Rust is a system programming language focused on memory safety and ease of use. It is as performant as C++ but Rust code is less likely to be affected by runtime bugs. The prime feature Rust feature used by ALVR is enums, that correspond to tagged unions in C++. Rust's enum is a data type that stores different kinds of data, but only one type can be accessed at a time. For example the type `Result` can contain either an `Ok` value or an `Err` value but not both. Together with pattern matching, this is the foundation of error management in Rust applications.
### Source code organization
* `alvr/`: This is where most of the code resides. Each subfolder is a Rust crate ("crate" means a code library or executable).
* `alvr/client/`: Crate that builds the client application. `alvr/client/android/` is the Android Studio project that builds the final APK.
* `alvr/common/`: Code shared by both client and server. It contains code for settings generation, networking, audio and logging.
* `alvr/launcher/`: This crate build the launcher executable.
* `alvr/server/`: This crate builds the driver DLL. `alvr/server/cpp/` contains the legacy code.
* `alvr/settings-schema/` and `alvr/settings-schema-derive/`: Utilities for settings code generation.
* `alvr/xtask/`: Build utilities. The code contained in this crate does not actually end up in the final ALVR applications.
* `server_release_template/`: Contains every file for ALVR server that does not require a build pass. This includes the dashboard.
* `wix/`: WIX project used to crate the ALVR installer on Windows.
ALVR code is hosted in a monorepo. This is an overview of the git tree:
* `.github/`: Contains scripts used by the GitHub CI.
* `alvr/`: Each subfolder is a Rust crate ("crate" means a code library or executable).
* `audio/`: Utility crate hosting audio related code shared by client and driver.
* `client_core/`: Platform agnostic code for the client. It is used as a Rust library for `alvr_client_openxr` and can also compiled to a C ABI shared library with a .h header for integration with other projects.
* `client_mock/`: Client mock implemented as a thin wrapper around `alvr_client_core`.
* `client_openxr/`: Client implementation using OpenXR, compiled to a APK binary.
* `common/`: Some common code shared by other crates. It contains code for versioning, logging, struct primitives, and OpenXR paths.
* `dashboard/`: The dashboard application.
* `events/`: Utility crate hosting code related to events.
* `filesystem/`: Utility crate hosting code for filesystem abstraction between Windows and Linux.
* `packets/`: Utility crate containing packet definitions for communication between client, driver and dashboard.
* `server/`: The driver shared library loaded by SteamVR.
* `server_io/`: Common functionality shared by dashboard and driver, for interaction with the host system. This allows dashboard and driver to work independently from each other.
* `session/`: Utility crate related to session file and data management.
* `sockets/`: Utility crate shared by client and driver with socket and protocol implementation.
* `vrcompositor_wrapper/`: Small script used on Linux to correctly load the ALVR Vulkan layer by SteamVR.
* `vulkan_layer/`: Vulkan WSI layer used on Linux to work around limitations of the OpenVR API on Linux. This is mostly patchwork and hopefully will be removed in the future.
* `xtask/`: Utility CLI hosting a variety of scripts for environment setting, building, and packaging ALVR. Should be called with `cargo xtask`.
* `resources/`: resources for the README.
* `wiki/`: Contains the source for the Github ALVR wiki. Changes are mirrored to the actual wiki once committed.
* `about.toml`: Controls what dependency licenses are allowed in the codebase, and helps with generating the licenses file in the packaged ALVR streamer.
* `Cargo.lock`: Contains versioning information about Rust dependencies used by ALVR.
* `Cargo.toml`: Defines the list of Rust crates contained in the repository, and hosts some other workspace-level Rust configuration.
## Logging and error management
In ALVR codebase, logging is split into interface and implementation. The interface is defined in `alvr/common/src/logging.rs`, the implementations are defined in `alvr/server/src/logging_backend.rs` and `alvr/client/src/logging_backend.rs`.
Logging is split into interface and implementation. The interface is defined in `alvr/common/src/logging.rs`, the implementations are defined in each binary crate as `logging_backend.rs`.
ALVR logging system is based on the crate [log](https://crates.io/crates/log). `log` is already very powerful on its own, since the macros `error!`, `warn!`, `info!`, `debug!` and `trace!` can collect messages, file and line number of the invocation. But I needed something more that can reduce boilerplate when doing error management (*Disclaimer: I know that there are tens of already established error management crates but I wanted to have something even more opinionated and custom fitted*).
ALVR logging system is based on the crate [log](https://crates.io/crates/log). `log` is already very powerful on its own, since its macros can collect messages, file and line number of the invocation.
ALVR defines some macros and functions to ease error management. The base type used for error management is `StrResult<T>` that is an alias for `Result<T, String>`. Read more about Rust's Result type [here](https://doc.rust-lang.org/std/result/).
`trace_err!` is a macro that takes as input a generic result and outputs and converts it into a `StrResult`. It does not support custom error messages and it should be used only to wrap `Result` types to convert them to `StrResult` when the result is actually not likely to return an error. This way we avoid calling `.unwrap()` that makes the program crash directly. In case of error, the `Err` type is converted to string and is prefixed with the current source code path and line number.
`trace_none!` works similarly to `trace_err!` but it accepts an `Option` as argument. `None` is mapped to `StrResult::Err()` with no converted error message (because there is none).
`fmt_e!` is a macro to create a `StrResult<T>` from a hand specified error message. The result will be always `Err`.
ALVR defines some structures, macros and functions to ease error management. The base type used for error management is `StrResult<T>` that is an alias for `Result<T, String>`. Read more about Rust's Result type [here](https://doc.rust-lang.org/std/result/).
When chaining `trace_err!` from one function to the other, a stack trace is formed. Unlike other error management crates, I can decide in which point in the stack to insert trace information to make error messages more concise.
There are many ways of logging in ALVR, each one for different use-cases. To make use of them you should add `use alvr_common::prelude::*` at the top of the Rust source file.
To show an error (if present) the function `show_err` is defined. It shows an error popup if supported by the OS (currently only on Windows) and the message is also forwarded to `error!`.
Other similar functions are defined: `show_e` shows an error unconditionally, `show_err_blocking` blocks the current thread until the popup is closed, `show_warn` opens a warning popup. More similar functions are in `alvr/common/src/logging.rs`.
* `error!()`, `warn!()`, `info!()`, `debug!()` (reexported macros from the `log` crate). Log is processed depending on the logging backend.
* `show_e()` and `show_w()` are used to log a string message, additionally showing a popup.
* `show_err()`, `show_warn()` work similarly to `show_e()` and `show_w()`, but they accept a `Result<>` and log only if the result is `Err()`.
* `fmt_e!()` adds tracing information to a message and produces a `Err()`, that can be returned.
* `err!()` and `enone!()` used respectively with .`.map_err()` and `.ok_or_else()`, to map a `Result` or `Option` to a `StrResult`, adding tracing information.
* Some other similarly named functions and macros with similar functionality
### The messaging system
### The event system
The communication between driver and the dashboard uses two methods. The dashboard can interrogate the server through an HTTP API. The server can notify the dashboard through logging. The server uses the function `log_id` to log a `LogId` instance (as JSON text). All log lines are sent to the dashboard though a websocket. The dashboard registers all log lines and searches for the log ID structures contained; the dashboard then reacts accordingly.
While log IDs can contain any (serializable) type of data, it is preferred to use them only as notifications. Any type of data needed by the dashboard that should be persistent is stored in the session structure (more on this later), and the dashboard can request it any time.
Events are messages used internally in the driver and sent to dashboard instances. Events are generated with `send_event()` and is implemented on top of the logging system.
## The launcher
This is the layout of `Event`, in JSON form
The launcher is the entry point for the server application. It first checks that SteamVR is installed and setup properly and then launches it.
The launcher requires `%LOCALAPPDATA%/openvr/` to contain a valid UTF-8 formatted file `openvrpaths.vrpath`. This file is crucial because it contains the path of the installation folder of SteamVR, the paths of the current registered drivers and the path of the Steam `config/` folder.
```json
{
"timestamp": "<timestamp>",
"event_type": {
"id": "<EventType>",
"content": { <depends on id> }
}
}
```
### The bootstrap lifecycle
Log is a special kind of event:
1. The launcher is opened. First `openvrpaths.vrpath` is checked to exist and to be valid.
2. From `openvrpaths.vrpath`, the list of registered drivers is obtained. If the current instance of ALVR is registered do nothing. Otherwise stash all driver paths to a file `alvr_drivers_paths_backup.txt` in `%TEMP%` and register the current ALVR path.
3. SteamVR is killed and then launched using the URI `steam://rungameid/250820`.
4. The launcher tries to GET `http://127.0.0.1:8082` until success.
5. The launcher closes itself.
6. Once the driver loads, `alvr_drivers_paths_backup.txt` is restored into `openvrpaths.vrpath`.
```json
{
"timestamp": "<timestamp>",
"event_type": {
"id": "Log",
"content": {
"severity": "Error or Warn or Info or Debug",
"content": "<the message>"
}
}
}
```
### Other launcher functions
The driver logs events in JSON form to `session.json`, one per line.
The launcher has the button `Reset drivers and retry` that attempts to fix the current ALVR installation. It works as follows:
Currently its use is limited, but eventually this will replace the current logging system, and logging will be built on top of the event system. The goal is to create a unified star-shaped network where each client and dashboard instance sends events to the server and the server broadcasts events to all other clients and dashboard instances. This should also unify the way the server communicates with clients and dashboards, making the dashboard just another client.
1. SteamVR is killed.
2. `openvrpaths.vrpath` is deleted and ALVR add-on is unblocked (in `steam/config/steamvr.vrsettings`).
3. SteamVR is launched and then killed again after a timeout. This is done to recreate the file `openvrpaths.vrpath`.
4. The current ALVR path is registered and SteamVR is launched again.
## Session and settings
The launcher can also be launched in "restart" mode, that is headless (no window is visible). This is invoked by the driver to bootstrap a SteamVR restart (since the driver cannot restart itself since it is a DLL loaded by SteamVR).
ALVR uses a unified configuration file, that is `session.json`. It is generated the first time ALVR is launched. This file contains the following top-level fields:
## Settings generation and data storage
* `"server_version"`: the current version of the streamer. It helps during a version upgrade.
* `"drivers_backup"`: temporary storage for SteamVR driver paths. Used by the dashboard.
* `"openvr_config"`: contains a list of settings that have been checked for a diff. It is used by C++ code inside the driver.
* `"client_connections"`: contains entries corresponding to known clients.
* `"session_settings"`: all ALVR settings, laid in a tree structure.
A common programming paradigm is to have a strict separation between UI and background logic. This generally helps with maintainability, but for settings management this becomes a burden, because for each change of the settings structure on the backend the UI must be manually updated. ALVR solves by heavily relying on code generation.
### Procedural generation of code and UI
### Code generation on the backend (Rust)
ALVR lays out settings in a tree-like structure, in a way that the code itself can efficiently make use of. Settings can contain variants (in `session.json` are specified in PascalCase), which represent mutually exclusive options.
On ALVR, settings are defined in one and only place, that is `alvr/common/src/data/settings.rs`. Rust structures and enums are used to construct a tree-like representation of the settings. Structs and enums are decorated with the derive macro `SettingsSchema` that deals with the backend side of the code generation.
While the hand-defined structs and enums represent the concrete realization of a particular settings configuration, `SettingsSchema` generates two other settings representations, namely the schema and the "default" representation (aka session settings).
The schema representation defines the structure and metadata of the settings (not the concrete values). While arrangement and position of the fields is inferred by the definition itself of the structures, the fields can also be decorated with metadata like `advanced`, `min`/`max`/`step`, `gui` type, etc. that is needed by the user interface.
The second generated representation is the "default" representation. This representation has a dual purpose: it is used to define the default values of the settings (used in turn by the schema generation step) and to store the settings values on disk (`session.json`).
But why not use the original hand-defined structures to store the settings on disk? This is because enums (that are tagged unions) creates branching.
The branching is a desired behavior. Take the `Controllers` setting in the Headset tab as an example. If you uncheck it it means you *now* don't care about any other settings related to controllers. If we store this on disk using the original settings representation, all modifications to the settings related to the controllers are lost, but *then* you may want to recover these settings.
To solve this problem, the default/session representation transforms every enum into a struct, where every branch becomes a field, so every branch coexist at once, even unused ones.
ALVR uses the macro `SettingsSchema` in the `settings-schema` crate to generate auxiliary code, ie a schema and the "default representation" of the settings. This is a crate created specifically for ALVR but can be used for other projects too.
### Code generation on the frontend (Javascript)
The schema is made of nested `SchemaNode`s that contain metadata. Some of the metadata is specified directly inside inline attributes in structures and enums.
One of the main jobs of the dashboard is to let the user interact with settings. The dashboard gets the schema from the driver and uses it to generate the user interface. The schema has every kind of data that the UI needs except for translations which are defined in `server_release_template/dashboard/js/app/nls`. This is because this type of metadata would obscure the original settings definition if it was defined inline, due to the large amount of text. The schema is also used to interpret the session data loaded from the server.
The "default representation" (the type names are generated by concatenating the structure/enum name with `Default`), are structures that can hold settings in a way no not lose information about unselected variants; enums are converted to structs and variants that hold a value are converted to fields. The main goal of this is to meet the user expectation of not losing nested configuration when changing some options. The default representation is exactly what is saved inside `session.json` in `"session_settings"`.
### The schema representation
Info about the various types of schema nodes can be found [here](https://github.com/zarik5/settings-schema-rs).
While the original structs and enums that define settings are named, the schema representation loses the type names; it is based on a single base enum `SchemaNode` that can be nested. `SchemaNode` defines the following variants:
The dashboard makes use of schema metadata and the default representation to generate the settings UI. The end result is that the settings UI layout closely matches the structures used internally in the code, and this helps understanding the inner workings of the code.
* `Section`: This is translated from `struct`s and struct-like `enum` variants data. It contains a list of named fields, that can be set to `advanced`. In the UI it is represented by a collapsible group of settings controls. The top level section is treated specially and it generates the tabs (Video, Audio, etc).
* `Choice`: This is translated from `enums`. Each variant can have one or zero childs. In the UI this is represented by a stateful button group. Only the active branch content is displayed.
* `Switch`: This is generated by the special struct `Switch`. This node type is used when a settings make sense to be "turned off", and it also had some associated specialized settings only when in the "on" state. In the UI this is similar to `Section` but has also a checkbox. In the future this should be graphically changed to a switch.
* `Boolean`: translated from `bool`.
* `Integer`/`Float`: Translated from integer and floating point type. They accept the metadata `min`, `max`, `step`, `gui`. `gui` can be either `textBox`, `upDown` and `slider`. Only certain combinations of `min`/`max`/`step`/`gui` is valid.
* `Text`: Translated from `String`. In the UI this is a simple textbox.
* `Array`: Translated from rust arrays. In the UI this is represented similarly to `Section`s, with the index as the field name. In the future this should be changed to look more like a table.
When upgrading ALVR, the session might have a slightly different layout, usually some settings might have been added/removed/moved/renamed. ALVR is able to handle this by doing an extrapolation process: it starts from the default session, and replace values taken from the old session file with the help of the settings schema.
There are also currently unused node types:
## The dashboard
* `Optional`: This is translated from `Option`. Similarly to `Switch`, this is generated from an enum that has one variant with data and one that doesn't. The reason behind the distinction is about the intention/meaning of the setting. Optional settings can either be "set" or "default". "Default" does not mean that the setting is set to a fixed default value, it means that ALVR can dynamically decide the value or let some other independent source decide the value, that ALVR might not even be aware of.
* `Vector` and `Dictionary`: Translated from `Vec<T>` and `Vec<(String, T)>` respectively. These types are unimplemented in the UI. They should represent a variable-sized collection of values.
The dashboard is the main way of interacting with ALVR. Functionality is organized in tabs.
### The session
### The User Interface
Settings (in the session settings representation) are stored inside `session.json`, together with other session data. The session structure is defined in `alvr/common/src/data/session.rs`. The session supports extrapolation, that is the recovery of data when the structure of `session.json` does not match the schema. This often happens during a server version update. The extrapolation is also used when the dashboard requests saving the settings, where the payload can be a preset, that is a deliberately truncated session file.
These are the main components:
## The connection lifecycle
TODO: Add screenshots
The code responsible for the connection lifecycle is located in `alvr/client/src/connection.rs` and `alvr/server/src/connection.rs`.
* Sidebar: is used to select the tab for the main content page.
* Connections tab: used to trust clients or add them manually specifying the IP
* Statistics tab: shows graphs for latency and FPS and a summary page
* Settings tab: settings page split between `Presets` and `All Settings`. `All Settings` are procedurally generated from a schema. `Presets` are controls that modify other settings.
* Installation tab: utilities for installation: setting firewall rules, registering the driver, launching the setup wizard.
* Logs tab: shows logs and events in a table.
* Debug tab: debugging actions.
* About tab: information about ALVR.
* Lower sidebar button: can be either "Launch SteamVR" or "Restart SteamVR", depending on the driver connection status
* Notification bar: shows log in a non-obstructive way.
The connection lifecycle can be divided into 3 steps: discovery, connection handshake and streaming.
### Driver communication
During multiple connection steps, the client behaves like a server and the server behaves like a client. This is because of the balance in responsibility of the two peers. The client becomes the portal though a PC, that can contain sensitive data. For this reason the server has to trust the client before initiating the connection.
The dashboard communicates with the driver in order to update its information and save configuration. This is done through a HTTP API, with base URL `http://localhost:8082`. These are the endpoints:
* `/api/dashboard-request`: This is the main URL used by the dashboard to send messages and data to the server. The body contains the specific type and body of the request.
* `/api/events`: This endpoint is upgraded to a websocket and is used for listening to events from the driver
* `/api/ping`: returns code 200 when the driver is alive.
The dashboard retains some functionality when the driver is not launched. It can manage settings, clients and perform installation actions, but clients cannot be discovered. Once The driver is launched all these actions are performed by the server, requested with the HTTP API. This mechanism ensures that there are no data races.
### Driver lifecycle
The dashboard is able to launch and restart SteamVR, in order to manage the driver's lifecycle.
The driver launch procedure is as follows:
* The driver is registered according to the "Driver launch action" setting, if needed. By default, current SteamVR drivers are unregistered and backed up inside `session.json`.
* On Linux, the vrcompositor wrapper is installed if needed
* SteamVR is launched.
Once the drivers shuts down, if there are backed up drivers, these are restored.
The driver restart procedure is as follows:
* The dashboard notifies the driver that it should be restarted.
* The driver sends a request for restart to the dashboard.
* The driver asks SteamVR to shutdown, never unregistering drivers.
* The dashboard waits for SteamVR to shutdown, otherwise killing it after a timeout.
* The dashboard relaunches SteamVR.
This might seem unnecessarily complicated. The reason for the first message round trip is to plug-in to the existing restarting system used by settings invalidation, which is invoked from the driver itself. The reason which the driver cannot be autonomous in restarting is because any auxiliary process spawned by the driver will block SteamVR shutdown or leave it in a zombie state.
## The streaming pipeline: Overview
The goal of ALVR is to bridge input and output of a PCVR application to a remote headset. In order to do this ALVR implements pipelines to handle input, video and audio. The tracking-video pipeline (as known as the motion-to-photon pipeline) is the most complex one and it can be summarized in the following steps:
* Poll tracking data on the client
* Send tracking to the driver
* Execute the PCVR game logic and render layers
* Compose layers into a frame
* Encode the video frame
* Send the encoded video frame to the client through the network
* Decode the video frame on the client
* Perform more compositor transformations
* Submit the frame to the VR runtime
* The runtime renders the frame during a vsync.
## Client-driver communication
ALVR uses a custom protocol for client-driver communication. ALVR supports UDP and TCP transports. USB connection is supported although not as a first class feature; you can read more about it [here](https://github.com/alvr-org/ALVR/wiki/ALVR-wired-setup-(ALVR-over-USB)).
### Discovery
ALVR discovery protocol has initial support for a cryptographic handshake but it is currently unused.
Usually the first step to establish a connection is discovery. When the server discovers a client it shows it in the "New clients" section in the Connection tab. The user can then trust the client and the connection is established.
When ALVR is launched for the first time on the headset, a hostname, certificate and secret are generated. The client then broadcasts its hostname, certificate and ALVR version (`ClientHandshakePacket`). The server has a looping task that listens for these packets and registers the client entry, saving hostname and certificate, if the client version is compatible.
If the client is visible and trusted on the server side, the connection handshake begins.
ALVR uses a UDP socket at 9943 for discovery. The client broadcasts a packet and waits for the driver to respond. It's the client that broadcasts and it's the driver that then asks for a connection: this is because of the balance in responsibility of the two peers. The client becomes the portal though a PC, that can contain sensitive data. For this reason the server has to trust the client before initiating the connection.
### Connection handshake
This is the layout of the discovery packet
The client listens for incoming TCP connections with the `ControlSocket` from the server. Once connected the client sends its headset specifications (`HeadsetInfoPacket`). The server then combines this data with the settings to create the configuration used for streaming (`ClientConfigPacket`) that is sent to the client. In particular, this last packet contains the dashboard URL, so the client can access the server dashboard. If this streaming configuration is found to invalidate the current ALVR OpenVR driver initialization settings (`OpenvrConfig` inside the session), SteamVR is restarted.
After this, if everything went right, the client discovery task is terminated, and after the server sends the control message `StartStream` the two peers are considered connected, but the procedure is not concluded. The next step is the setup of streams with `StreamSocket`.
| Prefix | Protocol ID | Hostname |
| :---------------: | :---------: | :------: |
| "ALVR" + 0x0 x 12 | 8 bytes | 32 bytes |
* The prefix is used to filter packets and ensure a packet is really sent by an ALVR client
* The protocol ID is a unique version identifier calculated from the semver version of the client. If the client version is *semver-compatible* with the streamer, the protocol ID will match.
* Hostname: the hostname is a unique identifier for a client. When a client is launched for the first time, an hostname is chosen and it persists for then successive launches. It is reset when the app is upgraded or downgraded.
The format of the packet can change between major versions, but the prefix must remain unchanged, and the protocol ID must be 8 bytes.
### Streaming
The streams created from `StreamSocket` (audio, video, tracking, etc) are encapsulated in async loops that are all awaited concurrently. One of these loops is the receiving end of the `ControlSocket`.
While streaming, the server only sends the control message `KeepAlive` periodically. The client can send `PlayspaceSync` (when the view is recentered), `RequestIDR` (in case of packet loss), and `KeepAlive`.
ALVR uses two sockets for streaming: the control socket and stream socket. Currently these are implemented with async code; there's a plan to move this back to sync code.
### Disconnection
The control socket uses the TCP transport; it is used to exchange small messages between client and server, ALVR requires TCP to ensure reliability.
When the control sockets encounters an error while sending or receiving a packet (for example with `KeepAlive`) the connection pipeline is interrupted and all looping tasks are canceled. A destructor callback (guard) is then run for objects or tasks that do not directly support canceling.
The stream socket can use UDP or TCP; it is used to send large packets and/or packets that do not require reliability, ALVR is robust to packet losses and packet reordering.
## The streaming socket
The specific packet format used over the network is not clearly defined since ALVR uses multiple abstraction layers to manipulate the data (bincode, tokio Length Delimited Coding). Furthermore, packets are broken up into shards to ensure they can support the MTU when using UDP.
`StreamSocket` is an abstraction layer over multiple network protocols. It currently supports UDP and TCP but it is designed to support also QUIC without a big API refactoring. `StreamSocket` API is inspired by the QUIC protocol, where multiple streams can be multiplexed on the same socket.
Since the amount of data streamed is large, the socket buffer size is increased both on the driver side and on the client.
Why not using one socket per stream? Regarding UDP, this does not have any particular advantage. The maximum transmission speed is still determined by the physical network controller and router. Regarding TCP, having multiple concurrent open sockets is even disadvantageous. TCP is a protocol that makes adjustments to the transmission speed depending on periodic network tests. Multiple TCP sockets can compete with each other for the available bandwidth, potentially resulting in unbalanced and unpredictable bandwidth between the sockets. Having one single multiplexed socket solves this by moving the bandwidth allocation problem to the application side.
## SteamVR driver
### Packet layout
The driver is the component responsible for most of the streamer functionality. It is implemented as a shared library loaded by SteamVR. It implements the [OpenVR API](https://github.com/ValveSoftware/openvr) in order to interface with SteamVR.
A packet is laid out as follows:
Using the OpenVR API, ALVR pushes tracking and button data to SteamVR using `vr::VRServerDriverHost()->TrackedDevicePoseUpdated()`. SteamVR then returns a rendered game frame with associated pose used for rendering. On Windows, frames are retrieved implementing the `IVRDriverDirectModeComponent` interface: SteamVR calls `IVRDriverDirectModeComponent::Present()`. On Linux this API doesn't work, and so ALVR uses a WSI Vulkan layer to intercept display driver calls done by vrcompositor. The pose associated to the frame is obtained from the vrcompositor execution stack with the help of libunwind.
| Stream ID | Packet index | Header | Raw buffer |
| :-------: | :----------: | :------: | :--------: |
| 1 byte | 8 bytes | variable | variable |
## Client and driver compositors
The packet index is relative to a single stream. It is used to detect packet loss.
Both header and raw buffer can have variable size, even from one packet to the other in the same stream. The header is serialized and deserialized using [bincode](https://github.com/servo/bincode) and so the header size can be obtained deterministically.
ALVR is essentially a bridge between PC and headset that transmits tracking, audio and video. But it also implements some additional features to improve image quality and streaming performance. To this goal, ALVR implements Fixed Foveated Rendering (FFR) and color correction.
### Throttling buffer
The client compositor is implemented in OpenGL, while on the server it's either implemented with DirectX 11 on Windows or Vulkan on Linux. There are plans to move all compositor code to the graphics abstraction layer [wgpu](https://github.com/gfx-rs/wgpu), mainly for unifying the codebase.
A throttling buffer is a traffic shaping tool to avoid packet bursts, that often lead to packet loss.
It's important to note that ALVR's compositors are separate from the headset runtime compositor and SteamVR compositors. The headset runtime compositor is part of the headset operative system and controls compositing between different apps and overlays, and prepares the image for display (with lens distortion correction, chroma aberration correction, mura and ghosting correction). On the driver side, on Windows ALVR takes responsibility for compositing layers returned by SteamVR. The only responsibility of SteamVR is converting the frame into a valid DXGI texture if the game uses OpenGL or Vulkan graphics. On Linux ALVR grabs Vulkan frames that are already composited by vrcompostor. This introduced additional challenges since vrcompositor implements async reprojection which disrupts our head tracking mechanism.
If the throttling buffer is enabled, the packets are fragmented/recombined into buffers of a predefined size. The size should be set according to the supported MTU of the current network configuration, to avoid undetected packet fragmentation at the IP layer.
### Foveated encoding
The current implementation is similar to the leaky bucket algorithm, but it uses some statistical machinery (`EventTiming` in fixed latency mode to 0) to dynamically determine the optimal time interval between packets such as the "bucket" does not overflow and the latency remains minimal.
Foveated rendering is a technique where frame images are individually compressed in a way that the human eye barely detects the compression. Particularly, the center of the image is kept at original resolution, and the rest is compressed. ALVR refers to foveated rendering as "Foveated encoding" to clarify its scope. In native standalone or PCVR apps, foveated rendering reduces the load on the GPU by rendering parts of the image ar lower resolution. In ALVR's case frames are still rendered at full resolution, but are then "encoded" (compressing the outskirts of the image) before actually encoding and transmitting them. The image is then reexpanded on the client side after decoding and before display.
## Event timing
Currently ALVR supports only fixed foveation, but support for tracked eye foveation is planned.
`EventTiming` is a general purpose mathematical tool used to manage timing for cyclical processes. Some "enqueue" and "dequeue" events are registered and `EventTiming` outputs some timing hints to minimize the queuing time for the next events.
In its history, ALVR implemented different algorithms for foveated encoding. The first one is "Warp", where the image is compressed in an elliptical pattern using the tangent function to define the compression ratio radially. A problem with algorithm is that it causes the image to become blurry. [Here](https://www.shadertoy.com/view/3l2GRR) is a demo of this algorithm in action. The second algorithm used was "Slices" where the image is sliced up into 9 sections (center, edges, corners), compressed to different degrees and the re-packed together into a single rectangular frame. The main issue with this algorithm was its complexity. You can find a demo [here](https://www.shadertoy.com/view/WddGz8). The current algorithm in use is reimplementation of Oculus AADT (Axis-Aligned Distorted Transfer), which simply compresses the lateral edges of the image horizontally and the vertical edged vertically. This algorithm has less compression power but it's much simpler and less taxing on the Quest's GPU.
Currently, `EventTiming` is used for the stream socket throttling buffer and audio implementations, but it will be also used for video frame timing (to reduce latency and jitter), total video latency estimation (to reduce the black pull and positional lag), controller timing and maybe also controller jitter.
### Color correction
`EventTiming` supports two operation modes: fixed latency and automatic latency.
Color correction is implemented on the server and adds simple brightness, contrast, saturation, gamma and sharpening controls. It's implemented on the server for performance reasons and to avoid amplifying image artifacts caused by transcoding.
### Fixed latency mode
## Video transcoding
In fixed latency mode, `EventTiming` calculates the average latency between corresponding enqueue and dequeue events.
To be able to send frames from driver to client through the network, they have to be compressed since current WiFi technology doesn't allow to send the amount of data of raw frames. Doing a quick conservative calculation, let's say we have 2 x 2048x2048 eye images, 3 color channels, 8 bits per channel, sent 72 times per second, that would amount to almost 15 Gbps.
Todo
ALVR uses h264 and HEVC video codecs for compression. These codecs are chosen since they have hardware decoding support on Android and generally hardware encoding support on the PC side. On Windows, the driver uses NvEnc for Nvidia GPUs and AMF for AMD GPUS; on Linux ALVR supports VAAPI, NvEnc and AMF through FFmpeg. In case the GPU doesn't support hardware encoding, on both Windows and Linux ALVR supports software encoding with x264 (through FFmpeg), although the performance is often insufficient for a smooth experience. The client supports only MediaCodec, which is the API to access hardware video codecs on Android.
### Automatic latency mode
Todo
## Motion-to-photon pipeline
Todo
## Foveated encoding
Foveated encoding is a technique where frame images are individually compressed in a way that the human eye barely detects the compression. Particularly, the center of the image is kept at original resolution, and the rest is compressed. In practice, first the frames are re-rendered on the server with the outskirts of the frame "squished". The image is then transmitted to the client and then it gets re-expanded by using an inverse procedure.
But why does this work? The human eye has increased acuity in the center of the field of vision (the fovea) with respect to the periphery.
Foveated encoding should not be confused with foveated rendering, where the image is rendered to begin with at a lower resolution in certain spots. Foveated encoding will NOT lower your GPU usage, only the network usage.
Currently ALVR does not directly support foveated encoding in the strict sense, instead it uses *fixed* foveated encoding. In a traditional foveated encoding application, the eyes are tracked, so that only what is directly looked at is rendered at higher resolution. But currently none of the headset supported by ALVR support eye tracking. For this reason, ALVR does foveated encoding by pretending the user is looking straight at the center of the image, which most of time is true.
Here are explained three foveated encoding algorithms.
### Warp
Developed by @zarik5. This algorithm applies an image compression that most adapts to the actual acuity graph of the human eye. It compresses the image radially (with an ellipse as the base) from a chosen spot in the image, with a chosen monotonic function. This algorithm makes heavy use of derivatives and inverse functions. It is implemented using a chain of shaders (shaders are a small piece of code that is run on the GPU for performance reasons). You can explore an interactive demo at [this link](https://www.shadertoy.com/view/3l2GRR).
This algorithm is actually NOT used by ALVR. It used to be, but it got replaced by the "slices" method. The warp method has a fatal flaw: the pixel alignment is not respected. This causes resampling that makes the image look blurry.
### Slices
Developed by @zarik5. This is the current algorithm used by ALVR for foveated encoding. The frame is cut into 9 rectangles (with 2 vertical and 2 horizontal cuts). Each rectangle is rendered at a different compression level. The center rectangle is uncompressed, the top/bottom/left/right rectangle is compressed 2x, the corner rectangles are compressed 4x. These cuts are actually virtual (mathematical) cuts, that are executed all at once in a single shader pass. All slices are neatly packed to form a new rectangular image. You can explore an interactive demo at [this link](https://www.shadertoy.com/view/WddGz8).
This algorithm is much simpler than the warp method but it is still quite complex. The implementation takes into account pixel alignment and uses some margins in the rectangles to avoid color bleeding. Like the warp algorithm, the slices method was designed to support eye tracking support when it will be available in consumer hardware.
### Axis-Aligned Distorted Transfer (AADT)
This algorithm was developed by Oculus for the Oculus Link implementation. It is simpler than the other two methods, the end result looks better but it has less compression power. Like the slices algorithm, the image is cut into 9 rectangles where each rectangle is compressed independently. But actually the top and bottom rectangles are compressed only vertically, and the left and right only horizontally. This type of compression lends itself well to be used for images rendered in VR headsets, since it works in the same direction (and not against) the image distortion needed for lens distortion correction.
It is planned to replace the slices method with AADT in the future.
h264 and HEVC codecs compression works on the assumption that consecutive frames are similar to each other. Each frame is reconstructed from past frames + some small additional data. For this reason, packet losses may cause glitches that persist many frames after the missing frame. When ALVR detects packet losses, it requests a new IDR frame from the encoder. A IDR frame is a packet that contains all the information to build a whole frame by itself; the encoder will ensure that successive frames will not rely on older frames than the last requested IDR.
## Audio
Todo
Game audio is captured on the PC and sent to the client, and microphone audio is captured on the client and sent to the PC. Windows and Linux implementation once again differ. On Windows, game audio is captured from a loopback device; microphone is is sent to virtual audio cable software to expose audio data from a (virtual) input device. On Linux the microphone does not work out-of-the-box, but there is a bash script available for creating and plugging into pipewire audio devices.
---------------------------
Document written by @zarik5
Unlike for video, audio is sent as a raw PCM waveform and new packets do not rely on old packets. But packet losses may still cause popping, which happens when there is a sudden jump in the waveform. To mitigate this, when ALVR detects a packet loss (or a buffer overflow or underflow) it will render a fadeout or cross-fade.
## Tracking and display timing
Handling head and controller tracking is tricky for VR applications, and even more for VR streaming applications.
In a normal native VR application, tracking is polled at the beginning of the rendering cycle, it is used to render the eye views from a certain perspective and render the controller or hand models. When the game finished rendering the frame it submits it to the VR runtime which will display it on screen. From the time tracking is polled and the frame is displayed on screen, 1 or more frame durations may have passed (for example at 72fps the frame duration is 13ms). Our eyes are very sensitive to latency, especially for orientation, so VR runtimes implement image reprojection (Oculus calls it Asynchronous Time Warp). Reprojection works by rendering the frame rotated in 3D to compensate for the difference in orientation between the tracking pose polled at the beginning of the rendering cycle and the actual pose of the headset at the time of vsync when the image should be pushed to the display. To be able to correctly rotate the image, the runtime will also need to know the timestamp used for polling tracking, which can be the time of poll, or better, the predicted time of the vsync. If a time in the future is used for tracking poll, the polled tracking will be extrapolated.
For VR streaming applications, the pipeline is similar, except that tracking is polled for a more distant point in the future, to compensate for the whole transcoding pipeline, and it's not trivial to decide on how much to predict in the future. ALVR calculates the prediction offset by reading how much time passes between the tracking poll time and the time a frame rendered with the same tracking is submitted. These interval samples are averaged and then used for future tracking polls. (To calculate the correct total latency you also need to add the VR runtime compositor latency, which in the dashboard latency graph is shown as "Client VSync").
On the streamer side, ALVR needs to workaround a OpenVR API limitation. SteamVR returns frames with its pose, but then ALVR is responsible of matching the pose with one of the poses submitted previously and re-match its timestamp.
## Other streams
There are some other kinds of data which can be streamed without requiring any special timing. These are button presses and haptics, respectively sent from client to driver and from driver to client.
## Upcoming
### Phase sync
Phase sync is not a single algorithm but many that share similar objectives, reducing latency or jitter in the rendering/streaming pipeline. The term "phase sync" comes from Oculus, that describes its algorithm for reducing latency in its OpenXR runtime by starting the rendering cycle as late las possible to reduce waiting time before the vsync.
In general, a phase sync algorithm is composed of two parts: a queue that holds data resources or pointers, and a statistical model to predict event times. The statistical model is fed with duration or other kinds of timing samples and as output it returns a refined time prediction for a recurring event. The statistical model could be simple and just aim for a average-controlled event, or more complex that aims for submitting for a deadline; the second case needs to take into account the variance of the timing samples. Unlike Oculus implementation, these statistical models can be highly configurable to tune the target mean or target variance.
There are a few phase sync algorithms planned to be implemented: frame submission timing (to reduce frame queueing on the client, controlled by shifting the phase of the driver rendering cycle), SteamVR tracking submission timing (to make sure SteamVR is using exactly the tracking sample we want) and tracking poll timing (to reduce queuing on the server side).
## Sliced encoding
Sliced encoding is another algorithm showcased by Oculus and it's about reducing latency by parallelizing work. In a simple streaming pipeline, frames are processed sequentially: rendering, then encoding, then transmission, then decoding. There is already some degree of parallelism, as rendering, encoding, transmission, and decoding can happen at the same time. Sliced encoding can help in reducing encoding and decoding time, as the frames are split into "slices". This allows for more efficient utilization of hardware encoders/decoders, or even use hardware and software codecs in parallel. It's crucial to note that network latency cannot be optimized. Given the constraint of network, sliced encoding can reduce waiting times between encoder/transmission and transmission/decoder as each encoded slice can be transmitted immediately and doesn't have to wait for the rest of the frame to be encoded (and a similar reasoning applies for the decoding side).