Progress Report 2017 September (#61)
* Add 2017september boilerplate * Add Candidate PRs * Add header template * Add GitHub links to PRs * Finish adding candidate PRs * Removed duplicate candidate PR * Write intro paragraph * Write outtro paragraphs * Quickfix: reorganize * Write #2951 * Write #2858 * Write #2858 * Quickfix: Add NOTE to #2858 * Write #2952 * Resort PRs * Add #2912 * Write #2952 * Write #2952 * Write #2952 * Write #2952 * Add morton-koopa.png * Write #2927 * Write #2927 * Write #2927 * Write #2951 * Write #2842 * Write #2958 * Quickfix: Morton Koopa image in #2951 * Shorten PR Names * Write #2915 * Write #2865 * Write #2900 * Write #2912 * Fix some FIXMEs * Write #2921 * Fix more FIXMEs * Apply fixes suggested by @Subv * Add fix suggested by @wwylele * Add fixes suggested by @acnleditor2 * Add Recordings citra-emu/citra#2517 * Added audio embeds for citra-emu/citra#2858 * Float morton-koopa.png to the right * Add fixes suggested by @Schplee * Add final paragraph by @Schplee * Add src according to citra-emu/citra-web#62 * Add callout from citra-emu/citra-web#63 * Add fixes suggested by @bunnei * Add banner by @bunnei * Set metadata to prepare for publish
This commit is contained in:
parent
50bae0443b
commit
a06121e1e5
|
@ -0,0 +1,361 @@
|
|||
+++
|
||||
date = "2017-11-09T09:46:00-04:00"
|
||||
title = "Citra Progress Report - 2017 September"
|
||||
tags = [ "progress-report" ]
|
||||
author = "anodium"
|
||||
forum = 5005
|
||||
+++
|
||||
|
||||
Winter arrives once more, and like I mentioned in [August's progress report](/entry/citra-progress-report-2017-august),
|
||||
I am extremely excited for what's in store. In fact, many of the really big goodies
|
||||
I've decided to seperate to their own articles, which should be coming up in the
|
||||
next few weeks.
|
||||
|
||||
There's also been many changes this month that improve the speed of emulation across
|
||||
the board, on top of the usual improvements in accuracy and features. And because
|
||||
of that, I've dubbed this month #Speedtember. Let's dive right in.
|
||||
|
||||
{{% callout %}}
|
||||
Hello everyone! We're all terribly sorry for the delay in getting this progress
|
||||
report out the door, but our main technical writer [anodium](https://github.com/anodium),
|
||||
was just a bit busy surviving both Hurricane Irma and Maria. Although she's a
|
||||
trooper and claims it's not an excuse for the delay, we find that her personal
|
||||
safety is a tad more important. We're all glad that she's safe and sound, and in
|
||||
a state where she can keep pumping out quality articles for Citra!
|
||||
{{% /callout %}}
|
||||
|
||||
## [Switchable Page Tables](https://github.com/citra-emu/citra/pull/2952) by [MerryMage](https://github.com/MerryMage)
|
||||
|
||||
Citra has a component called [dynarmic](https://github.com/MerryMage/dynarmic),
|
||||
which recompiles ARM11 code to x86-64 code at run time, and then executes that
|
||||
generated code, rather than interpreting the ARM11 instructions directly.
|
||||
|
||||
Because the 3DS has a 32 bit address bus, it can address 2^32 unique memory locations.
|
||||
And because the 3DS can address data down to a byte, it can address up to 2^32
|
||||
unique bytes, or about 4 gigabytes of memory. When considering that no 3DS has ever
|
||||
been released with more than 256 *mega*bytes of memory, this sounds absurd! And
|
||||
it is... unless you consider that a 3DS uses chunks of that huge address space to
|
||||
address peripherals, among other things. This is called memory-mapped input/output
|
||||
(MMIO), and is a great use of millions of addresses that would otherwise have
|
||||
been ignored, plus it also allows handling IO the exact same way memory is handled,
|
||||
so the design can be a bit simpler as it doesn't need special circuitry to handle IO.
|
||||
|
||||
Herein lies our problem. Because that code is now being run on a PC, those MMIO
|
||||
devices don't actually exist anymore, so Citra needs to handle those reads and
|
||||
writes itself. There's a few ways to go about it, but the simplest and most naïve
|
||||
is to replace every memory read or write with a function that checks if that address
|
||||
is mapped to memory or IO. Unfortunately, this is extremely slow, and we can't
|
||||
afford to have extremely slow address translation when games can access memory
|
||||
upwards of a few hundred thousand times per second.
|
||||
|
||||
With this, [MerryMage](https://github.com/MerryMage) has changed this behaviour so
|
||||
that rather than replacing a read/write with a function, it instead translates the
|
||||
address using a page table, and then tries to access that address directly. On the
|
||||
page table, all addresses that map to memory simply have a memory address written down.
|
||||
But on addresses that map to IO, it has address 0 written down. Trying to read or
|
||||
write to memory address 0 on x86 is illegal for every process except the
|
||||
operating system... and Citra tries to do it anyways!
|
||||
|
||||
When an invalid memory address (or a memory address that that process doesn't have
|
||||
permission to access) is read from or written to, x86 CPUs throw a page fault exception.
|
||||
Citra takes advantage of this behaviour by also registering an exception handler
|
||||
for page faults. If a page fault is thrown, Citra knows the game tried to access IO,
|
||||
and thus recompiles the memory read/write to a direct call to Citra's IO functions.
|
||||
This makes the usual case (memory access) extremely fast, and the less usual case
|
||||
slow, but only the first time it happens. Subsequent IO accesses use the recompiled
|
||||
functions which are faster.
|
||||
|
||||
This technique is called fastmem, and is not new at all. In fact, Dolphin uses
|
||||
it extensively in its JIT recompiler to speed up memory access as well. And thanks
|
||||
to [MerryMage](https://github.com/MerryMage)'s hard work, this same technique is
|
||||
now used extensively by Citra.
|
||||
|
||||
## [Give each process its own page table](https://github.com/citra-emu/citra/pull/2842) by [Subv](https://github.com/Subv)
|
||||
|
||||
In order to support running multiple processes at the same time, like your computer,
|
||||
Citra implements virtual memory, in which each process has its own page table.
|
||||
The page table represents a translation from the process' virtual addresses, to
|
||||
the 3DS' physical (or "real") addresses.
|
||||
|
||||
Before this, because Citra did not support multiple page tables, it also didn't
|
||||
support running multiple processes at once, such as a game and the software keyboard
|
||||
applet. Now, thanks to [Subv](https://github.com/Subv), Citra has an important
|
||||
building block in place.
|
||||
|
||||
## [Add support for loading application updates](https://github.com/citra-emu/citra/pull/2927) by [shinyquagsire23](https://github.com/shinyquagsire23)
|
||||
|
||||
Nintendo 3DS titles are contained within `*.app` files on the SD card or on the
|
||||
game cartridge, in the [NCCH container format](https://www.3dbrew.org/wiki/NCCH).
|
||||
This format is further divided into two formats, CXI and CFA, which stand for
|
||||
__C__TR e__X__ecutable __I__mage and __C__TR __F__ile __A__rchive, respectively.
|
||||
CXIs contain executable code, whereas CFAs cannot. CFAs usually accompany a CXI
|
||||
to provide other features such as the digital instruction manual, the Download Play
|
||||
child application, or in the case of game cartridges, system updates.
|
||||
|
||||
Both types of NCCH start with a header, and then followed by either an ExeFS
|
||||
image, a RomFS image, or both. The entire structure of an NCCH header may be best
|
||||
explained by a diagram:
|
||||
|
||||
{{< figure src="/images/entry/citra-progress-report-2017-september/ncch.png"
|
||||
title="Solid lines are required sections, dashed lines cannot be used in some cases, and dotted lines are optional sections." >}}
|
||||
|
||||
Now, games and applications need updates from time to time, and 3DSes handle these
|
||||
by installing the update as a seperate title from the base game. From that point
|
||||
on, whenever the user tries to launch the game, instead of loading the
|
||||
extended header (or [ExHeader](https://www.3dbrew.org/wiki/ExHeader) for short)
|
||||
and ExeFS image from the base game's NCCH, it replaces them with the update's
|
||||
ExHeader and ExeFS on launch. As for RomFS, the 3DS System Software will actually
|
||||
load both the base game's and the update's RomFS image, rather than replacing one
|
||||
with the other. Games are left to their own devices on how to handle these, and
|
||||
so the methods used per game can vary, though they usually just replace changed
|
||||
files, picking files from the base game RomFS if they haven't been modified.
|
||||
|
||||
Citra, before this PR, had the code for loading games and reading NCCH files all
|
||||
mixed into one big piece that fit in with everything else. With this patch,
|
||||
[shinyquagsire23](https://github.com/shinyquagsire23) has seperated the loader
|
||||
from the NCCH reader, allowing the loader to read multiple NCCHs at once. Additionally,
|
||||
whenever a game is loaded, the loader would also check if there is an update title
|
||||
installed on Citra's [virtual SD card](/wiki/user-directory/). If there is, it
|
||||
would replace the update ExHeader and ExeFS, and load the update RomFS as well.
|
||||
Just like a real console!
|
||||
|
||||
Most games worked out of the box with updates, and because they wrote the code
|
||||
with accuracy in mind, this very same PR has also laid part of the foundation
|
||||
needed to handle other features such as DLC support or even using real 3DS SD cards!
|
||||
Though, do note that we don't have any estimates on either those or any other
|
||||
features, as no one is actively working on either.
|
||||
|
||||
## [Implement geometry shader](https://github.com/citra-emu/citra/pull/2865) by [wwylele](https://github.com/wwylele)
|
||||
|
||||
The PICA200 GPU has a pipeline similar to [OpenGL's pipeline](https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview)
|
||||
for rendering 3D objects into a 2D display. I won't go through them all here,
|
||||
only the optional geometry shader step. Just after the vertex shader step, if
|
||||
enabled, all the vertices are processed by a shader kernel (which is a small
|
||||
program that runs directly on a GPU), taking as many vertices as the kernel
|
||||
wants as input, and outputting as many vertices as the kernel wants.
|
||||
|
||||
Because the kernel in the geometry shader is allowed as many inputs and outputs
|
||||
as it wants, it is significantly more powerful and flexible than the vertex shader,
|
||||
whose kernel is restricted to only one vertex at a time, both for input and output.
|
||||
But for that same reason, geometry shaders are much more complex to program, and
|
||||
so many games simply disable it. The games that do not disable it though, tend to
|
||||
use it very extensively, to the point of completely breaking graphics if it's not
|
||||
implemented.
|
||||
|
||||
Multiple uses have been found in the wild for geometry shaders, including but absolutely
|
||||
not limited to:
|
||||
|
||||
- Taking one vertex as input, and outputting a rectangle of vertices which can
|
||||
be textured with a sprite. Pokémon uses this extensively to render particles
|
||||
whenever a move is used. Monster Hunter takes it a step further and renders
|
||||
*all of its HUD and GUI* with this kernel.
|
||||
|
||||
- Taking a handful of vertices as input, and outputting even more vertices which
|
||||
are interpolations between the inputs, thus making the resulting mesh look smoother
|
||||
and less jagged when rendered.
|
||||
|
||||
At first glance, geometry shaders looked like an easy problem, since they use the
|
||||
same instruction set and format as vertex shaders, so a lot of the same code could
|
||||
be reused. At *second* glance, it turned out that configuring inputs and outputs
|
||||
for geometry shaders is much more complex than it is for vertex shaders.
|
||||
|
||||
There were actually three attempts to implement geometry shaders in Citra. The first
|
||||
was written by [ds84182](https://github.com/ds84182) about two years ago, only to
|
||||
be abandoned due to not knowing how the configuration of them was done. The second
|
||||
attempt was written by [JayFoxRox](https://github.com/JayFoxRox), but was also
|
||||
abandoned for the same reason.
|
||||
|
||||
But, after extensive research on geometry shaders was made by [fincs](https://github.com/fincs),
|
||||
the API was implemented in [ctrulib](https://github.com/smealum/ctrulib) and
|
||||
[citro3d](https://github.com/fincs/citro3d), and examples were written to demonstrate
|
||||
how to use it. Now that the community knew exactly how they worked, [wwylele](https://github.com/wwylele)
|
||||
picked up where [JayFoxRox](https://github.com/JayFoxRox) left off, cleaned up
|
||||
the code he wrote, and added the missing pieces.
|
||||
|
||||
After almost three years, and three different attempts to make it work, Citra now
|
||||
has a full, complete, and correct implementation of geometry shaders!
|
||||
|
||||
## [Implement custom clip plane](https://github.com/citra-emu/citra/pull/2900) by [wwylele](https://github.com/wwylele)
|
||||
|
||||
After the geometry shader (or the vertex shader, if it wasn't enabled), the vertices
|
||||
are "assembled" into a collection of triangles. After *that*, to make rendering
|
||||
more efficient, the triangles are then compared to 6 planes that make up the cube
|
||||
in which objects are actually visible by the camera. Any triangles outside of that
|
||||
cube are deleted, and any triangles that are partially inside the cube are split
|
||||
by the sides of the cube, and the resulting triangle outside of the cube is also
|
||||
deleted.
|
||||
|
||||
But the 3DS allows games to add a 7th plane whose position is fully customizable.
|
||||
Although no games are known to use this feature right now, it is indeed a feature
|
||||
of the 3DS' GPU. Because implementing it was fairly straightforward,
|
||||
[wwylele](https://github.com/wwylele) decided to just go ahead and implement it,
|
||||
in case someone decided to use it in the future.
|
||||
|
||||
## [Optimized Morton](https://github.com/citra-emu/citra/pull/2951) by [huwpascoe](https://github.com/huwpascoe)
|
||||
|
||||
Morton code is a function that interleaves multi-dimensional numbers into a one-dimensional
|
||||
number. Although it may seem like a very esoteric function, it's actually extremely
|
||||
useful in fields like linear algebra, databases, and what the 3DS uses it for:
|
||||
texture mapping.
|
||||
|
||||
Computers have an intermediate chunk of memory between RAM and the CPU called a
|
||||
cache. Caches are seperated into lines, each of which can hold one data item. GPUs
|
||||
also have a cache, also seperated into lines. Because they are seperated like this,
|
||||
if a texture is loaded into the cache, it would have to span multiple cache lines,
|
||||
or even not fit into the cache completely, thus making transformations on it slow,
|
||||
as it would have to load and store pieces of it from RAM multiple times.
|
||||
|
||||
To avoid this, GPUs can Morton encode textures so that two-dimensional manipulations
|
||||
are more likely to only need data already in the cache. Textures that have been
|
||||
Morton coded are usually referred to as swizzled or twiddled textures.
|
||||
|
||||
{{< figure src="/images/entry/citra-progress-report-2017-september/morton-koopa.png#floatright"
|
||||
title="Not this Morton!" >}}
|
||||
|
||||
In the function that Morton is implemented, there was a lookup table on Morton
|
||||
codes in the comments, and [huwpascoe](https://github.com/huwpascoe) thought it'd
|
||||
be best if we just use the lookup table directly. It worked just as well as before,
|
||||
but required less than a third of the math. Because this function is called so
|
||||
often during emulation (a rough estimate from them is about "millions of times a
|
||||
second"), this change although small, made very big changes in CPU performance.
|
||||
|
||||
## [Add draw for immediate and batch modes](https://github.com/citra-emu/citra/pull/2921) by [jroweboy](https://github.com/jroweboy)
|
||||
|
||||
The 3DS' GPU has two main modes for drawing to the screen, immediate and batch
|
||||
mode. In the former, the GPU takes and immediately draws every vertex as it is
|
||||
handed to it. In the latter, the GPU accepts vertices given to it, but doesn't
|
||||
actually bother drawing them until absolutely necessary, saving a bit of time
|
||||
from not having to go through the drawing procedure for every individual vertex.
|
||||
Although most games don't use immediate mode at all due to it being extremely
|
||||
slow, a handful do use it for a handful of visual effects, like New Super Mario
|
||||
Bros. 2.
|
||||
|
||||
About a year ago when the GPU code on Citra was rewritten, a handful of calls to
|
||||
the drawing routine were removed, as it was believed they were unnecessary. Turns
|
||||
out, one of the calls was actually needed for some effects in games, as it handled
|
||||
immediate mode drawing. This wasn't noticed for a very long time, as most games
|
||||
appeared to carry on with no side-effects at all from the rewrite, but was eventually
|
||||
found after some research courtesy of [ds84182](https://github.com/ds84182).
|
||||
|
||||
## [Interpolate audio samples on a frame-by-frame basis](https://github.com/citra-emu/citra/pull/2858) by [MerryMage](https://github.com/MerryMage)
|
||||
|
||||
When a 3DS game needs some sort of audio processing, they can access the 3DS' DSP,
|
||||
or __D__igital __S__ound __P__rocessor. It's another processor, alongside the ARM9
|
||||
and ARM11, that is given a firmware to run, which in turn is given a bunch of audio
|
||||
samples and parameters by the game. The DSP then plays back the buffer in chunks
|
||||
of about 5 milliseconds. Each one of these chunks is called an audio frame.
|
||||
|
||||
As of today, we don't know how the DSP exactly works, and we don't know how any
|
||||
of the firmwares exactly work. (Did I forget to mention earlier there's multiple
|
||||
versions of the firmware?) But we do know how to use it, and from there we can
|
||||
reimplement its behaviour directly in Citra. Which is exactly what [MerryMage](https://github.com/MerryMage)
|
||||
did back in June of 2016, which in turn brought [audio support for the first time](/entry/hle-audio-comes-to-citra/)
|
||||
in Citra.
|
||||
|
||||
This approach, although having the advantages of being easier to implement, easier
|
||||
to understand in code, and has a higher potential of being faster, it has the
|
||||
disadvantage that accuracy suffers significantly, especially when shortcuts are
|
||||
taken for the sake of speed. One of these shortcuts was in the audio interpolation,
|
||||
which is a way of inferring more audio samples from relatively very few existing
|
||||
samples.
|
||||
|
||||
On a real 3DS, games are allowed to interpolate different audio frames with
|
||||
different functions, even when in they're in the same buffer. On the other hand,
|
||||
Citra interpolated the entire buffer with one function as soon as it was loaded.
|
||||
This led to various effects and music in games to sound strange or inaccurate in
|
||||
some way.
|
||||
|
||||
One example of this is Deku Link's footsteps in *The Legend of Zelda: Majora's Mask 3D*.
|
||||
|
||||
Here's the output of a real 3DS console, for reference:
|
||||
|
||||
{{< audio src="/images/entry/citra-progress-report-2017-september/deku-hardware.ogg" >}}
|
||||
|
||||
And here's the output of Citra, before this was fixed:
|
||||
|
||||
{{< audio src="/images/entry/citra-progress-report-2017-september/deku-pre2858.ogg" >}}
|
||||
|
||||
Now that it's been fixed, his footsteps sound a lot better:
|
||||
|
||||
{{< audio src="/images/entry/citra-progress-report-2017-september/deku-post2858.ogg" >}}
|
||||
|
||||
Audio emulation in Citra is still somewhat inaccurate for now, though
|
||||
[MerryMage](https://github.com/MerryMage) is gradually working on fixing and
|
||||
improving it. Perhaps some day we may even be able to emulate the DSP firmware
|
||||
directly, which will be much more accurate than merely emulating its behaviour.
|
||||
|
||||
## [Use deque instead of vector for the audio buffer](https://github.com/citra-emu/citra/pull/2958) by [Subv](https://github.com/Subv)
|
||||
|
||||
Whenever the DSP consumes some frames from the audio buffer, Citra deletes them
|
||||
from it. This normally wouldn't pose any problems, but because the buffer was
|
||||
being stored as a vector, this led to some uneccessary operations. Namely, the
|
||||
C++ standard requires that all the data of a standard vector be in one contiguous
|
||||
block of memory. Because deleting frames from the buffer breaks this rule, Citra
|
||||
would automatically (1) allocate a new block of memory, (2) copy the entire buffer
|
||||
into that new block of memory, and (3) deallocate the old block of memory, thus
|
||||
deleting the old buffer.
|
||||
|
||||
These steps are huge waste of time, as Citra doesn't need to guarantee that the
|
||||
audio buffer is in one contiguous block. So [Subv](https://github.com/Subv) changed
|
||||
the type of the buffer from a vector to a deque, which is essentially a queue that
|
||||
you can remove data from both the beginning and end of it. Because the contiguity
|
||||
requirement doesn't exist in deques, Citra doesn't do the uneccessary copying,
|
||||
leading to huge speed boosts in audio bound titles like Super Mario 3D Land, and
|
||||
even the Home Menu. Now, both run significantly faster!
|
||||
|
||||
## [Add mingw64 compile support to appveyor](https://github.com/citra-emu/citra/pull/2912) by [jroweboy](https://github.com/jroweboy)
|
||||
|
||||
When a program is written in a high-level programming language, such as C++, Rust,
|
||||
or Go, before the program can be run on a machine, it must be translated or "compiled"
|
||||
to machine code. Although it is possible to do this translation by hand, it is
|
||||
usually extremely difficult to do so and very time consuming. So instead, we have
|
||||
a program called a compiler than can automatically do this translation for us.
|
||||
|
||||
This is also why a program compiled for an ARM machine cannot be run directly on
|
||||
an x86 machine, even when the source can work on either machine without issues.
|
||||
Instead this program must be translated, interpreted, or recompiled from source
|
||||
to x86. (In fact, this translation is exactly what [dynarmic](https://github.com/MerryMage/dynarmic)
|
||||
does to run code from a 3DS.)
|
||||
|
||||
Every statement in a program must have an exact, unambiguous definition of what
|
||||
it does (its semantics). But, in the same way that a statement that means one
|
||||
thing can be written many different ways, and different compilers can translate the
|
||||
same statement many different ways.
|
||||
|
||||
On Windows, there's two popular C++ compilers available as of today: MSVC++, which
|
||||
is the compiler Microsoft has written for Windows, and MINGW GCC, which is actually
|
||||
a port of the Linux `gcc` compiler to Windows. For better or worse, MINGW GCC
|
||||
optimizes Citra a little better than MSVC++, and so [jroweboy](https://github.com/jroweboy)
|
||||
has changed the Citra AppVeyor build script to add support for MINGW GCC as well
|
||||
as MSVC++. Do note that the MSVC++ builds are only available through GitHub, since
|
||||
they're only useful for debugging, and MINGW GCC builds are faster in most, if
|
||||
not all, cases, which is why the installer will only install those. This change
|
||||
also has closed the gap in performance the new Nightly builds had compared to
|
||||
the old Bleeding Edge builds.
|
||||
|
||||
## [Load different shared font depending on the region](https://github.com/citra-emu/citra/pull/2915) by [wwylele](https://github.com/wwylele)
|
||||
|
||||
Remember that last month [wwylele](https://github.com/wwylele) changed Citra so
|
||||
that instead of loading the shared font from a seperate file, it would
|
||||
[load it from the system archive](https://citra-emu.org/entry/citra-progress-report-2017-august/)?
|
||||
This builds on top of that behaviour. You see, a 3DS doesn't have a shared font,
|
||||
it has *four*. One contains glyphs for Latin script (for English, Spanish, Italian,
|
||||
French, etc.) and Japanese scripts, another contains glyphs for Traditional Chinese,
|
||||
the third font contains those for Simplified Chinese, and the last font contains
|
||||
the ones for Korean.
|
||||
|
||||
Before this PR, Citra would simply load the first shared font regardless of game
|
||||
or region. This made non-Latin or non-Japanese script games display completely
|
||||
incorrect characters at best, or crash at worst. Now Citra will load the appropriate
|
||||
shared font from the system archive depending on the region selected, just like
|
||||
a real console! Though, this will not work on machines that only have the
|
||||
`shared_font.bin` file, because it only contains the shared font for the region
|
||||
of the console it was dumped from. (e.g.: If you dump a Korean console, it'll
|
||||
only contain the Korean font.) If you want to use this feature, you must dump
|
||||
the system archive using the latest version of [`3dsutils`](https://github.com/citra-emu/3dsutils).
|
||||
|
||||
## Et. al.
|
||||
|
||||
And of course, big thanks to [everyone who's contributed](https://github.com/citra-emu/citra/graphs/contributors?from=2017-08-31&to=2017-09-30&type=c)
|
||||
this September, because Citra as a whole would not be the same without everyone
|
||||
involved having placed their pieces, big or small.
|
Binary file not shown.
After Width: | Height: | Size: 695 KiB |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
Binary file not shown.
After Width: | Height: | Size: 5.7 KiB |
Reference in New Issue