* Fix stencil dirty flags tracking when stencil is disabled
* Attach stencil on clears (previously it only attached depth)
* Attach stencil on drawing regardless of stencil testing being enabled
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
GLSL decompiler type system was broken. We converted all return values
to float except for some cases where returning we couldn't and
implicitly broke the rule of returning floats (e.g. for bools or bool
pairs).
Instead of doing this introduce class Expression that knows what type a
return value has and when a consumer wants to use the string it asks for
it with a required type, emitting a runtime error if types are
incompatible.
This has the disadvantage that there's more C++ code, but we can emit
better GLSL code that's easier to read.
Volume is a f32 value. (SwIPC describes it as a u32, but it is actually f32 as corroborated by switchbrew docs and SetAudioDeviceOutputVolume)
```cpp
const f32 volume = rp.Pop<f32>();
```
If an unmapping operation fails, we shouldn't be decrementing the amount
of memory mapped and returning that the operation was successful. We
should actually be returning the error code in this case.
Avoids potentially expensive (depending on the size of the memory block)
allocations by reserving the necessary memory before performing both
insertions. This avoids scenarios where the second insert may cause a
reallocation to occur.
Avoids needing to read the same long sequence of code in both code
paths. Also makes it slightly nicer to read and debug, as the locals
will be able to be shown in the debugger.
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
We can simply enable CMAKE_AUTOUIC and let CMake take care of handling
the UI code generation for targets.
As part of letting CMake automatically handle the header file parsing,
we must not name includes with "ui_*" unless they're related to the
output of the Qt UIC compiler. Because of this, we need to rename
ui_settings, given it would conflict with this restriction.
This commit ensures that the host gpu is constantly fed with commands to
work with, while the guest gpu keeps producing the rest of the commands.
This reduces syncing time between host and guest gpu.
This commit fixes offsets on Linear -> Tiled copies, corrects z pos
fortiled->linear copies, corrects bytes_per_pixel calculation in tiled
-> linear copies and relaxes some limitations set by latest dma fixes
refactors.
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
Audio devices use the supplied revision information in order to
determine if USB audio output is able to be supported. In this case, we
can only really handle using this revision information in
ListAudioDeviceName(), where it checks if USB audio output is supported
before supplying it as a device name.
A few other scenarios exist where the revision info is checked, such as:
- Early exiting from SetAudioDeviceOutputVolume if USB audio is
attempted to be set when that device is unsupported.
- Early exiting and returning 0.0f in GetAudioDeviceOutputVolume when
USB output volume is queried and it's an unsupported device.
- Falling back to AHUB headphones in GetActiveAudioDeviceName when the
device type is USB output, but is unsupported based off the revision
info.
In order for these changes to also be implemented, a few other changes
to the interface need to be made.
Given we now properly handle everything about ListAudioDeviceName(), we
no longer need to describe it as a stubbed function.
The revision querying facilities are used by more than just audren. e.g.
audio devices can use this to test whether or not USB audio output is
supported.
This will be used within the following change.
AudioDevice and AudioInterface aren't valid device names on the Switch.
We should also be returning consistent names in
GetActiveAudioDeviceName().
While we're at it, we can also handle proper name output in
ListAudioDeviceName, by returning all the available devices on the
Switch.
This is the default behavior of the copy constructor, so it doesn't need
to be specified.
While we're at it we can make the other non-default constructor
explicit.
Prevents a truncation warning from occurring with MSVC. Also the
internal data structures already treat it as a size_t, so this is just a
discrepancy in the interface.
Textures can have different components types in different orders. This
assert was completely inprecise and the effectiveness of such is better
handled by case and within the texture cache.
Conditional Rendering takes care of conditionaly clearing or drawing
depending on a set of queries. This PR implements the query checks to
stablish if things can be rendered or not.
These are std::shared_ptr instances underneath the hood, which means
copying them isn't as cheap as a regular pointer. Particularly so on
weakly-ordered systems.
This avoids atomic reference count increments and decrements where they
aren't necessary for the core set of operations.
While changing this code, simplify tracking code to allow returning
the base address node, this way callers don't have to manually rebuild
it on each invocation.
Creating multiple "AudioRenderer" threads cause the previous thread to be overwritten. The thread will name be renamed to AudioRenderer-InstanceX, where X is the current instance number.
Provides a basic implementation of SetAutoSleepDisabled. Until idle
handling is implemented, this is about the best we can do.
In the meantime, provide a rough documenting of specifics that occur
when this function is called on actual hardware.
The JIT is mature enough that this setting can be removed, falling back
to Unicorn only on unsupported architectures. Any missing features from
Unicorn (of which there are extremely few), are mostly
developer-oriented, which most users don't care about.
Features should be coordinated with the JIT, not the interpreter,
anyhow.
This was initially necessary when AArch64 JIT emulation was in its
infancy and all memory-related instructions weren't implemented.
Given the JIT now has all of these facilities implemented, we can remove
these functions from the CPU interface.
Prior to PR, Yuzu did not restore memory to RW-
on unmap of mirrored memory or unloading of NRO.
(In fact, in the NRO case, the memory was unmapped
instead of reprotected to --- on Load, so it was
actually lost entirely...)
This PR addresses that, and restores memory to RW-
as it should.
This fixes a crash in Super Smash Bros when creating
a World of Light save for the first time, and possibly
other games/circumstances.
We don't have any friends implemented in Yuzu yet so it doesn't make sense to return any friends. For now we'll be returning 0 friends however the information provided will allow a proper implementation of this cmd when needed.
must_reconfigure isn't a parameter for this function any more, so it can
be replaced with current_state.
While we're at it, we can make the parameters of the declaration match
the same name as the ones in the definition.
This sets the DeviceMapped attribute for GPU-mapped memory blocks,
and prevents merging device mapped blocks. This prevents memory
mapped from the gpu from having its backing address changed by
block coalesce.
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
This implements svcMapPhysicalMemory/svcUnmapPhysicalMemory for Yuzu,
which can be used to map memory at a desired address by games since
3.0.0.
It also properly parses SystemResourceSize from NPDM, and makes
information available via svcGetInfo.
This is needed for games like Super Smash Bros. and Diablo 3 -- this
PR's implementation does not run into the "ASCII reads" issue mentioned
in the comments of #2626, which was caused by the following bugs in
Yuzu's memory management that this PR also addresses:
* Yuzu's memory coalescing does not properly merge blocks. This results
in a polluted address space/svcQueryMemory results that would be
impossible to replicate on hardware, which can lead to game code making
the wrong assumptions about memory layout.
* This implements better merging for AllocatedMemoryBlocks.
* Yuzu's implementation of svcMirrorMemory unprotected the entire
virtual memory range containing the range being mirrored. This could
lead to games attempting to map data at that unprotected
range/attempting to access that range after yuzu improperly unmapped
it.
* This PR fixes it by simply calling ReprotectRange instead of
Reprotect.
Prior to execution within a process beginning, the process establishes
its own TLS region for uses (as far as I can tell) related to exception
handling.
Now that TLS creation was decoupled from threads themselves, we can add
this behavior to our Process class. This is also good, as it allows us
to remove a stub within svcGetInfo, namely querying the address of that
region.
Previously, a translated string was being appended onto another string
in a manner that doesn't allow the translator to control where the
appended text is placed. This can be a nuisance for languages where
grammar and text ordering differs from English.
We now append the strings via the format strings themselves, which
allows translators to reorder where the text will be placed.
Instead of passing by copy an execution context through out the whole
Vulkan call hierarchy, use a command buffer view and fence view
approach.
This internally dereferences the command buffer or fence forcing the
user to be unable to use an outdated version of it on normal usage.
It is still possible to keep store an outdated if it is casted to
VKFence& or vk::CommandBuffer.
While changing this file, add an extra parameter for Flush and Finish to
allow releasing the fence from this calls.
Provides a more accurate name for the memory region and also
disambiguates between the map and new map regions of memory, making it
easier to understand.
Handles the placement of the stack a little nicer compared to the
previous code, which was off in a few ways. e.g.
The stack (new map) region, shouldn't be the width of the entire address
space if the size of the region calculation ends up being zero. It
should be placed at the same location as the TLS IO region and also have
the same size.
In the event the TLS IO region contains a size of zero, we should also
be doing the same thing. This fixes our memory layout a little bit and
also resolves some cases where assertions can trigger due to the memory
layout being incorrect.
Taking the json instance as a constant reference, makes all moves into
the parameter non-functional, resulting in copies. Taking it by value
allows moves to function.
A normal user shouldn't change this, as it will slow down the emulation and can lead to bugs or crashes. The renaming is done in order to prevent users from leaving this on without a way to turn it off from the UI.
Extracts out all of the thread local storage management from thread
instances themselves and makes the owning process handle the management
of the memory. This brings the memory management slightly more in line
with how the kernel handles these allocations.
Furthermore, this also makes the TLS page management a little more
readable compared to the lingering implementation that was carried over
from Citra.
This will be necessary for making our TLS slot management slightly more
straightforward. This can also be utilized for other purposes in the
future.
We can implement the existing simpler overload in terms of this one
anyways, we just pass the beginning and end of the ASLR region as the
boundaries.
We shouldn't be incrementing if wave buffers are empty. They are considered invalid/unused wave buffers.
This fixes the issue of certain sounds looping when they shouldn't
The event should only be signaled when an output audio device gets changed. Example, Speaker to USB headset. We don't identify different devices internally yet so there's no need to signal the event yet.
DeltaFragments are only used to download and apply partial patches on a real console, and are not useful to us at all. Most patch NSPs do not include them, and when they do, it's a waste of space to install them.
StartLrAssignmentMode and StopLrAssignmentMode don't require any implementation as it's just used for showing the screen of changing the controller orientation if the user wishes to do so. Ever since #1634 this has not been needed as users can specify the controller orientation from the config and swap at any time. We store a private member just in case this gets used for anything extra in the future
InitializeApplicationInfoRestricted will need further implementation as it's checking for other user requirements about the game. As we're emulating, we're assuming the user owns the game so we skip these checks currently, implementation will need to be added further on
This PR attempts to implement the shared memory provided by GetSharedMemoryNativeHandle. There is still more work to be done however that requires a rehaul of the current time module to handle clock contexts. This PR is mainly to get the basic functionality of the SharedMemory working and allow the use of addition to it whilst things get improved on.
Things to note:
Memory Barriers are used in the SharedMemory and a better solution would need to be done to implement this. Currently in this PR I’m faking the memory barriers as everything is sync and single threaded. They work by incrementing the counter and just populate the two data slots. On data reading, it will read the last added data.
Specific values in the shared memory would need to be updated periodically. This isn't included in this PR since we don't actively do this yet. In a later PR when time is refactored this should be done.
Finally, as we don't handle clock contexts. When time is refactored, we will need to update the shared memory for specific contexts. This PR does this already however since the contexts are all identical and not separated. We're just updating the same values for each context which in this case is empty.
Tiime:SetStandardUserSystemClockAutomaticCorrectionEnabled, Time:IsStandardUserSystemClockAutomaticCorrectionEnabled are also partially implemented in this PR. The reason the implementation is partial is because once again, a lack of clock contexts. This will be improved on in a future PR.
This PR closes issue #2556
This is more representative of what actually occurs, as web does support remote URLs which wouldn't need a romfs callback. This paves for easy future support of this with a call like 'OpenPageRemote' or similar.
This makes conflicts between non compress and compress textures to be
auto recycled. It also limits the amount of mipmaps a texture can have
if it goes above it's limit.
Instead of storing all block width, height and depths in their shifted
form:
block_width = 1U << block_shift;
Store them like they are provided by the emulated hardware (their
block_shift form). This way we can avoid doing the costly
Common::AlignUp operation to align texture sizes and drop CPU integer
divisions with bitwise logic (defined in Common::AlignBits).
Due to our current infrastructure, it is possible for a mipmap to be set
on as a render target before a texception of that mipmap's superset be
set afterwards. This is problematic as we rely on texture views to set
up texceptions and protecting render targets targets for 3D texture
rendering.
One simple solution is to configure framebuffers after texture setup but
this brings other problems. This solution, forces a reconfiguration of
the framebuffers after such event happens.
Even though it has been proven that IAudioRenderer:SystemEvent is
actually an automatic event. The current implementation of such event is
all thought to be manual. Thus it's implementation needs to be corrected
when doing such change. As it is right now this PR introduced a series
of regressions on softlocks on multiple games. Therefore, this pr
reverts such change until a correct implementation is made.
These source files have been unused for the entire lifecycle of the
project. They're a hold-over from Citra and only add to the build time
of the project, so they can be removed.
There's also likely no way this would ever work in yuzu in its current
form without revamping quite a bit of it, given how different the GPU on
the Switch is compared to the 3DS.
The old implementation had faulty Threadsafe methods where events could
be missing. This implementation unifies unsafe/safe methods and makes
core timing thread safe overall.
IPC-100 was changed to InitializeApplicationInfoOld instead of InitializeApplicationInfo. IPC-150 makes an indentical call to IPC-100 however does extra processing. They should not have the same name as it's quite confusing to debug.
Avoids potentially performing multiple reallocations (depending on the
size of the input data) by reserving the necessary amount of memory
ahead of time.
This is trivially doable, so there's no harm in it.
These can be generified together by using a concept type to designate
them. This also has the benefit of not making copies of potentially very
large arrays.
This is performing more work than would otherwise be necessary during
VMManager's destruction. All we actually want to occur in this scenario
is for any allocated memory to be freed, which will happen automatically
as the VMManager instance goes out of scope.
Anything else being done is simply unnecessary work.
This test is intended to be invalid GLSL, but it was being invalid in
two points instead of one. The intention is to use a non-immediate
parameter in a textureOffset like function.
The problem is that this shader was being compiled as a separable
shader object and the text was writting to gl_Position without a
redeclaration, being invalid GLSL.
Address that issue by using a user-defined output attribute.
Given we don't currently implement the personal heap yet, the existing
memory querying functions are essentially doing what the memory querying
types introduced in 6.0.0 do.
So, we can build the necessary machinery over the top of those and just
use them as part of info types.
Hardware testing revealed that SSY and PBK push to a different stack,
allowing code like this:
SSY label1;
PBK label2;
SYNC;
label1: PBK;
label2: EXIT;
Analysis passes do not have a good reason to depend on shader_ir.h to
work on top of nodes. This splits node-related declarations to their own
file and leaves the IR in shader_ir.h
To prepare for translation support, this makes all of the widgets
cognizant of the language change event that occurs whenever
installTranslator() is called and automatically retranslates their text
where necessary.
This is important as calling the backing UI's retranslateUi() is often
not enough, particularly in cases where we add our own strings that
aren't controlled by it. In that case we need to manually refresh the
strings ourselves.
Instead of having a vector of unique_ptr stored in a vector and
returning star pointers to this, use shared_ptr. While changing
initialization code, move it to a separate file when possible.
This is a first step to allow code analysis and node generation beyond
the ShaderIR class.
Enforces the use of the proper URL resolution functions. e.g.
url = some_local_path_string;
should actually be:
url = QUrl::fromLocalPath(some_local_path_string);
etc.
This makes it harder to cause bugs when operating with both strings and
URLs at the same time.
Other overloads of start() are considerably much safer to use if we ever
need this in the future and need to pass arguments to the program, given
it contains separate parameters for the program path and the arguments
themselves, whereas this unsafe overload contains both as a single
string.
Given the alternatives are much safer, we can disable this.
If this path was ever taken, a runtime exception would occur due to the
lack of a formatting specifier to insert the error code into the format
string.
Its prototype declared at the top of the translation unit contains the
static qualifier, so the function itself should also contain it to make
it a proper internally linked function.
The deleter can just be set in the constructor and maintained throughout
the lifetime of the object.
If a contained pointer is null, then the deleter won't execute, so this
is safe to do. We don't need to swap it out with a version of a deleter
that does nothing.
We can make this message more meaningful by indicating the location the
screenshot has been saved to. We can also log out whenever a screenshot
could not be saved (e.g. due to filesystem permissions or some other
reason).
Treating it as a u16 can result in a sign-conversion warning when
performing arithmetic with it, as u16 promotes to an int when aritmetic
is performed on it, not unsigned int.
This also makes the interface more uniform, as the layout interface now
operates on u32 across the board.
We can just pass a pointer to GMainWindow directly and make it a
requirement of the interface. This makes the interface a little safer,
since this would technically otherwise allow any random QWidget to be
the parent of a render window, downcasting it to GMainWindow (which is
undefined behavior).
"position" was being written but not read anywhere besides geometry
shaders, where it had the same value as gl_Position.
This commit replaces "position" with gl_Position, reducing the
complexity of our code and the emitted GLSL code.
Allows for things such as:
auto rect = Common::Rectangle{0, 0, 0, 0};
as opposed to being required to explicitly write out the underlying
type, such as:
auto rect = Common::Rectangle<int>{0, 0, 0, 0};
The only requirement for the deduction is that all constructor arguments
be the same type.
Stays consistent in our code with using Qt's provided mechanisms, and
also properly handles Unicode paths (which file streams on Windows don't
do very well).
Qt uses a signed value to represent indices. We should follow this
convention where applicable to avoid unnecessary sign-conversion
warnings, as well as making it easier to interoperate with other aspects
of Qt.
While we're at it, we can also make a sign-conversion explicit.
Makes the dependency explicit in the TelemetrySession's interface
instead of making it a hidden dependency.
This also revealed a hidden issue with the way the telemetry session was
being initialized. It was attempting to retrieve the app loader and log
out title-specific information. However, this isn't always guaranteed to
be possible.
During the initialization phase, everything is being constructed. It
doesn't mean an actual title has been selected. This is what the Load()
function is for. This potentially results in dead code paths involving
the app loader. Instead, we explicitly add this information when we know
the app loader instance is available.
Fix missing OpSelectionMerge instruction. This caused devices loses on
most hardware, Intel didn't care.
Fix [-1;1] -> [0;1] depth conversions.
Conditionally use VK_EXT_scalar_block_layout. This allows us to use
non-std140 layouts on UBOs.
Update external Vulkan headers.
Keeps track of native ASTC support, VK_EXT_scalar_block_layout
availability and SSBO range.
Check for independentBlend and vertexPipelineStorageAndAtomics as a
required feature. Always enable it.
Use vk::to_string format to log Vulkan enums.
Style changes.
critical() is intended for critical/fatal errors that threaten the
overall stability of an application. A user entering a conflicting key
sequence is neither of those.
1. This is something that should be solely emitted by the hotkey dialog
itself
2. This is functionally unused, given there's nothing listening for the
signal.
The previous code was all "smushed" together wasn't really grouped
together that well.
This spaces things out and separates them by relation to one another,
making it easier to visually parse the individual sections of code that
make up the constructor.
Uses a std::string_view instead of a std::string, given the pointed to
string isn't modified and is only used in a formatting operation.
This is nice because a few usages directly supply a string literal to
the function, allowing these usages to otherwise not heap allocate,
unlike the std::string overloads.
While we're at it, we can combine the address formatting into a single
formatting call.
A checkbox is able to be tri-state, giving it three possible activity
types, so in the connect call here, it would actually be truncating an
int into a bool.
Instead, we can just listen on the toggled() signal, which passes along
a bool, not an int.
The following code is broken on AMD's proprietary GLSL compiler:
```glsl
uint idx = ...;
vec4 values = ...;
float some_value = values[idx & 3];
```
It index the wrong components, to fix this the following pessimized code
is emitted when that bug is present:
```glsl
uint idx = ...;
vec4 values = ...;
float some_value;
if ((idx & 3) == 0) some_value = values.x;
if ((idx & 3) == 1) some_value = values.y;
if ((idx & 3) == 2) some_value = values.z;
if ((idx & 3) == 3) some_value = values.w;
```
Component indexing on AMD's proprietary driver is broken. This commit adds
a test to detect when we are on a driver that can't successfully manage
component indexing.
It dispatches a dummy draw with just one vertex shader that writes to an
indexed SSBO from the GPU with data sent through uniforms, it then reads
that data from the CPU and compares the expected output.
nullptr was being returned in the error case, which, at a glance may
seem perfectly OK... until you realize that std::string has the
invariant that it may not be constructed from a null pointer. This
means that if this error case was ever hit, then the application would
most likely crash from a thrown exception in std::string's constructor.
Instead, we can change the function to return an optional value,
indicating if a failure occurred.
Makes the parameter ordering consistent, and also makes the filename
parameter a std::string. A std::string would be constructed anyways with
the previous code, as IOFile's only constructor with a filepath is one
taking a std::string.
We can also make WriteStringToFile's string parameter utilize a
std::string_view for the string, making use of our previous changes to
IOFile.
We don't need to force the usage of a std::string here, and can instead
use a std::string_view, which allows writing out other forms of strings
(e.g. C-style strings) without any unnecessary heap allocations.
This allows for forming comment nodes without making unnecessary copies
of the std::string instance.
e.g. previously:
Comment(fmt::format("Base address is c[0x{:x}][0x{:x}]",
cbuf->GetIndex(), cbuf_offset));
Would result in a copy of the string being created, as CommentNode()
takes a std::string by value (a const ref passed to a value parameter
results in a copy).
Now, only one instance of the string is ever moved around. (fmt::format
returns a std::string, and since it's returned from a function by value,
this is a prvalue (which can be treated like an rvalue), so it's moved
into Comment's string parameter), we then move it into the CommentNode
constructor, which then moves the string into its member variable).
Amends cases where we were using things that were indirectly being
satisfied through other headers. This way, if those headers change and
eliminate dependencies on other headers in the future, we don't have
cascading compilation errors.
Previously, the code was accumulating data into a std::vector and then
tossing all of it away if a setting was disabled.
Instead, we can just check if it's disabled and do no work at all if
possible. If it's enabled, then we can append to the vector and
allocate.
Unlikely to impact usage much, but it is slightly less sloppy with
resources.
A few of the aoc service stubs/implementations weren't fully popping all
of the parameters passed to them. This ensures that all parameters are
popped and, at minimum, logged out.
Given the array is a private static array, we can just make it
internally linked to hide it from external code. This also allows us to
remove an inclusion within the header.
SMDH is a metadata format used in some executable formats for the
Nintendo 3DS. Switch executables don't utilize this metadata format, so
this just a holdover from Citra and can be corrected.
Allows the loading screen code to compile with implicit string
conversions disabled.
While we're at it remove unnecessary const usages, and add it to nearby
variables where appropriate.
Gets rid of the need to special-case brace handling depending on the
overload used, and makes it consistent across the board with how fmt
handles them.
Strings with compile-time deducible strings are directly forwarded to
std::string's constructor, so we don't need to worry about the
performance difference here, as it'll be identical.
In a lot of places throughout the decompiler, string concatenation via
operator+ is used quite heavily. This is usually fine, when not heavily
used, but when used extensively, can be a problem. operator+ creates an
entirely new heap allocated temporary string and given we perform
expressions like:
std::string thing = a + b + c + d;
this ends up with a lot of unnecessary temporary strings being created
and discarded, which kind of thrashes the heap more than we need to.
Given we utilize fmt in some AddLine calls, we can make this a part of
the ShaderWriter's API. We can make an overload that simply acts as a
passthrough to fmt.
This way, whenever things need to be appended to a string, the operation
can be done via a single string formatting operation instead of
discarding numerous temporary strings. This also has the benefit of
making the strings themselves look nicer and makes it easier to spot
errors in them.
Many of these constructors don't even need to be templated. The only
ones that need to be templated are the ones that actually make use of
the parameter pack.
Even then, since std::vector accepts an initializer list, we can supply
the parameter pack directly to it instead of creating our own copy of
the list, then copying it again into the std::vector.
Given the class contains quite a lot of non-trivial types, place the
constructor and destructor within the cpp file to avoid inlining
construction and destruction code everywhere the class is used.
Avoids performing copies into the pair being returned. Instead, we can
just move the resources into the pair, avoiding the need to make copies
of both the std::string and ShaderEntries struct.
Given the offset is assigned a fixed value in the constructor, we can
just assign it directly and get rid of the need to write the name of the
variable again in the constructor initializer list.
Given the disk shader cache contains non-trivial types, we should
default it in the cpp file in order to prevent inlining of the
complex destruction logic.
The standard library expects hash specializations that don't throw
exceptions. Make this explicit in the type to allow selection of better
code paths if possible in implementations.
We don't need to load the code into a vector and then construct a string
over the data. We can just create a string with the necessary size ahead
of time, and read the data directly into it, getting rid of an
unnecessary heap allocation.
std::move does nothing when applied to a const variable. Resources can't
be moved if the object is immutable. With this change, we don't end up
making several unnecessary heap allocations and copies.
Booleans don't have a guaranteed size, but we still want to have them
integrate into the disk cache system without needing to actually use a
different type. We can do this by supplying non-template overloads for
the bool type.
Non-template overloads always have precedence during function
resolution, so this is safe to provide.
This gets rid of the need to smatter ternary conditionals, as well as
the need to use u8 types to store the value in.
These are only used from within this translation unit, so they don't
need to have external linkage. They were intended to be marked with this
anyways to be consistent with the other service functions.
Renames the members to more accurately indicate what they signify.
"OneShot" and "Sticky" are kind of ambiguous identifiers for the reset
types, and can be kind of misleading. Automatic and Manual communicate
the kind of reset type in a clearer manner. Either the event is
automatically reset, or it isn't and must be manually cleared.
The "OneShot" and "Sticky" terminology is just a hold-over from Citra
where the kernel had a third type of event reset type known as "Pulse".
Given the Switch kernel only has two forms of event reset types, we
don't need to keep the old terminology around anymore.
This reduces the boilerplate that services have to write out the current thread explicitly. Using current thread instead of client thread is also semantically incorrect, and will be a problem when we implement multicore (at which time there will be multiple current threads)
Nvidia's proprietary driver creates a real OpenGL compatibility profile
without this option, meanwhile Intel (and probably AMD, I haven't tested
it) require that QSurfaceFormat::FormatOption::DeprecatedFunctions is
explicitly enabled.
This was reduced due to happening on most games and at such constant
rate that it affected performance heavily for the end user. In general,
we are well aware of the assert and an implementation is already
planned.
Avoids inlining destruction logic where applicable, and also makes
forward declarations not cause unexpected compilation errors depending
on where the State class is used.
Lessens the amount of code that needs to be read, and gets rid of the
need to introduce an indexing variable. Instead, we just operate on the
objects directly.
std::memset is used to clear the entire register structure, which
requires that the Regs struct be trivially copyable (otherwise undefined
behavior is invoked). This prevents the case where a non-trivial type is
potentially added to the struct.
std::move within a copy constructor (on a data member that isn't
mutable) will always result in a copy. Because of that, the behavior of
this copy constructor is identical to the one that would be generated
automatically by the compiler, so we can remove it.
This corrects cases where it was possible to write more entries into the
write buffer than were requested. Now, we check the size of the buffer
before actually writing into them.
We were also returning the wrong value for
GetAvailableLanguageCodeCount2(). This was previously returning 64, but
only 17 should have been returned. 64 entries is the size of the static
array used in MakeLanguageCode() within the service binary itself, but
isn't the actual total number of language codes present.
Makes the class less surprising when it comes to forward declaring the
type, and also prevents inlining the destruction code of the class,
given it contains non-trivial types.
These are able to be omitted from the declaration of functions, since
they don't do anything at the type system level. The definitions of the
functions can retain the use of const though, since they make the
variables immutable in the implementation of the function where they're
used.
Instead of retrieving the data from the std::variant instance, we can
just check if the variant contains that type of data.
This is essentially the same behavior, only it returns a bool indicating
whether or not the type in the variant is currently active, instead of
actually retrieving the data.
By default, MSVC doesn't use standards-compliant volatile semantics.
This makes it behave in a standards-compliant manner, making
expectations more uniform across compilers.
For similar reasons to the previous change, we move this to a single
function, so we don't need to duplicate the conversion logic in several
places within main.cpp.
Specifies the conversions explicitly to avoid implicit conversions from
const char* to QString. This makes it easier to disable implicit QString
conversions in the future.
In this case, the implicit conversion was technically wrong as well. The
implicit conversion treats the input strings as ASCII characters. This
would result in an incorrect conversion being performed in the rare case
a branch name was created with a non-ASCII Unicode character, likely
resulting in junk being displayed.
Over time our config values have grown quite numerous in size.
Unfortunately it also makes the single functions we have for loading and
saving values more error prone.
For example, we were loading the core settings twice when they only
should have been loaded once. In another section, a variable was
shadowing another variable used to load settings from a completely
different section.
Finally, in one other case, there was an extraneous endGroup() call used
that didn't need to be done. This was essentially dead code and also a
bug waiting to happen.
This separates the section loading code into its own separate functions.
This keeps variables only visible to the code that actually needs it,
and makes it much easier to visually see the end of each individual
configuration group. It also makes it much easier to visually catch bugs
during code review.
While we're at it, this also uses QStringLiteral instead of raw string
literals, which both avoids constructing a lot of QString instances, but
also makes it much easier to disable implicit ASCII to QString and
vice-versa in the future via setting QT_NO_CAST_FROM_ASCII and
QT_NO_CAST_TO_ASCII as compilation flags.
The C++ standard allows constexpr variables declared with the extern
keyword to have external linkage. Previously MSVC wasn't abiding by
this. This just makes the compiler more standards compliant during
builds.
Given we currently don't make use of anything that would break by this,
this is safe to enable.
The backend is not used until we decide to submit the testcase/telemetry, and creating it early prevents users from updating the credentials properly while the games are running.
Also introduced in REV5 was a variable-size audio command buffer. This
also affects how the size of the work buffer should be determined, so we
can add handling for this as well.
Thankfully, no other alterations were made to how the work buffer size
is calculated in 7.0.0-8.0.0. There were indeed changes made to to how
some of the actual audio commands are generated though (particularly in
REV7), however they don't apply here.
Introduced in REV5. This is trivial to add support for, now that
everything isn't a mess of random magic constant values.
All this is, is a change in data type sizes as far as this function
cares.
"Unmagics" quite a few magic constants within this code, making it much
easier to understand. Particularly given this factors out specific
sections into their own self-contained lambda functions.
Instead of asserting on already stored shader variants, silently skip them.
This shouldn't be happening but when a shader is invalidated and it is
not stored in the shader cache, this assert would hit and save that
shader anyways when the asserts are disabled.
These are actually quite important indicators of thread lifetimes, so
they should be going into the debug log, rather than being treated as
misc info and delegated to the trace log.
Makes the code much nicer to follow in terms of behavior and control
flow. It also fixes a few bugs in the implementation.
Notably, the thread's owner process shouldn't be accessed in order to
retrieve the core mask or ideal core. This should be done through the
current running process. The only reason this bug wasn't encountered yet
is because we currently only support running one process, and thus every
owner process will be the current process.
We also weren't checking against the process' CPU core mask to see if an
allowed core is specified or not.
With this out of the way, it'll be less noisy to implement proper
handling of the affinity flags internally within the kernel thread
instances.
Provides serialization/deserialization to the database in system save files, accessors for database state and proper handling of both major Mii formats (MiiInfo and MiiStoreData)
This option allows picking the compatibility profile since a lot of bugs
are fixed in it. We devs will use this option to easierly debug current
problems in our Core implementation.:wq
flushing is now responsability of children caches instead of the cache
object. This change will allow the specific cache to pass extra
parameters on flushing and will allow more flexibility.
This is a holdover from Citra, where the 3DS has both
WaitSynchronization1 and WaitSynchronizationN. The switch only has one
form of wait synchronizing (literally WaitSynchonization). This allows
us to throw out code that doesn't apply at all to the Switch kernel.
Because of this unnecessary dichotomy within the wait synchronization
utilities, we were also neglecting to properly handle waiting on
multiple objects.
While we're at it, we can also scrub out any lingering references to
WaitSynchronization1/WaitSynchronizationN in comments, and change them
to WaitSynchronization (or remove them if the mention no longer
applies).
The actual behavior of this function is slightly more complex than what
we're currently doing within the supervisor call. To avoid dumping most
of this behavior in the supervisor call itself, we can migrate this to
another function.
The default constructor will always run, even when not specified, so
this is redundant.
However, the context member can indeed be initialized in the constructor
initializer list.
Previously we were building with MBCS, which is pretty undesirable. We
want the application to be Unicode-aware in general.
Currently, we make the command line variant of yuzu use ANSI variants of
the non-standard getopt functions that we link in for Windows, given we
only have an ANSI option-set.
We should really replace getopt with a library that we make all build
types of yuzu link in, but this will have to do for the time being.
Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
This is a compile definition introduced in Qt 4.8 for reducing the total
potential number of strings created when performing string
concatenation. This allows for less memory churn.
This can be read about here:
https://blog.qt.io/blog/2011/06/13/string-concatenation-with-qstringbuilder/
For a change that isn't source-compatible, we only had one occurrence
that actually need to have its type clarified, which is pretty good, as
far as transitioning goes.
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.
What this entails, is:
- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.
e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
mov ecx, edi
rol cx, 8
shl ecx, 16
shr edi, 16
rol di, 8
movzx eax, di
or eax, ecx
ret
while GCC 8.3 would generate the ideal:
mov eax, edi
bswap eax
ret
now both generate the same optimal output.
MSVC used to generate the following with the old code:
mov eax, ecx
rol cx, 8
shr eax, 16
rol ax, 8
movzx ecx, cx
movzx eax, ax
shl ecx, 16
or eax, ecx
ret 0
Now MSVC also generates a similar, but equally optimal result as clang/GCC:
bswap ecx
mov eax, ecx
ret 0
====
In the swap64 case, for the original code, clang 8.0 would generate:
mov eax, edi
bswap eax
shl rax, 32
shr rdi, 32
bswap edi
or rax, rdi
ret
(almost there, but still missing the mark)
while, again, GCC 8.3 would generate the more ideal:
mov rax, rdi
bswap rax
ret
now clang also generates the optimal sequence for this fallback as well.
This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.
mov r8, rcx
mov r9, rcx
mov rax, 71776119061217280
mov rdx, r8
and r9, rax
and edx, 65280
mov rax, rcx
shr rax, 16
or r9, rax
mov rax, rcx
shr r9, 16
mov rcx, 280375465082880
and rax, rcx
mov rcx, 1095216660480
or r9, rax
mov rax, r8
and rax, rcx
shr r9, 16
or r9, rax
mov rcx, r8
mov rax, r8
shr r9, 8
shl rax, 16
and ecx, 16711680
or rdx, rax
mov eax, -16777216
and rax, r8
shl rdx, 16
or rdx, rcx
shl rdx, 16
or rax, rdx
shl rax, 8
or rax, r9
ret 0
which is pretty unfortunate.
This gives us significantly more control over where in the
initialization process we start execution of the main process.
Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:
Init() {
main_process = Process::Create(...);
kernel.MakeCurrentProcess(main_process.get());
}
Load() {
const auto load_result = loader.Load(*kernel.GetCurrentProcess());
if (load_result != Loader::ResultStatus::Success) {
// Handle error here.
}
...
}
which presented a problem.
Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.
Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.
We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.
Now that we have dependencies on the initialization order, we can move
the creation of the main process to a more sensible area: where we
actually load in the executable data.
This allows localizing the creation and loading of the process in one
location, making the initialization of the process much nicer to trace.
Like with CPU emulation, we generally don't want to fire off the threads
immediately after the relevant classes are initialized, we want to do
this after all necessary data is done loading first.
This splits the thread creation into its own interface member function
to allow controlling when these threads in particular get created.
Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.
For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).
This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.
Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:
Initialization:
1. Timers
2. CPU
3. Kernel
4. Filesystem stuff (kind of gross, but can be amended trivially)
5. Applet stuff (ditto in terms of being kind of gross)
6. Main process (will be moved into the loading step in a following
change)
7. Telemetry (this should be initialized last in the future).
8. Services (4 and 5 should ideally be alongside this).
9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
class or booted altogether).
10. Renderer
11. GPU (will also have its threads created in a separate step in a
following change).
Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.
Allows the compiler to inform when the result of a swap function is
being ignored (which is 100% a bug in all usage scenarios). We also mark
them noexcept to allow other functions using them to be able to be
marked as noexcept and play nicely with things that potentially inspect
"nothrowability".
Including every OS' own built-in byte swapping functions is kind of
undesirable, since it adds yet another build path to ensure compilation
succeeds on.
Given we only support clang, GCC, and MSVC for the time being, we can
utilize their built-in functions directly instead of going through the
OS's API functions.
This shrinks the overall code down to just
if (msvc)
use msvc's functions
else if (clang or gcc)
use clang/gcc's builtins
else
use the slow path
The template type here is actually a forwarding reference, not an rvalue
reference in this case, so it's more appropriate to use std::forward to
preserve the value category of the type being moved.
Some objects declare their handle type as const, while others declare it
as constexpr. This makes the const ones constexpr for consistency, and
prevent unexpected compilation errors if these happen to be attempted to be
used within a constexpr context.
These indicate options that alter how a read/write is performed.
Currently we don't need to handle these, as the only one that seems to
be used is for writes, but all the custom options ever seem to do is
immediate flushing, which we already do by default.
Without passing in a parent, this can result in focus being stolen from
the dialog in certain cases.
Example:
On Windows, if the logging window is left open, the logging Window will
potentially get focus over the hotkey dialog itself, since it brings all
open windows for the application into view. By specifying a parent, we
only bring windows for the parent into view (of which there are none,
aside from the hotkey dialog).
Avoids dumping all of the core settings machinery into whatever files
include this header. Nothing inside the header itself actually made use
of anything in settings.h anyways.