Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
This is a compile definition introduced in Qt 4.8 for reducing the total
potential number of strings created when performing string
concatenation. This allows for less memory churn.
This can be read about here:
https://blog.qt.io/blog/2011/06/13/string-concatenation-with-qstringbuilder/
For a change that isn't source-compatible, we only had one occurrence
that actually need to have its type clarified, which is pretty good, as
far as transitioning goes.
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.
What this entails, is:
- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.
e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
mov ecx, edi
rol cx, 8
shl ecx, 16
shr edi, 16
rol di, 8
movzx eax, di
or eax, ecx
ret
while GCC 8.3 would generate the ideal:
mov eax, edi
bswap eax
ret
now both generate the same optimal output.
MSVC used to generate the following with the old code:
mov eax, ecx
rol cx, 8
shr eax, 16
rol ax, 8
movzx ecx, cx
movzx eax, ax
shl ecx, 16
or eax, ecx
ret 0
Now MSVC also generates a similar, but equally optimal result as clang/GCC:
bswap ecx
mov eax, ecx
ret 0
====
In the swap64 case, for the original code, clang 8.0 would generate:
mov eax, edi
bswap eax
shl rax, 32
shr rdi, 32
bswap edi
or rax, rdi
ret
(almost there, but still missing the mark)
while, again, GCC 8.3 would generate the more ideal:
mov rax, rdi
bswap rax
ret
now clang also generates the optimal sequence for this fallback as well.
This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.
mov r8, rcx
mov r9, rcx
mov rax, 71776119061217280
mov rdx, r8
and r9, rax
and edx, 65280
mov rax, rcx
shr rax, 16
or r9, rax
mov rax, rcx
shr r9, 16
mov rcx, 280375465082880
and rax, rcx
mov rcx, 1095216660480
or r9, rax
mov rax, r8
and rax, rcx
shr r9, 16
or r9, rax
mov rcx, r8
mov rax, r8
shr r9, 8
shl rax, 16
and ecx, 16711680
or rdx, rax
mov eax, -16777216
and rax, r8
shl rdx, 16
or rdx, rcx
shl rdx, 16
or rax, rdx
shl rax, 8
or rax, r9
ret 0
which is pretty unfortunate.
This gives us significantly more control over where in the
initialization process we start execution of the main process.
Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:
Init() {
main_process = Process::Create(...);
kernel.MakeCurrentProcess(main_process.get());
}
Load() {
const auto load_result = loader.Load(*kernel.GetCurrentProcess());
if (load_result != Loader::ResultStatus::Success) {
// Handle error here.
}
...
}
which presented a problem.
Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.
Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.
We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.
Now that we have dependencies on the initialization order, we can move
the creation of the main process to a more sensible area: where we
actually load in the executable data.
This allows localizing the creation and loading of the process in one
location, making the initialization of the process much nicer to trace.
Like with CPU emulation, we generally don't want to fire off the threads
immediately after the relevant classes are initialized, we want to do
this after all necessary data is done loading first.
This splits the thread creation into its own interface member function
to allow controlling when these threads in particular get created.
Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.
For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).
This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.
Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:
Initialization:
1. Timers
2. CPU
3. Kernel
4. Filesystem stuff (kind of gross, but can be amended trivially)
5. Applet stuff (ditto in terms of being kind of gross)
6. Main process (will be moved into the loading step in a following
change)
7. Telemetry (this should be initialized last in the future).
8. Services (4 and 5 should ideally be alongside this).
9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
class or booted altogether).
10. Renderer
11. GPU (will also have its threads created in a separate step in a
following change).
Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.
Allows the compiler to inform when the result of a swap function is
being ignored (which is 100% a bug in all usage scenarios). We also mark
them noexcept to allow other functions using them to be able to be
marked as noexcept and play nicely with things that potentially inspect
"nothrowability".
Including every OS' own built-in byte swapping functions is kind of
undesirable, since it adds yet another build path to ensure compilation
succeeds on.
Given we only support clang, GCC, and MSVC for the time being, we can
utilize their built-in functions directly instead of going through the
OS's API functions.
This shrinks the overall code down to just
if (msvc)
use msvc's functions
else if (clang or gcc)
use clang/gcc's builtins
else
use the slow path
The template type here is actually a forwarding reference, not an rvalue
reference in this case, so it's more appropriate to use std::forward to
preserve the value category of the type being moved.
Some objects declare their handle type as const, while others declare it
as constexpr. This makes the const ones constexpr for consistency, and
prevent unexpected compilation errors if these happen to be attempted to be
used within a constexpr context.
These indicate options that alter how a read/write is performed.
Currently we don't need to handle these, as the only one that seems to
be used is for writes, but all the custom options ever seem to do is
immediate flushing, which we already do by default.
Without passing in a parent, this can result in focus being stolen from
the dialog in certain cases.
Example:
On Windows, if the logging window is left open, the logging Window will
potentially get focus over the hotkey dialog itself, since it brings all
open windows for the application into view. By specifying a parent, we
only bring windows for the parent into view (of which there are none,
aside from the hotkey dialog).
Avoids dumping all of the core settings machinery into whatever files
include this header. Nothing inside the header itself actually made use
of anything in settings.h anyways.
We need to ensure dynarmic gets a valid pointer if the page table is
resized (the relevant pointers would be invalidated in this scenario).
In this scenario, the page table can be resized depending on what kind
of address space is specified within the NPDM metadata (if it's
present).
In our error console, when loading a game, the strings:
QString::arg: Argument missing: "Loading...", 0
QString::arg: Argument missing: "Launching...", 0
would occasionally pop up when the loading screen was running. This was
due to the strings being assumed to have formatting indicators in them,
however only two out of the four strings actually have them.
This only applies the arguments to the strings that have formatting
specifiers provided, which avoids these warnings from occurring.
Adjusts the interface of the wrappers to take a system reference, which
allows accessing a system instance without using the global accessors.
This also allows getting rid of all global accessors within the
supervisor call handling code. While this does make the wrappers
themselves slightly more noisy, this will be further cleaned up in a
follow-up. This eliminates the global system accessors in the current
code while preserving the existing interface.
Keeps the return type consistent with the function name. While we're at
it, we can also reduce the amount of boilerplate involved with handling
these by using structured bindings.
This doesn't actually work anymore, and given how long it's been left in
that state, it's unlikely anyone actually seriously used it.
Generally it's preferable to use RenderDoc or Nsight to view surfaces.
Given we already ensure nothing can set the zeroth register in
SetRegister(), we don't need to check if the index is zero and special
case it. We can just access the register normally, since it's already
going to be zero.
We can also replace the assertion with .at() to perform the equivalent
behavior inline as part of the API.
Now, since we have a const qualified variant of GetPointer(), we can put
it to use in ReadBlock() to retrieve the source pointer that is passed
into memcpy.
Now block reading may be done from a const context.
- Use QStringLiteral where applicable.
- Use const where applicable
- Remove unnecessary precondition check (we already assert the pixbuf
being non null)
Fills in the missing surface types that were marked as unknown. The
order corresponds with the TextureFormat enum within
video_core/texture.h.
We also don't need to all of these strings as translatable (only the
first string, as it's an English word).
Since c5d41fd812 callback parameters were
changed to use an s64 to represent late cycles instead of an int, so
this was causing a truncation warning to occur here. Changing it to s64
is sufficient to silence the warning.
Replaces header inclusions with forward declarations where applicable
and also removes unused headers within the cpp file. This reduces a few
more dependencies on core/memory.h
BitField has been trivially copyable since
e99a148628, so we can eliminate these
TODO comments and use ReadObject() directly instead of memcpying the
data.
Makes the return type consistently uniform (like the intrinsics we're
wrapping). This also conveniently silences a truncation warning within
the kernel multi_level_queue.
Rather than make a full copy of the path, we can just use a string view
and truncate the viewed portion of the string instead of creating a totally
new truncated string.
Temporal generally indicates a relation to time, but this is just
creating a temporary, so this isn't really an accurate name for what the
function is actually doing.
TXQ returns integer types. Shaders usually do:
R0 = TXQ(); // => int
R0 = static_cast<float>(R0);
If we don't treat it as an integer, it will cast a binary float value as
float - resulting in a corrupted number.
In several places, we have request parsers where there's nothing to
really parse, simply because the HLE function in question operates on
buffers. In these cases we can just remove these instances altogether.
In the other cases, we can retrieve the relevant members from the parser
and at least log them out, giving them some use.
Applies the override specifier where applicable. In the case of
destructors that are defaulted in their definition, they can
simply be removed.
This also removes the unnecessary inclusions being done in audin_u and
audrec_u, given their close proximity.
Quite a few unused includes have built up over time, particularly on
core/memory.h. Removing these includes means the source files including
those files will no longer need to be rebuilt if they're changed, making
compilation slightly faster in this scenario.
Rather than scream that the file doesn't exist, we can clearly state
what specifically doesn't exist, to avoid ambiguity, and make it easier
to understand for non-primary English speakers/readers.
Quite a bit of these were out of sync with Switchbrew (and in some cases
entirely wrong). While we're at it, also expand the section of named
members. A segment within the control metadata is used to specify
maximum values for the user, device, and cache storage max sizes and
journal sizes.
These appear to be generally used by the am service (e.g. in
CreateCacheStorage, etc).
We need to be checking whether or not the given address is within the
kernel address space or if the given address isn't word-aligned and bail
in these scenarios instead of trashing any kernel state.
For whatever reason, shared memory was being used here instead of
transfer memory, which (quite clearly) will not work based off the name
of the function.
This corrects this wonky usage of shared memory.
Given server sessions can be given a name, we should allow retrieving
it instead of using the default implementation of GetName(), which would
just return "[UNKNOWN KERNEL OBJECT]".
The AddressArbiter type isn't actually used, given the arbiter itself
isn't a direct kernel object (or object that implements the wait object
facilities).
Given this, we can remove the enum entry entirely.
Moves includes into the cpp file where necessary. This way,
microprofile-related stuff isn't dumped into other UI-related code when
the dialog header gets included.
Similarly like svcGetProcessList, this retrieves the list of threads
from the current process. In the kernel itself, a process instance
maintains a list of threads, which are used within this function.
Threads are registered to a process' thread list at thread
initialization, and unregistered from the list upon thread destruction
(if said thread has a non-null owning process).
We assert on the debug event case, as we currently don't implement
kernel debug objects.
Now that ShouldWait() is a const qualified member function, this one can
be made const qualified as well, since it can handle passing a const
qualified this pointer to ShouldWait().
Previously this was performing a u64 + int sign conversion. When dealing
with addresses, we should generally be keeping the arithmetic in the
same signedness type.
This also gets rid of the static lifetime of the constant, as there's no
need to make a trivial type like this potentially live for the entire
duration of the program.
This doesn't really provide any benefit to the resource limit interface.
There's no way for callers to any of the service functions for resource
limits to provide a custom name, so all created instances of resource
limits other than the system resource limit would have a name of
"Unknown".
The system resource limit itself is already trivially identifiable from
its limit values, so there's no real need to take up space in the object to
identify one object meaningfully out of N total objects.
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
Since C++17, we no longer need to explicitly specify the type of the
mutex within the lock_guard. The type system can now deduce these with
deduction guides.
Based off RE, most of these structure members are register values, which
makes, sense given this service is used to convey fatal errors.
One member indicates the program entry point address, one is a set of
bit flags used to determine which registers to print, and one member
indicates the architecture type.
The only member that still isn't determined is the final member within
the data structure.
The kernel makes sure that the given size to unmap is always the same
size as the entire region managed by the shared memory instance,
otherwise it returns an error code signifying an invalid size.
This is similarly done for transfer memory (which we already check for).
Many of these functions are carried over from Dolphin (where they aren't
used anymore). Given these have no use (and we really shouldn't be
screwing around with OS-specific thread scheduler handling from the
emulator, these can be removed.
The function for setting the thread name is left, however, since it can
have debugging utility usages.
This was initially added to prevent problems from stubbed/not implemented NFC services, but as we never encountered such and as it's only used in a deprecated function anyway, I guess we can just remove it to prevent more clutter of the settings.
Reports the (mostly) correct size through svcGetInfo now for queries to
total used physical memory. This still doesn't correctly handle memory
allocated via svcMapPhysicalMemory, however, we don't currently handle
that case anyways.
This will make operating with the process-related SVC commands much
nicer in the future (the parameter representing the stack size in
svcStartProcess is a 64-bit value).
This isn't used at all in the OpenGL shader cache, so we can remove it's
include here, meaning one less file needs to be recompiled if any
changes ever occur within that header.
core/memory.h is also not used within this file at all, so we can remove
it as well.
We can just pass in the Maxwell3D instance instead of going through the
system class to get at it.
This also lets us simplify the interface a little bit. Since we pass in
the Maxwell3D context now, we only really need to pass the shader stage
index value in.
The pusher instance is only ever used in the constructor of the
ThreadManager for creating the thread that the ThreadManager instance
contains. Aside from that, the member is unused, so it can be removed.
These functions act in tandem similar to how a lock or mutex require a
balanced lock()/unlock() sequence.
EnterFatalSection simply increments a counter for how many times it has
been called, while LeaveFatalSection ensures that a previous call to
EnterFatalSection has occured. If a previous call has occurred (the
counter is not zero), then the counter gets decremented as one would
expect. If a previous call has not occurred (the counter is zero), then
an error code is returned.
In some cases, our callbacks were using s64 as a parameter, and in other
cases, they were using an int, which is inconsistent.
To make all callbacks consistent, we can just use an s64 as the type for
late cycles, given it gets rid of the need to cast internally.
While we're at it, also resolve some signed/unsigned conversions that
were occurring related to the callback registration.
One behavior that we weren't handling properly in our heap allocation
process was the ability for the heap to be shrunk down in size if a
larger size was previously requested.
This adds the basic behavior to do so and also gets rid of HeapFree, as
it's no longer necessary now that we have allocations and deallocations
going through the same API function.
While we're at it, fully document the behavior that this function
performs.
Makes it more obvious that this function is intending to stand in for
the actual supervisor call itself, and not acting as a general heap
allocation function.
Also the following change will merge the freeing behavior of HeapFree
into this function, so leaving it as HeapAllocate would be misleading.
In cases where HeapAllocate is called with the same size of the current
heap, we can simply do nothing and return successfully.
This avoids doing work where we otherwise don't have to. This is also
what the kernel itself does in this scenario.
Another holdover from citra that can be tossed out is the notion of the
heap needing to be allocated in different addresses. On the switch, the
base address of the heap will always be managed by the memory allocator
in the kernel, so this doesn't need to be specified in the function's
interface itself.
The heap on the switch is always allocated with read/write permissions,
so we don't need to add specifying the memory permissions as part of the
heap allocation itself either.
This also corrects the error code returned from within the function.
If the size of the heap is larger than the entire heap region, then the
kernel will report an out of memory condition.
The use of a shared_ptr is an implementation detail of the VMManager
itself when mapping memory. Because of that, we shouldn't require all
users of the CodeSet to have to allocate the shared_ptr ahead of time.
It's intended that CodeSet simply pass in the required direct data, and
that the memory manager takes care of it from that point on.
This means we just do the shared pointer allocation in a single place,
when loading modules, as opposed to in each loader.
This source file was utilizing its own version of the NSO header.
Instead of keeping this around, we can have the patch manager also use
the version of the header that we have defined in loader/nso.h
The total struct itself is 0x100 (256) bytes in size, so we should be
providing that amount of data.
Without the data, this can result in omitted data from the final loaded
NSO file.
Implements an API agnostic texture view based texture cache. Classes
defined here are intended to be inherited by the API implementation and
used in API-specific code.
This implementation exposes protected virtual functions to be called
from the implementer.
Before executing any surface copies methods (defined in API-specific code)
it tries to detect if the overlapping surface is a superset and if it
is, it creates a view. Views are references of a subset of a surface, it
can be a superset view (the same as referencing the whole texture).
Current code manages 1D, 1D array, 2D, 2D array, cube maps and cube map
arrays with layer and mipmap level views. Texture 3D slices views are
not implemented.
If the view attempt fails, the fast path is invoked with the overlapping
textures (defined in the implementer). If that one fails (returning
nullptr) it will flush and reload the texture.
Makes it more evident that one is for actual code and one is for actual
data. Mutable and static are less than ideal terms here, because
read-only data is technically not mutable, but we were mapping it with
that label.
In 93da8e0abf, the page table construct
was moved to the common library (which utilized these inclusions). Since
the move, nothing requires these headers to be included within the
memory header.
- GPU will be released on shutdown, before pages are unmapped.
- On subsequent runs, current_page_table will be not nullptr, but GPU might not be valid yet.
When #2247 was created, thread_queue_list.h was the only user of
boost-related code, however #2252 moved the page table struct into
common, which makes use of Boost.ICL, so we need to add the dependency
to the common library's link interface again.
Given this is utilized by the loaders, this allows avoiding inclusion of
the kernel process definitions where avoidable.
This also keeps the loading format for all executable data separate from
the kernel objects.
Neither the NRO or NSO loaders actually make use of the functions or
members provided by the Linker interface, so we can just remove the
inheritance altogether.
This function passes in the desired main applet and library applet
volume levels. We can then just pass those values back within the
relevant volume getter functions, allowing us to unstub those as well.
The initial values for the library and main applet volumes differ. The
main applet volume is 0.25 by default, while the library applet volume
is initialized to 1.0 by default in the services themselves.
Modifying CMAKE_* related flags directly applies those changes to every
single CMake target. This includes even the targets we have in the
externals directory.
So, if we ever increased our warning levels, or enabled particular ones,
or enabled any other compilation setting, then this would apply to
externals as well, which is often not desirable.
This makes our compilation flag setup less error prone by only applying
our settings to our targets and leaving the externals alone entirely.
This also means we don't end up clobbering any provided flags on the
command line either, allowing users to specifically use the flags they
want.
We generally shouldn't be hijacking CMAKE_CXX_FLAGS, etc as a means to
append flags to the targets, since this adds the compilation flags to
everything, including our externals, which can result in weird issues
and makes the build hierarchy fragile.
Instead, we want to just apply these compilation flags to our targets,
and let those managing external libraries to properly specify their
compilation flags.
This also results in us not getting as many warnings, as we don't raise
the warning level on every external target.
We really don't need to pull in several headers of boost related
machinery just to perform the erase-remove idiom (particularly with
C++20 around the corner, which adds universal container std::erase and
std::erase_if, which we can just use instead).
With this, we don't need to link in anything boost-related into common.
Rather than make a global accessor for this sort of thing. We can make
it a part of the thread interface itself. This allows getting rid of a
hidden global accessor in the kernel code.
This condition was checking against the nominal thread priority, whereas
the kernel itself checks against the current priority instead. We were
also assigning the nominal priority, when we should be assigning
current_priority, which takes priority inheritance into account.
This can lead to the incorrect priority being assigned to a thread.
Given we recursively update the relevant threads, we don't need to go
through the whole mutex waiter list. This matches what the kernel does
as well (only accessing the first entry within the waiting list).
* gdbstub: fix IsMemoryBreak() returning false while connected to client
As a result, the only existing codepath for a memory watchpoint hit to break into GDB (InterpeterMainLoop, GDB_BP_CHECK, ARMul_State::RecordBreak) is finally taken,
which exposes incorrect logic* in both RecordBreak and ServeBreak.
* a blank BreakpointAddress structure is passed, which sets r15 (PC) to NULL
* gdbstub: DynCom: default-initialize two members/vars used in conditionals
* gdbstub: DynCom: don't record memory watchpoint hits via RecordBreak()
For now, instead check for GDBStub::IsMemoryBreak() in InterpreterMainLoop and ServeBreak.
Fixes PC being set to a stale/unhit breakpoint address (often zero) when a memory watchpoint (rwatch, watch, awatch) is handled in ServeBreak() and generates a GDB trap.
Reasons for removing a call to RecordBreak() for memory watchpoints:
* The``breakpoint_data`` we pass is typed Execute or None. It describes the predicted next code breakpoint hit relative to PC;
* GDBStub::IsMemoryBreak() returns true if a recent Read/Write operation hit a watchpoint. It doesn't specify which in return, nor does it trace it anywhere. Thus, the only data we could give RecordBreak() is a placeholder BreakpointAddress at offset NULL and type Access. I found the idea silly, compared to simply relying on GDBStub::IsMemoryBreak().
There is currently no measure in the code that remembers the addresses (and types) of any watchpoints that were hit by an instruction, in order to send them to GDB as "extended stop information."
I'm considering an implementation for this.
* gdbstub: Change an ASSERT to DEBUG_ASSERT
I have never seen the (Reg[15] == last_bkpt.address) assert fail in practice, even after several weeks of (locally) developping various branches around GDB. Only leave it inside Debug builds.
Makes it an instantiable class like it is in the actual kernel. This
will also allow removing reliance on global accessors in a following
change, now that we can encapsulate a reference to the system instance
in the class.
Within the kernel, shared memory and transfer memory facilities exist as
completely different kernel objects. They also have different validity
checking as well. Therefore, we shouldn't be treating the two as the
same kind of memory.
They also differ in terms of their behavioral aspect as well. Shared
memory is intended for sharing memory between processes, while transfer
memory is intended to be for transferring memory to other processes.
This breaks out the handling for transfer memory into its own class and
treats it as its own kernel object. This is also important when we
consider resource limits as well. Particularly because transfer memory
is limited by the resource limit value set for it.
While we currently don't handle resource limit testing against objects
yet (but we do allow setting them), this will make implementing that
behavior much easier in the future, as we don't need to distinguish
between shared memory and transfer memory allocations in the same place.
The previous code had some minor issues with it, really not a big deal,
but amending it is basically 'free', so I figured, "why not?".
With the standard container maps, when:
map[key] = thing;
is done, this can cause potentially undesirable behavior in certain
scenarios. In particular, if there's no value associated with the key,
then the map constructs a default initialized instance of the value
type.
In this case, since it's a std::shared_ptr (as a type alias) that is
the value type, this will construct a std::shared_pointer, and then
assign over it (with objects that are quite large, or actively heap
allocate this can be extremely undesirable).
We also make the function take the region by value, as we can avoid a
copy (and by extension with std::shared_ptr, a copy causes an atomic
reference count increment), in certain scenarios when ownership isn't a
concern (i.e. when ReserveGlobalRegion is called with an rvalue
reference, then no copy at all occurs). So, it's more-or-less a "free"
gain without many downsides.
With this, all kernel objects finally have all of their data members
behind an interface, making it nicer to reason about interactions with
other code (as external code no longer has the freedom to totally alter
internals and potentially messing up invariants).
After doing a little more reading up on the Opus codec, it turns out
that the multistream API that is part of libopus can handle regular
packets. Regular packets are just a degenerate case of multistream Opus
packets, and all that's necessary is to pass the number of streams as 1
and provide a basic channel mapping, then everything works fine for
that case.
This allows us to get rid of the need to use both APIs in the future
when implementing multistream variants in a follow-up PR, greatly
simplifying the code that needs to be written.
Previously this was required, as BitField wasn't trivially copyable.
BitField has since been made trivially copyable, so now this isn't
required anymore.
Relocates the error code to where it's most related, similar to how all
the other error codes are. Previously we were including a non-generic
error in the main result code header.
These can just be passed regularly, now that we use fmt instead of our
old logging system.
While we're at it, make the parameters to MakeFunctionString
std::string_views.
Instead of holding a reference that will get invalidated by
dma_pushbuffer.pop(), hold it as a copy. This doesn't have any
performance cost since CommandListHeader is 8 bytes long.
There's no real need to use a shared lifetime here, since we don't
actually expose them to anything else. This is also kind of an
unnecessary use of the heap given the objects themselves are so small;
small enough, in fact that changing over to optionals actually reduces
the overall size of the HLERequestContext struct (818 bytes to 808
bytes).
Now that we have the address arbiter extracted to its own class, we can
fix an innaccuracy with the kernel. Said inaccuracy being that there
isn't only one address arbiter. Each process instance contains its own
AddressArbiter instance in the actual kernel.
This fixes that and gets rid of another long-standing issue that could
arise when attempting to create more than one process.
Similar to how WaitForAddress was isolated to its own function, we can
also move the necessary conditional checking into the address arbiter
class itself, allowing us to hide the implementation details of it from
public use.
Rather than let the service call itself work out which function is the
proper one to call, we can make that a behavior of the arbiter itself,
so we don't need to directly expose those implementation details.
This makes the class much more flexible and doesn't make performing
copies with classes that contain a bitfield member a pain.
Given BitField instances are only intended to be used within unions, the
fact the full storage value would be copied isn't a big concern (only
sizeof(union_type) would be copied anyways).
While we're at it, provide defaulted move constructors for consistency.
Because of the recent separation of GPU functionality into sync/async
variants, we need to mark the destructor virtual to provide proper
destruction behavior, given we use the base class within the System
class.
Prior to this, it was undefined behavior whether or not the destructor
in the derived classes would ever execute.
This will be utilized by more than just that class in the future. This
also renames it from OpusHeader to OpusPacketHeader to be more specific
about what kind of header it is.
We already have the thread instance that was created under the current
process, so we can just pass the handle table of it along to retrieve
the owner of the mutex.
Removes a few unnecessary dependencies on core-related machinery, such
as the core.h and memory.h, which reduces the amount of rebuilding
necessary if those files change.
This also uncovered some indirect dependencies within other source
files. This also fixes those.
Places all error codes in an easily includable header.
This also corrects the unsupported error code (I accidentally used the
hex value when I meant to use the decimal one).
Places all of the functions for address arbiter operation into a class.
This will be necessary for future deglobalizing efforts related to both
the memory and system itself.
Removes a few inclusion dependencies from the headers or replaces
existing ones with ones that don't indirectly include the required
headers.
This allows removing an inclusion of core/memory.h, meaning that if the
memory header is ever changed in the future, it won't result in
rebuilding the entirety of the HLE services (as the IPC headers are used
quite ubiquitously throughout the HLE service implementations).
Avoids directly relying on the global system instance and instead makes
an arbitrary system instance an explicit dependency on construction.
This also allows removing dependencies on some global accessor functions
as well.
Given we already pass in a reference to the kernel that the shared
memory instance is created under, we can just use that to check the
current process, rather than using the global accessor functions.
This allows removing direct dependency on the system instance entirely.
In these cases the system object is nearby, and in the other, the
long-form of accessing the telemetry instance is already used, so we can
get rid of the use of the global accessor.
We already pass a reference to the system object to the constructor of the renderer,
so we can just use that instead of using the global accessor functions.
Reduces the potential amount of rebuilding necessary if any headers
change. In particular, we were including a header from the core library
when we don't even link the core library to the web_service library, so
this also gets rid of an indirect dependency.
Moves local global state into the Impl class itself and initializes it
at the creation of the instance instead of in the function.
This makes it nicer for weakly-ordered architectures, given the
CreateEntry() class won't need to have atomic loads executed for each
individual call to the CreateEntry class.
Any SDL invocation can call the even callback on the same thread, which can call GetSDLJoystickBySDLID and eventually cause double lock on joystick_map_mutex. To avoid this, lock guard should be placed as closer as possible to the object accessing code, so that any SDL invocation is with the mutex unlocked
Changes the interface as well to remove any unique methods that
frontends needed to call such as StartJoystickEventHandler by
conditionally starting the polling thread only if the frontend hasn't
started it already. Additionally, moves all global state into a single
SDLState class in order to guarantee that the destructors are called in
the proper order
MSVC does not seem to like using constexpr values in a lambda that were declared outside of it.
Previously on MSVC build the hotkeys to inc-/decrease the speed limit were not working correctly because in the lambda the SPEED_LIMIT_STEP had garbage values.
After googling around a bit I found: https://github.com/codeplaysoftware/computecpp-sdk/issues/95 which seems to be a similar issue.
Trying the suggested fix to make the variable static constexpr also fixes the bug here.
The comment already invalidates itself: neither MMIO nor rasterizer cache belongsHLE kernel state. This mutex has a too large scope if MMIO or cache is included, which is prone to dead lock when multiple thread acquires these resource at the same time. If necessary, each MMIO component or rasterizer should have their own lock.
This currently has the same behavior as the regular
OpenAudioRenderer API function, so we can just move the code within
OpenAudioRenderer to an internal function that both service functions
call.
This service function appears to do nothing noteworthy on the switch.
All it does at the moment is either return an error code or abort the
system. Given we obviously don't want to kill the system, we just opt
for always returning the error code.
Provides names for previously unknown entries (aside from the two u8
that appear to be padding bytes, and a single word that also appears
to be reserved or padding).
This will be useful in subsequent changes when unstubbing behavior related
to the audio renderer services.
This function is also supposed to check its given policy type with the
permission of the service itself. This implements the necessary
machinery to unstub these functions.
Policy::User seems to just be basic access (which is probably why vi:u
is restricted to that policy), while the other policy seems to be for
extended abilities regarding which displays can be managed and queried,
so this is assumed to be for a background compositor (which I've named,
appropriately, Policy::Compositor).
There's no real reason this shouldn't be allowed, given some values sent
via a request can be signed. This also makes it less annoying to work
with popping enum values, given an enum class with no type specifier
will work out of the box now.
It's also kind of an oversight to allow popping s64 values, but nothing
else.
This didn't really provide much benefit here, especially since the
subsequent change requires that the behavior for each service's
GetDisplayService differs in a minor detail.
This also arguably makes the services nicer to read, since it gets rid
of an indirection in the class hierarchy.
The kernel allows restricting the total size of the handle table through
the process capability descriptors. Until now, this functionality wasn't
hooked up. With this, the process handle tables become properly restricted.
In the case of metadata-less executables, the handle table will assume
the maximum size is requested, preserving the behavior that existed
before these changes.
This manages two kinds of streaming buffers: one for unified memory
models and one for dedicated GPUs. The first one skips the copy from the
staging buffer to the real buffer, since it creates an unified buffer.
This implementation waits for all fences to finish their operation
before "invalidating". This is suboptimal since it should allocate
another buffer or start searching from the beginning. There is room for
improvement here.
This could also handle AMD's "pinned" memory (a heap with 256 MiB) that
seems to be designed for buffer streaming.
The scheduler abstracts command buffer and fence management with an
interface that's able to do OpenGL-like operations on Vulkan command
buffers.
It returns by value a command buffer and fence that have to be used for
subsequent operations until Flush or Finish is executed, after that the
current execution context (the pair of command buffers and fences) gets
invalidated a new one must be fetched. Thankfully validation layers will
quickly detect if this is skipped throwing an error due to modifications
to a sent command buffer.
The NVFlinger service is already passed into services that need to
guarantee its lifetime, so the BufferQueue instances will already live
as long as they're needed. Making them std::shared_ptr instances in this
case is unnecessary.
Like the previous changes made to the Display struct, this prepares the
Layer struct for changes to its interface. Given Layer will be given
more invariants in the future, we convert it into a class to better
signify that.
With the display and layer structures relocated to the vi service, we
can begin giving these a proper interface before beginning to properly
support the display types.
This converts the display struct into a class and provides it with the
necessary functions to preserve behavior within the NVFlinger class.
* Fixes Unicode Key File Directories
Adds code so that when loading a file it converts to UTF16 first, to
ensure the files can be opened. Code borrowed from FileUtil::Exists.
* Update src/core/crypto/key_manager.cpp
Co-Authored-By: Jungorend <Jungorend@users.noreply.github.com>
* Update src/core/crypto/key_manager.cpp
Co-Authored-By: Jungorend <Jungorend@users.noreply.github.com>
* Using FileUtil instead to be cleaner.
* Update src/core/crypto/key_manager.cpp
Co-Authored-By: Jungorend <Jungorend@users.noreply.github.com>
These are more closely related to the vi service as opposed to the
intermediary nvflinger.
This also places them in their relevant subfolder, as future changes to
these will likely result in subclassing to represent various displays
and services, as they're done within the service itself on hardware.
The reasoning for prefixing the display and layer source files is to
avoid potential clashing if two files with the same name are compiled
(e.g. if 'display.cpp/.h' or 'layer.cpp/.h' is added to another service
at any point), which MSVC will actually warn against. This prevents that
case from occurring.
This also presently coverts the std::array introduced within
f45c25aaba back to a std::vector to allow
the forward declaration of the Display type. Forward declaring a type
within a std::vector is allowed since the introduction of N4510
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4510.html) by
Zhihao Yuan.
As fetching command list headers and and the list of command headers is a fixed 1:1 relation now, they can be implemented within a single call.
This cleans up the Step() logic quite a bit.
Fetching every u32 from memory leads to a big overhead. So let's fetch all of them as a block if possible.
This reduces the Memory::* calls by the dma_pusher by a factor of 10.
A fairly trivial change. Other sections of the codebase use nested
namespaces instead of separate namespaces here. This one must have just
been overlooked.
Gets rid of the largest set of mutable global state within the core.
This also paves a way for eliminating usages of GetInstance() on the
System class as a follow-up.
Note that no behavioral changes have been made, and this simply extracts
the functionality into a class. This also has the benefit of making
dependencies on the core timing functionality explicit within the
relevant interfaces.
Previously, we were completely ignoring for screenshots whether the game uses RGB or sRGB.
This resulted in screenshot colors that looked off for some titles.
There are some potential edge cases where gl_state may fail to track the
state if a related state changes while the toggle is disabled or it
didn't change. This addresses that.
Handles a pool of resources protected by fences. Manages resource
overflow allocating more resources.
This class is intended to be used through inheritance.
Fences take ownership of objects, protecting them from GPU-side or
driver-side concurrent access. They must be commited from the resource
manager. Their usage flow is: commit the fence from the resource
manager, protect resources with it and use them, send the fence to an
execution queue and Wait for it if needed and then call Release. Used
resources will automatically be signaled when they are free to be
reused.
Makes it consistent with the regular standard containers in terms of
size representation. This also gets rid of dependence on our own
type aliases, removing the need for an include.
The necessity of this parameter is dubious at best, and in 2019 probably
offers completely negligible savings as opposed to just leaving this
enabled. This removes it and simplifies the overall interface.
VKDevice contains all the data required to manage and initialize a
physical device. Its intention is to be passed across Vulkan objects to
query device-specific data (for example the logical device and the
dispatch loader).
We already store a reference to the system instance that the renderer is
created with, so we don't need to refer to the system instance via
Core::System::GetInstance()
This file is intended to be included instead of vulkan/vulkan.hpp. It
includes declarations of unique handlers using a dynamic dispatcher
instead of a static one (which would require linking to a Vulkan
library).
Places all of the timing-related functionality under the existing Core
namespace to keep things consistent, rather than having the timing
utilities sitting in its own completely separate namespace.
When I originally added the compute assert I used the wrong
documentation. This addresses that.
The dispatch register was tested with homebrew against hardware and is
triggered by some games (e.g. Super Mario Odyssey). What exactly is
missing to get a valid program bound by this engine requires more
investigation.
This was originally included because texture operations returned a vec4.
These operations now return a single float and the F4 prefix doesn't
mean anything.
Previous code relied on GLSL parameter order (something that's always
ill-formed on an IR design). This approach passes spatial coordiantes
through operation nodes and array and depth compare values in the the
texture metadata. It still contains an "extra" vector containing generic
nodes for bias and component index (for example) which is still a bit
ill-formed but it should be better than the previous approach.
i965 (and probably all mesa drivers) require GL_PROGRAM_SEPARABLE when using
glProgramBinary. This is probably required by the standard but it's ignored by
permisive proprietary drivers.
This commit it automatically generated by command in zsh:
sed -i -- 's/BitField<\(.*\)_le>/BitField<\1>/g' **/*(D.)
BitField is now aware to endianness and default to little endian. It expects a value representation type without storage specification for its template parameter.
This is compromise for swap type being used in union. A union has deleted default constructor if it has at least one variant member with non-trivial default constructor, and no variant member of T has a default member initializer. In the use case of Bitfield, all variant members will be the swap type on endianness mismatch, which would all have non-trivial default constructor if default value is specified, and non of them can have member initializer
Converts many of the Find* functions to return a std::optional<T> as
opposed to returning the raw return values directly. This allows
removing a few assertions and handles error cases like the service
itself does.
Some games search conditionally use global memory instructions. This
allows the heuristic to search inside conditional nodes for the source
constant buffer.
Some games call LDG at the top of a basic block, making the tracking
heuristic to fail. This commit lets the heuristic the decoded nodes as a
whole instead of per basic blocks.
This may lead to some false positives but allows it the heuristic to
track cases it previously couldn't.
A holdover from citra, the Horizon kernel on the switch has no
prominent kernel object that functions as a timer. At least not
to the degree of sophistication that this class provided.
As such, this can be removed entirely. This class also wasn't used at
all in any meaningful way within the core, so this was just code sitting
around doing nothing. This also allows removing a few things from the
main KernelCore class that allows it to use slightly less resources
overall (though very minor and not anything really noticeable).
No inheritors of the WaitObject class actually make use of their own
implementations of these functions, so they can be made non-virtual.
It's also kind of sketchy to allow overriding how the threads get added
to the list anyways, given the kernel itself on the actual hardware
doesn't seem to customize based off this.
This was previously causing:
warning C4828: The file contains a character starting at offset 0xa33
that is illegal in the current source character set (codepage 65001).
warnings on Windows when compiling yuzu.
This functions almost identically to DecodeInterleavedWithPerfOld,
however this function also has the ability to reset the decoder context.
This is documented as a potentially desirable thing in the libopus
manual in some circumstances as it says for the OPUS_RESET_STATE ctl:
"This should be called when switching streams in order to prevent the
back to back decoding from giving different result from one at a time
decoding."
Constant buffer values on the shader IR were using different offsets if
the access direct or indirect. cbuf34 has a non-multiplied offset while
cbuf36 does. On shader decoding this commit multiplies it by four on
cbuf34 queries.
* Implemented the puller semaphore operations.
* Nit: Fix 2 style issues
* Nit: Add Break to default case.
* Fix style.
* Update for comments. Added ReferenceCount method
* Forgot to remove GpuSmaphoreAddress union.
* Fix the clang-format issues.
* More clang formatting.
* two more white spaces for the Clang formatting.
* Move puller members into the regs union
* Updated to use Memory::WriteBlock instead of Memory::Write*
* Fix clang style issues
* White space clang error
* Removing unused funcitons and other pr comment
* Removing unused funcitons and other pr comment
* More union magic for setting regs value.
* union magic refcnt as well
* Remove local var
* Set up the regs and regs_assert_positions up properly
* Fix clang error
Cubemaps are considered layered and to create a texture view the texture
mustn't be a layered texture, resulting in cubemaps being bound as
cubemap arrays. To fix this issue this commit introduces an extra
surface parameter called "is_array" and uses this to query for texture
view creation.
Now that texture views for cubemaps are actually being created, this
also fixes the number of layers created for the texture view (since they
have to be 6 to create a texture view of cubemaps).
Some games (like Xenoblade Chronicles 2) clear both depth and stencil
buffers while there's a depth-only texture attached (e.g. D16 Unorm).
This commit reads the zeta format of the bound surface on
ConfigureFramebuffers and returns if depth and/or stencil attachments
were set. This is ignored on DrawArrays but on Clear it's used to just
clear those attachments, bypassing an OpenGL error.
In addition to the default, external, EDID, and internal displays,
there's also a null display provided as well, which as the name
suggests, does nothing but discard all commands given to it. This is
provided for completeness.
Opening a display isn't really a thing to warn about. It's an expected
thing, so this can be a debug log. This also alters the string to
indicate the display name better.
Opening "Default" display reads a little nicer compared to Opening
display Default.
This quite literally functions as a basic setter. No other error
checking or anything (since there's nothing to really check against).
With this, it completes the pm:bm interface in terms of functionality.
This appears to be a vestigial API function that's only kept around for
compatibility's sake, given the function only returns a success error
code and exits.
Since that's the case, we can remove the stubbed notification from the
log, since doing nothing is technically the correct behavior in this
case.
std::moveing a local variable in a return statement has the potential to
prevent copy elision from occurring, so this can just be converted into
a regular return.
Looking into the implementation of the C++ standard facilities that seem
to be within all modules, it appears that they use 7 as a break reason
to indicate an uncaught C++ exception.
This was primarily found via the third last function called within
Horizon's equivalent of libcxxabi's demangling_terminate_handler(),
which passes the value 0x80000007 to svcBreak.
According to documentation, if the argument of std::exp is zero, one is returned.
However we want the return value to be also zero in this case so no audio is played.
Commercial games assume that this value is 1 but they never set it. On
the other hand nouveau manually sets this register. On
ConfigureFramebuffers we were asserting for what we are actually
implementing (according to envytools).
With the loading screen merged, we don't want to actually show at this
point, but it still needs to be shown to actually create the context.
Turns out you can just show and hide it immediately and it'll work.
With shader caches on the horizon, one requirement is to provide visible
feedback for the progress. The shader cache reportedly takes several
minutes to load for large caches that were invalidated, and as such we
should provide a loading screen with progress.
Adds a loading screen widget that will be shown until the first frame of
the game is swapped. This was chosen in case shader caches are not being
used, several games still take more than a few seconds to launch and
could benefit from a loading screen.
This is a function that definitely doesn't always have a non-modifying
behavior across all implementations, so this should be made non-const.
This gets rid of the need to mark data members as mutable to work around
the fact mutating data members needs to occur.
There is a bug on Intel's blob driver where it fails to properly build a
vertex array object if it's not bound even after creating it with
glCreateVertexArrays. This workaround binds it after creating it to
bypass the issue.
Since the data is doing the path CPU -> GPU -> GPU copy is the most
approximate hint. Using GL_STREAM_DRAW generated a performance warning
on Nvidia's stack. Changing this hint removed the warning.
These values are not equivalent, based off RE. The internal value is put
into a lookup table with the following values:
[3, 0, 1, 2, 4]
So the values absolutely do not map 1:1 like the comment was indicating.
Avoids entangling the IPC buffer appending with the actual operation of
converting the scaling values over. This also inserts the proper error
handling for invalid scaling values.
This appears to only check if the scaling mode can actually be
handled, rather than actually setting the scaling mode for the layer.
This implements the same error handling performed on the passed in
values.
Within the actual service, it makes no distinguishing between docked and
undocked modes. This will always return the constants values reporting
1280x720 as the dimensions.
This IPC command is simply a stub inside the actual service itself, and
just returns a successful error code regardless of input. This is likely
only retained in the service interface to not break older code that relied
upon it succeeding in some way.
In many cases, we didn't bother to log out any of the popped data
members. This logs them out to the console within the logging call to
provide more contextual information.
Internally within the vi services, this is essentially all that
OpenDefaultDisplay does, so it's trivial to just do the same, and
forward the default display string into the function.
It appears that the two members indicate whether a display has a bounded
number of layers (and if set, the second member indicates the total
number of layers).
This is a bounds check to ensure that the thread priority is within the
valid range of 0-64. If it exceeds 64, that doesn't necessarily mean
that an actual priority of 64 was expected (it actually means whoever
called the function screwed up their math).
Instead clarify the message to indicate the allowed range of thread
priorities.
Now that we handle the kernel capability descriptors we can correct
CreateThread to properly check against the core and priority masks
like the actual kernel does.
When a shader samples a texture array but that texture in OpenGL is
created without layers, use a texture view to increase the texture
hierarchy. For example, instead of binding a GL_TEXTURE_2D bind a
GL_TEXTURE_2D_ARRAY view.
These two macros being used in tandem were used prior to the
introduction of UNIMPLEMENTED and UNIMPLEMENTED_MSG. This provides
equivalent behavior, just with less typing/reading involved.
This makes the naming more closely match its meaning. It's just a
preferred core, not a required default core. This also makes the usages
of this term consistent across the thread and process implementations.
This function isn't a general purpose function that should be exposed to
everything, given it's specific to initializing the main thread for a
Process instance.
Given that, it's a tad bit more sensible to place this within
process.cpp, which keeps it visible only to the code that actually needs
it.
Provides extra information that makes it easier to tell if an executable
being run is using a 36-bit address space or a 39-bit address space.
While we don't support AArch32 executables yet, this also puts in
distinguishing information for the 32-bit address space types as well.
In all cases that these functions are needed, the VMManager can just be
retrieved and used instead of providing the same functions in Process'
interface.
This also makes it a little nicer dependency-wise, since it gets rid of
cases where the VMManager interface was being used, and then switched
over to using the interface for a Process instance. Instead, it makes
all accesses uniform and uses the VMManager instance for all necessary
tasks.
All the basic memory mapping functions did was forward to the Process'
VMManager instance anyways.
This stores a file in the save directory called '.yuzu_save_size' which stores the two save sizes (normal area and journaled area) sequentially as u64s.
Calling tr() from a file-scope array isn't advisable, since it can be
executed before the Qt libraries are even fully initialized, which can
lead to crashes.
Instead, the translatable strings should be annotated, and the tr()
function should be called at the string's usage site.
This allows us to present a much nicer UI to the user over a simple combo box and is made easy with the modular nature of the profile-selection applet frontend.
Using the QtProfileSelectorDialog, this implementation is trivial. This mimics the real switch behavior of asking which user on every game boot, but it is default disabled as that might get inconvenient.
Similar to the service capability flags, however, we currently don't
emulate the GIC, so this currently handles all interrupts as being valid
for the time being.
Handles the priority mask and core mask flags to allow building up the
masks to determine the usable thread priorities and cores for a kernel
process instance.
We've had the old kernel capability parser from Citra, however, this is
unused code and doesn't actually map to how the kernel on the Switch
does it. This introduces the basic functional skeleton for parsing
process capabilities.
If a thread handle is passed to svcGetProcessId, the kernel attempts to
access the process ID via the thread's instance's owning process.
Technically, this function should also be handling the kernel debug
objects as well, however we currently don't handle those kernel objects
yet, so I've left a note via a comment about it to remind myself when
implementing it in the future.
Starts the process ID counter off at 81, which is what the kernel itself
checks against internally when creating processes. It's actually
supposed to panic if the PID is less than 81 for a userland process.
Now it also indicates the name and max session count. This also gives a
name to the unknown bool. This indicates if the created port is supposed
to be using light handles or regular handles internally. This is passed
to the respective svcCreatePort parameter internally.
Allows capturing screenshot at the current internal resolution (native for software renderer), but a setting is available to capture it in other resolutions. The screenshot is saved to a single PNG in the current layout.
Adds the barebones enumeration constants and functions in place to
handle memory attributes, while also essentially leaving the attribute
itself non-functional.
We can hide the direct array from external view and instead provide
functions to retrieve the necessary info. This has the benefit of
completely hiding the makeup of the SinkDetails structure from the rest
of the code.
Given that this makes the array hidden, we can also make the array
constexpr by altering the members slightly. This gets rid of several
static constructor calls related to std::vector and std::function.
Now we don't have heap allocations here that need to occur before the
program can even enter main(). It also has the benefit of saving a
little bit of heap space, but this doesn't matter too much, since the
savings in that regard are pretty tiny.
Services created with the ServiceFramework base class install themselves as HleHandlers with an owning shared_ptr in the ServerPort ServiceFrameworkBase::port member variable, creating a cyclic ownership between ServiceFrameworkBase and the ServerPort, preventing deletion of the service objects.
Fix that by removing the ServiceFrameworkBase::port member because that was only used to detect multiple attempts at installing a port. Instead store a flag if the port was already installed to achieve the same functionality.
In the previous change, the memory writing was moved into the service
function itself, however it still had a problem, in that the entire
MemoryInfo structure wasn't being written out, only the first 32 bytes
of it were being written out. We still need to write out the trailing
two reference count members and zero out the padding bits.
Not doing this can result in wrong behavior in userland code in the following
scenario:
MemoryInfo info; // Put on the stack, not quaranteed to be zeroed out.
svcQueryMemory(&info, ...);
if (info.device_refcount == ...) // Whoops, uninitialized read.
This can also cause the wrong thing to happen if the user code uses
std::memcmp to compare the struct, with another one (questionable, but
allowed), as the padding bits are not guaranteed to be a deterministic
value. Note that the kernel itself also fully zeroes out the structure
before writing it out including the padding bits.
Moves the memory writes directly into QueryProcessMemory instead of
letting the wrapper function do it. It would be inaccurate to allow the
handler to do it because there's cases where memory shouldn't even be
written to. For example, if the given process handle is invalid.
HOWEVER, if the memory writing is within the wrapper, then we have no
control over if these memory writes occur, meaning in an error case, 68
bytes of memory randomly get trashed with zeroes, 64 of those being
written to wherever the memory info address points to, and the remaining
4 being written wherever the page info address points to.
One solution in this case would be to just conditionally check within
the handler itself, but this is kind of smelly, given the handler
shouldn't be performing conditional behavior itself, it's a behavior of
the managed function. In other words, if you remove the handler from the
equation entirely, does the function still retain its proper behavior?
In this case, no.
Now, we don't potentially trash memory from this function if an invalid
query is performed.
This would result in svcSetMemoryAttribute getting the wrong value for
its third parameter. This is currently fine, given the service function
is stubbed, however this will be unstubbed in a future change, so this
needs to change.
The kernel returns a memory info instance with the base address set to
the end of the address space, and the size of said block as
0 - address_space_end, it doesn't set both of said members to zero.
Gets the two structures out of an unrelated header and places them with
the rest of the memory management code.
This also corrects the structures. PageInfo appears to only contain a
32-bit flags member, and the extra padding word in MemoryInfo isn't
necessary.
Amends the MemoryState enum to use the same values like the actual
kernel does. Also provides the necessary operators to operate on them.
This will be necessary in the future for implementing
svcSetMemoryAttribute, as memory block state is checked before applying
the attribute.
The Process object kept itself alive indefinitely because its handle_table
contains a SharedMemory object which owns a reference to the same Process object,
creating a circular ownership scenario.
Break that up by storing only a non-owning pointer in the SharedMemory object.
fmt::format() returns a std::string instance by value, so calling
.c_str() on it here is equivalent to doing:
auto* ptr = std::string{}.c_str();
The data being pointed to isn't guaranteed to actually be valid anymore
after that expression ends. Instead, we can just take the string as is,
and provide the necessary formatting parameters.
Based off RE, the backing code only ever seems to use 0-2 as the range
of values 1 being a generic log enable, with 2 indicating logging should
go to the SD card. These are used as a set of flags internally.
Given we only care about receiving the log in general, we can just
always signify that we want logging in general.
This was causing some games (most notably Pokemon Quest) to softlock due to an event being fired when not supposed to. This also removes a hack wherein we were firing the state changed event when the game retrieves it, which is incorrect.
Amends it with missing values deduced from RE (ProperSystem being from
SwitchBrew for naming)
(SdCardUser wasn't that difficult to discern given it's used alongside
SdCardSystem when creating the save data indexer, based off the usage of
the string "saveDataIxrDbSd" nearby).
Original reason:
As Windows multi-byte character codec is unspecified while we always assume std::string uses UTF-8 in our code base, this can output gibberish when the string contains non-ASCII characters. ::OutputDebugStringW combined with Common::UTF8ToUTF16W is preferred here.
This was only ever public so that code could check whether or not a
handle was valid or not. Instead of exposing the object directly and
allowing external code to potentially mess with the map contents, we
just provide a member function that allows checking whether or not a
handle is valid.
This makes all member variables of the VMManager class private except
for the page table.
These auto-deduce the result based off its arguments, so there's no need
to do that work for the compiler, plus, the function return value itself
already indicates what we're returning.
Similarly, here we can avoid doing unnecessary work twice by retrieving
the file type only once and comparing it against relevant operands,
avoiding potential unnecessary object construction/destruction.
While GetFileType() is indeed a getter function, that doesn't mean it's
a trivial function, given some case require reading from the data or
constructing other objects in the background. Instead, only do necessary
work once.
No implementations actually modify instance state (and it would be
questionable to do that in the first place given the name), so we can
make this a const member function.
Greatly simplifies the current input UI, while still allowing power users to tweak advanced settings. Adds 'input profiles', which are easy autoconfigurations to make getting started easy and fast. Also has a custom option which brings up the current, full UI.
This allows the array to be constexpr. std::function is also allowed to
allocate memory, which makes its constructor non-trivial, we definitely
don't want to have all of these execute at runtime, taking up time
before the application can actually load.
While partially correct, this service call allows the retrieved event to
be null, as it also uses the same handle to check if it was referring to
a Process instance. The previous two changes put the necessary machinery
in place to allow for this, so we can simply call those member functions
here and be done with it.
Process instances can be waited upon for state changes. This is also
utilized by svcResetSignal, which will be modified in an upcoming
change. This simply puts all of the WaitObject related machinery in
place.
svcResetSignal relies on the event instance to have already been
signaled before attempting to reset it. If this isn't the case, then an
error code has to be returned.
In some constexpr functions, msvc is building the LUT at runtime
(pushing each element onto the stack) out of an abundance of caution. Moving the
arrays into be file-scoped constexpr's avoids this and turns the functions into
simple look-ups as intended.
This function simply does a handle table lookup for a writable event
instance identified by the given handle value. If a writable event
cannot be found for the given handle, then an invalid handle error is
returned. If a writable event is found, then it simply signals the
event, as one would expect.
svcCreateEvent operates by creating both a readable and writable event
and then attempts to add both to the current process' handle table.
If adding either of the events to the handle table fails, then the
relevant error from the handle table is returned.
If adding the readable event after the writable event to the table
fails, then the writable event is removed from the handle table and the
relevant error from the handle table is returned.
Note that since we do not currently test resource limits, we don't check
the resource limit table yet.
Two kernel object should absolutely never have the same handle ID type.
This can cause incorrect behavior when it comes to retrieving object
types from the handle table. In this case it allows converting a
WritableEvent into a ReadableEvent and vice-versa, which is undefined
behavior, since the object types are not the same.
This also corrects ClearEvent() to check both kernel types like the
kernel itself does.
Previously, ILibraryAppletAccessor would signal upon creation of any applet, but this is incorrect. A flag inside of the applet code determines whether or not creation should signal state change and swkbd happens to be one of these applets.
Load() is already given the process instance as a parameter, so instead
of coupling the class to the System class, we can just forward that
parameter to LoadNro()
These slots are only ever attached to event handling mechanisms within
the class itself, they're never used externally. Because of this, we can
make the functions private.
This also removes redundant usages of the private access specifier.
The previous code could potentially be a compilation issue waiting to
occur, given we forward declare the type for a std::unique_ptr. If the
complete definition of the forward declared type isn't visible in a
translation unit that the class is used in, then it would fail to
compile.
Defaulting the destructor in a cpp file ensures the std::unique_ptr's
destructor is only invoked where its complete type is known.