1
0
Fork 0
Commit Graph

93 Commits

Author SHA1 Message Date
Wunk e13735b624
video_core: Implement an arm64 shader-jit backend (#7002)
* externals: Add oaksim submodule

Used for emitting ARM64 assembly

* common: Implement aarch64 ABI

Utilize oaknut to implement a stack frame.

* tests: Allow shader-jit tests for x64 and a64

Run the shader-jit tests for both x86_64 and arm64 targets

* video_core: Initialize arm64 shader-jit backend

Passes all current unit tests!

* shader_jit_a64: protect/unprotect memory when jit-ing

Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.

* shader_jit_a64: Fix ARM64-Imm overflow

These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.

* shader_jit_a64: Fix Geometry shader conditional

* shader_jit_a64: Replace `ADRL` with `MOVP2R`

Fixes some immediate-generation exceptions.

* common/aarch64: Fix CallFarFunction

* shader_jit_a64: Optimize `SantitizedMul`

Co-authored-by: merryhime <merryhime@users.noreply.github.com>

* shader_jit_a64: Fix address register offset behavior

Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.

* shader_jit_a64: Fix `RET` address offset

A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.

* shader_jit_a64: Increase max program size

Tuned for A64 program size.

* shader_jit_a64: Use `UBFX` for extracting loop-state

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `CMP+B` to `CBNZ`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Use `FMOV` for `ONE` vector

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove x86-specific documentation

* shader_jit_a64: Use `UBFX` to extract exponent

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check

Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:

No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f

* shader_jit/tests: Add Dest Mask unit test

* shader_jit_a64: Fix Dest-Mask `BSL` operand order

Passes the dest-mask unit tests now.

* shader_jit_a64: Use `MOVI` for DestEnable mask

Accelerate certain cases of masking with MOVI as well

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests: Add source-swizzle unit test

This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.

* shader_jit_a64: Fix swizzle index generation

This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.

* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`

Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly.  This enabled the ability
emit assembly that is not directly supported by nihstro.

* shader_jit/tests: Add `CALL` unit-test

Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.

EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.

* shader_jit_a64: Fix nested `BL` subroutines

`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.

Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.

* shader_jit/tests: Allocate generated tests on heap

Each of these generated shader-test objects were causing the stack to
overflow.  Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.

* shader_jit_a64: Preserve `lr` register from external function calls

`EMIT` makes an external function call, and should be preserving `lr`

* shader_jit/tests: Add `MAD` unit-test

The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68

Instead, the program code is manually configured and added.

* shader_jit/tests: Fix uninitialized instructions

These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.

* shader_jit_a64: Remove unneeded `MOV`

Residue from the direct-port of x64 code.

* shader_jit_a64: Use `std::array` for `instr_table`

Add some type-safety and const-correctness around this type as well.

* shader_jit_a64: Avoid c-style offset casting

Add some more const-correctness to this function as well.

* video_core: Add arch preprocessor comments

* common/aarch64: Use X16 as the veneer register

https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard

* shader_jit/tests: Add uniform reading unit-test

Particularly to ensure that addresses are being properly truncated

* common/aarch64: Use `X0` as `ABI_RETURN`

`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.

* common/aarch64: Add veneer register note

`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.

* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`

* shader_jit_a64: Fix CALLU condition

Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!

---------

Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
2023-11-05 21:40:31 +01:00
Steveice10 27bad3a699
audio_core: Replace AAC decoders with single FAAD2-based decoder. (#7098) 2023-11-04 14:56:13 -07:00
Castor215 89d5d4a2b6
externals: allow user to use system cubeb (#7107) 2023-11-02 17:33:40 -07:00
Castor215 8d811913a5
externals: allow user to use system cryptopp (#7105) 2023-11-01 17:57:10 -07:00
Castor215 d3ce43782d
externals: allow users to use system libenet (#7100) 2023-10-31 14:01:50 -07:00
Castor215 4ac10c4a9d
externals: allow users to use system Zstandard (#7083) 2023-10-21 16:10:02 -07:00
Castor215 2416258117
externals: add overarching USE_SYSTEM_LIBS variable (#7078) 2023-10-20 17:02:20 -07:00
Steveice10 1caf569f16
externals: Update cryptopp-cmake and cryptopp. (#7041) 2023-10-17 11:03:03 -07:00
Castor215 2d83fff581
externals: allow user to use system glslang (#7075) 2023-10-17 11:02:50 -07:00
Steveice10 e49b3c75bd
externals: Make system dynamic library headers flags instead of auto-detect. (#7065) 2023-10-16 19:32:58 -07:00
Castor215 956b0868fd
externals: allow user to use system inih (#7073) 2023-10-16 19:31:56 -07:00
Castor215 3d55270de6
externals: allow users to use system xbyak (#7068) 2023-10-13 15:03:50 -07:00
Castor215 775a25b492
externals: allow system cpp-cttplib to be used with both meson and cpp-httplib builds (#7062)
Co-authored-by: Violet Purcell <vimproved@inventati.org>
2023-10-11 14:43:36 -07:00
PabloMK7 897d1fa957
Implement more HTTP:C functionality (#7035)
* Implement missing http:c functionality.

* More implementation details and cleanup.

* Organize code

* Disable treat errors as warnings for httplib

* Fix defines

* Remove pragmas that do nothing and mark as SYSTEM

* Make httplib system

* Try to fix issue from httplib

* Apply suggestions

* Fix header ordering

* Fix compilation issue

* Create and use ctx.CommandID()

* Add and use Common::TruncateString

* Apply more suggestions

* Apply suggestions

* Fix compilation

* Apply suggestions

* Fix format

* Revert SplitURL to previous version

* Apply suggestions
2023-10-11 10:09:16 -07:00
Steveice10 6244f9e3fd
ci: Support Android x86_64 and optimize build caching. (#7045)
* android: Support x86_64 devices.

* ci: Improve ccache hits and stats.

* ci: Compress Android artifacts.

* ci: Re-enable PCH and set ccache sloppiness appropriately.
2023-10-08 23:56:01 -07:00
Castor215 f5b8888686
externals: allow user to use system fmt (#7052) 2023-10-07 16:00:02 -07:00
Castor215 492aa3cb10
externals: allow user to use system dynarmic (#7044) 2023-10-06 21:49:56 -07:00
Castor215 7931aac3b7
externals: require cpp-httplib >= 0.14.1 (#7043) 2023-10-05 16:41:07 -07:00
Castor215 483e877001
externals: allow users to use system JSON headers (nlohmann-json3) (#7042) 2023-10-04 14:32:43 -07:00
Castor215 0ce956ba00
externals: allow users to use system cpp-httplib (#7034) 2023-10-04 15:41:13 +02:00
Castor215 b28ade1ee8
externals: mark cpp-jwt headers as SYSTEM (#7033) 2023-10-03 01:19:18 -07:00
Castor215 38f310f716
externals: allow users to use system cpp-jwt libraries (#6976) 2023-09-28 16:31:14 -07:00
SachinVin 8aee625a14
externals: Add option to use system SoundTouch (#6971) 2023-09-16 14:46:32 -07:00
GPUCode dfa2fd0e0d
Add vulkan backend (#6512)
* code: Prepare frontend for vulkan support

* citra_qt: Add vulkan options to the GUI

* vk_instance: Collect tooling info

* renderer_vulkan: Add vulkan backend

* qt: Fix fullscreen and resize issues on macOS. (#47)

* qt: Fix bugged macOS full screen transition.

* renderer/vulkan: Fix swapchain recreation destroying in-use semaphore.

* renderer/vulkan: Make gl_Position invariant. (#48)

This fixes an issue with black artifacts in Pokemon games on Apple GPUs.
If the vertex calculations differ slightly between render passes, it can
cause parts of model faces to fail depth test.

* vk_renderpass_cache: Bump pixel format count

* android: Custom driver code

* vk_instance: Set moltenvk configuration

* rasterizer_cache: Proper surface unregister

* citra_qt: Fix invalid characters

* vk_rasterizer: Correct special unbind

* android: Allow async presentation toggle

* vk_graphics_pipeline: Fix async shader compilation

* We were actually waiting for the pipelines regardless of the setting, oops

* vk_rasterizer: More robust attribute loading

* android: Move PollEvents to OpenGL window

* Vulkan does not need this and it causes problems

* vk_instance: Enable robust buffer access

* Improves stability on mali devices

* vk_renderpass_cache: Bring back renderpass flushing

* externals: Update vulkan-headers

* gl_rasterizer: Separable shaders for everyone

* vk_blit_helper: Corect depth to color convertion

* renderer_vulkan: Implement reinterpretation with copy

* Allows reinterpreteration with simply copy on AMD

* vk_graphics_pipeline: Only fast compile if no shaders are pending

* With this shaders weren't being compiled in parallel

* vk_swapchain: Ensure vsync doesn't lock framerate

* vk_present_window: Match guest swapchain size to vulkan image count

* Less latency and fixes crashes that were caused by images being deleted before free

* vk_instance: Blacklist VK_EXT_pipeline_creation_cache_control with nvidia gpus

* Resolves crashes when async shader compilation is enabled

* vk_rasterizer: Bump async threshold to 6

* Many games have fullscreen quads with 6 vertices. Fixes pokemon textures missing with async shaders

* android: More robust surface recreation

* renderer_vulkan: Fix dynamic state being lost

* vk_pipeline_cache: Skip cache save when no pipeline cache exists

* This is the cache when loading a save state

* sdl: Fix surface initialization on macOS. (#49)

* sdl: Fix surface initialization on macOS.

* sdl: Fix render window events not being handled under Vulkan.

* renderer/vulkan: Fix binding/unbinding of shadow rendering buffer.

* vk_stream_buffer: Respect non coherent access alignment

* Required by nvidia GPUs on MacOS

* renderer/vulkan: Support VK_EXT_fragment_shader_interlock for shadow rendering. (#51)

* renderer_vulkan: Port some recent shader fixes

* vk_pipeline_cache: Improve shadow detection

* vk_swapchain: Add missing check

* renderer_vulkan: Fix hybrid screen

* Revert "gl_rasterizer: Separable shaders for everyone"

Causes crashes on mali GPUs, will need separate PR

This reverts commit d22d556d30ff641b62dfece85738c96b7fbf7061.

* renderer_vulkan: Fix flipped screenshot

---------

Co-authored-by: Steveice10 <1269164+Steveice10@users.noreply.github.com>
2023-09-13 01:28:50 +03:00
Steveice10 66404a669f
build: Fixes for a few minor issues (#6886) 2023-08-14 09:47:17 -07:00
Steveice10 6d0cd5b00e
build: Expose ENABLE_SCRIPTING and ENABLE_WEB_SERVICE flags as public. (#6872) 2023-08-07 03:12:49 -07:00
GPUCode 0048e61fc7
Fix compilation without ENABLE_WEB_SERVICE (#6856) 2023-08-06 12:23:53 -07:00
liushuyu 7e6a761f07
cmake: fix USE_SYSTEM_BOOST behavior ... (#6837) 2023-08-02 12:20:35 -07:00
Steveice10 13a8969824
build: Clear out remaining compile warnings. (#6662) 2023-07-04 21:00:24 -07:00
Steveice10 2d6aca4563
build: Rework CI and move all bundling into new build target. (#6556)
* build: Rework CI and move all bundling into new build target.

* ci: Use "mingw" in msys2 release names for compatibility.

* ci: Use "osx" in macOS release names for compatibility.

* ci: Disable macOS upload.

Will be moved to a separate PR for canary merge.
2023-06-26 17:42:00 -07:00
SachinVin c66594caf8 Enable warnings as errors
cpp-jwt: supress OpenSSL deprecation warnings
2023-06-17 21:23:58 +05:30
Steveice10 38435e9b3e
Dynamically load FFmpeg and libfdk-aac if available. (#6570) 2023-06-17 02:06:18 +03:00
Steveice10 238a574645
qt: Add support for building for iOS. (#6594) 2023-06-07 20:40:53 -07:00
Steveice10 54c499ed5b
Prepare for Vulkan backend (#6595)
* externals: Add libraries required for vulkan

* build: Add support for downloading bundled MoltenVK.

* ci: Install tools needed for Vulkan.

* citra_qt: Add API status indicator

---------

Co-authored-by: GPUCode <geoster3d@gmail.com>
2023-06-05 07:29:05 -07:00
Steveice10 52f88f8fb4
chore: Fix GCC 13 compilation and SoundTouch libraries being installed. (#6593) 2023-06-02 23:11:17 -07:00
SachinVin 41f13456c0
Chore: Enable warnings as errors on MSVC (#6456)
* tests: add Sanity test for SplitFilename83

fix test

fix test

* disable `C4715:not all control paths return a value` for nihstro includes

nihstro: no warn

* Chore: Enable warnings as errors on msvc + fix warnings

fixes

some more warnings

clang-format

* more fixes

* Externals: Add target_compile_options `/W0` nihstro-headers and ...

Revert "disable `C4715:not all control paths return a value` for nihstro includes"
This reverts commit 606d79b55d3044b744fb835025b8eb0f4ea5b757.

* src\citra\config.cpp: ReadSetting: simplify type casting

* settings.cpp: Get*Name: remove superflous logs
2023-05-01 22:38:58 +03:00
Steveice10 055a58f01e
audio_core: Implement OpenAL backend (#6450) 2023-05-01 21:17:45 +02:00
GPUCode 06f3c90cfb
Custom textures rewrite (#6452)
* common: Add thread pool from yuzu

* Is really useful for asynchronous operations like shader compilation and custom textures, will be used in following PRs

* core: Improve ImageInterface

* Provide a default implementation so frontends don't have to duplicate code registering the lodepng version

* Add a dds version too which we will use in the next commit

* rasterizer_cache: Rewrite custom textures

* There's just too much to talk about here, look at the PR description for more details

* rasterizer_cache: Implement basic pack configuration file

* custom_tex_manager: Flip dumped textures

* custom_tex_manager: Optimize custom texture hashing

* If no convertions are needed then we can hash the decoded data directly removing the needed for duplicate decode

* custom_tex_manager: Implement asynchronous texture loading

* The file loading and decoding is offloaded into worker threads, while the upload itself still occurs in the main thread to avoid having to manage shared contexts

* Address review comments

* custom_tex_manager: Introduce custom material support

* video_core: Move custom textures to separate directory

* Also split the files to make the code cleaner

* gl_texture_runtime: Generate mipmaps for material

* custom_tex_manager: Prevent memory overflow when preloading

* externals: Add dds-ktx as submodule

* string_util: Return vector from SplitString

* No code benefits from passing it as an argument

* custom_textures: Use json config file

* gl_rasterizer: Only bind material for unit 0

* Address review comments
2023-04-27 07:38:28 +03:00
Steveice10 d16dce6d99
externals: Update SoundTouch to upstream. (#6451) 2023-04-26 00:25:02 +02:00
SachinVin 9c81dc0dd8
externals\CMakeLists.txt: dynarmic, fmt, xbyak: add `EXCLUDE_FROM_ALL` property (#6398) 2023-04-06 14:31:28 +02:00
hank121314 8d563d37b4
citra_android: Storage Access Framework implementation (#6313) 2023-03-23 14:30:52 +01:00
Steveice10 8b116aaa04
externals: Fix mismatched CryptoPP definitions between compile time and header use. (#6314) 2023-02-25 12:58:38 +02:00
Steveice10 3a6a17c708
externals: Bundle cryptopp as submodule. (#6272)
fix https://github.com/citra-emu/citra/issues/6271
2023-02-02 16:26:21 +01:00
Steveice10 a298e4969b
externals: Switch to newer cryptopp-cmake. (#6242) 2023-01-15 21:45:42 +05:30
Steveice10 a8848cce43 build: Update to support multi-arch builds. 2023-01-07 01:09:32 -08:00
SachinVin 0e325255f3 externals: point to upstream dynarmic 2023-01-06 06:41:51 -08:00
SachinVin 21fe65c29c externals: bump xbyak to v6.68 2023-01-06 06:41:51 -08:00
SachinVin fbe06234b1 Core: Port Exclusive memory impl from yuzu
core\arm\dynarmic\arm_dynarmic.cpp: fix build

core\arm\dynarmic\arm_dynarmic.cpp: Fixes

CPP 20
2022-10-23 13:19:33 +05:30
SachinVin 98d3b9c776 externals\CMakeLists.txt: add fmt before dynarmic 2022-10-23 13:19:32 +05:30
SachinVin 4a590d1fcb xbyak: Correct xbyak include directory
xbyak is intended to be installed in /usr/local/include/xbyak.
Since we desire not to install xbyak before using it, we copy the headers
to the appropriate directory structure and use that instead
Co-authored-by: merry <git@mary.rs>
2022-10-23 13:19:32 +05:30