urbanists.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a server for people who like bikes, transit, and walkable cities. Let's get to know each other!

Server stats:

525
active users

#fluidx3d

0 posts0 participants0 posts today

I made this #FluidX3D #CFD simulation run on a frankenstein zoo of 🟥AMD + 🟩Nvidia + 🟦Intel #GPU​s! 🖖🤪
youtube.com/watch?v=_8Ed8ET9gB

The ultimate SLI abomination setup:
- 1x Nvidia A100 40GB
- 1x Nvidia Tesla P100 16GB
- 2x Nvidia A2 15GB
- 3x AMD Instinct MI50
- 1x Intel Arc A770 16GB

I split the 2.5B cells in 9 domains of 15GB - A100 takes 2 domains, the other GPUs 1 domain each. The GPUs communicate over PCIe via #OpenCL.

Huge thanks to Tobias Ribizel from TUM for the hardware!

I got access to @LRZ_DE's new coma-cluster for #OpenCL benchmarking and experimentation 🖖😋💻🥨🍻
I've added a ton of new #FluidX3D #CFD #GPU​/​#CPU benchmarks:
github.com/ProjectPhysX/FluidX

Notable hardware configurations include:
- 4x H100 NVL 94GB
- 2x Nvidia L40S 48GB
- 2x Nvidia A2 15GB datacenter toaster
- 2x Intel Arc A770 16GB
- AMD+Nvidia SLI abomination consisting of 3x Instinct MI50 32GB + 1x A100 40GB
- AMD Radeon 8060S (chonky Ryzen AI Max+ 395 iGPU with quad-channel RAM) thanks to @cheese

GitHubGitHub - ProjectPhysX/FluidX3D: The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use. - ProjectPhysX/FluidX3D

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU #multithreading. 🖖😋
Horizontal sum in #OpenCL was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric #raytracing!
github.com/ProjectPhysX/FluidX

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. 🖖🤯
The #RTX 5090 looks like a toy in comparison.

MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in #GPGPU: #CUDA is not the performance leader anymore. 🖖😛
You need a cross-vendor language like #OpenCL to leverage its power.

FluidX3D on #GitHub: github.com/ProjectPhysX/FluidX

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute capability, fixed broken voxelization on some #GPU​s and added a workaround for a CPU compiler bug that corrupted rendering. Also #AMD GPUs will now show up with their correct name (no idea why AMD can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...)
Have fun! 🖖😉
github.com/ProjectPhysX/FluidX

GitHubRelease FluidX3D v3.1 (more bug fixes) · ProjectPhysX/FluidX3DThank you for using FluidX3D! Update v3.1 brings two critical bug fixes/workarounds and various small improvements under the hood: Improvements faster enqueueReadBuffer() on modern CPUs with 64-B...