Urbanists.Social Admins @admins

0 posts0 participants0 posts today

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Apr 29 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

My #IWOCL 2025 Keynote presentation is online!
Scaling up #FluidX3D #CFD beyond 100 Billion cells on a single computer - a story about the true cross-compatibility of #OpenCL
https://www.youtube.com/watch?v=Sb3ibfoOi0c&list=PLA-vfTt7YHI2HEFrpzPhhQ8PhiztKhHU8&index=1
Slides: https://www.iwocl.org/wp-content/uploads/iwocl-2025-moritz-lehmann-keynote.pdf

YouTubeScaling Up FluidX3D CFD Beyond 100 Billion CellsBy IWOCL

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Apr 10

Apr 10

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

What an honor to start the #IWOCL conference with my keynote talk! Nowhere else you get to talk to so many #OpenCL and #SYCL experts in one room! I shared some updates on my #FluidX3D #CFD solver, how I optimized it at the smallest level of a single grid cell, to scale it up on the largest #Intel #Xeon6 #HPC systems that provide more memory capacity than any #GPU server.

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Mar 25 *

Mar 25 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

I made this #FluidX3D #CFD simulation run on a frankenstein zoo of AMD + Nvidia + Intel #GPUs!
https://www.youtube.com/watch?v=_8Ed8ET9gBU

The ultimate SLI abomination setup:
- 1x Nvidia A100 40GB
- 1x Nvidia Tesla P100 16GB
- 2x Nvidia A2 15GB
- 3x AMD Instinct MI50
- 1x Intel Arc A770 16GB

I split the 2.5B cells in 9 domains of 15GB - A100 takes 2 domains, the other GPUs 1 domain each. The GPUs communicate over PCIe via #OpenCL.

Huge thanks to Tobias Ribizel from TUM for the hardware!

YouTubeFluidX3D running AMD + Nvidia + Intel GPUs in "SLI" to pool together 132GB VRAMBy Dr. Moritz Lehmann

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Mar 22 *

Mar 22 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

I got access to @LRZ_DE's new coma-cluster for #OpenCL benchmarking and experimentation
I've added a ton of new #FluidX3D #CFD #GPU/#CPU benchmarks:
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#single-gpucpu-benchmarks

Notable hardware configurations include:
- 4x H100 NVL 94GB
- 2x Nvidia L40S 48GB
- 2x Nvidia A2 15GB datacenter toaster
- 2x Intel Arc A770 16GB
- AMD+Nvidia SLI abomination consisting of 3x Instinct MI50 32GB + 1x A100 40GB
- AMD Radeon 8060S (chonky Ryzen AI Max+ 395 iGPU with quad-channel RAM) thanks to @cheese

GitHubGitHub - ProjectPhysX/FluidX3D: The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use. - ProjectPhysX/FluidX3D

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Mar 9

Mar 9

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

#FluidX3D #CFD v3.2 is out! I've implemented the much requested #GPU summation for object force/torque; it's ~20x faster than #CPU #multithreading.
Horizontal sum in #OpenCL was a nice exercise - first local memory reduction and then hardware-supported atomic floating-point add in VRAM, in a single-stage kernel. Hammering atomics isn't too bad as each of the ~10-340 workgroups dispatched at a time does only a single atomic add.
Also improved volumetric #raytracing!
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2

FluidX3D simulation of the X-wing with velocity raytracing visualization

FluidX3D simulation of the X-wing with density raytracing visualization

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Mar 3 *

Mar 3 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s.
The #RTX 5090 looks like a toy in comparison.

MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in #GPGPU: #CUDA is not the performance leader anymore.
You need a cross-vendor language like #OpenCL to leverage its power.

FluidX3D on #GitHub: https://github.com/ProjectPhysX/FluidX3D

FluidX3D benchmarks: the 8x AMD MI300X system leaves every other benchmarked computer behind in the dust.

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Feb 23

Feb 23

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

I'm doing a podcast about #FluidX3D today with Improbable Matter, going live in 30 minutes!
https://youtu.be/csGLVZqr0SE

YouTubeThe FluidX3D code with Moritz LehmannBy Improbable Matter

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Feb 23

Feb 23

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

The 4x #Nvidia #H100 SXM5 server in the new Festus cluster at Uni Bayreuth is the fastest system I've ever tested in #FluidX3D #CFD, achieving 78 GLUPs/s #LBM performance at ~1650W #GPU power draw.
https://github.com/ProjectPhysX/FluidX3D?tab=readme-ov-file#multi-gpu-benchmarks
https://www.hpc.uni-bayreuth.de/clusters/festus/#__tabbed_1_3

FluidX3D benchmark running on 4x H100 SXM5 GPUs

GPU load during FluidX3D benchmark shown in nvidia-smi

CPU/GPU load during FluidX3D benchmark shown in my own monitoring application

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Feb 8 *

Feb 8 *

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

#FluidX3D #CFD v3.1 is out! I have updated the #OpenCL headers for better device specs detection via device ID and #Nvidia compute capability, fixed broken voxelization on some #GPUs and added a workaround for a CPU compiler bug that corrupted rendering. Also #AMD GPUs will now show up with their correct name (no idea why AMD can't report it as CL_DEVICE_NAME like every other sane vendor and instead need CL_DEVICE_BOARD_NAME_AMD extension...)
Have fun!
https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1

GitHubRelease FluidX3D v3.1 (more bug fixes) · ProjectPhysX/FluidX3DThank you for using FluidX3D! Update v3.1 brings two critical bug fixes/workarounds and various small improvements under the hood: Improvements faster enqueueReadBuffer() on modern CPUs with 64-B...

Recent searches

Search options

Administered by:

Server stats:

#fluidx3d