urbanists.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a server for people who like bikes, transit, and walkable cities. Let's get to know each other!

Server stats:

560
active users

#cuda

5 posts5 participants0 posts today

💻 FreeBSD CUDA drm-61-kmod 💻

"Just going to test the current pkg driver, this will only take a second...", the old refrain goes. Surely, it will not punt away an hour or so of messing about in loader.conf on this EPYC system...

- Here are some notes to back-track a botched/crashing driver kernel panic situation.
- Standard stuff, nothing new over the years here with loader prompt.
- A few directives are specific to this system, though may provide a useful general reference.
- The server has an integrated GPU in addition to nvidia pcie, so a module blacklist for the "amdgpu" driver is necessary (EPYC 4564P).

Step 1: during boot-up, "exit to loader prompt"
Step 2: set/unset the values as needed at the loader prompt

unset nvidia_load
unset nvidia_modeset_load
unset hw.nvidiadrm.modeset
set module_blacklist=amdgpu,nvidia,nvidia_modeset
set machdep.hyperthreading_intr_allowed=0
set verbose_loading=YES
set boot_verbose=YES
set acpi_dsdt_load=YES
set audit_event_load=YES
kern.consmsgbuf_size=1048576
set loader_menu_title=waffenschwester
boot

Step 3: login to standard tty shell
Step 4: edit /boot/loader.conf (and maybe .local)
Step 5: edit /etc/rc.conf (and maybe .local)
Step 6: debug the vast output from kern.consmsgbuf logs

Been fighting the whole day trying to get ROCm to play nice with 780M and PyTorch. Using latest #rocm and my laptop just freezes with gfx1103 and using HSA override to 11.0.0 and with 10.3.0 :blobcatknife:

#amd really needs to fix this crap for their GPUs. Using Docker and their provided ROCm images. I know, 780M is not supported. But c’mon, ALL Nvidia cards can run #CUDA just fine. #rant

Just got my RSS reader YOShInOn building with uv and running under WSL2 with the Cuda libraries, despite a slight version mismatch... All I gotta do is switch it from arangodb (terrible license) to postgres, and it might have a future... With sentence_transformers running under WSL2 I might even be able to deduplicate the million images in my Fraxinus image sorter

Continued thread

Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <doi.org/10.1002/cpe.8313>

This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <doi.org/10.1016/j.jcp.2022.111>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is

a. too much effort
b. probably not worth it.

Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.

6/

AMD YOLO: because why not base your entire #business #strategy on a meme? 🚀🎉 Thanks to AMD's cultural enlightenment, they're now #shipping #boxes faster than philosophical musings on singularity! 🤯 Who knew rewriting a stack could be as easy as beating #NVIDIA at their own game? Just don't tell CUDA—it might get jealous! 😜
geohot.github.io//blog/jekyll/ #AMD #YOLO #meme #CUDA #competition #HackerNews #ngated

the singularity is nearer · AMD YOLOAMD is sending us the two MI300X boxes we asked for. They are in the mail.

Hot Aisle's 8x AMD #MI300X server is the fastest computer I've ever tested in #FluidX3D #CFD, achieving a peak #LBM performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. 🖖🤯
The #RTX 5090 looks like a toy in comparison.

MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in #GPGPU: #CUDA is not the performance leader anymore. 🖖😛
You need a cross-vendor language like #OpenCL to leverage its power.

FluidX3D on #GitHub: github.com/ProjectPhysX/FluidX

The recording of the February 20th, 2025 #bhyve Production User Call is up:

youtu.be/Kb1muRQvsrs

We discussed two lab successes, bhyve.org, LibVirt updates, the VirtManager update, hypervisor "anti-detection", an old VirtIO bug, FreeBSD 14.3 goals and wishlist items, #CUDA ON FREEBSD, Nuttx, more GPU Pass-Through, and more!

"Don't forget to slam those Like and Subscribe buttons."