urbanists.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a server for people who like bikes, transit, and walkable cities. Let's get to know each other!

Server stats:

543
active users

#rocm

0 posts0 participants0 posts today

#AMD splits #ROCm toolkit into two parts – ROCm #AMDGPU drivers get their own branch under Instinct #datacenter #GPU moniker
The new #datacenter Instinct driver is a renamed version of the #Linux AMDGPU driver packages that are already distributed and documented with ROCm. Previously, everything related to ROCm (including the amdgpu driver) existed as part of the ROCm software stack.
tomshardware.com/pc-components

Tom's Hardware · AMD splits ROCm toolkit into two parts – ROCm AMDGPU drivers get their own branch under Instinct datacenter GPU monikerBy Aaron Klotz
Continued thread

Then your Docker compose container should have:

image-name:
build:
context: .
devices:
- /dev/dri
- /dev/kfd
group_add:
- video
shm_size: 4G
environment:
- PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512
- PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512
- HSA_OVERRIDE_GFX_VERSION=11.0.0

#rocm#gfx1103#780M

So, good news. ROCm 6.3.4 and PyTorch 2.4.0 seems stable enough with gfx1103 if I use HSA override for 11.0.0, using latest firmware blobs and kernel 6.13.10 on Fedora 41.

In your Dockerfile, build your AI app from:
```
FROM rocm/pytorch:rocm6.3.4_ubuntu24.04_py3.12_pytorch_release_2.4.0
```

Been fighting the whole day trying to get ROCm to play nice with 780M and PyTorch. Using latest #rocm and my laptop just freezes with gfx1103 and using HSA override to 11.0.0 and with 10.3.0 :blobcatknife:

#amd really needs to fix this crap for their GPUs. Using Docker and their provided ROCm images. I know, 780M is not supported. But c’mon, ALL Nvidia cards can run #CUDA just fine. #rant

The B-17 Bomber was amazing and helped win WWII. I flew on one in 2002 as a tourist - I have family members that were ball turret gunners - bad place to be.

This video was shot on Hi-8, and thankfully I digitized it (at 720x480) way back in that day. Now, I've up-scaled it with local AI (1408x954) and the improvement is astounding.

Sadly, this actual B17 crashed in 2019: en.wikipedia.org/wiki/2019_Boe

#localai
#stablediffusion
#rocm
#amd
#b17
#flyingfortress

Continued thread

Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <doi.org/10.1002/cpe.8313>

This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <doi.org/10.1016/j.jcp.2022.111>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is

a. too much effort
b. probably not worth it.

Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.

6/