Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better GPU memory monitoring, and works with more GPUs #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

troelsarvin
Copy link

These changes have been running in several setups for a number of weeks, involving several different NVIDIA cards, including a very primitive one which can only report GPU temperature and GPU clock frequency.
The pull request does not contain an updated MKP; I can add an updated MKP, if needed.

Troels Arvin added 2 commits November 22, 2021 15:25
The code now has fewer assumptions about the data which nvidia_smi
can deliver. On some GPUs (probably some cheap models) it many
parameters may have a value of "N/A", and those are now ignored.

Also, the GPU memory is better read from the gpu_fb_memory_usage_used
value, at least on the GPUs I've tested. So the number of memory
related performance data has been expanded. Maybe the memory_util
monitoring point should be removed, unless we know for sure that
it is really useful with some GPU/driver combinations?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant