Running Binary nVidia Drivers under Xen Host

Submitted by jbreland on Sun, 06/22/2008 - 01:50

Simon (not verified)

Sat, 07/12/2008 - 17:37

Hi,

A question:
Why do I need to install the non-Xen kernel? Is this only to be able to properly install the nvidia driver using it's setup-script?
Im using openSuSE 10 x64 with a almost recent kernel (2.6.25.4) and currently without xen.

According to you writing the nvidia-driver/xen support each other (and compile fine under xen). My last state was that this setup is only possible for an old patched nvidia driver (with several performance and stability problems).

Thanks ahead!

PS: sorry for my bad english

- Simon

There are two parts to the binary driver package:

  • the driver itself (the kernel module - nvidia.ko)
  • the various libraries needed to make things work

While the kernel module will indeed build against the Xen kernel (provided the appropriate CLI options are used, as discussed above), I was unable to get the necessary libraries installed using the Xen kernel. It might be possible to do this, but I don't know how. For me, it was easier to let my package manager (Portage, for Gentoo) install the package. This would only install when I'm using the non-Xen kernel. After that was installed, I could then switch back to the Xen kernel and manually build/install the kernel module.

Of course, as I mentioned above, this was done on a Gentoo system. Other distributions behave differently, and I'm not sure what may be involved in getting the binary drivers setup correctly on them. If you have any luck, though, please consider posting your results here for the benefits of others.

Good luck.

--
http://www.legroom.net/

I have it working on CentOS 5.2 with a Xen kernel as well, thanks to this I have TwinView available again:

[root@mythtv ~]# dmesg | grep NVRM
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 100.14.19 Wed Sep 12 14:08:38 PDT 2007
NVRM: builtin PAT support disabled, falling back to MTRRs.
NVRM: bad caching on address 0xffff880053898000: actual 0x77 != expected 0x73
NVRM: please see the README section on Cache Aliasing for more information
NVRM: bad caching on address 0xffff880053899000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff88005389a000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff88005389b000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff88005389c000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff88005389d000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff88005389e000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff88005389f000: actual 0x77 != expected 0x73
NVRM: bad caching on address 0xffff8800472f4000: actual 0x67 != expected 0x63
NVRM: bad caching on address 0xffff880045125000: actual 0x67 != expected 0x63
[root@mythtv ~]# uname -r
2.6.18-92.1.13.el5xen
[root@mythtv ~]#

Now see if I can fix the bad caching errors... and see if I can run a dom host.

Thanks heaps!

Can you explain how you got that to work. I'm still getting a error on the modprobe step.

[root@localhost ~]# modprobe nvidia
nvidia: disagrees about version of symbol struct_module
FATAL: Error inserting nvidia (/lib/modules/2.6.18-92.el5xen/kernel/drivers/video/nvidia.ko): Invalid module format

Any ideas anyone?

I have the following kernel related packages installed and am compiling some older drivers (100.14.19) as they work for my card in non-xen kernels as well:

[root@mythtv ~]# rpm -qa kernel* | grep $(uname -r | sed -e 's/xen//') | sort
kernel-2.6.18-92.1.18.el5
kernel-devel-2.6.18-92.1.18.el5
kernel-headers-2.6.18-92.1.18.el5
kernel-xen-2.6.18-92.1.18.el5
kernel-xen-devel-2.6.18-92.1.18.el5
[root@mythtv ~]#

I am booted into the xen kernel:

[root@mythtv ~]# uname -r
2.6.18-92.1.18.el5xen
[root@mythtv ~]#

I already have my source extracted like explained in the article and navigated to it. Inside the ./usr/src/nv folder of the source tree I issue the following command (from the article as well) which starts compiling:

[root@mythtv nv]# IGNORE_XEN_PRESENCE=y make SYSSRC=/lib/modules/`uname -r`/build module

Above command should start compilation. After compilation I copy the driver to my lib tree:

[root@mythtv nv]# mkdir -p /lib/modules/`uname -r`/kernel/drivers/video/nvidia/
[root@mythtv nv]# cp -i nvidia.ko /lib/modules/`uname -r`/kernel/drivers/video/nvidia/

Then to load the driver:
[root@mythtv ~]# depmod -a
[root@mythtv ~]# modprobe nvidia

To see if it was loaded I issue this command:
[root@mythtv ~]# dmesg | grep NVIDIA

which in my case outputs this:
[root@mythtv ~]# dmesg |grep NVIDIA
nvidia: module license 'NVIDIA' taints kernel.
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 100.14.19 Wed Sep 12 14:08:38 PDT 2007
[root@mythtv ~]#

I do not worry about the tainting of the kernel as it seems to work pretty well for me as well as for this error:

[root@mythtv nv]# dmesg |grep NVRM
NVRM: loading NVIDIA UNIX x86_64 Kernel Module 100.14.19 Wed Sep 12 14:08:38 PDT 2007
NVRM: builtin PAT support disabled, falling back to MTRRs.
[root@mythtv nv]#

Anonymous (not verified)

Sat, 07/12/2008 - 19:18

A really intersting article you created here -- if there were not (I hope) a typo that destroys everything:

The last paragraph reads:
"Assuming all went well, you should not have a fully functional ..."

The word "not" is disturbing me, and I have some hope that it should be a "now", as that would make sense with all your efforts.
Can you please comment on this issue?

Thanks

Jamesttgrays (not verified)

Thu, 10/23/2008 - 22:17

Hm.. strange - I wasn't able to get this to work with the newest version of the Nvidia drivers. it says something along the lines of "will not install to Xen-enabled kernel." Darned Nvidia - serves me right.. I ought've gotten me an ATI card!

kdvr (not verified)

Wed, 11/05/2008 - 05:54

openSUSE 11.0 with linux-2.6.27 (its a fresh install and dont remember exact version of kernel, im under windows), Leadtek Winfast 9600GT

For me it doesn't work. It won't build with: IGNORE_XEN_PRESENCE=y make SYSSRC=/lib/modules/`uname -r`/build module. It says something like this kernel is not supported, the same error as the setup, nothing about xen though.

I need xen for studying purposes on my desktop pc, and running it without drivers is not an option as the cooler is blowing at full speed.

For OpenSuse 11 I got it working doing this
cd /usr/src/linux
make oldconfig && make scripts && make prepare

# Extract the source code from nvidia installer
sh NVIDIA-Linux-whateverversion-pkg2.run -a -x

cd NVIDIA-Linux-whateverversion-pkg2/usr/src/nv/
#build
IGNORE_XEN_PRESENCE=y make SYSSRC=/usr/src/linux module
#should have built a kernel module
cp nvidia.ko /lib/modules/`uname -r`/kernel/drivers/video/
cd /lib/modules/`uname -r`/kernel/drivers/video/
depmod -a
modprobe nvidia

glxinfo is showing direct rendering: yes
So it seems to be working.

Andy (not verified)

Tue, 12/23/2008 - 00:42

I tried different combinations of Red Hat kernels (2.6.18-92.1.22.el5xen-x86_64, 2.6.18-120.el5-xen-x86_64, 2.6.18-92.el5-xen-x86_64) and NVIDIA drivers (177.82, 173.08) but I couldn't get it to run. I succeed in compiling the kernel module but once I start the X server (either with startx or init 5) the screen just turns black and a hard reset is needed.

/var/log/messages contains the lines:


Dec 23 14:23:56 jt8qm1sa kernel: BUG: soft lockup - CPU#0 stuck for 10s! [X:8177]
Dec 23 14:23:56 jt8qm1sa kernel: CPU 0:
Dec 23 14:23:56 jt8qm1sa kernel: Modules linked in: nvidia(PU) ...

I'm giving up now. Any hint anyone?

Andy

Juche jbreland,
first of all thank you for your article, it gave me confidence that it will work some time. But this time is still to come.
I Did everything as you said (exept that I unmerged the old nvidia Driver at the beginning) and Every time I want to start X I get this Error:


(II) Module already built-in
NVIDIA: could not open the device file /dev/nvidia0 (Input/output error).
(EE) NVIDIA(0): Failed to initialize the NVIDIA graphics device PCI:1:0:0.
(EE) NVIDIA(0): Please see the COMMON PROBLEMS section in the README for
(EE) NVIDIA(0): additional information.
(EE) NVIDIA(0): Failed to initialize the NVIDIA graphics device!
(EE) Screen(s) found, but none have a usable configuration.

I am using different Versions than you did:

* sys-kernel/xen-sources: 2.6.21
* app-emulation/xen: 3.3.0
* x11-drivers/nvidia-drivers: 177.82
* sys-kernel/gentoo-sources: 2.6.27-r7

It looks like everything works out fine, lsmod has the nvidia module listed, the File /dev/nvidia0 exists and there is no Problem in Accessing it, I have the same problem if I try to start X as root user.

Do you have any Idea?

Hannes

PS.: could you please post your obsolet xorg.conf configuration of the open Source Driver, that would help me too.

Off the top of my head, no. You installed the new nvidia-drivers package first, right? All of those libraries are needed when loading the module. Did you get any build errors or errors when inserting the new module? Did you try rebooting just to be certain that an older version of the nvidia or nv module was not already loaded or somehow lingering in memory?

As for the xorg.conf file using the nv driver, you can grab it from the link below, but keep in mind that it's not fully functional. It provided me with a basic, unaccelerated, single monitor desktop that was usable, but rather miserable.
xorg.conf.nv

--
http://www.legroom.net/

THX for your reply.
I am still trying. Here my new enlightenments:
I found out, that when I compile the nvidia Driver the regular way it uses this make command:

make -j3 HOSTCC=i686-pc-linux-gnu-gcc CROSS_COMPILE=i686-pc-linux-gnu- LDFLAGS= IGNORE_CC_MISMATCH=yes V=1 SYSSRC=/usr/src/linux SYSOUT=/lib/modules/2.6.21-xen/build HOST_CC=i686-pc-linux-gnu-gcc clean module

This is verry different from your suggestion of only running

IGNORE_XEN_PRESENCE=y make SYSSRC=/lib/modules/`uname -r`/build module

So I tried my long make Command with the Prefix of IGNORE_XEN_PRESENCE=y but this lead to a Build error (some other error then the "This is a XEN Kernel" Error) see below. Then I tired (why did not you take this approach):

IGNORE_XEN_PRESENCE=y emerge -av nvidia-drivers

which was an easier way to produce the same error:

include/asm/fixmap.h:110: error: expected declaration specifiers or '...' before 'maddr_t'

Very Strange, If I just run your short command, compilation runs without any Problem.

Another thing that I found out was that the Log in dmsg is different when loadning the nvidia module under XEN:

nvidia: module license 'NVIDIA' taints kernel.
NVRM: loading NVIDIA UNIX x86 Kernel Module 177.82 Tue Nov 4 13:35:57 PST 2008

or under a regular Kernel:

nvidia: module license 'NVIDIA' taints kernel.
nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
nvidia 0000:01:00.0: setting latency timer to 64
NVRM: loading NVIDIA UNIX x86 Kernel Module 177.82 Tue Nov 4 13:35:57 PST 2008

There are two lines missing maybe important, maybe the reason why the /dev/nvidia0 device was not created.

I will continue trying, if I Find some solution or new enlightenments I will keep you and your fan-club informed. Thanks for your comments,

Hannes

In my last post I mentioned that I recently had a hardware failure that took down my server. I needed to get it back up and running again ASAP, but due to a large number of complications I was unable to get the original hardware up and running again, nor could I get any of the three other systems I had at my disposal to work properly. Seriously, it was like Murphy himself had taken up residence here. In the end, rather desperate and out of options, I turned to Xen (for those unfamiliar with it, it's similar to VMware or Virtual Box, but highly geared towards server0. I'd recently had quite a bit of experience getting Xen running on another system, so I felt it'd be a workable, albeit temporary, solution to my problem.

Unfortunately, the only working system I had suitable for this was my desktop, and while the process of installing and migrating the server to a Xen guest host was successful (this site is currently on that Xen instance) it was not without it's drawbacks. For one thing, there's an obvious performance hit on my desktop while running under Xen concurrently with my server guest, though fortunately my desktop is powerful enough that this mostly isn't an issue (except when the guest accesses my external USB drive to backup files; for some reason that consumes all CPU available for about 2 minutes and kills performance on the host). There were a few other minor issues, but by far the biggest problem was that the binary nVidia drivers would not install under Xen. Yes, the open source 'nv' driver would work, but that had a number of problems/limitations:

  1. dramatically reduced video performance, both in video playback and normal 2d desktop usage
  2. no 3d acceleration whatsoever (remember, this is my desktop system, so I sometimes use it for gaming)
  3. no (working) support for multiple monitors
  4. significantly different xorg.conf configuration

In fairness, issues 1 and 2 are a direct result of nVidia not providing adequate specifications for proper driver development. Nonetheless, I want my hardware to actually work, so the performance was not acceptable. Issue 3 was a major problem as well, as I have two monitors and use both heavily while working. I can only assume that this is due to a bug in the nv driver for the video card I'm using (a GeForce 8800 GTS), as dual monitors should be supported by this driver. It simply wouldn't work, though. Issue 4 wasn't that significant, but it did require quite a bit of time to rework it, which was ultimately pointless anyway due to issue 3.

So, with all that said, I began my quest to get the binary nVidia drivers working under Xen. Some basic searches showed that this was possible, but in every case the referenced material was written for much older versions of Xen, the Linux kernel, and/or the nVidia driver. I tried several different suggestions and patches, but none would work. I actually gave up, but then a few days later I got so fed up with performance that I started looking into it again and trying various different combinations of suggestions. It took a while, but I finally managed hit on the special sequence of commands necessary to get the driver to compile AND load AND run under X. Sadly, the end result is actually quite easy to do once you know what needs to be done, but figuring it out sure was a bitch. So, I wanted to post the details here to hopefully save some other people a lot of time and pain should they be in a similar situation.

This guide was written with the following system specs in mind:

  • Xen 3.2.1
  • Gentoo dom0 host using xen-sources-2.6.21 kernel package
    • a non-Xen kernel must also be installed, such as gentoo-sources-2.6.24-r8
  • GeForce 5xxx series or newer video card using nvidia-drivers-173.14.09 driver package

Version differences shouldn't be too much of an issue; however, a lot of this is Gentoo-specific. If you're running a different distribution, you may be able to modify this technique to suit your needs, but I haven't tested it myself (if you do try and have any success, please leave a comment to let others know what you did). The non-Xen kernel should be typically left over from before you installed Xen on your host; if you don't have anything else installed, however, you can do a simple emerge gentoo-source to install it. You don't need to run it, just build against it.

Once everything is in place, and you're running the Xen-enabled (xen-sources) kernel, I suggest uninstalling any existing binary nVidia drivers with emerge -C nvidia-drivers. I had a version conflict when trying to start X at one point as the result of some old libraries not being properly updated, so this is just to make sure that the system's in a clean state. Also, while you can do most of this while in X while using the nv driver, I suggest logging out of X entirely before the modprobe line.

Here's the step-by-step guide:

  1. Run uname -r to verify the version of your currently running Xen-enabled kernel; eg., mine's 2.6.21-xen
  2. verify that you have both Xen and non-Xen kernels installed: cd /usr/src/ && ls -l
    • eg., I have both linux-2.6.21-xen and linux-2.6.24-gentoo-r8
  3. create a symlink to the non-Xen kernel: ln -sfn linux-2.6.24-gentoo-r8 linux
  4. install the nVidia-drivers package, which includes the necessary X libraries: emerge -av nvidia-drivers
    • this will also install the actual driver, but it'll be built and installed for the non-Xen kernel, not your current Xen-enabled kernel
  5. determine the specific name and version of the nVidia driver package that was just installed; this can be found by examining the output of emerge -f nvidia-drivers (look for the NVIDIA-Linux-* line)
  6. extract the contents of the nVidia driver package: bash /usr/portage/distfiles/NVIDIA-Linux-x86_64-173.14.09-pkg2.run -a -x
  7. change to the driver source code directory: cd NVIDIA-Linux-x86_64-173.14.09-pkg2/usr/src/nv/
  8. build the driver for the currently-running Xen-enabled kernel: IGNORE_XEN_PRESENCE=y make SYSSRC=/lib/modules/`uname -r`/build module
  9. assuming there are no build errors (nvidia.ko should exist), install the driver:
    • mkdir /lib/modules/`uname -r`/video
    • cp -i nvidia.ko /lib/modules/`uname -r`/video/
    • depmod -a
  10. if necessary, log out of X, then load the driver: modprobe nvidia
  11. if necessary, reconfigure xorg.conf to use the nvidia binary driver rather than the nv driver
  12. test that X will now load properly with startx
  13. if appropriate, start (or restart) the display manager with /etc/init.d/xdm start

Assuming all went well, you should now have a fully functional and accelerated desktop environment, even under a Xen dom0 host. W00t. If not, feel free to post a comment and I'll try to help if I can. You should also hit up the Gentoo Forums, where you can get help from people far smarter than I.

I really hope this helps this helps some people out. It was a royal pain in the rear to get this working, but believe me, it makes a world of difference when using the system.