[Repoze-dev] Fwd: DRAFT: Invitation to a dance
Chris McDonough
chrism at plope.com
Sun Sep 30 13:02:14 UTC 2007
Begin forwarded message:
> From: "Graham Dumpleton" <graham.dumpleton at gmail.com>
> Date: September 30, 2007 6:44:11 AM EDT
> To: "Chris McDonough" <chrism at plope.com>
> Cc: "Philipp von Weitershausen" <philipp at weitershausen.de>,
> "Christian Theune" <ct at gocept.com>, "Martijn Faassen"
> <faassen at startifact.com>, "Jim Fulton" <jim at zope.com>, "Ian
> Bicking" <ianb at colorstudy.com>, "Phillip J. Eby"
> <pje at telecommunity.com>, "Rob Miller" <ra at burningman.com>, "Joel
> Burton" <joel at joelburton.com>, "Martin Aspeli" <optilude at gmx.net>
> Subject: Re: DRAFT: Invitation to a dance
>
> On 30/09/2007, Chris McDonough <chrism at plope.com> wrote:
>>>> as well as which user the process should be
>>>> started as (which is very handy). For http://www.repoze.org/tmp/
>>>> plone, the configuration is:
>>>>
>>>> WSGIPythonExecutable /home/repoze/tmp/site/bin/python
>>>
>>> You are misunderstanding what WSGIPythonExecutable directive in
>>> mod_wsgi is for. In general this directive should never be used.
>>> Apache/mod_wsgi will always use the Python runtime library that
>>> it was
>>> compiled against (even in daemon mode). This is done through the
>>> Python library being linked to the Apache mod_wsgi module.
>>>
>>> What the WSGIPythonExecutable directive is for is to workaround a
>>> short coming of Python as to how it determines where the installed
>>> Python lib directory is.
>>>
>>> What happens is that Python runtime when initialised tries to
>>> work out
>>> where the Python library directory is by looking for where the
>>> Python
>>> executable is in the PATH of the process, even though the python
>>> program isn't executed in this case. The problem is that if the
>>> Apache
>>> process uses a PATH that would result in the Python runtime
>>> finding a
>>> different python executable than what mod_wsgi was compiled against
>>> (ie., multiple instance of Python installed in different root
>>> directories), then the wrong Python lib directory will be used and
>>> thus the wrong common modules and site-packages directory.
>>
>> I think we're actually using it correctly, because what we're after
>> is a form of that use case. The Python we're trying to point to is a
>> "virtual" Python (see http://peak.telecommunity.com/dist/virtual-
>> python.py). The Python that mod_wsgi is compiled against is also the
>> Python which is the source of the "virtual" Python. Essentially two
>> things may differ between the "virtual" Python and the Python used to
>> create it: the packages which are in site-packages may differ, and
>> the distutils.cfg may differ. Its version cannot differ. Really
>> it's if I had installed exactly the same Python version compiled with
>> the same toolchain and libraries installed within a different
>> location (e.g. one that doesn't happen to be on the Apache process'
>> PATH), but it's just done through symlink hackery instead of files.
>>
>> Being able to specify the interpreter on a per-VirtualHost basis
>> would prevent us from needing to do any sys.path munging in the wsgi
>> loader.
>
> Except that in mod_wsgi there is nothing to stop multiple VirtualHosts
> being handled within the same process, be it embedded or daemon mode.
> Because WSGIPythonExecutable is actually affecting the whole of
> Python, and not just a specific interpreter you cant therefore
> restrict it to just one VirtualHost (or sub interpreter). This isn't
> because of mod_wsgi, but because of Python itself and how it does
> Python initialisation.
>
>> In particular, it would let us reuse the Python 'site'
>> module behavior that .pth files put into "sys.prefix + '/pythonX.X/
>> lib/site-packages'" are consulted for extra info that extends
>> sys.path (used heavily by eggs). This is consistent with the idea
>> that each of our applications will use the same Python version; they
>> just must have different sys.path settings and each is represented
>> within the context of a separate virtual Python. See http://
>> bob.pythonmac.org/archives/2005/02/06/using-pth-files-for-python-
>> development/ for more info about .pth files and site directories.
>
> I actually did look at one point embedding within the implementation
> of mod_wsgi support for workingenv concept so that option to
> WSGIDaemonProcess would just point at workingenv directory. The issue
> was whether in implementing it, the bits in Python would change as
> wasn't sure whether these were in the core of Python or in setuptools
> or something else. Then Ian effectively put workinenv behind him and
> moved onto his next idea.
>
> Now although Ian's new idea works better, it only really does so for
> where Python program is being used directly with a single Python
> interpreter instance. It isn't really suited for embedded Python and
> especially where there are multiple Python sub interpreters and you
> want a slightly different environment for each sub interpreter,
> although that has its caveats.
>
> I guess why this is complicated a bit is because mod_wsgi initialises
> Python once in the parent Apache process before doing any forks. This
> means that initialisation doesn't have to be done for each Apache
> child process, but also means the initialisation cant be done
> separately for each daemon process with a different Python library
> directory. Thus, doing sys.path manipulation is the only real way of
> doing things in the way Python is embedded with mod_wsgi.
>
> Overall I have no problem pursuing the idea of supporting something
> equivalent to workingenv in mod_wsgi which will do all the sys.path
> fixups transparently, but just need to be sure that what it depends on
> isn't going to change. Also want it to be self contained and not
> depend on some third party package. I just may need some guidance in
> implementing such a thing as have not gone to the trouble yet of
> working out properly how all these egg things work.
>
>> I understand that adding something like this would lead to some
>> people misunderstanding its purpose and believing that you could
>> actually use multiple Python versions simultaneously, but it sure
>> would make life a lot easier for people who wanted to use it this
>> way.
>>
>>>> WSGIDaemonProcess tmp threads=1 processes=4 maximum-requests=10000
>>>>
>>>> <Directory /home/repoze/tmp/site/etc>
>>>> Order deny,allow
>>>> Allow from all
>>>> </Directory>
>>>>
>>>> <VirtualHost *:80>
>>>> DocumentRoot /home/repoze/www/www.repoze.org
>>>> ServerName www.repoze.org
>>>> ScriptAlias /viewcvs "/usr/lib/cgi-bin/viewcvs.cgi"
>>>> ServerAdmin repoze-dev at repoze.org
>>>> WSGIScriptAlias /tmp /home/repoze/tmp/site/etc/zope2.wsgi
>>>> WSGIProcessGroup tmp
>>>> WSGIPassAuthorization On
>>>> SetEnv HTTP_X_VHM_HOST http://www.repoze.org/tmp
>>>> </VirtualHost>
>>>>
>>> The intent with mod_wsgi is that the WSGI script file is where you
>>> specify everything. Thus, sys.path needs to be modified in it. If
>>> you
>>> want to have different working environments for different
>>> applications
>>> then you can use workingenv. See:
>>>
>>> http://docs.pythonweb.org/pages/viewpage.action?pageId=5439610
>>
>> Workingenv is no longer maintained: Ian is on to
>> "virtualenv" (http://
>> pypi.python.org/pypi/virtualenv) which works almost exactly like
>> virtual_python.py except it works on Windows and provides some
>> additional tools for customization. Workingenv is basically just a
>> fancy way to set sys.path from what I understood of it, and I believe
>> Ian abandoned it because it's more convenient and predictable to not
>> have to write additional code that munges sys.path at all.
>
> But virtualenv is still effectively doing sys.path munging. The only
> difference is that you aren't replicating it and instead are just
> letting Python itself do the sys.path munging.
>
>>> I have looked at a directive which allows you to preload a WSGI
>>> script
>>> file, but it gets a bit complicated as you need to distinguish which
>>> Python sub interpreter you need to load it into, as well as
>>> whether it
>>> should be loaded into the main Apache child processes (embedded
>>> mode),
>>> or a specific set of daemon processes.
>>
>> Perhaps making the configuration a bit more byzantine on a per-mode
>> basis but more explicit might help. For example, if you want to work
>> in daemon mode where there are multiple processes, and each process
>> has exactly one interpreter instance, I suspect the preloading become
>> pretty trivial. I guess the difficulty then comes in supporting the
>> other modes where preloading isn't as well defined.
>
> Maybe I described it the wrong way, implementing the preload isn't
> that complicated, it is more just the slight confusion that may arise
> when people define it, as they may not realise what they need to
> define as the application group and process group. This is solved
> though by some good documentation. Thus:
>
> WSGIPreloadScript /some/path.wsgi process-group=tmp
> application-group=%{GLOBAL}
>
> ...
>
> <VirtualHost *:80>
> ...
> WSGIProcessGroup tmp
> WSGIApplicationGroup %{GLOBAL}
> ...
> </VirtualHost>
>
> Graham
>
_______________________________________________
Repoze-dev mailing list
Repoze-dev at lists.repoze.org
http://lists.repoze.org/mailman/listinfo/repoze-dev
More information about the Repoze-dev
mailing list