[Repoze-dev] Fwd: DRAFT: Invitation to a dance

Chris McDonough chrism at plope.com
Sun Sep 30 13:02:14 UTC 2007



Begin forwarded message:

> From: "Graham Dumpleton" <graham.dumpleton at gmail.com>
> Date: September 30, 2007 6:44:11 AM EDT
> To: "Chris McDonough" <chrism at plope.com>
> Cc: "Philipp von Weitershausen" <philipp at weitershausen.de>,  
> "Christian Theune" <ct at gocept.com>, "Martijn Faassen"  
> <faassen at startifact.com>, "Jim Fulton" <jim at zope.com>, "Ian  
> Bicking" <ianb at colorstudy.com>, "Phillip J. Eby"  
> <pje at telecommunity.com>, "Rob Miller" <ra at burningman.com>, "Joel  
> Burton" <joel at joelburton.com>, "Martin Aspeli" <optilude at gmx.net>
> Subject: Re: DRAFT: Invitation to a dance
>
> On 30/09/2007, Chris McDonough <chrism at plope.com> wrote:
>>>> as well as which user the process should be
>>>> started as (which is very handy).  For http://www.repoze.org/tmp/
>>>> plone, the configuration is:
>>>>
>>>> WSGIPythonExecutable /home/repoze/tmp/site/bin/python
>>>
>>> You are misunderstanding what WSGIPythonExecutable directive in
>>> mod_wsgi is for. In general this directive should never be used.
>>> Apache/mod_wsgi will always use the Python runtime library that  
>>> it was
>>> compiled against (even in daemon mode). This is done through the
>>> Python library being linked to the Apache mod_wsgi module.
>>>
>>> What the WSGIPythonExecutable directive is for is to workaround a
>>> short coming of Python as to how it determines where the installed
>>> Python lib directory is.
>>>
>>> What happens is that Python runtime when initialised tries to  
>>> work out
>>> where the Python library directory is by looking for where the  
>>> Python
>>> executable is in the PATH of the process, even though the python
>>> program isn't executed in this case. The problem is that if the  
>>> Apache
>>> process uses a PATH that would result in the Python runtime  
>>> finding a
>>> different python executable than what mod_wsgi was compiled against
>>> (ie., multiple instance of Python installed in different root
>>> directories), then the wrong Python lib directory will be used and
>>> thus the wrong common modules and site-packages directory.
>>
>> I think we're actually using it correctly, because what we're after
>> is a form of that use case.  The Python we're trying to point to is a
>> "virtual" Python (see http://peak.telecommunity.com/dist/virtual-
>> python.py).  The Python that mod_wsgi is compiled against is also the
>> Python which is the source of the "virtual" Python.  Essentially two
>> things may differ between the "virtual" Python and the Python used to
>> create it: the packages which are in site-packages may differ, and
>> the distutils.cfg may differ.  Its version cannot differ.  Really
>> it's if I had installed exactly the same Python version compiled with
>> the same toolchain and libraries installed within a different
>> location (e.g. one that doesn't happen to be on the Apache process'
>> PATH), but it's just done through symlink hackery instead of files.
>>
>> Being able to specify the interpreter on a per-VirtualHost basis
>> would prevent us from needing to do any sys.path munging in the wsgi
>> loader.
>
> Except that in mod_wsgi there is nothing to stop multiple VirtualHosts
> being handled within the same process, be it embedded or daemon mode.
> Because WSGIPythonExecutable is actually affecting the whole of
> Python, and not just a specific interpreter you cant therefore
> restrict it to just one VirtualHost (or sub interpreter). This isn't
> because of mod_wsgi, but because of Python itself and how it does
> Python initialisation.
>
>> In particular, it would let us reuse the Python 'site'
>> module behavior that .pth files put into  "sys.prefix + '/pythonX.X/
>> lib/site-packages'" are consulted for extra info that extends
>> sys.path (used heavily by eggs).  This is consistent with the idea
>> that each of our applications will use the same Python version; they
>> just must have different sys.path settings and each is represented
>> within the context of a separate virtual Python.  See http://
>> bob.pythonmac.org/archives/2005/02/06/using-pth-files-for-python-
>> development/ for more info about .pth files and site directories.
>
> I actually did look at one point embedding within the implementation
> of mod_wsgi support for workingenv concept so that option to
> WSGIDaemonProcess would just point at workingenv directory. The issue
> was whether in implementing it, the bits in Python would change as
> wasn't sure whether these were in the core of Python or in setuptools
> or something else. Then Ian effectively put workinenv behind him and
> moved onto his next idea.
>
> Now although Ian's new idea works better, it only really does so for
> where Python program is being used directly with a single Python
> interpreter instance. It isn't really suited for embedded Python and
> especially where there are multiple Python sub interpreters and you
> want a slightly different environment for each sub interpreter,
> although that has its caveats.
>
> I guess why this is complicated a bit is because mod_wsgi initialises
> Python once in the parent Apache process before doing any forks. This
> means that initialisation doesn't have to be done for each Apache
> child process, but also means the initialisation cant be done
> separately for each daemon process with a different Python library
> directory. Thus, doing sys.path manipulation is the only real way of
> doing things in the way Python is embedded with mod_wsgi.
>
> Overall I have no problem pursuing the idea of supporting something
> equivalent to workingenv in mod_wsgi which will do all the sys.path
> fixups transparently, but just need to be sure that what it depends on
> isn't going to change. Also want it to be self contained and not
> depend on some third party package. I just may need some guidance in
> implementing such a thing as have not gone to the trouble yet of
> working out properly how all these egg things work.
>
>> I understand that adding something like this would lead to some
>> people misunderstanding its purpose and believing that you could
>> actually use multiple Python versions simultaneously, but it sure
>> would make life a lot easier for people who wanted to use it this  
>> way.
>>
>>>> WSGIDaemonProcess tmp threads=1 processes=4 maximum-requests=10000
>>>>
>>>> <Directory /home/repoze/tmp/site/etc>
>>>>    Order deny,allow
>>>>    Allow from all
>>>> </Directory>
>>>>
>>>> <VirtualHost *:80>
>>>>    DocumentRoot /home/repoze/www/www.repoze.org
>>>>    ServerName www.repoze.org
>>>>    ScriptAlias /viewcvs "/usr/lib/cgi-bin/viewcvs.cgi"
>>>>    ServerAdmin repoze-dev at repoze.org
>>>>    WSGIScriptAlias /tmp /home/repoze/tmp/site/etc/zope2.wsgi
>>>>    WSGIProcessGroup tmp
>>>>    WSGIPassAuthorization On
>>>>    SetEnv HTTP_X_VHM_HOST http://www.repoze.org/tmp
>>>> </VirtualHost>
>>>>
>>> The intent with mod_wsgi is that the WSGI script file is where you
>>> specify everything. Thus, sys.path needs to be modified in it. If  
>>> you
>>> want to have different working environments for different  
>>> applications
>>> then you can use workingenv. See:
>>>
>>>   http://docs.pythonweb.org/pages/viewpage.action?pageId=5439610
>>
>> Workingenv is no longer maintained: Ian is on to  
>> "virtualenv" (http://
>> pypi.python.org/pypi/virtualenv) which works almost exactly like
>> virtual_python.py except it works on Windows and provides some
>> additional tools for customization.   Workingenv is basically just a
>> fancy way to set sys.path from what I understood of it, and I believe
>> Ian abandoned it because it's more convenient and predictable to not
>> have to write additional code that munges sys.path at all.
>
> But virtualenv is still effectively doing sys.path munging. The only
> difference is that you aren't replicating it and instead are just
> letting Python itself do the sys.path munging.
>
>>> I have looked at a directive which allows you to preload a WSGI  
>>> script
>>> file, but it gets a bit complicated as you need to distinguish which
>>> Python sub interpreter you need to load it into, as well as  
>>> whether it
>>> should be loaded into the main Apache child processes (embedded  
>>> mode),
>>> or a specific set of daemon processes.
>>
>> Perhaps making the configuration a bit more byzantine on a per-mode
>> basis but more explicit might help.  For example, if you want to work
>> in daemon mode where there are multiple processes, and each process
>> has exactly one interpreter instance, I suspect the preloading become
>> pretty trivial.  I guess the difficulty then comes in supporting the
>> other modes where preloading isn't as well defined.
>
> Maybe I described it the wrong way, implementing the preload isn't
> that complicated, it is more just the slight confusion that may arise
> when people define it, as they may not realise what they need to
> define as the application group and process group. This is solved
> though by some good documentation. Thus:
>
>   WSGIPreloadScript /some/path.wsgi process-group=tmp
> application-group=%{GLOBAL}
>
>   ...
>
>   <VirtualHost *:80>
>   ...
>   WSGIProcessGroup tmp
>   WSGIApplicationGroup %{GLOBAL}
>   ...
>   </VirtualHost>
>
> Graham
>

_______________________________________________
Repoze-dev mailing list
Repoze-dev at lists.repoze.org
http://lists.repoze.org/mailman/listinfo/repoze-dev



More information about the Repoze-dev mailing list