Chandler Performance
Startup Times
OSX
Using a combination of DYLD_PRINT_LIBRARIES, DYLD_TRACE, and ktrace, Robin and I discovered that dyld was taking a long time resolving symbols. Robin summarizes:
"Morgen and I spent a bunch of time today trying to figure out what is taking all the time when the wxPython extension module is being imported with
MacPython?-OSX. We've done some ktrace/kdumping and just like in a thread on Pythonmac-SIG from February[1] we are seeing about a six second delay just before HIToolbox is open()ed, although it was not after libcrypto in our case. (HIToolbox was also loaded a couple seconds before the delay with load_shared_file()...)
We then set DYLD_TRACE and I wrote a script to decode the function calls it adds to a ktrace listing. It seems that there are many thousands of symbols that are being handled by dyld during that 6 second block of time (now expanded out to about 30 seconds because of all the logging.) Dyld is doing something (probably relocating/rebinding) with all those symbols at that time as we had supposed, but the surprise was that the wxWindows symbols are a very small fraction of the time, only about 7%. The bulk of the rest of the time is spent on what looks like Carbon symbols (at least I recognise a lot of them as Carbon
APIs, the rest look like they might be internal stuff that the Carbon
APIs themselves use...)"
The players:
| Python (executable) | dynamically linked to Python.framework, CoreServices?, Foundation |
| Python.framework | dynamically linked to libSystem, CoreServices?, Foundation |
| libwx_mac (wx library) | dynamically linked to Carbon, libSystem, libz |
| wxc.so (Python extension) | dynamically linked to libwx_mac, Python.framework, libSystem |
Using a test script which only contains the line "from wxPython.wx import *", the following execution times were seen (the tests were run 4 times, and the last time is shown):
| Python linked to libwx_mac | Python with prebind | Python linked to Carbon | Seconds |
| no | no | no | 8.5 |
| no | yes | no | 26 |
| yes | yes | no | 23 |
| yes | no | no | 5.5 |
| yes | yes | yes | 7 |
| yes | no | yes | 4.5 |
The first row represents how the build exists today in CVS.
Using the python-linked-to-libwx_mac-and-carbon-but-no-prebind binaries:
| script: | seconds with DYLD_PRINT_LIBRARIES set | seconds without DYLD_PRINT_LIBRARIES set |
| running an empty script | 1.1 | 0.6 |
| import wxc | 3 | 2.1 |
| from wxPython.wx import * | 4.5 | 4.0 |
10/17 status: Before, it took 30 seconds to get to the main loop. 4 seconds saved by linking Python with Carbon and libwx_mac; another 4 seconds saved by delaying a "from wx import py" until needed by the
PyCrust? debugger. Now down to 22 seconds. Still a ways to go...
Here is a breakdown of where time is spent from launch to mainloop( ):
- "from application.Application import wxApplication" = 11 seconds total, which includes:
- "from wxPython.wx import *" = 5 seconds
- "import wxc" = 3 seconds
- performing all the imports within wx.py = 2 seconds
- "from wxPython.xrc import *" = .5 seconds
- "from model.schema.DomainSchemaLoader import DomainSchemaLoader?" = 2.2 seconds
- (includes loading 11 .so files: _socket.so, _ssl.so, math.so, _random.so, fcntl.so, cStringIO.so, binascii.so, md5.so, sha.so UUIDext.so, mxDateTime.so)
- "from application.Preferences import Preferences" = 0.2 seconds
- (includes loading _persistence.so)
- "from application.SplashScreen import SplashScreen?" = 1 second
- (includes loading icglue.so, _Res.so, _File.so, MacOS.so, htmlc.so)
- "from application.URLTree import URLTree" = 0.1 seconds
- "from application.agents.Notifications.NotificationManager import NotificationManager?" = 0.15 seconds
- "from application.agents.AgentManager import AgentManager?" = 0.3 seconds
- "from persistence import Persistent" = 0.01 seconds
- "from persistence.list import PersistentList?" = 0.001 seconds
- "import application.ChandlerWindow" = 0.13 seconds
- "import PreferencesDialog?" = 0.03 seconds
- "import ChandlerJabber?" = 0.4 seconds
- (includes loading cPickle.so, select.so, pyexpat.so)
- "import PresencePanel?" = 0.08 seconds
- "from zodb import db" = 0.5 seconds
- (includes loading _timestamp.so, itertools.so, _weakref.so)
- "from zodb.storage.file import FileStorage?" = 0.4 seconds
- (includes loading _bsddb.so, libdb-4.1.dylib, _fsBTree.so)
- "from application.repository.Repository import Repository" = 0.02 seconds
- "from application.ImportExport import ImportExport?" = 0.24 seconds
- "application = wxApplication(sys.argv)" = 17 seconds total, which includes:
- opening zodb FileStorage? = 0.6 seconds
- loading XMLRepository class = 1.2 seconds
- (includes loading datetime.so, _dbxml.so, libxerces-c.23.dylib)
- opening XMLRepository = 1.1 seconds
- load schema pack = 0.67 seconds
- loading zaobao data model = 1.2 seconds
- (includes loading of zlib.so)
- loading parcels = 6 seconds
- calendar parcel = 5 seconds
- (includes loading calendarc.so, gizmosc.so, operator.so)
- synchronize view = 1.2 seconds
- (includes loading libCSync.A.dylib, libRIP.A.dylib, libCGATS.A.cylib)
- creating AgentManager? = 3.8 seconds
- OpenStartingURL? = 2.2 seconds
Relevant links:
--
MorgenSagen - 16 Oct 2003
--
MichaelToy - 17 Oct 2003
I asked spencer what the aol people did, and got some stories, might be worth hooking up and getting more details. Here's comments from the iChat log:
ok. we played with link options for libwx_mac
turns out, that with the best optimization, we found that our static initializers didn't all get initialized, so watch out for that
the use of bind_at_load is really slow. don't do that
you can use -single_module, but i think there was some req'mnt on the Nov 2002 dev tools for that
we're doing a two-stage link to create a single module
share-ld-sh in the wxwin mac distrib does that
--
MorgenSagen - 22 Oct 2003
I compiled into the Python interpreter some of the .so extensions that Chandler makes use of (such as _socket.so, math.so, fcntl.so, cStringIO.so, binascii.so, md5.so, sha.so), and that decreased the time to execute "from model.schema.DomainSchemaLoader import
DomainSchemaLoader?" from 2.2 to 1.3 seconds.