Skip to content

intern() redundant strings to reduce memory footprint #2181

@bdbaddog

Description

@bdbaddog

This issue was originally created at: 2008-08-16 13:42:01.
This issue was reported by: pankrat.
pankrat said at 2008-08-16 13:42:01

The idea is to use intern() for identical strings which are held by multiple
objects. For example, the suffix string '.o' can be shared by many Nodes in a
typical build environment. Other candidates which have been considered are
filenames and paths. An example is that 'abspath' and 'labspath' are equal on
Posix platforms.

This patch "interns" filenames/paths, suffixes, and the implicit dependencies
reported by the C Scanner. The total memory saved is between 1-2%.

A word of warning: Once a string is interned, it is immortal in Python 2.2 and
before.

Thanks to Jean for pointing me to intern().

pankrat said at 2008-08-16 13:43:35

Created an attachment (id=477)
Reduce memory footprint with intern()

gregnoel said at 2008-08-19 12:39:53

Bug party triage.

gregnoel said at 2008-09-09 16:06:56

Bug party triage. Go for it.

pankrat said at 2008-10-09 09:33:56

Some tests fail if the patch is applied. I need to investigate why this is
happening. Anyway, it won't be ready for 1.1 so I set the target milestone to
'research'.

bdbaddog said at 2009-01-21 15:23:19

I've applied your patch and tried on my large memory footprint build, the memory
footprint is <= 1/2 what it was before. The CPU does get much more loaded by scons.
But the build fails.
I'll also take a look at regressions run with this patch as it looks like it
could be very helpful from my build for which scons is >500MB now.

pankrat said at 2009-01-22 07:17:56

The scanner logic does not work in the general case. Please try the new patch
which I will attach in a minute.

Thanks, Ludwig

pankrat said at 2009-01-22 07:19:20

Created an attachment (id=571)
Fix scanner intern

bdbaddog said at 2009-01-24 14:50:55

Initial indications on new patch are good. Build still ongoing, but mem
footprint is down from 598M to 310M.

I'll gather some more statistics and I need to check for correctness of course.

bdbaddog said at 2009-01-24 14:51:41

spoke too soon. Mem usage just balooned to 572M.

pankrat said at 2009-02-09 13:19:17

Committed in r3991.

More information about this issue is at http://docs.python.org/lib/non-essential-built-in-funcs.html.

pankrat attached intern_strings.patch at 2008-08-16 13:43:35.

Reduce memory footprint with intern()

pankrat attached intern_strings3.patch at 2009-01-22 07:19:20.

Fix scanner intern

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions