Which class should one use for very large sets (>40000)

VA Smalltalk is a "100% VisualAge compatible" IDE that includes the original VisualAge technology and the popular VA Assist and WidgetKit add-ons.

Moderators: Eric Clayberg, wembley, tc, Diane Engles, solveig

Which class should one use for very large sets (>40000)

Postby marten » Sat Jun 16, 2007 2:26 am

Normally I use instances of class Set to maintain a set of domain objects. These instances behaves pretty well, when the size of entries are upto 30000. After that the performance goes down ...

Another solution I have found is the class EsIdentitySet, which takes around 1/3 of the time of the normal Set class (with 90000 entries: 69 seconds against 235 seconds, with 40000 entries: 11 seconds against 19 seconds).

Any other ideas ?
marten
[|]
 
Posts: 641
Joined: Sat Oct 14, 2006 7:10 am
Location: Hamburg - Germany

Re: Which class should one use for very large sets

Postby koschate » Sat Jun 16, 2007 1:59 pm

Look at AbtHighCapacityDictionary and AbtHighCapacityLookupTable. They both take advantage of #abtHash32. The performance hit you see is because the standard #hash wraps at 32767, so you end up with lots of hash collisions, and you end up with a linear search. The #abtHash32 method generates a 32-bit hash, which is better suited for large collections.

That being said, I don't believe there's a high capacity set, but how hard could it be to implement?
koschate
[|]
 
Posts: 102
Joined: Thu Feb 01, 2007 7:24 am

Re: Which class should one use for very large sets

Postby John Clapperton » Mon Oct 15, 2007 6:47 am

koschate wrote:Look at AbtHighCapacityDictionary and AbtHighCapacityLookupTable. ...


I just tried

AbtHighCapacityDictionary new at: 'hello' put: 'world'.

and got a walkback "...due to OS error126"

abtPrimHashBytes: aByteArrayOrString seed: aSeedInt

<primitive: 'Abt_Primitives':bytesHash32>

^self primitiveFailed <---

Ideas anyone?
John
John Clapperton
[|]
 
Posts: 18
Joined: Wed Oct 03, 2007 6:21 am

Postby nmongeau » Mon Oct 15, 2007 7:40 am

That primitive is in abtprcXX.dll, is that file present?

Normand
nmongeau
[|]
 
Posts: 29
Joined: Fri Jan 12, 2007 9:37 am

Postby John Clapperton » Mon Oct 15, 2007 10:34 am

Thanks, but there is:

C:\Program Files\vast75\bin\abtprc75.dll

John

As you were. Just seen that I was using:
"C:\Program Files\vast70\bin\abt.exe"
starting in
C:\PROGRA~1\vast75\image750

Changed that to:
"C:\Program Files\vast75\bin\abt.exe"

and it works ok now.
Thanks again,
John
John Clapperton
[|]
 
Posts: 18
Joined: Wed Oct 03, 2007 6:21 am

Re: Which class should one use for very large sets

Postby a3aan » Thu Oct 18, 2007 12:23 pm

koschate wrote:Look at AbtHighCapacityDictionary and AbtHighCapacityLookupTable. They both take advantage of #abtHash32.


Based on AbtHighCapacityLookupTable I made a high capacity identity lookup table and would like to make two points.

1 It would be nice if such a dictionary was in the base image.
2 There is quite some code duplication going on in the various dictionary classes which can be reduced by extracting the hash and compare operations.

Just my 2 eurocents.
Adriaan.
Adriaan van Os
a3aan
[|]
 
Posts: 45
Joined: Fri May 25, 2007 1:41 am

Postby wembley » Fri Oct 19, 2007 6:20 am

My 2 cents worth (worth a lot less than 2 eurocents these days :cry: ):

This is an example of how the class hierarchy of VA Smalltalk has grown in unanticipated, and not necessarily good, ways over the years. The original Abt<collection class name> classes came about due to a division of labor between 2 development groups (one inside IBM and one outside). The AbtHighCapacity<collection class name> classes came aboout in the same way -- they were developed as part of the ObjectExtender work by IBM Consulting and moved wholesale into the base.

So, what can/should be done at this point. It's time for a refactoring of, at least, the collection class hierarchy to pick out the essential differences between the classes and move these differences into their own methods so common code can be common. This might even give us the opportunity to parameterize algorithms -- after all, you don't need a different subclass of SortedCollection for each different sort algorithm.

Thoughts?
John O'Keefe [|], Principal Smalltalk Architect, Instantiations Inc.
wembley
Moderator
 
Posts: 405
Joined: Mon Oct 16, 2006 3:01 am
Location: Durham, NC


Return to VA Smalltalk 7.0, 7.5 & 8.0

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest