In GemStone/S 32-bit there is a one billion (2^30 – 1) object limit and one of the features of the 64-bit product is that this limit is raised to one trillion (2^40 – 1). As databases become large, an obvious question is, “What types of objects are in my database?” For example, imagine you knew that you had the following objects:
2522819 #String
823195 #AuditEventSet
502189 #UpdateEvent
445152 #Association
433474 #OrderedCollection
398370 #SymbolKeyValueDictionary
386246 #GsfsIdentitySet
385162 #Array
160303 #DateTime
148078 #CollisionBucket
131604 #LhsTimestamp
129642 #AuditEventArray
128050 #GsMethod
124073 #FdbDrugDrugInteractionSet
108768 #MedicationSet
100789 #KeyValueDictionary
I recently made a tool named 'ScanBackup' that will look at a normal GemStone/S 32-bit backup (made with Repository>>#'fullBackupTo:') and when run from a live version of the database from which the backup was made will give a report of the object names and counts. The code follows in Topaz file-in format:
doit
Object subclass: 'ScanBackup'
instVarNames: #( file position bytes
offset short0 short1 long0
long1 long2 long3 classes)
classVars: #()
classInstVars: #( classes)
poolDictionaries: #[]
inDictionary: UserGlobals
constraints: #[ ]
instancesInvariant: false
isModifiable: false
%
! Remove existing behavior from ScanBackup
doit
ScanBackup removeAllMethods.
ScanBackup class removeAllMethods.
%
! ------------------- Class methods for ScanBackup
category: 'other'
classmethod: ScanBackup
classes
classes notNil ifTrue: [^classes].
classes := IntegerKeyValueDictionary new.
AllUsers do: [:eachUser |
(System canRead: eachUser) ifTrue: [
(System canRead: eachUser symbolList) ifTrue: [
eachUser symbolList do: [:eachSymbolDictionary |
(System canRead: eachSymbolDictionary) ifTrue: [
eachSymbolDictionary do: [:eachGlobal |
(System canRead: eachGlobal) ifTrue: [
eachGlobal isBehavior ifTrue: [
(System canRead: eachGlobal classHistory) ifTrue: [
eachGlobal classHistory do: [:eachClass |
(System canRead: eachClass) ifTrue: [
classes at: eachClass asOop put: eachClass.
].
].
].
].
].
].
].
].
].
].
].
^classes.
%
category: 'other'
classmethod: ScanBackup
scanBackupAtServerPath: aString
"
ScanBackup scanBackupAtServerPath: '...'.
"
^self new
path: aString;
report.
%
! ------------------- Instance methods for ScanBackup
category: 'other'
method: ScanBackup
byteAt: anInteger
^bytes at: offset + anInteger.
%
category: 'other'
method: ScanBackup
longAt: anInteger
^(bytes at: offset + anInteger + long0)
* 256 + (bytes at: offset + anInteger + long1)
* 256 + (bytes at: offset + anInteger + long2)
* 256 + (bytes at: offset + anInteger + long3).
%
category: 'other'
method: ScanBackup
path: aString
file := GsFile
open: aString
mode: 'rb'
onClient: false.
position := 0.
bytes := ByteArray new: 1024 * 1024.
offset := 0.
classes := IntegerKeyValueDictionary new.
[
self readFile.
] ensure: [
file close.
].
%
category: 'other'
method: ScanBackup
physicalRecordHeaderSize
^28.
%
category: 'other'
method: ScanBackup
read: anInteger
file next: anInteger into: bytes.
position := file position.
offset := 0.
%
category: 'other'
method: ScanBackup
readDataRecord
| numObjs |
numObjs := self shortAt: 3.
offset := offset + 8.
1 to: numObjs do: [:i | self readObject].
%
category: 'other'
method: ScanBackup
readFile
self readFirstPhysicalRecord.
[
file atEnd not.
] whileTrue: [
self readPhysicalRecord.
System abortTransaction. "Avoid holding a commit record"
].
%
category: 'other'
method: ScanBackup
readFirstPhysicalRecord
| localHeaderSize physicalRecordKind logicalRecordKind
logicalRecordSize numLogicalRecords physicalRecordSize |
localHeaderSize := 28.
self read: self physicalRecordHeaderSize + localHeaderSize.
offset := 0.
(physicalRecordKind := self byteAt: 1) = 18
ifFalse: [self error: 'Unrecognized record type!'].
self setSwizzle.
(logicalRecordKind := self shortAt: 39) = 12 "LOG_BACKUP_ROOT_RECORD"
ifFalse: [self error: 'Unexpected record!'].
logicalRecordSize := self shortAt: 41.
numLogicalRecords := self shortAt: 7.
physicalRecordSize := (self shortAt: 5) * 1024.
self read: logicalRecordSize - localHeaderSize.
self read: physicalRecordSize - logicalRecordSize - self physicalRecordHeaderSize.
offset := 0.
2 to: numLogicalRecords do: [:i |
self readLogicalRecord.
].
%
category: 'other'
method: ScanBackup
readLogicalRecord
| recordKind recordSize oldOffset |
recordKind := self shortAt: 11.
recordKind = 12 "LOG_BACKUP_ROOT_RECORD"
ifTrue: [self error: 'Should have already processed this record!'].
recordSize := self shortAt: 13.
oldOffset := offset.
offset := offset + 16. "header size"
recordKind = 03 "LOG_DATA_RECORD" ifTrue: [self readDataRecord] ifFalse: [
recordKind = 13 "LOG_BACKUP_EOF_RECORD" ifTrue: [] ifFalse: [
recordKind = 14 "LOG_BACKUP_START_CHECKPOINT" ifTrue: [] ifFalse: [
recordKind = 15 "LOG_BACKUP_END_CHECKPOINT" ifTrue: [] ifFalse: [
'What kind of record is this?' halt.
]]]].
offset := oldOffset + recordSize.
%
category: 'other'
method: ScanBackup
readObject
| objID objClass physSize count |
objID := self longAt: 1.
objClass := self longAt: 5.
physSize := self shortAt: 19.
offset := offset + physSize.
count := classes
at: objClass
ifAbsentPut: [0].
classes
at: objClass
put: count + 1.
%
category: 'other'
method: ScanBackup
readPhysicalRecord
| pageKind numLogicalRecords recordSize |
self read: self physicalRecordHeaderSize.
(pageKind := self byteAt: 1) = 18
ifFalse: [self error: 'Unrecognized record type!'].
numLogicalRecords := self shortAt: 7.
recordSize := (self shortAt: 5) * 1024.
self read: recordSize - self physicalRecordHeaderSize.
1 to: numLogicalRecords do: [:i |
self readLogicalRecord.
].
%
category: 'other'
method: ScanBackup
report
| stream list |
stream := WriteStream on: String new.
list := OrderedCollection new.
classes keysAndValuesDo: [:key :value | list add: key -> value].
list := list asSortedCollection: [:a :b | a value > b value].
1 to: 200 do: [:i |
| classOop count theClass |
classOop := (list at: i) key.
count := (list at: i) value.
theClass := self class classes at: classOop ifAbsent: [nil].
theClass notNil ifTrue: [
i printOn: stream.
stream tab.
count printOn: stream.
stream tab.
theClass name printOn: stream.
stream cr.
].
].
^stream contents.
%
category: 'other'
method: ScanBackup
setSwizzle
| swizzle |
swizzle := (bytes copyFrom: 49 to: 54) asArray.
swizzle = #(0 1 2 3 0 1) ifTrue: [
short0 := 1.
short1 := 0.
long0 := 3.
long1 := 2.
long2 := 1.
long3 := 0.
^self.
].
swizzle = #(3 2 1 0 1 0) ifTrue: [
short0 := 0.
short1 := 1.
long0 := 0.
long1 := 1.
long2 := 2.
long3 := 3.
^self.
].
self error: 'Unexpected file type!'.
%
category: 'other'
method: ScanBackup
shortAt: anInteger
^(bytes at: offset + anInteger + short0)
* 256 + (bytes at: offset + anInteger + short1).
%

3 comments
Comments feed for this article
May 14, 2009 at 9:16 pm
Dale Henrichs
James,
Do you plan on publishing a version that works against 64 bit backup files?
May 14, 2009 at 9:39 pm
James Foster
Yes, I’d like to do a tool for 64-bit as well. The immediate need was for the 32-bit format since that is where running out of objects is a real problem (and also that was where we had a customer paying for the tool
. Admittedly the 32-bit tool is not so useful for the GLASS crowd!
May 29, 2009 at 10:17 am
Ian Gilchrist
Hi James,
I’ve been successful with an approach that “walks” the OT with multiple gem sessions in parallel. This gets a result back in a reasonable amount of time and scales well for very large repositories that can afford a large SPC.
Walking the backup file is handy as well, given you know the internal mappings of the backup file format.
regards,
iang