free hosting   image hosting   hosting reseller   online album   e-shop   famous people 
Free Website Templates
Free Installer

 

 

            Internal Documentation



Presented by:     Team 20
                         Don Han, g9dhan@cdf
                         Jeff Xing, g0xinzhi@cdf
                         Carol Cheung, g9carolc@cdf
Presented to:     Professor Dave Warman
                         CSC 408
                         Software Engineering
                         Department Of Computer Science
                         University of Toronto

Presented Date:  Feb 27, 2001



 
 

TABLE OF CONTENTS


Introduction                         ......................................................................................................
Class BigStringdhan             ......................................................................................................
Class BTreeString                ......................................................................................................
Class LeafNode                   ......................................................................................................
Class ParNode                     ......................................................................................................
Summary                              ......................................................................................................
Brief Description on Testing   ......................................................................................................
Appendix                              ......................................................................................................



 
 
 

INTRODUCTION

        The implementation of phaseB contains four classes: BigString, BTreeString, LeafNode and Parnode. It has also several test drivers, but is not included in documentation part (see test part for details of the test drivers).
        Class BigString will manipulate the following operations specfified by the BigStringinterface: constructors, length(), toString(), charAt(), concat(), print(). It works by communicating with class BTreeString.
        Class BTreeString is the core of the implementation. It supports all the operations requested by class BigString. It uses a B+ tree data structure, and stores the long string only in the leaves. It works by communicating with classes LeafNode and ParNode.
        Class LeafNode supports the those operations requested by class BTreeString that are related to the leaves. A leaf in Class LeafNode holds a key and a segment of long string.
        Class ParNode supports the those operations requested by class BTreeString that are related to the parent nodes. A parent node in Class ParNode holds three arrays: keys and pointers to next level (either parent nodes or leaves).
 

Class: BIGSTRING

        The Structure of a BigString is as follows: a BigString contains a reference to a B+ tree which stores the first part of the string, and a vector to store other BigStrings which resulted from possible concatenations.

Features of a BigString:

Instance variables:

Constructors:

Operations:

Algorithm:

    BigString b;    // the BigString to return
    if(bigString has an empty vector)
        set b's instance variables;
        set b's tree to be this BigString's tree;
        copy the contents of this BigString's vector into b's vector;
        add bigString to b's vector;
    else
        set b's instance variables;
        set b's tree to be this BigString's tree;
        copy the contents of this BigString's vector into b's vector;
        create a new BigString temp which has an empty vector and whose tree references bigString's tree;
        add temp to b's vector
        copy contents of bigString's vector into b's vector
    end if
    return b;
 

Algorithm:

    if(start is non-zero) set the new start to ns;

    if(the character we are looking for is within the tree) return the character in the tree with that index;
    else    // we have to look at the contents of the vector
        loop
            if(the length of the next BigString in the vector < ns)
                decrement the length of that BigString from ns;
                go to the next element in the vector;
            else break
        end loop
        return the character in the particular element in the vector that is located at ns;

Decrementing is needed: for example if you want to locate character 250 in the following BigString:
tree:
    starting index = 0, treeLimit = 99
Vector:
    element 0 - starting index = 0, length = 100
    element 1 - starting index = 0, length = 100
then the character we want is the character at index 50 (250 - 100 - 100) in element 1.

Operations for phase C:

Algorithm:

    BigString b;
    if(begin indexes a location in this BigString's tree)
        set start of b to begin;
        set b's length to be this BigString's length - begin;
        set b's treeLimit to be this BigString's treeLimit;
        set b's tree to reference this BigString's tree;
        copy the contents of this BigString into b's vector;
        return b;
    else    // look at the contents of the vector
        locate the element in this BigString's vector that the substring should start at;
        let that be element k;
        set b's instance variables;
        set b's tree to be element k's tree;
        copy the contents of this BigString's vector into b's vector;
        return b;
    end
 

Algorithm:

    BigString b;
    set b's length to end - begin
     if(begin indexes a location in this BigString's tree)
        set start of b to begin;
        set b's tree to reference this BigString's tree
        if(end  - 1 <= this BigString's treeLimit)
            set b's treeLimit = end;
            return b;
        else
            set b's treeLimit to be this BigString's treeLimit ;
            let k be the last element in this BigString's vector that the substring needs;
            copy all elements in this BigString's vector up to k into b's vector;
            return b;
        end if
    else
        let k be the element in this BigString's vector that contains the start of the substring;
        let j be the element in this BigString's vector that contains the end of the substring;
        set b's tree to element k's tree;
        set the instance variables of b;
        copy all elements in this BigString from elements k+1 to element j, inclusive;
        return b;
    end
 

Algorithm:

    get the string value of this BigString;
    create a new tree using that string value;
    empty the vector;
    update instance variables;
 

Helper methods (other than the ones specified in handout):

Algorithm:

    String s = "";
    if(the my B+ tree contains all the characters from index a to index b)
        return the substring from the B+ tree in the  range [a,b)
    else if(the substring to return contains some tail end of the tree)
        concatenate that end of the tree to s;
        keep track of how many characters are in s so far.

    if(the number of characters in s so far < (b-a))
        go through the vector, starting at element 0,
        looking at the BigStrings and concatenating enough substrings to s so that s is of length (b-a)
 

Algorithm:

    if(the my B+ tree contains all the characters from index a to index b)
        output those characters in that range from the B+ tree onto the specified OutputStream
        return
    else if(some of the characters  are contained in the tail end of the tree)
        output those characters onto specified OutputStream
        keep track of how many characters are outputed so far.

    if(the number of characters in s so far < (b-a))
        go through the vector, starting at element 0,
        looking at the BigStrings and output putting characters from the BigStrings so that we have outputed (b-a)
        characters
 

Private helper methods:

Class: BTREESTRING

        The fundamental data structure of the BigString library is the B+ tree. It is implemented in class BTreeString. The leaves of the B+ tree stores substrings (held in character arrays) of the original string. That is, the original string is broken up into substrings. Each piece is stored in a leaf. The maximum possible length of a substring in a leaf is fixed in SMALL_STRING_LENGTH (SSL). The leaves of the tree are also held together in a linked list. This linked structure will increase the performance significantly when we want retrieve (or partially retrieve) the tree (e.g., the toString() or print() operation) the tree, because we can simply find the beginning leaf, and follow the linked leaves until the ending leaf.
        All leaves in the linked list will contain strings with length SSL, except the last whose length is less than or equal SSL. The internal nodes of the B+ tree contain an array of pointers to children nodes. The maximum number of children an internal has is also fixed. This is referred to as the ORDER. The height of the tree is dependent on the number of characters (n), the number of children (ORDER) and the length of a substring (SSL). That is

If the height of the tree is 1 (there is only one leaf node), then the root of the B+ tree is a LeafNode. Otherwise, the root of the B+ tree is a ParNode.
The tree will always be built so that there is as little space wasted as possible. And the tree in this design is not as complicated as typical B+ trees because, from the specs of BigString interface, only append is involved during the building of the tree and there are no insertions or deletions after the tree is built.
        The tree is built as follows: if there is room in the existing tree, just append the new leaf to the end of the linked list of leaves and make appropriate updates in all the ancestors so that the new leaf can be accessed. That is, the immediate parent of the leaf will have to have its pointers to children updated, as well as the keys keeping track of the leaves. All other ancestors will have to have their keys updated only. Otherwise, the existing tree is full, so the existing tree will become the leftmost subtree of a new root. The right subtree of the new root is the tree of ancestors to the new leaf. Also, the new leaf will be appended to the end of the linked list of leaves. The building of string to a B+ tree is shown in Figure 1. 

        Trees (and BigStrings) will be stored totally in memory. But the OS of UNIX can handle this even if the long string does not fit the main memory, because current implementation (linked structure) does not use contiguous memory.

Constructor:

Instance variables:

Public Methods:

Private Methods:

Algorithms:

BTreeString(string)
    feeds the first characters of string to a char array A with length min{SSL, stringLength)
    create a new leaf with (A, 0) and let the leaf be the root
    loop until the end of string
        feeds the string to a charArray A and get a key k
        append(A, k);

Append(A, k)
    create a new leaf L (calls LeafNode(A, k))
    find the path (a vector) from root to the last leaf M (call LeafNode(A, k))
    let M point to L (calls M.setNext(L))
    updateTree(path, null, L, height-1, k)

updateTree(path, P, L, level, k)
    parameters check, ParNode P and leaf L can not be both null or both not-null
    if current root is a leaf
    // height = 1
        create a new ParNode and let it to be the new root R (calls ParNode())
        let R point to both the old root and leaf L (calls R.setLeafNode(leaf, key) twice)
        set the leafRoot be null, increment the height and return
    else if the current level is one level below the root
    // level = 1
        if R still has at least one more slot
            if L is not null
                then let R point to L (calls R.setLeafNode(k, L))
            else let R point to par (call R.setParNode(k, par))
        return
        else   // we need a new root
            create a new root R? and a new Parnode Q
            if L is not null
                then let R point to L (calls Q.setLeafNode(k, L))
            else let Q point to P (call Q.setParNode(k, P))
            let R? point to R and Q, set R? to be the root
            increment the height and return
    else if L is not null and L?s parent Par is not the root
    // level = height-1 AND level > 1
        if Par still has at least one more slot
            then let Par point to L
        updates only the last key of Par?s parent (calls updateKey(path, level-1, key),
                                                                                this method works recursively)
        else
            creates a new ParNode Q and let Q point to L
            recursively calls updateTree(path, null, Q, level-1, k)
    else
    // P is not null and P?s parent Par is not the root
        if Par still has at least one more slot
            then let Par point to P
            updates only the last key of Par?s parent (calls updateKey(path, level-1, key))
        else
        creates a new ParNode Q and let Q point to P
    recursively calls updateTree(path, null, Q, level-1, k)
 
 

Class: LEAFNODE

        The leaves of the B+ tree.
        Leaves contain a character array, which contains a small segment of the string represented by the BigString. Each leaf contains a pointer, so that it may reference another LeafNode. This is because the leaves of a B+ tree are linked together in a linked list (singly linked, in our case). The pointer to the LeafNode from this leaf is a reference to this leaf's successor in this list. As well, a leaf has a unique key (unique compared to the keys of other leaves in the same B+ tree), which indicate how far from the start of the linked list this leaf is. Keys are ordered starting from 0 and increasing by 1 each time.
        The constructor takes a reference to a character array and a key, which the leaf's instance variables will be set to.

Constructor:

Instance variables:

Methods:

Class: PARNODE

        The internal node, or root,of the B+ tree
        The ParNode merely contains information and pointers about it's children nodes. Its children may be leaves, or other internal nodes. A ParNode contains two array of pointers: an array of pointers to leaves, and an array of pointers to other internal nodes. If a ParNode points to at least one leaf, then it only points to leaves, it does not point to other internal nodes. Likewise, if a ParNode has at least one child that is another internal node, then all its children are internal nodes; it does not have any leaf children. Before adding any children to a ParNode, one should check what kind of children it potentially has. Also, a ParNode contains an array of keys:  if the ith element in this array of keys contains the value K, then it is the ancestor of leaves with keys <= K. All keys >= 0.

        The constructor takes no parameters, and initializes children-pointer-array elements to null and key-array elements to -1, as it is unknown at the time of initialization what kind of children a ParNode will have.
 

Constructor:

Instance variables:

Methods:



        Figure 2 gives a snapshot that how a big string is being stored in a B+ tree in our implementation.


 

TESTING

        Testing in BigString project is divided into four phases: Unit testing, Integration testing, Validation testing, and System testing (see test part for details). Unit testing tests moduls separately, Intergration testing tests how good modules fit together, Validation testing tests how well software meets requirements and System testing tests overall system's functionality in real-life situations. In all testing cases, our implementation eith worked correctly or throwed exceptions where applicapable.
 

SUMMARY

        Our implementation satisfies the specs on completion, and satisfies the design requirements on information hiding, low coupling and high cohesion.

APPENDIX

        There are several changes from the original design (Phase A):         The updated class diagram is shown in Figure 3.