blog
HOME · CREATIVE · WEB · TECH · BLOG

Friday, October 10th, 2008

Turbocharging 4D v11 SQL

[These are notes from 4D Summit 2008]

Presenters: Olivier Dechanels, Thibaud Arguillere

Olivier Deschanels giving a presentation at 4D SummitHow the datafile is organized...

Each block in 4D's data file is 128 bytes.

Each block can only hold one item. An item may take more than one block and if so, they must be contiguous.

In the future they plan on having special data segments to store large items. This will let you put certain items on different disks.

A record is made up of a Header, Values and a Micro-Structure. The micro-structure is the description of the record at the time the record was saved (in case it subsequently changes).

Header 32 bytes
Values: Boolean: 1, Integer: 2, Longint: 4, Longint64/Date/Time: 8, Float: 7 + X Bytes, Alpha: 4+(2x nb chars), Blob/Picture/Text: 4
Micro-structure: 8/field

Nulls don't take space in the data file other than in the micro-structure.

The micro-structure is really useful because you can change the structure of a table without having to worry about existing data. 4D will automatically translate and deal with data in a table that was stored in a different format.

While table numbers are never reused, field numbers are reused. This means that you do need to update customer data if you delete a field and add another that takes it's place. Otherwise you'll get the old data in the new field.

When data is read from the hard drive it reads the next bits of data into a buffer, but if the next thing it needs isn't in the buffer then it has to go looking for it. Hence, try to avoid fragmentation. The best way to do this is to avoid variable sized fields, and store text and blobs outside the record. The more you can have a fixed size record the less fragmentation, and the faster your app can read from disk. [Realize that if you need to access the text a lot this can work against you.]

They don't recommend using JPR's "airbag" strategy that adds variable amount of data to make all records the same size to avoid fragmentation. Better to just store variable size data outside the record. They're working on implementing an auto airbag to force records to take a fixed amount of space to avoid future fragmentation.

Realize that you may have free space in records. For example, a many to many joining table with just two longints. You can add denormalized data in there that can make operations faster without increasing the size of your data file.

They save the length of the record in the address table. If that doesn't match the actual length of the record they alert you there's a problem with your data file.

BLOBs, Pictures, Text fields - are stored outside the record, though v11.3 lets you set a size threshold for storing inside the record. In the record the ID of the external location is stored. On disk they have the same status as a record with address tables to help locate them. This will be optimized further in the future.

Thibaud Arguillere giving a presentation at 4D SummitIn the future they're thinking of having indexes on formulas.

Indexes are somewhat costly. For a long integer index you're looking at 12 bytes per value plus the overhead of things like the header for the index and empty space in the index.

B-Tree indexes are structured on levels. Root = 128 keys, Level 1 = 128x128 = 16,384 keys, Level 2 = 16384 x 128 = 2,097,152 keys, etc. The number of access to find the record in the index depends on the number of levels - one access per level.

For ORDER BYs with multiple values have a composite index to speed them up.

In v2004 it would use the index if the selection had more than 9% of the records in the table, otherwise it would drop back to sequential for the QUERY or the ORDER BY. v11 is more intelligent and figures it out based on the number of disk accesses that are needed. This means a number of SET DATABASE PARAMETERs are now ignored.

Just because you create an index doesn't mean it will be used. Unused indexes slow down the database because it takes time to manage them and space to store them.

With cluster indexes each value has either a bit table or a long int array to remember the records that match that value. 4D decides which is more sufficient.

They're coming out with a 4DPop component (requires v11.3) that analyzes your data file and shows the storage details.

If you copy/paste pictures into 4D, v11 takes all the info you paste which may be considerably more than the picture itself. It's a good idea to CONVERT PICTURE($pict;"jpeg") to reduce the size of the images.

Tags: ,
Categories: 4D

Previous Post: « 4D Summit Q&A Wrap-Up

2 Comments

  1. Christian Sakowski Says:

    > While table numbers are never reused, field numbers are reused.

    That is not true! Table AND field numbers are reused!!!

  2. Thibaud Arguillere Says:

    Yes, I made a mystake on this during the session : table numbers are reused, sorry for this.

    *But* the data belonging to a cleared table will never go into a new one, wihile, as we explained during the session, this may happen for fields (that’s why one should ALL RECORDS/APPLY TO SELECTION([aTable];[aTable]aField:=[aTable]aField), on a non indexed field to run faster.

    Example: we have T1, T2, T3 and T4. The user creates records in T2. Then, we delete table 2 and create a new one => it will have n°2 but a different internal UID. So when the new structure is sent to the user, the records he created previously are not given to the new table.

    Thibaud

Leave a Reply

HOME · CREATIVE · WEB · TECH · BLOG