On Sets


ACI - Documentation Français English German ACI Technical Notes ACI Technical Notes, By Subject Back Previous Next Find

On Sets

By Jean-Yves Fock-Hoon

Quality Assurance Manager, 4D, Inc.

Technical Note 01-13

Technical Notes for Technical Notes for 01-03 March 2001

Introduction


Sets allow you to manipulate selections easily with less memory. If you want to handle selections quickly, sets can really help you.

What is a Selection?


4D uses the notion of a selection. Each table has its own selection, named Current selection. This selection could be empty or contains all records or some records for the table. When you work with a table, you're working on its current selection. This is what you can see in User environment or in the Custom Menus environment after performing a 4D command such as ALL RECORDS or QUERY and DISPLAY SELECTION or MODIFY SELECTION, i.e. a list of records.

In fact, the current selection is a list of pointers that point to the records that are members of the selection. So, when you're sorting a selection, you're just sorting a list on pointers. There is only one current selection per table and per process. Working in a table could require you to work with different selections. The COPY NAMED SELECTION and CUT NAMED SELECTION commands allow you to work with many selections. After building a selection, you could require another one but do not want to lose the current one. In this case, you can copy or cut the current selection into a temporary selection and build a new one or reload a previous temporary selection.

So, why not use only selections?

A selection is a list of pointers that you can sort. Reloading this selection allows you to keep the order. However, selections also have disadvantages:

Selections can only be in memory. They cannot be saved.

Two or more selections cannot be manipulated together.

They require a lot of memory, four bytes per record.

If the order in a selection is not important, you should consider using sets:

Sets can be saved or loaded. A document can be stored in your data file and reloading a set will be faster than rebuilding a selection,

You can get the union, difference, and interesection between two sets,

You can easily check if a record is in the set,

Sets require less memory, 1 bit per record

How does it really work? One bit per record is used. This bit will just say that "The record is a member of this set or not". That's also why sets cannot be sorted and manipulating selections with sets can be faster than manipulating temporary selections.

Note: Sets and temporary selections must be always updated. If you delete a record, which is a member of a temporary selection or a member of a set (but do not remove it from the set), the temporary selection or the set will become obsolete.

How can I use a set?


Who can remember this figure?

Sets in 4D use the same scheme. Once a selection has been defined, you can save it into a set in order to reduce the amount of the memory used by the selection, to be able to work with them, and create other sets (UNION, INTERSECTION, DIFFERENCE) and perform the task quickly.

4D allows you to use three types of sets:

Inter-process sets: These sets start with a diamond sign (<>) and can be used by all processes of the current application.

Process sets: These sets do not start with any specific character and can be used only by the process that created them.

Local process: These sets start with a $ sign and the set exists only in the process of the 4D Client that created them.

It's up to you to decide if you want to use inter-process, process, or local sets. Each type of set has its advantages and disadvantages. In a 4D Server configuration, inter-process and process sets are maintained on the 4D Server machine and a copy of these sets is made on the Client side. However, local sets are maintained on the 4D Client machine. Operations will be performed on the Client machine but for specific commands, the local set could be copied on the Server side. Each 4D commands that manage sets are optimized for each specific case. This requires having all sets on the same machine. You should consider using inter-process or process sets to facilitate your code. However, using local sets is not necessarily to be avoided. This has an advantage: it reduces the number of requests on the network, since almost of the operations that will manage inter-process or process sets will be performed on 4D Server, while local sets will be managed only on 4D Client. This part is explained in the 4D Server reference manual. Seee the chapter "4D Server and sets".

The maximum number of characters that a name of a set can have is 80, "$" or "<>" included.

Note: 4D can create two sets if needed, "Lockedset" and "Userset". If you used the ARRAY TO SELECTION, APPLY TO SELECTION, or DELETE SELECTION commands, other processes might lock some records. In this case, 4D will create a set named "Lockedset" that would contain these locked records. If no records were locked, this set will not be created. When a selection has been displayed, you can highlight one or more records in this selection. These highlighted records will be added to a set named "Userset". This could help you to handle this set, such as reducing your current selection to the selected records or to remove the selected records from your current selection. These sets are local to the 4D Client application.

Speed! I want more speed

Once a selection has been built, it can be saved into a set. A set can be loaded quickly and working with sets is fast. So, what would be a good example to use sets?

What about a custom selection? A user performs a complex query. This returns a huge selection that this user has to work on. Time is running out and he is not finished. It would be nice if he could save his job and reload it by tomorrow…You can do it. You just need to copy this selection into a set and save this set. Next day, you only have to reload this set…

This is not the only way you can use sets. Let's take the example database.

This database has been designed for a single-user environment and has not been designed to be used in Client/Server mode with more than one 4D Client. This database does not manage the update of sets between all 4D Clients if one of them has been updated.

In the Custom Menus environment, there is an item "How does it work" from the "Set" menu.

This item will perform a DISPLAY SELECTION of all records of table [People].

From this screen, we can see a bunch of records, RPG players. The "Set F", "Set G" and "Set H" buttons allow you to create sets named "SetF", "SetG" and "SetG". These sets are created based on the highlighted records, i.e., based on the "Userset" set. To create a set, highlight some records, then click the Set F, Set G, or Set H buttons. The COPY SET command will be used with the set "Userset".

The following illustration shows the results after the three sets have been defined.

The "Use set F", "Use set G" and "Use set H" buttons allow you to replace your current selection by the selection defined by that set. By defining these sets, other sets ("FinterG", "FinterH", "GinterH", "FunionG", "FunionH", "GunionH", "FinterGinterH" and "FunionGunionH") can also be created using the INTERSECTION and UNION commands.

The following illustration shows the results after doing a Use Set F on the data shown above.

The Reverse Selection button allows you to reverse the current selection from all records of the [People] table. If you've displayed only 1000 records, 4D will replace this selection by the other 49000 records, since there are 50000 records in the [People] table.

The Query button allows you to define your own query from the 4D Query dialog.

The "All records" button performs ALL RECORDS in the [People] table.

The Done button closes this window.

For each record that will be displayed, a bullet will be drawn if the current record is a member of these sets.

By defining your own records in these sets you will see how easily these bullets could be displayed by just using the "Is in set" command.

Building the same example without using sets would be a challenge. Using temporary selections it would be quite impossible or slow. Using arrays could be an acceptable solution, a little bit slower, except that it will require more memory.

What about using sets to search records?

This is one of the powerful uses of sets. After defining a set that matches a query, it will be easier to work with them in order to simulate a query with multiple conditions.

From this database, in Custom Menus, select the Query Example item from the Set menu. This will perform a MODIFY SELECTION on the [People] table. From this window, you can add, edit, or delete records. Adding, updating, and deleting records will also update all sets that the database defined.

The All records button executes ALL RECORDS for the [People] table.

The Done button closes the window.

The Query button displays the following dialog.

With this dialog, you can create and benchmark various search conditions. The time needed to do the search via sets versus via QUERY can be compared. The number of ticks and the number of records returned by the query is reported above the search criterion area.

All fields are indexed except for the Class field, which is a non-indexed Long Integer field. Queries in 4D are fast and optimized, especially if you query with indexed fields. Even if it's fast, it will still take a few ticks to perform the request. Comparing the search with one indexed field will be useless. Both queries will require similar time, even if the query by index will be sometimes faster than sets.

And what about performing a complex query based on indexed fields only?

For the first query, 4D will need to load the index tables in the cache. Once the indexes have been loaded, the search will start and 4D will perform the request after a few seconds. If the search will be re-performed, the time taken would be about 2 ticks or less.

What about the query with sets? The time taken will always be about 2 ticks. At least, 2 ticks, it's better than a few seconds for the first query.

What about queries with non-indexed fields? In the example given in the screenshot, a query with fields will always take about 118 or 152 ticks.

The same query via sets will take between 0 and 3 ticks.

In a Client/Server configuration, even in single-user configuration, complex queries involving only AND conditions will take a few seconds for the first time, and few ticks for the next time. However, queries involving OR conditions will always take the same time, i.e. about 20 ticks or a few seconds depending on the request. Using a query with sets will always returns similar times, i.e. few ticks.

How the query by set will be simulated?

Given A and B, two requests that match with two sets, and C, a set that could contain the result.

= A: The set A already matches this request. Only a USE SET (A) will be needed.

# A: The set A is the reverse. DIFFERENCE ("All Records"; A; C) is enough

A & B: This means: INTERSECTION (A; B; C)

A | B: This means: UNION (A; B; C)

A EXCEPT B: What about DIFFERENCE (A; B; C)?

What about 4D Server?


With 4D Server, regular queries are performed on the Server side. A Query by Formula will be executed on the Client side. This requires at least an exchange of requests between these two applications. Given a complex structure file with a lot of indexes and records and many 4D Client connections. If you try to perform a lot of queries with indexes on different tables, this could take a few seconds. The request will be faster if the index tables are already in the 4D Server cache. Sometimes, 4D could load the index tables, query, and unload them to reload other index tables if the activity is intensive. In this case, it will be better to use local sets with 4D Client. There is no need to load index tables and the operations such as INTERSECTION or Is in set will be done only on the 4D Client side, a little bit faster, because this will not overload the 4D Server machine and the network will not be involved.

However, using sets is not so easy in Client/Server configuration. The need to have the sets synchronized with the records requires a little work. That's why in the form method of the Input form, a method is called (ROBOT_Rec_has_been_Created, ROBOT_Rec_Has_Been_Modified, and ROBOT_Rec_Has_Been_Deleted). The main problem will be to update all sets for all clients at the same time.

This solution can really be difficult to apply with a table that will always be modified by many processes from many 4D Client applications. But this is still a perfect solution if this will be applied to a table that will receive only or mainly queries but less modification, such as a Product table, or a database on a CD-ROM, or to simulate keyword indexes for long text documents.

Summary


This tech note compares the sets to selections and explains when and how to use sets instead of selections to manipulate records and do complex searches.


ACI - Documentation Français English German ACI Technical Notes ACI Technical Notes, By Subject Back Previous Next Find