A Question related to function behaviour

ksgokul

New member
Joined
Mar 21, 2010
Messages
2
Hi,
I don't know, whether i am fine to use this forum. Please apologize, if i am not supposed to be posting here. I am a software developer involved in open source development. I will try to give a background of my question.
a) In a database, the way to go about indexing the data is usually with B+ Trees. So whenever we insert a data into a table, we make an index entry, so that people who search for data will get access to the inserted data.
b) There is a concept called function based indexes, wherein instead of indexing on a particular value(say x), we index on f(x), where f is a function. Sometimes, we need to reach the index entry from the table entry. For example, when we want to delete the data in the table, we need to mark the data in the index also as deleted. So when i accept a function, i need to know whether f(x) will always be the same for a given x. For example a random number generator cannot give the same value for two calls to the same function. Is there a way to determine whether a function will return the same output for same input, irrespective of the number of times it is called? We term these kind of functions as volatile functions
c) Similarly, we allow users to define a data type and an ordering for that data type. For example complex numbers don't have ordering. So the user should write a function, which compares two complex numbers and say which is greater. This is necessary for insertion into B+ Tree. Here we need to find those functions which behave like this.
say there are three values a,b,c and the function returns a > b, b > c and c > a(actually it should have been a > c). We call these type of data types as broken data types. Is there a mathematical way to find out such broken data type comparison functions?

Thanks,
Gokul.
 
b) Are you able to look at the function source code? If so, if the function has no static data, then it will be deterministic. That is, it will always return the same output given the same input. For example, the following code is deterministic:
Code:
void f(int x)
{
  return x*x;
}
while this is not:
Code:
void f(int x)
{
  static int a=0;
  a = a+1;
  return x*x*a;
}
Because it uses data that was not either constant or passed to the function.

If you cannot look at the code, there is no way to know for sure that the function is deterministic, but maybe you can run it 10000000 times with the same data value to see. (or, if you're really brave, you can disassemble the binary and look at the assembly code to determine how the function works).

However, you need a "base index function" (more generally called a hash function) that minimizes collisions. That is:
if x1 != x2, f(x1) != f(x2)
When hashing from a large space to a smaller space, this is impossible. Therefore, when you look up the index in the btree based on the key of the hash (here, x1 is the key, and f(x1) is the hash), you use the hash to order it in the tree and when you find the node in the tree, you implement it as a linked list of the actual keys so you can find the exact entry you want.
c) You don't need an ordering for the orginal data type: you need an ordering for the hashed result. For example, say our hash is:
Code:
double f(complex_t *my_value)
{
  return my_value->real_part*my_value->real_part + my_value->complex_part*my_value->complex_part;
}
Now, you have an indexable hash. Take this as the hash in the tree. When you find the node in the tree you search for the correct answer as:
int is_equal(complex_t *my_value, complex_t *value_in_tree_list)
{
if ((my_value->real_part == value_in_tree_list->real_part) && (my_value->imaginary_part == value_in_tree_list->imaginary_part))
return 1;
else return 0;
}
Just iterate through the list until you find the key that matches. Now you have your pointer into the database structure.
 
Hi Subhotosh Khan,
Thanks a lot for the detailed reply.
a) Actually, i am trying to take part in the development of Opensource database called Postgresql. So when we create the function based index infrastructure, we don't know, what kind of functions people might write using the infrastructure. Currently there is no functionality to go back from table to index in the source code. So the community will accept only mathematical solutions. I was actually wondering, whether there might be some tests, which if the function undergoes, we might be able to classify it as deterministic.

b) we cannot apply hashing on a given data, as it will spoil the user defined ordering. Hashing is useful for equality checks, but it will get rid of a useful index range scan path. Say for example, some person might want all db records inserted in the last five days. If i hash the date, then it will spoil the ordering of the column in the index and it won't be able to answer such queries. So we cannot apply hashing functions. Here again, i am expecting some kind of definite mathematical tests, that would help us identify whether the comparison function defined by the function is broken/not.

I think, i should have been more clear with my question. But my question is, Is there a field of mathematics, where i should concentrate to get the answer for my question?

Once again, thanks a lot for taking your time to answer my question.

Thanks,
Gokul.
 
Top