Photometric redshifts and large scale structure
To understand the cosmological laws that govern our Universe, we need a detailed map of the galaxy distribution. Large scale astronomical surveys like the Sloan Digital Sky Survey (SDSS) scanned the sky for years and collected data on hundred millions of galaxies. One simple and also crucial parameter, the distance of an extragalactic object from us, is especially to hard to measure. According to Hubble's law there is a linear relation between distance of a galaxy and the redshift of its spectrum. SDSS took picture of more than 150 million galaxies, but doe to observational limitations it was able to get redshift less then 1% of them. With my collaborators and students I am developing methods, to estimate photometric redshift and other physical parameters for the other 99%.
Spatial indexing of large databases
Large observational and measurement surveys in physics, astronomy, meteorology, biology and many other disciplines collect huge amount of complex data. Scientists create models and simulations to understand the underlying laws of nature. To efficiently handle the avalanche of observational and simulated data, and to be able to compare theory with models we need to use the current state of art of information technology. We are developing various algorithms and tools to handle multi-dimensional complex databases.
The relation between people, neurons or biochemical reactions can be modelled as complex networks of interactions. But not only nature can produce such complicated structures communication networks like the Internet is equally large and complex and we have to use the same mathematical tools and technology to analyse it that we use for studying natural systems. We have created a Europe-wide precision traffic observatory (ETOMIC), collect traffic measurements and analyse the data.
The quickly evolving technology of DNA/RNA microchips and new generation sequenators open up uncharted territories in genomics and other areas of microbiology. Even small institutes or research groups can afford to sequence whole genomes or measure the levels of the expression of all genes. These methods easily produce terabytes of data for a single sample. Handling this large amount of information is beyond not just the capacities of a standard biostatistics toolbox but often exceed the limitations of many modern datamining algorithms. We are working on the improvement of the computational framework and algorithms and collaborate with biologists and doctors to harness the information hidden in the data.