Various the completeness and the efficiency, also

Various data mining algorithms have been applied by astronomers in like most of the different applications in astronomy. But long-term researches and several mining projects
have  been made by experts in this field of data mining making use of data related to the study of astronomy because astronomy has created numerous magnificent datasets that are flexible to the approach along with
numerous other areas like as medicine and high energy physics. Instances of such
numerous projects are the SKICAT-Sky Image Cataloging and Analysis System for catalog
production and analysis of the catalog from digitized sky surveys particularly the scans given by the second Palomar Observatory
Sky Survey; the JAR Tool- Jet Propulsion Laboratory Adaptive Recognition Tool used for recognition of volcanoes formed in over 30,000 images of Venus which came by the Magellan
mission; the following
 and more general Diamond and the Lawrence Livermore National Laboratory Sapphire project work.  

Object classification

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 Classification is an crucial preliminary step in the scientific method as it provides a way for arranging information in a method that may be used to make hypotheses and compare easily with models. The two most useful concepts in object
classification are the completeness and the efficiency, also known as recall and precision. They are generally defined in terms of  true and false positives
(TP and FP) and true and false negatives (TN and FN). The completeness is the fraction of those objects
that are in reality of a given type that are  classified as that type: and the efficiency is the fraction of objects generally classified as a given type
that are truly of that type These two quantities are interesting astrophysically because, while one wants both higher completeness and efficiency, there is mostly a tradeoff involved. The importance of each often mostly depends on the application, for instance, an investigation of such rare objects generally
requires high completeness while allowing some contamination (lower efficiency) but statistical clustering of
cosmological objects requires high efficiency even at the cost of completeness.  

Star-Galaxy Separation

 Due to their physical size in comparison to their distance from us, almost all the stars are unresolved in photometric datasets, and therefore appear as point
sources. Galaxies despite being further
away, generally subtend a larger angle and appear as extended sources. However, other astrophysical
objects such as quasars and supernovae,
are also seen as as point sources. Thus, the separation of photometric catalog into stars
and galaxies, or more generally, stars, galaxies and other
objects, is an important
problem. The number of galaxies and stars in typical surveys (of order 108 or above) requires that such separation must be
automated. This problem is a well studied one and automated
approaches were employed before current data mining algorithms became famous,
for instance, during digitization done by the scanning of various
photographic plates by machines such as the APM and DPOSS.Several
data mining algorithms have been applied, including ANN,DT,mixture
modelling and SOM with most algorithms achieving over efficiency around 95%. Typically, this is performed using a set of measured morphological parameters
that are made from the survey photometry, with perhaps colors or other information, such as the seeing. The
advantage of  data mining approach is that all such information about
each object is easily incorporated.

 Galaxy Morphology

Galaxies come in a range
of numerous sizes and shapes, or more collectively, morphology. The most well-known system for the morphological classification of galaxies is the Hubble Sequence of elliptical, spiral, barred
spiral, and irregular, along with various subclasses. This system correlates to many physical properties known to be crucial in the formation and formation of galaxies. Because galaxy morphology
is a tough and complex phenomenon that correlates to the underlying the subject of physics, but is not
unique to any one given process, the Hubble sequence has shown, despite
it being rather subjective and based on visible-light  morphology originally created from blue-biased photographic plates.
The Hubble sequence has been extended in various other
methods, and for data mining
purposes the T system has been extensively taken into consideration.
This system maps the categorical Hubble types E, S0, Sa, Sb, Sc, Sd, and Irr onto the numerical values -5 to 10. One can train a supervised algorithm to allot T types to images for which measured parameters are made available. Such parameters can be completely morphological, or comprise of other information such as color. A
series of papers
written by Lahav and collaborators do
exactly the same, by applying ANNs to predict the T type of galaxies at low redshift, and finding equal amount of accuracy to
human experts. ANNs have also been applied to higher redshift data to distinguish between
normal and unique galaxies and the fundamentally topological
and unsupervised SOM ANN has been used to classify various galaxies from Hubble Space Telescope images, where the initial distribution of various
classes is unknown. Likewise, ANNs have been used to obtain the morphological types from galaxy spectra.
Photometric redshifts An
area of astrophysics that has greatly increased in popularity in the last few years is the estimation of redshifts from photometric data (photo-zs). This is because, although the distances are less accurate than the ones obtained with
spectra, the sheer number
of objects with photometric measurements can often make up for the reduction in individual
accuracy by suppressing the
statistical noise of an ensemble calculation. The two common approaches to photo-zs are the template method and
the empirical training the set method. The template approach has many
difficult issues, including calibration, zero-points, priors, multi-wavelength performance
(e.g., poor in the mid-infrared), and difficulty handling
missing or incomplete training data. We focus in this review on the
empirical approach, as it is an implementation of supervised learning.
3.2.1. Galaxies At low redshifts, the calculation of photometric redshifts for normal galaxies is quite straightforward due to the break in the typical galaxy spectrum at 4000A. Thus, as a galaxy is redshifted with
increasing distance, the color (measured as a difference in magnitudes) changes
relatively smoothly. As a result, both template and empirical
photo-z approaches obtain similar outcomes, a
root-mean-square deviation of ~ 0.02 in redshift,
which is near to the best possible result given
the intrinsic spread in the properties. This has been
shown with ANNs SVM DT, kNN, empirical polynomial relations, numerous template-based studies, and several other procedures. At
higher redshifts, acheiving accurate results becomes more difficult because the 4000A break is shifted redward of the optical, galaxies are fainter and thus spectral data are sparser, and galaxies intrinsically evolve over time. While supervised learning has been successfully used, beyond the spectral regime the obvious limitation arises that in order to reach the
limiting magnitude of the photometric portions of surveys, extrapolation would
be required. In this regime, or where only small training sets are available,
template-based results can be used, but without spectral information, the
templates themselves are being extrapolated. However, the extrapolation of the
templates is being done in a more physically motivated
manner. It is likely that the more general hybrid method of using empirical data to iteratively improve the templates or the semi-supervised procedure
described in will ultimately provide a more elegant solution. Another
issue at higher redshift is that the available numbers of objects can become
quite small (in the hundreds or fewer), thus reintroducing the curse of dimensionality by a simple lack of objects in comparison to measured wavebands. The methods of dimension reduction can help to mitigate this effectVarious data mining algorithms have been applied by astronomers in like most of the different applications in astronomy. But long-term researches and several mining projects
have  been made by experts in this field of data mining making use of data related to the study of astronomy because astronomy has created numerous magnificent datasets that are flexible to the approach along with
numerous other areas like as medicine and high energy physics. Instances of such
numerous projects are the SKICAT-Sky Image Cataloging and Analysis System for catalog
production and analysis of the catalog from digitized sky surveys particularly the scans given by the second Palomar Observatory
Sky Survey; the JAR Tool- Jet Propulsion Laboratory Adaptive Recognition Tool used for recognition of volcanoes formed in over 30,000 images of Venus which came by the Magellan
mission; the following
 and more general Diamond and the Lawrence Livermore National Laboratory Sapphire project work.  

Object classification

 Classification is an crucial preliminary step in the scientific method as it provides a way for arranging information in a method that may be used to make hypotheses and compare easily with models. The two most useful concepts in object
classification are the completeness and the efficiency, also known as recall and precision. They are generally defined in terms of  true and false positives
(TP and FP) and true and false negatives (TN and FN). The completeness is the fraction of those objects
that are in reality of a given type that are  classified as that type: and the efficiency is the fraction of objects generally classified as a given type
that are truly of that type These two quantities are interesting astrophysically because, while one wants both higher completeness and efficiency, there is mostly a tradeoff involved. The importance of each often mostly depends on the application, for instance, an investigation of such rare objects generally
requires high completeness while allowing some contamination (lower efficiency) but statistical clustering of
cosmological objects requires high efficiency even at the cost of completeness.  

Star-Galaxy Separation

 Due to their physical size in comparison to their distance from us, almost all the stars are unresolved in photometric datasets, and therefore appear as point
sources. Galaxies despite being further
away, generally subtend a larger angle and appear as extended sources. However, other astrophysical
objects such as quasars and supernovae,
are also seen as as point sources. Thus, the separation of photometric catalog into stars
and galaxies, or more generally, stars, galaxies and other
objects, is an important
problem. The number of galaxies and stars in typical surveys (of order 108 or above) requires that such separation must be
automated. This problem is a well studied one and automated
approaches were employed before current data mining algorithms became famous,
for instance, during digitization done by the scanning of various
photographic plates by machines such as the APM and DPOSS.Several
data mining algorithms have been applied, including ANN,DT,mixture
modelling and SOM with most algorithms achieving over efficiency around 95%. Typically, this is performed using a set of measured morphological parameters
that are made from the survey photometry, with perhaps colors or other information, such as the seeing. The
advantage of  data mining approach is that all such information about
each object is easily incorporated.

 Galaxy Morphology

Galaxies come in a range
of numerous sizes and shapes, or more collectively, morphology. The most well-known system for the morphological classification of galaxies is the Hubble Sequence of elliptical, spiral, barred
spiral, and irregular, along with various subclasses. This system correlates to many physical properties known to be crucial in the formation and formation of galaxies. Because galaxy morphology
is a tough and complex phenomenon that correlates to the underlying the subject of physics, but is not
unique to any one given process, the Hubble sequence has shown, despite
it being rather subjective and based on visible-light  morphology originally created from blue-biased photographic plates.
The Hubble sequence has been extended in various other
methods, and for data mining
purposes the T system has been extensively taken into consideration.
This system maps the categorical Hubble types E, S0, Sa, Sb, Sc, Sd, and Irr onto the numerical values -5 to 10. One can train a supervised algorithm to allot T types to images for which measured parameters are made available. Such parameters can be completely morphological, or comprise of other information such as color. A
series of papers
written by Lahav and collaborators do
exactly the same, by applying ANNs to predict the T type of galaxies at low redshift, and finding equal amount of accuracy to
human experts. ANNs have also been applied to higher redshift data to distinguish between
normal and unique galaxies and the fundamentally topological
and unsupervised SOM ANN has been used to classify various galaxies from Hubble Space Telescope images, where the initial distribution of various
classes is unknown. Likewise, ANNs have been used to obtain the morphological types from galaxy spectra.
Photometric redshifts An
area of astrophysics that has greatly increased in popularity in the last few years is the estimation of redshifts from photometric data (photo-zs). This is because, although the distances are less accurate than the ones obtained with
spectra, the sheer number
of objects with photometric measurements can often make up for the reduction in individual
accuracy by suppressing the
statistical noise of an ensemble calculation. The two common approaches to photo-zs are the template method and
the empirical training the set method. The template approach has many
difficult issues, including calibration, zero-points, priors, multi-wavelength performance
(e.g., poor in the mid-infrared), and difficulty handling
missing or incomplete training data. We focus in this review on the
empirical approach, as it is an implementation of supervised learning.
3.2.1. Galaxies At low redshifts, the calculation of photometric redshifts for normal galaxies is quite straightforward due to the break in the typical galaxy spectrum at 4000A. Thus, as a galaxy is redshifted with
increasing distance, the color (measured as a difference in magnitudes) changes
relatively smoothly. As a result, both template and empirical
photo-z approaches obtain similar outcomes, a
root-mean-square deviation of ~ 0.02 in redshift,
which is near to the best possible result given
the intrinsic spread in the properties. This has been
shown with ANNs SVM DT, kNN, empirical polynomial relations, numerous template-based studies, and several other procedures. At
higher redshifts, acheiving accurate results becomes more difficult because the 4000A break is shifted redward of the optical, galaxies are fainter and thus spectral data are sparser, and galaxies intrinsically evolve over time. While supervised learning has been successfully used, beyond the spectral regime the obvious limitation arises that in order to reach the
limiting magnitude of the photometric portions of surveys, extrapolation would
be required. In this regime, or where only small training sets are available,
template-based results can be used, but without spectral information, the
templates themselves are being extrapolated. However, the extrapolation of the
templates is being done in a more physically motivated
manner. It is likely that the more general hybrid method of using empirical data to iteratively improve the templates or the semi-supervised procedure
described in will ultimately provide a more elegant solution. Another
issue at higher redshift is that the available numbers of objects can become
quite small (in the hundreds or fewer), thus reintroducing the curse of dimensionality by a simple lack of objects in comparison to measured wavebands. The methods of dimension reduction can help to mitigate this effect

x

Hi!
I'm Ethel!

Would you like to get a custom essay? How about receiving a customized one?

Check it out