Methods
Methods
On this page you will find an overview of the data collection and analysis methods in which I have extensive expertise.
Please note that some methods are named differently in different disciplines, while other methods are more likely to be understood as the generic term of an entire family of methods.
If the method you want is not listed, it is quite possible that a different term is simply used on this page. Please feel free to contact me accordingly.
Overview analysis methods
Methods for analyzing distributions, relationships and trends
- Uni- and bivariate analyses: This includes the analysis of the central tendency (e.g. mean and median) and dispersion (e.g. variance, standard deviation, skewness and kurtosis) of individual variables as well as the strength of the association between two variables (e.g. Pearson’s r, Spearman’s Rho, Chi² test, t-test and ANOVA).
- Exploratory and confirmatory factor analyzes (EFA & CFA): Factor analysis is (among other possible applications) generally a procedure with which the quality of a test or survey instrument can be evaluated. If you want to measure a “latent” phenomenon that cannot be directly observed (e.g. “health awareness”) and would like to use several individual questions in a questionnaire (e.g. “Have you had medical check-ups in the past year?”) to do so, then factor analysis can be used to analyze the extent to which each individual question contributes to measuring the latent phenomenon. This leads to the identification of weighting coefficients and an evaluation of the overall data quality.
- Cluster analysis: This is a family of procedures that can be used to identify groups of people based on common characteristics. This can be used, for example, to create and differentiate customer segments.
- Residual diagnostics and outlier identification: The quality of a regression analysis can be evaluated using residual diagnostics. Part of this is outlier identification, which can be used, for example, to identify individuals within a data set who behave “very unusually”. This forms the basis, for example, of fraud detection procedures such as those used by banks to detect credit card fraud.
- Trend analysis: Especially for time series data (e.g. stock price trends or temperature over time), it is often necessary to identify whether values are currently increasing, decreasing or moving sideways. This enables forecasts of future developments.
Classic methods for developing causal and predictive models
- Regression analyses: This includes all regression techniques such as the “standard” linear regression (OLS regression), logistic regression, non-linear regression, generalized linear regression (GLM), non-parametric regression or Bayesian regression. One of the goals of regression analysis is to develop predictive models.
- Fixed and random effects analyses: Fixed and random effects analyses are used as alternatives to classic regression analyses when panel data (data collected at multiple points in time) are available. The aim is to identify factors that do not change over time and thus increase the quality of the estimate compared to classic regression analyses.
- Multilevel analyses: These are special procedures within the family of regression analyses that are used when so-called hierarchical data is available (e.g. analysis of students, where both characteristics of the individual students and characteristics of the class/school are analyzed). If hierarchical data is available, multilevel analyses increase the quality of the estimate compared to classic regression analyses.
- Analysis of paradata: The analysis of paradata (e.g. reaction times, user behavior in an app or on a website) enables the analysis of certain phenomena that would otherwise appear differently if observed directly. Participants in studies behave differently (less naturally) when they are aware that they are being observed. Analysis of paradata circumvents this phenomenon.
Advanced methods for developing causal and predictive models
- Structural equation modeling (SEM): SEM is considered a combination of regression analysis and CFA and represents an extremely flexible framework for modeling and estimating a wide variety of causal models. For example, fixed and random effects models as well as multilevel models can be estimated using SEM. Different estimation algorithms (e.g. ML, WLSMV, GLS) enable both parametric and non-parametric estimates, so that variables with a wide range of scaling (e.g. binary, ordered-categorical or continuous variables) can be used within the model.
- Moderator and mediator analyses, non-linear models: With the help of regression/SEM it is possible to analyze complex path dependencies, which arise, for example, from the presence of moderators and mediators. This enables an improved approximation to reality compared to classic, linear methods.
- Machine learning methods (e.g. random forest analyses): Such methods generally enable a variety of possible applications, but are probably most often used for the development of prediction models. For example, from a variety of factors, those that lead to a specific event can be identified (e.g. which combinations of existing factors lead to a customer making a purchase).
- (Latent) Growth curve models: Growth curve models enable the analysis of factors that influence whether a certain trend is currently increasing, decreasing or moving sideways.
- (Latent) Autoregressive and cross-lagged models: These models allow conclusions to be drawn about the causal order of phenomena. E.g.: Do depressed people become unemployed more often, does unemployment make people depressed or is the relationship interdependent?
Overview of data collection methods
Questionnaire surveys
- Questionnaire development: One of the standard instruments for collecting data is the questionnaire. However, when developing and collecting data using questionnaires, a number of “errors” can occur that reduce the quality of the data collected. This is called the “total survey error”. Accordingly, I will support you in developing your questionnaire in order to avoid these errors.
- Planning, programming and conducting online surveys: One of the fastest ways to collect data is to conduct online surveys. I would be happy to support you in digitizing your questionnaire and making it accessible online.
- Planning CATI and CASI surveys: Computer-assisted telephone interviews (CATI) and computer-assisted self-interviews (CASI) are nowadays considered classic methods compared to online surveys, but are still very important. If you would like to carry out such a survey, I will support you with the organization and implementation.
- Factorial surveys: Factorial surveys are questionnaire surveys in which the questionnaires (the stimuli presented) can be different for each respondent. This makes it possible to analyze how participants react to different stimuli, which, among other things, allows conclusions to be drawn about possible optimization potential.
Further methods
- Experimental and quasi-experimental designs: Experiments are often considered the “gold standard” when the causal effect (e.g. the effectiveness of medication, trainings or interventions) is to be clearly analyzed. Accordingly, I support you in the planning and implementation of your experiment.
- Conducting interviews: Data can be collected not only using standardized questionnaires, but also using partially standardized or unstandardized interviews. Compared to standardized questionnaires, this enables deeper insights into certain phenomena, simply because interviewers can ask questions at relevant points. Such interviews are often carried out during usability evaluations or within focus groups, for example. In this regard, I can support you in planning such an interview, recruiting or training interviewers, or by conducting the interviews myself.
- Export and analysis of data from databases: Data does not necessarily have to be collected anew for each analysis and can instead already be available in certain databases (e.g. eCommerce data from an online shop system). I can export and process such data for you so that useful analyzes can be carried out.
- Mixed mode surveys: In mixed mode surveys data from different sources are linked to one another (e.g. user data from an app with questionnaire data from an online survey). The particular focus here is on the data protection-compliant implementation, which often represents a significant problem. I would be happy to advise and support you in this regard.