Programming Courses for Economists and Social Scientists
The IDSC offers courses on practice-oriented programming skills in the area of economic research methods for national and international researchers. The courses below are now part of the IDSC repertory. They can be given separately or mixed in time frames which range from a few days to a semester and are suited for Graduate School students as well as Faculty. Contact us at firstname.lastname@example.org if you are interested in a course at your institution.
Just like with the rest of the labor market a labor economist will eventually need to use a second language other than the predominant Stata. The same goes for other economist or social scientists. Python, originally a language for the web, is now a prime statistical language, sporting a rich collection of diverse modules that include, natural language processing, regressions, machine learning, deep learning, all kinds of stats, supreme graphing, agent-based simulations etc. According to the TIOBE Index (https://www.tiobe.com/tiobe-index), as of May 2022, Python is the most popular programming language. In comparison Stata ranks somewhere between 50 and 100. According to the World Economic Forum Python ranks among the top skills that the world tech giants require by both engineers and data scientists.
- 1. The Internet as a Data Source for Social Science with Python
- 2. Text as Data with Python
- 3. Machine Learning with Python
- 4. Working with Stata and Python
Motivation: As more and more markets (marriage market, transport market, labor market, etc) move online or are born exclusively online, our ability to study markets and understand socioeconomic phenomena will depend on being able to leverage the internet as a data source. This means text mining will be an important skill for social scientists. In recognition of this fact the European parliament is working on excluding data and text mining from future digital copyright legislation. The course covers the basics of Python selectively, depending on which language elements are necessary for the examples. The core aim is to study:
- Hit the limits working with Stata’s built in rudimentary web browser and regular expressions.
- The basics of how to install and manage a python installation and its modules.
- How to construct and brand a web browser in Python.
- How to use Python to download pages from the web and store them.
- How to use regular expression (module: re) to harvest data out of html documents.
- The data types Python provides for storing data (module: panda).
- Some graphing, basic regressions with Python etc.
The lectures will be written in Jupyter notebooks which run in a web browser so that participants can play with the code as we go along. Example highlights include downloading data from Google Trends, RePEc, Twitter, Wahlrecht.de, LinkedIn, Yahoo Finance, etc.
Motivation: Large part of the human socio-economic interaction occurs by means of written text. The ability to convert such text into data can open new research avenues. Turning text into data is a growing research area in social science. This course teaches fundamental NLP (natural language processing) techniques implemented in various Python modules (NLTK, Gensim, sklearn etc).
The core aim is to study:
- Basic statistical analysis and visualization of texts
- Building Corpora of documents
- Vectorizing documents (feature extraction)
- Stemming, cleaning etc.
- Building a feature space for you corpus.
- Preparing your corpus for machine learning.
The lectures will be written in Jupyter notebooks which run in a web browser so that participants can play with the code as we go along. The course will be hands on with many examples from the literature or other data sources.
Motivation: There are many cases in which your data is lives on a multidimensional manifold (X-rays, CT scans, photos, vectorized text, demographics etc). In such cases it is helpful to have a machine discover what’s going on and give you a direction for your analysis or to fit a model to your data. This course covers the basics of supervised and supervised machine learning as implemented in Python.
The core aim is to study:
- Unsupervised Learning techniques (e.g. kmeans, k nearest neighbor algorithm etc.)
- The mathematical underpinnings of Machine Learning (why does it work?).
- What ML is and isn’t.
- The theoretical algorithmic underpinnings on neural networks.
- Artificial Neural Networks in Python (e.g. sklearn etc).
- Deep learning in Python (with Tensorflow and Keras).
- Getting access to GPUs and TPUs with Google Colaboratory.
The course is a hands on introduction and during the course we will train and test various models on several kinds of data including data from text.
Python is a great programming language and can be written and, very importantly, read easily. Since Stata 16 Python is integrated into Stata which is the dominant statistical software in Labor economics and since Stata 17 a python module named pystata was published which allows one to call stata from python. This course will teach you how to seamlessly work using both languages and call the other one using concrete and common examples and tasks.
The core aim is two-fold:
- Understanding and using the Stata Python API (Stata Function Interface, SFI) which makes it possible to embed python code into Stata code and pass data and variable values or locals between Stata and Python. For example, a Stata script embeds a web scraping python script and places the result into Stata memory.
- Understanding and using the Python module PyStata which makes the reverse possible: embed Stata code inside a python script.
- Some familiarity with programming concepts (e.g. Stata).
- Windows, Linux or MacOS laptop or PC.
- Stata 17 or later (according to the course).
- Internet connection.
- NextCloud Client for accessing course material on https://cloud.iza.org (https://nextcloud.com/install/).
- Anaconda-Navigator (https://www.anaconda.com/download).
- Python 3.* latest version (comes with Anaconda).