What about danstat?

Overview

Teaching: 30 min
Exercises: 15 min
Questions
  • Is there an easier way to access Statistics Denmark

Objectives
  • Use a package to do the API-calls to Statistics Denmark

  • Connect to Statistics Denmark, and extract data

  • Create a list of lists to control the variables to be extracted

  • Using the danstat package

Please note: These pages are autogenerated. Some of the API-calls may fail during that process. We are figuring out what to do about it, but please excuse us for any red errors on the pages for the time being.

Is there an easier way?

Many larger online services provide packages for easier access to their APIs.

Popular services might not have to do this, because enthusiasts write packages themselves.

A package called danstat is available, and makes it easier to extract data from Statistics Denmark.

The danstat package/library

Previously we retrieved at table with demographic data from Statistics Denmark.

How can we get that table using the danstat package?

Before using the library, we will need to install it:

install.packages("danstat")

Some installations of R may have problems installing it. In that case, try this:

install.packages("remotes")
library(remotes)
remotes:install_github("cran/danstat")

After installation, we load the library using the library function. And then we can access the functions included in the library:

The danstat package contain four functions, equivalent to the four endpoints we discussed earlier.

The get_subjects() function sends a request to the Statistics Denmark API, asking for a list of the subjects. The information is returned to our script, and the get_subjects() function presents us with a dataframe containing the information.

library(danstat)
subjects <- get_subjects()
subjects
   id            description active hasSubjects subjects
1   1                 People   TRUE        TRUE     NULL
2   2      Labour and income   TRUE        TRUE     NULL
3   3                Economy   TRUE        TRUE     NULL
4   4      Social conditions   TRUE        TRUE     NULL
5   5 Education and research   TRUE        TRUE     NULL
6   6               Business   TRUE        TRUE     NULL
7   7              Transport   TRUE        TRUE     NULL
8   8    Culture and leisure   TRUE        TRUE     NULL
9   9 Environment and energy   TRUE        TRUE     NULL
10 19                  Other   TRUE        TRUE     NULL

We get the 13 major subjects from Statistics Denmark we have seen before. As before, each of them have sub-subjects.

If we want to take a closer look at the subdivisions of a given subject, we use the get_subjects() function again, this time specifying which subject we are interested in:

Let us try to get the sub-subjects from the subject 1 - containing information about populations and elections:

sub_subjects <- get_subjects(subjects = 1)
sub_subjects
  id description active hasSubjects
1  1      People   TRUE        TRUE
                                                                                                                                                                                                                                                      subjects
1 3401, 3407, 3410, 3415, 3412, 3411, 3428, 3409, Population, Households, families and children, Migration, Housing, Health, Democracy, National church, Names, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE

The result is a bit complicated. The column “subjects” in the resulting dataframe contains another dataframe. We access it like we normally would access a column in a dataframe:

sub_subjects$subjects
[[1]]
    id                       description active hasSubjects subjects
1 3401                        Population   TRUE        TRUE     NULL
2 3407 Households, families and children   TRUE        TRUE     NULL
3 3410                         Migration   TRUE        TRUE     NULL
4 3415                           Housing   TRUE        TRUE     NULL
5 3412                            Health   TRUE        TRUE     NULL
6 3411                         Democracy   TRUE        TRUE     NULL
7 3428                   National church   TRUE        TRUE     NULL
8 3409                             Names   TRUE        TRUE     NULL

Those sub-subjects have their own subjects! Lets get to the bottom of this, and use 2401, Population and population projections as an example:

sub_sub_subjects <- get_subjects("3401")
sub_sub_subjects$subjects
[[1]]
     id                      description active hasSubjects subjects
1 20021               Population figures   TRUE       FALSE     NULL
2 20024 Immigrants and their descendants   TRUE       FALSE     NULL
3 20022           Population projections   TRUE       FALSE     NULL
4 20019                        Adoptions  FALSE       FALSE     NULL
5 20017                           Births   TRUE       FALSE     NULL
6 20018                        Fertility   TRUE       FALSE     NULL
7 20014                           Deaths   TRUE       FALSE     NULL
8 20015                  Life expectancy   TRUE       FALSE     NULL

Now we are at the bottom. We can see in the column “hasSubjects” that there are no sub_sub_sub_subjects.

The hierarchy is: 1 Population and elections | 3401 Population | 20021 Population figures

The final sub_sub_subject contains a number of tables, that actually contains the data we are looking for.

get_subjects is able to retrieve all the sub, sub-sub and sub-sub-sub-jects in one go. The result is a bit confusing and difficult to navigate.

Remember that the initial result was a dataframe containing another dataframe. If we go all the way to the bottom, we will get a dataframe, containing several dataframes, each of those containing several dataframes.

We recommend that you do not try it, but this is how it is done:

lots_of_subjects <- get_subjects(1, recursive = T, include_tables = T)

The “recursive = T” parameter means that get_subjects will retrieve the subjects of the subjects, and then the subjects of those subjects.

Which datatables exists?

But we ended up with a sub_sub_subject,

20021 Population figures

How do we find out which tables exists in this subject?

The get_tables() function returns a dataframe with information about the tables available for a given subject.

tables <- get_tables(subjects="20021")
tables
         id
1    FOLK1A
2   FOLK1AM
3   BEFOLK1
4   BEFOLK2
5     FOLK3
6  FOLK3FOD
7      BEF5
8        FT
9     HISB3
10      BY1
11      BY2
12      BY3
13     BEF4
14  POSTNR1
15  POSTNR2
16      KM1
17    SOGN1
18   LABY02
19   LABY03
20   LABY05
21    BEF5F
22    BEF5G
23    BEV22
24   LABY01
25   BEV107
26 KMSTA003
27   GALDER
28 KMGALDER
                                                                         text
1                                  Population at the first day of the quarter
2                                    Population at the first day of the month
3                                                       Population 1. January
4                                                       Population 1. January
5                                                       Population 1. January
6                                                       Population 1. January
7                                                       Population 1. January
8                                        Population figures from the censuses
9                                                    Summary vital statistics
10                                                      Population 1. January
11                                                      Population 1. January
12                                                      Population 1. January
13                                                      Population 1. January
14                                                      Population 1. January
15                                                      Population 1. January
16                                 Population at the first day of the quarter
17                                                      Population 1. January
18              Population i percentage of all in the same municipality group
19                             Population i percentage of all in the same age
20 Persons, who lived in the same municipality group as they did 20 years ago
21              People born in Faroe Islands and living in Denmark 1. January
22                  People born in Greenland and living in Denmark 1. January
23                                Summary vital statistics (provisional data)
24                                       Population increase per 1,000 capita
25                                                   Summary vital statistics
26                                                   Summary vital statistics
27                                                                Average age
28                                                                Average age
               unit             updated firstPeriod latestPeriod active
1            Number 2023-08-11T08:00:00      2008Q1       2023Q3   TRUE
2            Number 2023-10-10T08:00:00     2021M10      2023M09   TRUE
3            Number 2023-03-01T08:00:00        1971         2023   TRUE
4            Number 2023-03-01T08:00:00        1901         2023   TRUE
5            Number 2023-02-10T08:00:00        2008         2023   TRUE
6            Number 2023-02-10T08:00:00        2008         2023   TRUE
7            Number 2023-02-10T08:00:00        1990         2023   TRUE
8            Number 2023-02-10T08:00:00        1769         2023   TRUE
9            Number 2023-06-02T08:00:00        1901         2023   TRUE
10           Number 2023-06-02T08:00:00        2010         2023   TRUE
11           Number 2023-06-02T08:00:00        2010         2023   TRUE
12                - 2023-06-30T08:00:00        2017         2023   TRUE
13           Number 2023-04-14T08:00:00        1901         2023   TRUE
14           Number 2023-03-01T08:00:00        2010         2023   TRUE
15           Number 2023-03-01T08:00:00        2010         2023   TRUE
16           Number 2023-08-11T08:00:00      2007Q1       2023Q3   TRUE
17           Number 2023-03-03T08:00:00        2010         2023   TRUE
18         Per cent 2023-06-22T08:00:00        2008         2023   TRUE
19         Per cent 2023-06-22T08:00:00        2008         2023   TRUE
20         Per cent 2023-05-23T08:00:00        2007         2023   TRUE
21           Number 2023-02-10T08:00:00        2008         2023   TRUE
22           Number 2023-02-10T08:00:00        2008         2023   TRUE
23           Number 2023-08-11T08:00:00      2007Q2       2023Q2   TRUE
24 Per 1,000 capita 2023-06-13T08:00:00        2007         2022   TRUE
25           Number 2023-02-10T08:00:00        2006         2022   TRUE
26           Number 2023-03-03T08:00:00        2015         2022   TRUE
27             Avg. 2023-02-10T08:00:00        2005         2023   TRUE
28             Avg. 2023-03-03T08:00:00        2007         2023   TRUE
                                                             variables
1                               region, sex, age, marital status, time
2                                               region, sex, age, time
3                                       sex, age, marital status, time
4                                                       sex, age, time
5                       day of birth, birth month, year of birth, time
6                    day of birth, birth month, country of birth, time
7                                     sex, age, country of birth, time
8                                                  national part, time
9                                               type of movement, time
10                               urban and rural areas, age, sex, time
11                             municipality, city size, age, sex, time
12 urban and rural areas, population area and population density, time
13                                                       islands, time
14                                         postal code, sex, age, time
15                                         postal code, sex, age, time
16                         parish, member of the National Church, time
17                                              parish, sex, age, time
18                                      municipality groups, age, time
19                                      municipality groups, age, time
20                                      municipality groups, age, time
21                              sex, age, parents place of birth, time
22                              sex, age, parents place of birth, time
23                                 region, type of movement, sex, time
24                         municipality groups, type of movement, time
25                                 region, type of movement, sex, time
26                                             parish, movements, time
27                                             municipality, sex, time
28                                                   parish, sex, time

We get at lot of information here. The id identifies the table, text gives a description of the table that humans can understand. When the table was last updated and the first and last period that the table contains data for.

In the variables column, we get information on what kind of data is stored in the table.

Before we pull out the data, we need to know which variables are available in the table. We do this with this function:

metadata <- get_table_metadata("FOLK1A", variables_only = T)
metadata
          id           text elimination  time                     map
1     OMRÅDE         region        TRUE FALSE denmark_municipality_07
2        KØN            sex        TRUE FALSE                    <NA>
3      ALDER            age        TRUE FALSE                    <NA>
4 CIVILSTAND marital status        TRUE FALSE                    <NA>
5        Tid           time       FALSE  TRUE                    <NA>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               values
1                                                                                                                                                                                           000, 084, 101, 147, 155, 185, 165, 151, 153, 157, 159, 161, 163, 167, 169, 183, 173, 175, 187, 201, 240, 210, 250, 190, 270, 260, 217, 219, 223, 230, 400, 411, 085, 253, 259, 350, 265, 269, 320, 376, 316, 326, 360, 370, 306, 329, 330, 340, 336, 390, 083, 420, 430, 440, 482, 410, 480, 450, 461, 479, 492, 530, 561, 563, 607, 510, 621, 540, 550, 573, 575, 630, 580, 082, 710, 766, 615, 707, 727, 730, 741, 740, 746, 706, 751, 657, 661, 756, 665, 760, 779, 671, 791, 081, 810, 813, 860, 849, 825, 846, 773, 840, 787, 820, 851, All Denmark, Region Hovedstaden, Copenhagen, Frederiksberg, Dragør, Tårnby, Albertslund, Ballerup, Brøndby, Gentofte, Gladsaxe, Glostrup, Herlev, Hvidovre, Høje-Taastrup, Ishøj, Lyngby-Taarbæk, Rødovre, Vallensbæk, Allerød, Egedal, Fredensborg, Frederikssund, Furesø, Gribskov, Halsnæs, Helsingør, Hillerød, Hørsholm, Rudersdal, Bornholm, Christiansø, Region Sjælland, Greve, Køge, Lejre, Roskilde, Solrød, Faxe, Guldborgsund, Holbæk, Kalundborg, Lolland, Næstved, Odsherred, Ringsted, Slagelse, Sorø, Stevns, Vordingborg, Region Syddanmark, Assens, Faaborg-Midtfyn, Kerteminde, Langeland, Middelfart, Nordfyns, Nyborg, Odense, Svendborg, Ærø, Billund, Esbjerg, Fanø, Fredericia, Haderslev, Kolding, Sønderborg, Tønder, Varde, Vejen, Vejle, Aabenraa, Region Midtjylland, Favrskov, Hedensted, Horsens, Norddjurs, Odder, Randers, Samsø, Silkeborg, Skanderborg, Syddjurs, Aarhus, Herning, Holstebro, Ikast-Brande, Lemvig, Ringkøbing-Skjern, Skive, Struer, Viborg, Region Nordjylland, Brønderslev, Frederikshavn, Hjørring, Jammerbugt, Læsø, Mariagerfjord, Morsø, Rebild, Thisted, Vesthimmerlands, Aalborg
2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        TOT, 1, 2, Total, Men, Women
3 IALT, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, Age, total, 0 years, 1 year, 2 years, 3 years, 4 years, 5 years, 6 years, 7 years, 8 years, 9 years, 10 years, 11 years, 12 years, 13 years, 14 years, 15 years, 16 years, 17 years, 18 years, 19 years, 20 years, 21 years, 22 years, 23 years, 24 years, 25 years, 26 years, 27 years, 28 years, 29 years, 30 years, 31 years, 32 years, 33 years, 34 years, 35 years, 36 years, 37 years, 38 years, 39 years, 40 years, 41 years, 42 years, 43 years, 44 years, 45 years, 46 years, 47 years, 48 years, 49 years, 50 years, 51 years, 52 years, 53 years, 54 years, 55 years, 56 years, 57 years, 58 years, 59 years, 60 years, 61 years, 62 years, 63 years, 64 years, 65 years, 66 years, 67 years, 68 years, 69 years, 70 years, 71 years, 72 years, 73 years, 74 years, 75 years, 76 years, 77 years, 78 years, 79 years, 80 years, 81 years, 82 years, 83 years, 84 years, 85 years, 86 years, 87 years, 88 years, 89 years, 90 years, 91 years, 92 years, 93 years, 94 years, 95 years, 96 years, 97 years, 98 years, 99 years, 100 years, 101 years, 102 years, 103 years, 104 years, 105 years, 106 years, 107 years, 108 years, 109 years, 110 years, 111 years, 112 years, 113 years, 114 years, 115 years, 116 years, 117 years, 118 years, 119 years, 120 years, 121 years, 122 years, 123 years, 124 years, 125 years
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         TOT, U, G, E, F, Total, Never married, Married/separated, Widowed, Divorced
5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      2008K1, 2008K2, 2008K3, 2008K4, 2009K1, 2009K2, 2009K3, 2009K4, 2010K1, 2010K2, 2010K3, 2010K4, 2011K1, 2011K2, 2011K3, 2011K4, 2012K1, 2012K2, 2012K3, 2012K4, 2013K1, 2013K2, 2013K3, 2013K4, 2014K1, 2014K2, 2014K3, 2014K4, 2015K1, 2015K2, 2015K3, 2015K4, 2016K1, 2016K2, 2016K3, 2016K4, 2017K1, 2017K2, 2017K3, 2017K4, 2018K1, 2018K2, 2018K3, 2018K4, 2019K1, 2019K2, 2019K3, 2019K4, 2020K1, 2020K2, 2020K3, 2020K4, 2021K1, 2021K2, 2021K3, 2021K4, 2022K1, 2022K2, 2022K3, 2022K4, 2023K1, 2023K2, 2023K3, 2008Q1, 2008Q2, 2008Q3, 2008Q4, 2009Q1, 2009Q2, 2009Q3, 2009Q4, 2010Q1, 2010Q2, 2010Q3, 2010Q4, 2011Q1, 2011Q2, 2011Q3, 2011Q4, 2012Q1, 2012Q2, 2012Q3, 2012Q4, 2013Q1, 2013Q2, 2013Q3, 2013Q4, 2014Q1, 2014Q2, 2014Q3, 2014Q4, 2015Q1, 2015Q2, 2015Q3, 2015Q4, 2016Q1, 2016Q2, 2016Q3, 2016Q4, 2017Q1, 2017Q2, 2017Q3, 2017Q4, 2018Q1, 2018Q2, 2018Q3, 2018Q4, 2019Q1, 2019Q2, 2019Q3, 2019Q4, 2020Q1, 2020Q2, 2020Q3, 2020Q4, 2021Q1, 2021Q2, 2021Q3, 2021Q4, 2022Q1, 2022Q2, 2022Q3, 2022Q4, 2023Q1, 2023Q2, 2023Q3

There is a lot of other metadata in the tables, including the phone number to the staffmember at Statistics Denmark that is responsible for maintaining the table. We are only interested in the variables, which is why we add the parameter “variables_only = T”.

What kind of values can the individual datapoints take?

metadata %>% 
  slice(4) %>% 
  pull(values)
[[1]]
   id              text
1 TOT             Total
2   U     Never married
3   G Married/separated
4   E           Widowed
5   F          Divorced

We use the slice function from tidyverse to pull out the fourth row of the dataframe, and the pull-function to pull out the values in the values column.

The same trick can be done for the other fields in the table:

metadata %>% 
  slice(1) %>% 
  pull(values) %>% 
  .[[1]] %>% 
  head
   id               text
1 000        All Denmark
2 084 Region Hovedstaden
3 101         Copenhagen
4 147      Frederiksberg
5 155             Dragør
6 185             Tårnby

Here we see the individual municipalities in Denmark.

Now we are almost ready to pull out the actual data!

But first!

Which variables do we want?

We need to specify which variables we want in our answer. Do we want the total population for all municipalities in Denmark? Or just a few? Do we want the total population, or do we want it broken down by sex.

These variables, and the values of them, need to be specified when we pull the data from Statistics Denmark.

We also need to provide that information in a specific way.

If we want data for all municipalites, we want to pull the variable “OMRÅDE” from the list of variables.

Therefore we need to give the function an argument containing both the information that we want the population data broken down by “OMRÅDE”, and that we want all values of “OMRÅDE”.

As before, we need to specify what we want using a list. Let us make our first list:

list(code = "OMRÅDE", values = NA)
$code
[1] "OMRÅDE"

$values
[1] NA

This list have to components. One called “code”, and one called “values”. Code have the content “OMRÅDE”, specifying that we want the variable in the data from Statistics Denmark calld “OMRÅDE”.

“values” has the content “NA”. We use “NA”, when we want to specify that we want all the “OMRÅDE”. If we only wanted a specific municipality, we could instead specify it instead of writing “NA”.

Let us assume that we also want to break down the data based on marriage status.

That information is stored in the variable “CIVILSTAND”.

And above, we saw that we had the following values in that variable:

metadata %>% 
  slice(4) %>% 
  pull(values)
[[1]]
   id              text
1 TOT             Total
2   U     Never married
3   G Married/separated
4   E           Widowed
5   F          Divorced

A value for the total population is probably not that interesting, if we pull all the individual values for “Never married” etc.

We can now make another list:

  list(code = "CIVILSTAND", values = c("U", "G", "E", "F"))
$code
[1] "CIVILSTAND"

$values
[1] "U" "G" "E" "F"

Here the “values” part is a vector containing the values we want to pull out for that variable.

It might be interesting to take a look at how the population changes over time.

In that case we need to pull out data from the “Tid” variable.

That would look like this:

list(code = "Tid", values = NA)
$code
[1] "Tid"

$values
[1] NA

If we want to pull data broken down by all three variables, we need to provide a list, containing three lists.

We do that using this code:

variables <- list(list(code = "OMRÅDE", values = NA),
                  list(code = "CIVILSTAND", values = c("U", "G", "E", "F")),
                  list(code = "Tid", values = NA)
              )
variables
[[1]]
[[1]]$code
[1] "OMRÅDE"

[[1]]$values
[1] NA


[[2]]
[[2]]$code
[1] "CIVILSTAND"

[[2]]$values
[1] "U" "G" "E" "F"


[[3]]
[[3]]$code
[1] "Tid"

[[3]]$values
[1] NA

And now, finally, we are ready to get the data!

data <- get_data(table_id = "FOLK1A", variables = variables)
Rows: 26460 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (3): OMRÅDE, CIVILSTAND, TID
dbl (1): INDHOLD

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

It takes a short moment. But now we have a dataframe containing the data we requested:

head(data)
# A tibble: 6 × 4
  OMRÅDE      CIVILSTAND    TID    INDHOLD
  <chr>       <chr>         <chr>    <dbl>
1 All Denmark Never married 2008Q1 2552700
2 All Denmark Never married 2008Q2 2563134
3 All Denmark Never married 2008Q3 2564705
4 All Denmark Never married 2008Q4 2568255
5 All Denmark Never married 2009Q1 2575185
6 All Denmark Never married 2009Q2 2584993

This procedure will work for all the tables from Statistics Denmark!

The data is nicely formatted and ready to use. Almost.

Before we do anything else, let us save the data.

write_csv2(data, "data/SD_data.csv")

Key Points

  • Larger services often provide packages to make it easier to use their API