Project supported by the National Science Foundation
1) What is a DMU?
DMU stands for ``Decision Making Unit’’. A DMU is the ``unit’’ portion of a unit of analysis. For example, a common unit of analysis is the country-year (e.g. US-1950). In this case, the ``country’’ is the decision making unit.
We added this terminology to NewGene so that users are not restricted to using COW country-based units of analysis (as was the case with EuGene). For example, users can now create a dataset with a leader-year unit of analysis (where the ``leader’’ is the DMU).
See also FAQ 14.
2) What does it mean to "Limit DMUs"?
3) Where do I specify my output unit of analysis?
Your output ``unit of analysis’’ is determined by the variable group you select and the # of rows you choose (see FAQ 4). With each variable group, notice that the Variable Group’s main heading description ends with the unit of analysis
4) What do I do with the ``#COW Country cols’’ window in the upper middle of the ``Select Variables’’ tab?
This allows you to specify the number of countries to include in a unit of analysis. For example, if box reads ``#COW Country cols: 1” then, the unit of analysis will be a country-year unit of analysis. Using the up or down arrow next to the box will allow you to change it to ``#COW Country cols: 2” to create a dyad-year unit of analysis where countries are the Decision Making Unit (See FAQ 1). Using the up or down arrow next to the box will allow you to change it to ``#COW Country cols: 3” to create a triad-year unit of analysis where countries are the Decision Making Unit.
You may also find it helpful to read the ``Creating K-adic Data'' Guide.
The three exceptions are the three EUGENE based variable groups (variable groups 01a, 01b, and 01c). Since these are intended to allow the user to reproduce datasets as found in the original EuGene software, they, like the datasets in EuGene are ``hardwired’’ to produce dyad-year (variable groups 01b and 01c) or country-year (variable group 01a) datasets.
5) I liked EuGene. How can I create the same dyadic datasets that I used to create with EuGene?
Please read our ``quick start’’ document, which shows you how to use NewGene to create a dyadic dataset as you would have created using EuGene.
6) How do I create a directed dyad year dataset (for country dyads)?
If you want to create a dyadic dataset as you would have using the older EuGene Software, see FAQ #5.
8) How do I create a leader-year data set?
You can create a leader year dataset if you are ONLY using the Achigos or LEAD variable groups (variable groups 10a, 10b, 10c, or 10d). You have to work with a country DMU in order to ``speak to'' the other datasets (as they are all country based). See the FAQ 14.
9) How do I create a triad-year data set?
10) How do I create a dataset with a different time value in the unit of analysis (e.g. a leader-month dataset)?
11) COW updated the country data. How do I update my countries to do a dyad-year set?
12) NewGene does not include a particular dataset. How do I add a new data set of independent variables?
Given the proliferation of datasets available for scholars conducting quantitative analysis in international relations and political science, it is simply not possible to preload NewGene with all available datasets.
Therefore, easing the ability for users to add their own preferred dataset was one of the major motivations for creating NewGene as a replacement for EuGene. To see how to insert your own preferred dataset, please see the ``Installing User Data’’ video on our YouTube channel.
13) When I generate an output dataset, what is the ``.debugsql.txt’ file?
This text document is automatically created by NewGene. It provides a printable summary of the procedures followed by NewGene to generate your requested output dataset.
14) Why are countries the main DMU?
15) With leader data, how are the leader-year observations determined?
Both the ``10a. ARCHIGOS Leader Data (Country-Year)’’ variable group and the ``10b. LEAD Leader Data (Country-Year)’’ variable group has no repeat country-year observations (e.g. US-1963 appears just one time). The leader specified for each country year is the leader in office for the major of that year (e.g. for the US-1963 observation, the leader is Kennedy since he was in office until November 1963).
In contrast, both the ``10c. ARCHIGOS Leader Data (Leader-Year)’’ variable group and the ``10d. LEAD Leader Data (Leader-Year)’’ variable group can have multiple country-year observations (e.g. there are two US-1963 observations, one for Kennedy and one for Johnson).
16) How do/Should I use the random sampling function?
17) How Do I Create a dataset with ALL combinations of dyads and triads?
Please read the ``Creating K-adic Data'' Guide.
19) What does the following error mean: ``'This version of NewGene does not support k-j-ads, where k>1 AND j>1 (e.g. ``#COW Country Cols'' is set to 2 and ``# ATOP Alliance Cols'' is also set to 2); please select columns to generate a k-ad only (e.g. the ``#COW Country Cols'' is set to 2 and the ``#ATOP Alliance Cols'' is set to 1).'''?
NewGene does not currently support k-j-ads, or k-j-n-ads, etc. ...Just k-ads.
Suppose you are working with the ``08a. ATOP Alliance (Country-ATOPID-Year)'' variable group. The user IS allowed to set the number of ATOPID columns to any number - but, if so, ATOPID becomes the column associated with k. At that point, the CCODE column must be set to 1. This allows the output to be a k-ad.
From NewGene's perspective, all datasets are arbitrary, generic data; the columns can represent anything. This increases NewGene's flexibility to work with a variety of different data formates and it also means the spin controls can be set to any value the user wants - but only one column selector at a time can have k exceed the minimum allowed. All spin controls are therefore enabled and available for the user to set the number of columns - but NewGene then enforces that only k-ads are possible with the current version - not yet k-j-ads, k-j-n-ads, etc.
7) How do I create a country-year data set?
See FAQ 4. Also, if you simply wish to create a country-year dataset using variables that were available in the EuGene software, you can do so by selecting variables from the ``01a. EuGene Country-Year Data’’ variable group.
18) What if I want to create k-j-ad, not just a k-ad?
Please see FAQ 19.
20) Why are there two Militarized Compellent Threat Variable Groups (06d and 06e)?
The base MCT data are unique. The dataset is a directed dyadic dataset, but it does not contain all dyadic combinations. This is because Country A in the original MCT dataset is the challenger state and Country B in the original MCT dataset is the target during a crisis. Additionally, some crises have multiple challengers.
The first version of the MCT dataset in NewGene (variable group 06d) uses a Country-Crisis-Year unit of analysis. This dataset is useful if one wishes, for example, to create a k-adic dataset with all crises that have 1 challenger, 2 challengers, and 3 challengers (this can be done by setting the ``# COW Country Cols'' to 4 and then deleting from the output dataset all crises where the num_challengers variable is larger than 3..
The second version of the MCT dataset in NewGene (variable group 06e) uses a Country-Country-Crisis-Year unit of analysis. This is basically the original MCT dataset. As such, it is recommended that the user only use a dyadic unit of analysis with this dataset (i.e. set the ``# COW Country Cols'' spinner to 2)
21) Why are the data for the Classic EuGene variable groups not not available past the late 1990s/early 2000s?
These variable groups allow the user to generate datasets as they would have appeared in EuGene. Hence, some data in EuGene is not available past a certain year. For example, Polity III data are not available past 1994, or the MID v 3.0 data are not available until after 2002. If the user wishes to use have more up-to-date data, then the user will want to use one of the other variables groups (e.g. 03a. Polity).
22) What does the ``Simple Mode'' Button Do?
When the box next to ``Simple Mode'' is checked, the ``Input'' Menu, ``Output'' Menu, ``Output dataset'' tab, and ``Input dataset'' tab are hidden. Most users will not need to access these menus, so the default setting is to have Simple Mode on (to keep the interface as simple as possible).
23) What do the variable names mean?
In addition to the descriptions next to the variable names in the ``Variable Groups'' window, users may wish to consult the NewGene Variable Codebook sheet (in Excel format). This codebook contains a list of the variables, along with descriptions (including the different descriptions that the same variable can have, depending on the Variable Group selected by the user).
24) What if I do not want NewGene to be installed on the C: Drive?
If you wish to run NewGene from a different location to save disk space, first download to the default location and then simply move the entire installation folder from the "C:\Program Files (x86)" folder (it should be obvious which folder to move) to any other location they want (i.e., the D: drive). You can then double-click the NewGene application from inside this folder directly to run it.
25) The font on NewGene is really, really ridiculously small. What should I do?
Use the magnify feature on your computer (i.e. on a PC, simultaneously press the control key and the "+" key).
26) Does NewGene work with ICloud?
iCloud backup is not intended for huge data files such as the ~2GB NewGene data files. Moreover, technically complex work with large data files (which NewGene performs) is not something iCloud sync is currently intended to support.
27) What version of OsX supports NewGene?
We urge users to update to the latest version of macOS (Sierra).
28) I am having difficulty creating a dataset that includes a distance measure and ATOP alliance measure?
You may have recieved an error that said:
"Failed: There are unrelated DMUs in the units of analysis for the variable groups you've selected (CCODE and ATOP)! There must be a full relationship between the variable groups you've selected. This means that at least one variable group must contain the full set of all DMUs, taken as a whole, for all variable groups you've selected."
This is because the inherently dyadic nature of distance (see video) makes it tricky to work with variable groups that do not use a country-year or explicitly dyadic structure. This can be overcome by installing a country-country-year version of the ATOP data (available here and see guide on installing data here).
29) I am working on a Mac and prepared a dataset to install to NewGene. However, after following the install steps described in the quickstart guide and YouTube video, I am receiving an error indidcating that a number of lines failed to read. It appears that everything is correctly formatted (e.g. the columns have the appropriate ``data type'' label). What is going on?
There is a quirky feature of excel, called "line endings", that Macs do not handle well when saving csv files (see here for more details https://nicercode.github.io/blog/2013-04-30-excel-and-line-endings/)
What to do? There are two approaches:
Approach 1: The file must have been initially created on Excel for PC, so now it's time to return back to that PC and prepare the file to send to a Mac user.
From a PC with Excel installed on it:
File -> Save As
In the second dropdown, select (CSV - Macintosh)
Approach 2: If Approach 1 doesn't work, there are numerous other programs that will perform the conversion. Unfortunately, you do need to download another program to do this. We recommend downloading UltraEdit (www.ultraedit.com). Open the CSV file in UE, and then choose the "Save As" option, and the following step depends on whether you're running UE on the Mac, or on Windows.
Mac: in the "Line Terminator" dropdown, select "Unix (LF)"
Windows: to the very right of the "Save" button is a little drop-down arrow. Click it and choose "Save (Unix line ends - LF)"
That should do it!