Aggregated Gender Estimates by Scientific Field Since 1950
We extracted all profiles from research-type articles published in journals and conference proceedings indexed in Scopus since 1950. Books are excluded due to unreliable publication dates. From the publications, we infer all used first names, surnames, and the country of the first-listed org type affiliation.
Gender is estimated using first name and country information. The country is derived from the first available org type affiliation in the first available publication.
If a scientist appears with multiple first names, we use the one with the most matches in the gender database. Gender estimates are merged from gender-api.com
using both name and country, and fallback to name only when needed.
For researchers from East Slavic countries (rus
, blr
, ukr
, kaz
, tkm
, uzb
, kgz
, geo
), we infer gender from the surname: -aja
and -va
indicate female; -v
indicates male.
A substantial number of researchers lack gender estimates, typically because no full first name was available.
Fields are defined at the ASJC-2 level and inferred from the source (journal or conference proceeding) in which the scientist publishes. If a source is assigned to multiple fields—as is often the case—the scientist is considered active in each of them.
Variables
- year: Year of the publication
- field: ASJC-2 field
- female: Number of authors publishing in that year and that field estimated to be female
- male: Number of authors publishing in that year and that field estimated to be male
- unknown: Number of authors publishing in that year and that field without gender estimate