September 18, 2025

Harmony Thrive

Superior Health, Meaningful Life

Geographical analysis of malignant tumor incidence and treatment in China

Geographical analysis of malignant tumor incidence and treatment in China

Malignant tumors and hospitals

Identification of malignant tumors and hospitals

The research data is derived from the publicly disclosed information of one of China’s most representative malignant tumor mutual assistance platforms ( The platform has been operating since October 16, 2018, with an average membership of over 80 million per session31. By the end of March 2022, the platform had disclosed a total of 202,092 cases. This study utilized an automated case query tool to capture screenshots of the disclosed materials, and text recognition software was used to extract textual information.

The types of malignant tumors covered by the mutual assistance platform are the same as those commonly found in malignant tumor disease insurance. The tumors are classified based on the body site where they initially occur. This study selects the six most common types of malignant tumors, which are: respiratory system malignant tumors, digestive system malignant tumors, breast system malignant tumors, reproductive system malignant tumors, urinary system malignant tumors, and endocrine system malignant tumors. These six types represent 17.45%, 13.97%, 12.13%, 6.50%, 6.38%, and 4.43% of all the cases in the sample, respectively. These six malignant tumor types are highly representative.

The location of malignant tumor occurrence is defined as the patient’s usual place of residence. This study uses prefecture-level (and above) regions (excluding data from Hong Kong, Macao, and Taiwan) as the study area, excluding regions with small populations and few cases. In the analysis of proportion-based indicators, only 308 regions with more than 30 cases are included (Fig. 1). The names of hospitals in the selected data were standardized, resulting in a total of 6,432 hospitals that issued diagnostic certificates to the patients in the sample. Based on the standardized names of these hospitals, the corresponding prefecture-level (and above) regions where these hospitals are located were identified as the treatment locations, totaling 348 regions32.

Fig. 1
figure 1

Proportion in the occurrence of six types of malignant tumors in prefecture-level regions from October 2018 to March 2022. Note: The base map was created using the standard map with the approval number GS(2020)4630 issued by the Ministry of Natural Resources, and the boundary of the map was not modified. To better reflect the spatial variation in the incidence proportion of each tumor type, we used customized classification intervals (cutoffs) for each map in Fig. 1. This approach allows for the clear visualization of intra-type spatial clustering even for tumor types with lower national average proportions (e.g., urinary and endocrine systems). The color scales are therefore not uniform across maps and are optimized for each tumor type’s data distribution. This is a common approach in thematic mapping when the goal is to visualize spatial patterns rather than directly compare magnitudes across different variables.

Tumor classification scheme

In accordance with the International Classification of Diseases for Oncology (ICD-10) and the GLOBOCAN classification system used by the International Agency for Research on Cancer (IARC), we mapped the six major categories of malignant tumors into specific anatomical sites as follows (Table 1): Respiratory system cancers (ICD-10: C30–C39): including nasal cavity (C30), nasopharynx (C11), oropharynx (C10), larynx (C32), trachea (C33), bronchi and lungs (C34); Digestive system cancers (ICD-10: C15–C26): including esophagus (C15), stomach (C16), small intestine (C17), colon (C18), rectum (C20), liver (C22), gallbladder (C23), pancreas (C25); Breast cancers (ICD-10: C50): female and male breast malignancies; Reproductive system cancers (ICD-10: C51–C58 for females; C60–C63 for males): including cervical cancer (C53), ovarian cancer (C56), endometrial cancer (C54), prostate cancer (C61), testicular cancer (C62); Urinary system cancers (ICD-10: C64–C68): including kidney (C64), renal pelvis (C65), ureter (C66), bladder (C67); Endocrine system cancers (ICD-10: C73–C75): including thyroid gland (C73), adrenal gland (C74), other endocrine glands (C75). The classification of each case into these broad categories was based on the diagnostic information disclosed on the mutual aid platform. ICD codes were approximated based on tumor anatomical site names and expert mapping.

Table 1 Composition of six major tumor categories by ICD-10 code and sample size.

Inclusion and exclusion criteria

The dataset used in this study was obtained from publicly disclosed diagnostic cases on a national malignant tumor mutual aid platform. The inclusion criteria were as follows:

  1. 1.

    Patients with a clearly indicated diagnosis of malignant tumor, as determined by a physician and supported by a diagnostic certificate;

  2. 2.

    Tumors classified as malignant based on standard clinical definitions (ICD-10 codes), regardless of histological subtype;

  3. 3.

    Patients with available data on residence location and hospital of diagnosis;

  4. 4.

    Cases diagnosed between October 2018 and March 2022.

The exclusion criteria were:

  1. 1.

    Patients with missing or ambiguous tumor site data that could not be mapped to ICD-10 codes;

  2. 2.

    Duplicate entries from the same patient;

  3. 3.

    Benign, borderline, or in-situ tumors;

  4. 4.

    Pediatric cases under 18 years old (due to different pathophysiological characteristics and treatment patterns).

Importantly, due to the aggregated and anonymized nature of the data disclosed by the platform, histological subtypes (e.g., adenocarcinoma vs. squamous cell carcinoma in lung cancer) were not available for analysis. Therefore, the classification of cancers in this study is based on anatomical site only, as is common in population-based cancer surveillance systems (e.g., GLOBOCAN, National Cancer Registry).

Hospital classification and grouping

In our study, a total of 6,432 hospitals were identified based on diagnostic certificates issued to patients. We classified these institutions into three broad categories based on publicly available hospital registration and naming information:

  1. 1.

    Specialised cancer hospitals (n = 183): institutions exclusively dedicated to oncology treatment and research;

  2. 2.

    General hospitals with dedicated oncology departments (n = 2346): large tertiary or secondary hospitals that operate oncology departments within a multiprofile setting;

  3. 3.

    General or local hospitals without clear oncology departments (n = 3903): hospitals where cancer treatment is provided, but oncology is not clearly distinguished as a separate unit.

Due to the nature of the data obtained from the mutual aid platform, some classifications are inferred based on hospital names (e.g., “Tumor Hospital,” “Oncology Department,” or known tertiary-level classification), and may contain minor uncertainties. However, all institutions included have officially issued tumor-related diagnostic certificates and therefore meet the criteria of providing oncology care. As shown in Supplementary Table S1.

Malignant tumor occurrence intensity by region

To measure the intensity of a certain type of malignant tumor in a region, the proportion of this type of malignant tumor in all malignant tumors in the region is usually used. This paper defines the proportion of malignant tumor k in region i = the number of cases of malignant tumor k in region i / the number of cases of all malignant tumors in region i. “Gender” and “age” are two basic variables that affect the disease33. Therefore, in order to obtain a more robust explanatory power of geographical factors, gender and age factors are removed before analysis. The formula for calculating the intensity of 6 types of malignant tumors after removing gender and age factors in each region is as follows:

$$\begin{aligned} & y\_Remove\,gender\,and\,age\,factors_{i}^{k} = y_{i}^{k} – y\_Baseline_{i}^{k} \\ & \quad = y_{i}^{k} – \sum\limits_{Gender,Age}^{{}} {\left( {w_{i,Gender,Age} \times y_{Gender,Age}^{k} } \right)} \end{aligned}$$

(1)

In order to control for demographic structure differences and better isolate the effect of geographical factors, we applied a method analogous to direct standardization, a commonly used approach in spatial epidemiology and health geography. Specifically, we reweighted the sex- and age-specific tumor incidence in each region according to the national population distribution across four age groups and both sexes. This allowed us to calculate the adjusted tumor proportion by region while removing demographic bias34. A detailed step-by-step example of this adjustment process is provided in the Supplementary Table S2 for clarity and reproducibility.

In the formula: \(y\_Baseline_{i}^{k}\) is the proportion of malignant tumor k in area i when assuming that area i has a national overall sex-specific occurrence state, which is jointly determined by wi, Gender, Age and \(y_{{Gender,A{\text{ge}}}}^{k}\); wi,Gender,Age is the proportion of all cases in region i by gender and age group. Gender is divided into male and female, and age is divided into 4 groups according to the age of the year, which are < 40 years old, 40–50 years old, 50–60 years old and ≥ 60 years old.

Geographical factors

This study analyzes the impact of 8 natural geographical factors and 6 human geographical factors on the occurrence of malignant tumors, based on existing theories and literature, as well as the availability of data at the prefecture level.

Natural geographical factors

Average Elevation and Terrain Relief.The altitude directly determines air pressure, influencing climate conditions, and has an effect on lung function and overall health. Data on altitude is obtained from the DEM (Digital Elevation Model) data of terrain surface morphology, released by the Ministry of Natural Resources’ Data Service Center, and processed using ArcGIS 10.8 (Esri) software ( Terrain relief affects the comfort of the natural environment, population density, and so on, which in turn influences disease occurrence35. The data for terrain relief is referenced from the work of Feng Zhiming et al. The average elevation and terrain relief are based on the measurements from January 2020.

Meteorological Factors: (1) January average temperature: Affects the respiratory system, lung function, and thyroid hormone levels, etc.; (2) July average temperature: Affects the respiratory system, lung function, and thyroid hormone levels, etc.; (3) Annual precipitation: Affects the respiratory and digestive systems, among others; (4) Relative humidity: Affects the respiratory system, etc.; (5) Annual sunshine hours: Affects the respiratory system17; (6) Average wind speed: Affects lung function, respiratory diseases, etc. Meteorological data is obtained from the surface weather station observation data released by the National Meteorological Information Center of China Meteorological Data Network. The data used is from the nearest weather station to the geometric center of each sample region. Considering the birth years of patients in the sample, the meteorological data used for this study covers the average values from 1970 to 2021 (https://data.cma.cn).

Human geographical factors

(1) Income Level: Income level reflects the overall economic capacity of a region and may influence the development of healthcare infrastructure, availability of medical services, and regional investment in early detection and public health initiatives. In this study, regional income level is measured by per capita Gross Domestic Product (GDP) (Collected from the China City Statistical Yearbook (various years), published by the National Bureau of Statistics). (2) Urbanization Rate: Urbanization affects working and resting conditions, population density, and the accessibility of early intervention measures, which in turn impacts health. Studies have found significant differences in health and disease status between urban and rural residents36. The urbanization rate of a region is measured by the proportion of the urban population to the total population. (3) Education Level: Education affects individuals’ thinking and behaviors. Education level has a significant relationship with diseases in the digestive system, endocrine system, reproductive system, etc.37. Given the relatively low level of health literacy among Chinese residents and the large regional disparities, education level is more likely to influence health behaviors and health status. The education level of a region is measured by the proportion of college students per 10,000 people (Sourced from the Ministry of Education’s national higher education database and city-level yearbooks). (4) Public Health Resources: Diseases can be prevented and controlled, and an important aspect of public health is the control of chronic diseases and malignant tumors. The significant impact of public health resources on residents’ health status has been supported by several studies38. Public health resources in a region are measured by the number of hospital and health center beds per 10,000 people (Obtained from the China Health Statistical Yearbook). (5) Water Pollution: Water pollution affects water quality and soil, and in turn influences foodborne diseases, digestive system diseases, reproductive system diseases, and overall public health. The water pollution situation in a region is measured by the per capita industrial wastewater discharge (Data collected from the China Environmental Statistical Yearbook published by the Ministry of Ecology and Environment of China). (6) Air Pollution: Air pollution has been a topic of significant concern in recent years and has a notable impact on respiratory system diseases, digestive system diseases, reproductive system diseases, and population health39. The air pollution situation in a region is measured by the per capita industrial sulfur dioxide (SO2) emissions (Data collected from the China Statistical Yearbook on Environment published by the Ministry of Ecology and Environment of China).

Research methods

Spatial autocorrelation test

A rigorous method to determine the spatial correlation of variables is to perform a spatial autocorrelation test40. The global Moran’s I index is calculated for the occurrence proportion of each type of malignant tumor, and its formula is as follows:

$$I = \frac{{\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{i = 1}^{N} {w_{{ii^{\prime } }} \left( {y_{i} – \overline{y}} \right)\left( {y_{{i^{\prime} }} – \overline{y}} \right)} } }}{{\left( {\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{i = 1}^{N} {w_{{ii^{\prime } }} } } } \right)\sum\nolimits_{i = 1}^{N} {\left( {y_{{ii^{\prime } }} – \overline{y}} \right)^{2} /N} }},\overline{y} = \frac{1}{N}\left( {\sum\nolimits_{i = 1}^{N} {y_{i} } } \right),i \ne i^{\prime }$$

(2)

In the formula: \(w_{{ii^{\prime } }}\) indicates whether regions \(i\) and \(i^{\prime }\) are geographically adjacent (0 or 1). The \(w_{{ii^{\prime } }}\) values for all pairs of regions form the spatial weight matrix; \(y_{i}\) and \(y_{{i^{\prime } }}\) represent the proportions of malignant tumor cases of a particular type in regions \(i\) and \(i^{\prime }\) among all malignant tumor cases in those regions, respectively; \(\overline{y}\) is the average of \(y_{i}\) across all regions; N is the number of regions in the sample. The value of Moran’s I index ranges from -1 to 1. A higher value indicates a stronger positive spatial correlation of the variable at the national level.

Geographic detector

This study uses the factor detection method from geographic detectors41 to analyze the explanatory power of individual geographical factors on the occurrence of malignant tumors. The detection power is denoted as q, and its calculation formula is as follows:

$$q = 1 – \sum\limits_{h = 1}^{L} {\frac{{N_{h} \sigma_{h}^{2} }}{{N\sigma^{2} }}} = 1 – \frac{SSW}{{SST}}$$

(3)

In the formula: h represents a specific stratum of the factor, with each factor in this study divided into 10 strata, where h = 1,2,…,10; \(N_{h}\) and \(\sigma_{h}^{2}\) represent the sample size (number of regions) and variance within the h stratum, respectively; N and σ2 represent the total sample size (total number of regions) and overall variance, respectively; SSW is the sum of the within-stratum variances; SST is the total variance. The value of q ranges from [0, 1]. The higher the value, the stronger the association between the factor and the occurrence of malignant tumors.

Multiple regression analysis

To overcome the multicollinearity issue and identify the most significant factors, this study uses the stepwise regression method42. Initially, all influencing factors are included as independent variables in the regression model. The factor with the highest p value is removed, and the process continues until all remaining factors have p values smaller than 0.1. The regression model is as follows:

$$y_{i}^{k} = \alpha^{k} + X1\_std_{i}^{{{\prime } }} \beta^{k} + X2\_std_{i}^{{{\prime } }} \gamma^{k} + \varepsilon_{i}^{k}$$

(4)

In the formula: \(y_{i}^{k}\) represents the occurrence intensity of malignant tumor type k in region i; \(X1\_std_{i}^{\prime }\) represents the natural geographical factors, \(X2\_std_{i}^{\prime } \gamma^{k}\) represents the human geographical factors. Both values have been standardized to visually reflect the magnitude of the coefficients of different variables; \(\beta^{k}\) and \(\gamma^{k}\) are the coefficients to be estimated for the natural and human geographical factors, respectively; \(\alpha^{k}\) is the constant term; \(\varepsilon_{i}^{k}\) is the random error term.

To further evaluate potential multicollinearity among the geographic variables, we calculated the Variance Inflation Factor (VIF) for all independent variables included in the regression models. As shown in Supplementary Table S3, all VIF values were below 5, indicating no serious multicollinearity problems.

Theil index and its regional decomposition

The Theil index is used to measure the geographical inequality of the “treatment-to-incidence ratio” of malignant tumors in China. The calculation formula is as follows43:

$$Theil = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\frac{{Z_{i} }}{{\overline{Z} }}} \ln \frac{{Z_{i} }}{{\overline{Z} }}$$

(5)

In the formula: \(Z_{i}\) represents the “treatment-to-incidence ratio” of malignant tumors in region i; \(\overline{Z}\) is the average “treatment-to-incidence ratio” of malignant tumors across all regions; N is the number of regions. The larger the value of the Theil index, the higher the degree of inequality. Geographical division in China is quite distinct, and the geographical inequality at the national level includes both within-region and between-region disparities44. The Theil index at the national level is decomposed into two parts, with the calculation formula as follows:

$$\begin{aligned} Theil &= Theil_{Within – region} + Theil_{{_{Between – region} }} \\& = \sum\limits_{j = 1}^{j} {\left( {\frac{{N_{j} }}{N} \times \frac{{\overline{Z}_{i} }}{{\overline{Z}}}} \right)Theil_{j} + } \sum\limits_{j = 1}^{j} {\left( {\frac{{N_{j} }}{N} \times \frac{{\overline{Z}_{i} }}{{\overline{Z}}}} \right)\ln } \frac{{\overline{Z}_{i} }}{{\overline{Z}}} \end{aligned}$$

(6)

In the formula: The “within-region” variable j represents different regions; J represents the number of regions; Nj represents the number of regions within region j; N represents the total number of regions in the country; \(\overline{Z}_{i}\) represents the average “treatment-to-incidence ratio” of malignant tumors in region j; \(\overline{Z}\) represents the average “treatment-to-incidence ratio” of malignant tumors across all regions in the country. For the “between-region” part, in addition to the variables used in the within-region part, there is also Theilj, which represents the Theil index of the “treatment-to-incidence ratio” of malignant tumors in region j, reflecting the differences between regions within region j.

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.