J Han / N Uberoi (@1.57) vs Y Zhang / Y Zhao C (@2.25)
10-09-2019

Our Prediction:

J Han / N Uberoi will win
  • Home
  • Tennis
  • J Han / N Uberoi vs Y Zhang / Y Zhao C

J Han / N Uberoi – Y Zhang / Y Zhao C Match Prediction | 10-09-2019 01:00

There were only 8.64Mb (28.11%) that could be aligned to the GM12878 [30], which was originated from European, indicating a significant portion of these 30.72Mb sequences may be Chinese-specific or East Asian-specific. These sequences were validated by the additional sequences of GRCh38 reference genome and previously published human genome assemblies [2, 26,27,28,29,30]. Most of the sequences (30.07Mb) could be fully or partially aligned to the above genomes at 90% identity (Fig. 3f and Table2). Overall, there were only 646.23Kb that could not be aligned to all the above genomes at all, and this indicates that the vast majority of the fully unaligned sequences were valid human DNA sequences. To our surprise, the percentage (48.0%) of sequences that could be aligned to HX1 genome [2], which is from a Chinese individual, is lower than that of the HuRef genome (55.3%) [28]. In particular, 24.10Mb (78.46%) could bealigned to the YH genome [26], which is the first assembled Asian individual, and 24.37Mb (79.35%) could bealigned to the KOREF genome [29], which is from a South Korean individual.

This is particularly important when we focus on the non-reference sequences. DNA contamination from other organisms may lead to imprecise outcome and should be considered in any sequencing project [32]. The major source of non-human sequences was microorganisms, and majority of remaining sequences were labeled as human. In order to get high confidence non-reference sequences derived from human genome rather than contamination, we proposed a strict filtering step to drop potential contamination sequences as many as possible. We used a local alignment method to classify and exclude the sequences labeled as microorganisms or non-primate eukaryotes. There are several possible sources of contaminants, such as biological source and DNA present in reagents or instruments [33].

Disclosure of conflict of interest

NSCLC includes adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and other cell types. Lung cancer is divided into two classes: non-small cell lung cancer (NSCLC) and small cell lung cancer. Lung cancer is one of the most frequently diagnosed cancers and the leading cause of cancer death worldwide [1]. Although many treatments are available, its prognosis is still poor. Smoking is the most common cause of lung cancer overall, but lung adenocarcinoma is the most frequently occurring cell type in nonsmokers, and its pathogenesis remains unclear. Lung adenocarcinoma is the most common type of lung cancer and has been increasing in recent years. The 5-year survival of all lung cancer patients is only approximately 16% [2].

Analyzing membrane proteomes may help us understand carcinogenic mechanisms and promote the discovery of new potential tumor biomarkers and therapeutic targets. Membrane proteins account for approximately 30% of the whole cell proteome and are known to be involved in cell proliferation, cell adhesion, and tumor cell invasion. They are also pivotal to the development, growth, angiogenesis, and metastasis of tumors [12-14]. The cell membrane is involved in many biological functions, including small molecules transport, cell-cell and cell-substrate recognition and interaction, and cell signaling transduction and communications [10,11].

Due to the large size of the human genome, this process using QUAST[35] directly is time-consuming and requires a huge amount of memory (Table 1). In order to obtain non-reference sequences from individual genomes, contigs unable to be aligned to the GRCh38 primary assembly sequence (with identity cutoff of 90%) were collected for each individual. We adopted a strategy based on a well-assembled and well-annotated reference genome. In HUPAN pipeline, we focused on two types of non-reference sequences: fully unaligned sequences and partially unaligned sequences. Fully unaligned sequences are defined as contigs with no alignment to the reference sequence while partially unaligned sequences are defined as contigs with at least one alignment and at least one unaligned fragment longer than a defined threshold (default, 500bp). We discarded those sequences whose best match were microorganisms including bacteria, fungi, archaea, and viruses and non-primate eukaryotes including all plants and non-primate animals, which could reflect possible contaminations (Additional file 1: Supplementary methods). In order to speed up this step, we developed a two-step strategy: discarding the contigs highly similar with the reference genome followed by extracting non-reference sequences (Additional file 1: Supplementary methods). After obtaining individual non-reference sequences, we merged them and removed redundant sequences by CDHIT[36] with the identity cutoff of 90%. Building pan-genome sequences from individual assemblies is another challenging task.

Methods

Blots were probed with anti-Na+/K+-ATPase for the plasma membrane and anti-prohibitin for mitochondria. T-M and T-C indicate the solution of membrane proteins and the solution of cytoplasm proteins in tumor tissue, respectively. N-M and N-C indicate the solution of membrane proteins and the solution of cytoplasm proteins in normal lung tissue, respectively. Verification of membrane protein purification using Western blotting analysis.

We selected SGA instead of SOAPdenovo2[25] due to its high assembly quality and distinctly low memory consumption. Then, we conducted de novo assemble for the 185 newly sequenced Han Chinese genomes using all reads (see the Methods section). We first optimized the assemble parameters based on simulation data (Additional file 1: Supplementary methods and Table S2). As a result, the average size of the assembled 185 genomes was 2,720,566,5597,126,135bp and the average size of contigs N50 was 8042387bp (Additional file 1: Figure S1).

S100A14 is a novel member of the S100 protein family [35]. The study by Lukanidin [37] showed that the S100 family is pivotal in cell migration, invasion, and cancer metastasis. S100 is a subfamily of proteins related by Ca2+-binding to the EF-hand superfamily that appear to be involved in the regulation of many cellular processes (e.g., cell cycle progression, differentiation, cell-cell communication, intracellular signaling, energy metabolism) [35,36].

Additional file

In total, there are 19,817 protein-coding genes in the annotation database. The gene/transcript annotation of GRCh38 primary assembly sequences was based on GENCODE [37] (Release 26). If a gene has multiple transcripts, only the transcript with the longest open reading frame (ORF) was selected as a representative. The annotation of GRCh38 primary assembly sequences and non-reference sequences were independent. Since all genes located in chromosome Y were absent in all female individuals, we excluded 63 genes in chromosome Y.

The sections were then incubated with anti-S100A14 antibody (1:200) overnight at 4C. Paraffin embedded specimens (each 4 m) were dewaxed, rehydrated in a series of ethanol solutions, and treated with an antigen retrieval solution with a microwave. Immunohistochemistry (IHC) was performed via the SP9001 Rabbit kit (Zhongshan Jinqiao Biotech Company, Beijing, China) according to manufacturers instructions. Endogenous peroxidase activity was blocked using 3% H2O2 for 10 min. Sections were counterstained with hematoxylin, dehydrated, and then mounted with coverslips. The positively stained area was evaluated as follows: 0, no staining; 1, 80% stained positive. All sections were examined microscopically and blindly evaluated by two independent pathologists according to a scoring method described previously by Zhang [23]. Unspecific staining was blocked for 15 min using normal goat serum. Each specimen was assessed with reference to staining intensity and positively stained area. The combined staining score (staining intensity times staining area) was then graded as 0, negative immunoreactivity; 1-4, low immunoreactivity; and >4, high immunoreactivity. Expression levels of S100A14 were determined using the standard SP immunohistochemical technique. Staining intensity was graded on the following scale: 0, no staining; 1, light yellow; 2, yellowish brown; 3, brown. The primary antibody was replaced with PBS as the negative control. The immunoreaction was visualized using 3, 3-diaminobenzidine (DAB) staining. At least 5 high-power fields were selected randomly, with >200 cells counted per field.

To further validate the differential expression of S100A14 identified by iTRAQ labeling and LC-MS/MS, we examined the expression levels of S100A14 in 10 lung adenocarcinoma tissue samples and paired normal lung tissues using Western blotting. As shown in Figure 5, S100A14 was significantly upregulated in lung adenocarcinoma compared with matched normal lung tissue, which confirmed the LC-MS/MS result.