卓越的学术带头人
引领AI技术发展、打造核心引擎
张少典
创始人/CEO
博士 高级工程师
 

上海市“千人计划”、上海市浦江人才
2017福布斯亚洲30位30岁以下杰出青年(30under30,医疗科技领域)
2017胡润中国30位30岁以下创业领袖
上海交通大学计算机系APEX数据与知识管理实验室特聘研究员
中国医药信息学会(CMIA)理论与教育委员会委员
中国卫生信息与健康医疗大数据学会卫生信息学教育专业委员会委员
中国卫生信息与健康医疗大数据学会儿科专业委员会顾问
中国民族卫生协会信息化专业委员会专家委员

自2011年起,长期从事医学信息学领域的科研工作。博士期间参与多个由美国国家自然科学基金、美国国家癌症研究所等机构资助的基金项目,并参与纽约长老会医院临床数据建模、自然语言处理 等项目研发。已在医学信息学顶尖国际期刊和会议JAMIA、JBI、AMIA上发表十余篇论文,并且于 2014年和2016年两次获得美国医学信息学会(AMIA)年度大会(AMIA Symposium)最佳博士生 论文提名(Best student paper finalist),于2016年获得AMIA大会CPHI子领域最佳博士生 论文奖。所发表的学术论文的Google scholar总引用次数超过600次(截至2018年10月)。博士期 间长期担任JAMIA、JBI、JMIR等顶尖期刊及AMIA、ACL等顶尖会议的审稿人。2010年和2012年曾于 微软亚洲研究院、微软总部Redmond研究院全职实习,从事数据挖掘、自然语言处理研发工作。

王飞
首席科学家
博士
 

康奈尔大学威尔医学院副教授
曾任职于康涅狄格大学以及IBM沃森研究中心

王飞博士于2008年在清华大学自动化系获得博士学位,其博士学位论文“图上的半监督学习算法研究”获得了2011年全国优秀博士论文奖。其博士论文相关研究还获得了教育部自然科学奖一等奖以及国家自然科学奖二等奖。主要研究方向包括数据挖掘,机器学习技术在医疗信息学中的应用。王飞博士已经在相关方向的顶级国际会议和杂志上发表了超过200篇学术论文,引用超过9000次,H指数50。其(指导的学生)论文曾获ICDM2016最佳论文提名,ICDM2015最佳学生论文,ICDM 2010的最佳研究论文提名奖,SDM 2011最佳研究论文候选以及AMIA 2014转化生物信息学峰会的Marco Romani最佳论文候选。王博士还是Michael Fox基金会主办的帕金森病亚型发现数据竞赛的冠军获得者,以及NIPS2017基因变异分类挑战赛优胜奖获得者。 王博士还是首届国际健康信息学大会(ICHI)研究成就奖,以及美国自然基金委杰出青年学者奖的获得者。王博士同时还是AMIA知识发现与数据挖掘 (KDDM) 工作小组主席。任杂志Artificial Intelligence in Medicine的编委(Associate Editor),Journal of Health Informatics Research的编委,Smart Health的编委,Data Mining and Knowledge Discovery的执行编委(Action Editor),Pattern Recognition编委,IEEE Transactions on Neural Networks and Learning Systems编委。已在美国申请相关专利40余项,授权15项。

雄厚的技术研发实力
为AI提供弹药、夯实基础
300+
公司300+人团队
1
行业NO.1中文医学NLP论文发表量
30%
硕士、博士占30%
40+
研发团队发表SCI论文40余篇
100+
医学背景100+人

以下为部分论文:

 

Shaodian Zhang, Erin O'Carroll Bantum, Jason Owen, Suzanne Bakken, and Noemie Elhadad,Online cancer communities as informatics intervention for social support: Conceptualization, Characterization, and Impact, Journal of American Medical Informatics Association (JAMIA).

 

Junjie Xing, Kenny Q. Zhu, and Shaodian Zhang, Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text, COLING 2018.

 

Zhenghui Wang, Yanru Qu, Liheng Chen, Jian Shen, Weinan Zhang, Shaodian Zhang, Yimei Gao, Gen Gu, Ken Chen, and Yong Yu, Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition, NAACL 2018.

 

Zhe Jian, Xusheng Guo, Shijian Liu, Handong Ma,Shaodian Zhang, Rui Zhang, Jianbo Lei, A Cascaded Approach for Chinese Clinical Text De-Identification with Less Annotation Effort, Journal of Biomedical Informatics (JBI).


Shaodian Zhang, Tian Kang, Lin Qiu, Weinan Zhang, Yong Yu, and Noemie Elhadad, Cataloguing treatments discussed and used in online autism communities, 2017 International World Wide Web Conference (WWW) (acceptance rate: 17%).

 

Shaodian Zhang, Lin Qiu, Frank Chen, Weinan Zhang, Yong Yu, and Noemie Elhadad, "We Make Choices We Think are Going to Save Us": Debate and Stance Identification for Online Breast Cancer CAM Discussions, 2017 International World Wide Web Conference (WWW).


Erin O’Carroll Bantum, Noemie Elhadad, Jason E. Owen,Shaodian Zhang, Mitch Golant, Joanne Buzaglo, Joanne Stephen and Janine Giese-Davis, Machine Learning for Identifying Emotional Expression in Text: Improving the Accuracy of Established Methods, Journal of Technology in Behavioral Science.

 

Tian Kang,Shaodian Zhang, Youlan Tang, Gregory W. Hruby, Alexander Rusanov, Noemie Elhadad, and Chunhua Weng, EliIE: An Open-Source Information Extraction System for Clinical Trial Eligibility Criteria, Journal of American Medical Informatics Association (JAMIA).


Shaodian Zhang, Edouard Grave, Elizabeth Sklar, and Noemie Elhadad, Longitudinal Analysis of Discussion Topics in an Online Breast Cancer Community using Convolutional Neural Networks, Journal of Biomedical Informatics (JBI).

 

Tian Kang,Shaodian Zhang, Xingting Zhang, Dong Wen, and Jianbo Lei, Detecting Negation and Scope in Chinese Clinical Notes using Character and Word Embedding, Computer Methods and Programs in Biomedicine.


Shaodian Zhang, and Noemie Elhadad, Factors Contributing to Dropping-out in an Online Health Community: Static and Longitudinal Analyses, AMIA 2016.Best student paper finalist.

 

Shaodian Zhang, Tian Kang, Xingting Zhang, Dong Wen, Noemie Elhadad, and Jianbo Lei, Speculation Detection for Chinese Clinical Notes: Impacts of Word Segmentation and Embedding Models, Journal of Biomedical Informatics (JBI).


Shaodian Zhang, Erin Bantum, Jason Owen, and Noemie Elhadad, Does Sustained Participation in an Online Health Community Affect Sentiment?, AMIA 2014.Best student paper finalist.

 

Noemie Elhadad,Shaodian Zhang, Patricia Driscoll, and Samuel Brody, Characterizing the Sublanguage of Online Breast Cancer Forums for Medications, Symptoms, and Emotions., AMIA 2014.


Xiaohua Liu, Furu Wei,Shaodian Zhang, and Ming Zhou, Named Entity Recognition for Tweets., ACM Transactions on Intelligent Systems and Technology (TIST).

 

Shaodian Zhang and Noemie Elhadad, Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts, Journal of Biomedical Informatics (JBI).


Xiaohua Liu,Shaodian Zhang, Furu Wei, and Ming Zhou, Recognizing Named Entities in Tweets, ACL 2011.

 

Shaodian Zhang, Hai Zhao, Guodong Zhou, and Bao-liang Lu, Hedge Detection and Scope Finding by Sequence Labeling with Normalized Feature Selection, CoNLL 2010.

 

Yanru Qu, Zhenghui Wang, Lin Qiu, Ken Chen, Shaodian Zhang, Yong Yu. Sampled in Pairs and Driven by Text: A New Graph Embedding Framework. WWW'19. 2019.

 

Kenny Zhu, Junjie Xing, Shaodian Zhang. Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text. Proceedings of the 27th International Conference on Computational Linguistics. 2018.

 

Weinan Zhang, Zhenghui Wang, Shaodian Zhang,Yimei Gao,Gen Gu,Ken Chen. Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018.

 

Xu Min, Bin Yu, Fei Wang. Predictive Modeling of the Hospital Readmission Risk from Pa- tients’ Claims Data Using Machine Learning: A Case Study on COPD. Nature Scientific Re- ports. To Appear. 2019.

 

XiZhang,JingyuanChou,JianLiang,CaoXiao,YizeZhao,HariniSarva,ClaireHenchcliffe,Fei Wang. Data-Driven Subtyping of Parkinson’s Disease Using Longitudinal Clinical Records: A Cohort Study. Nature Scientific Reports. 9(1), 797. 2019.

 

FeiWang,LawrenceCasalino,DhruvKhullar.DeepLearninginMedicine:Promise,Progress and Challenges. JAMA Internal Medicine. 2018.


Cao Xiao, Ying Li, Inci Baytas, Jiayu Zhou and Fei Wang∗. An MCEM Framework for Drug Safety Signal Detection and Combination from Heterogeneous Real World Evidence. Nature Scientific Reports, 8(1), p.1806. 2018.

 

Fei Wang, Ping Zhang, Nan Cao, Jianying Hu, Robert Sorrentino. Exploring the Associations between Drug Side-Effects and Therapeutic Indications. Journal of Biomedical Informatics (JBI). October, Volume 51, 15-23. 2014.

 

Fei Wang, Lawrence Casalino, Dhruv Khullar. Artificial Intelligence Algorithms for Medical Prediction Should Be Nonproprietary and Readily Available—Reply. JAMA Internal Medicine. ToAppear. 2019. (Impactfactor19.989)

 

Xu Min, Bin Yu, Fei Wang. Predictive Modeling of the Hospital Readmission Risk from Patients’ Claims Data Using Machine Learning: A Case Study on COPD. Nature Scientific Reports. ToAppear. 2019.

 

Chang Su, Jie Tong, Yongjun Zhu, Peng Cui, Fei Wang. Network Embedding in Biomedical DataScience. BriefingsinBioinformatics. ToAppear. 2018. (Impactfactor6.302)

 

FeiWang,LawrenceCasalino,DhruvKhullar. DeepLearninginMedicine: Promise,Progress andChallenges. JAMAInternalMedicine. 2018. (Impactfactor19.989)

 

Bekhet, Laila R., Yonghui Wu, Ningtao Wang, Xin Geng, Wenjin Jim Zheng, Fei Wang, Hulin Wu,HuaXu,andDeguiZhi. ”AstudyofGeneralizabilityofRecurrentNeuralNetwork-Based Predictive Models for Heart Failure Onset Risk using a Large and Heterogeneous EHR Data set.”JournalofBiomedicalInformatics(JBI)(2018).

 

Jian Liang, Kun Chen, Ming Lin, Changshui Zhang, Fei Wang. Robust Finite Mixture Regression for Heterogeneous Targets. Data Mining and Knowledge Discovery (DMKD). 1-52. 2018.

 

FengyiTang,CaoXiao,FeiWang,JiayuZhou. PredictiveModelinginUrgentCare: AComparativeStudyofMachineLearningApproaches. JAMIAOpen. 2018.

 

Cao Xiao, Tengfei Ma, Adji B. Dieng, David M. Blei, Fei Wang∗. Readmission Prediction via DeepContextualEmbeddingofClinicalConcepts. PlosONE.13(4),p.e0195024. 2018.

 

Yongjun Zhu, Olivier Elemento, Jyotishman Pathak, Fei Wang∗. Drug knowledge bases and their applications in biomedical informatics research. Briefings in Bioinformatics. Jan 3. doi: 10.1093/bib/bbx16. 2018. (Impactfactor8.399)

全球领先、成熟的中文医学自然语言处理技术
让AI像医生一样读懂病历、提炼信息、获取知识
顶尖的商用性能
完整的技术体系
领先的学术能力
真实环境下的病历理解和信息抽取人机 PK实验显示:森亿智能的自然语言处理引擎独立运作,能够数百倍提升录入效率,并提升准确度9%;和人工解析联合使用,能够辅助医生将病历解析、信息抓取的速度提升 34%,准确度提升 12%。
*参考文献:Improving the efficacy of data entry process for clinical research with an NLP-driven medical information extraction system: a quantitative field research, Shijian Liu, Jiang Han, Lei Fang, Shaodian Zhang, Fei Wang, Handong Ma, Ken Chen, to appear, Journal of Medical Internet Research

25个科室病历、上百种检查报告的解析算法。

识别110 大类临床信息,50 多类语义关联,知识图谱包含51万余概念,2700万条关联,支持对SNOMED-CT、ICD、MedDRA、ATC、LOINC等10余种国际主流医学术语标准的映射,支持国内及全球数据的互联互通。

包含从分词、命名实体识别、Chunking、Conjunction Analysis、 Syntactic Parsing、Semantic Parsing、Entity Normalization全部NLP套件,单组件性能均超95%,端到端性能90+%。

迁移学习体系使得标注量下降97%,标注效率提升30%+。

森亿智能团队在人工智能与医学信息学顶级会议与杂志上有数十篇论文发表,包括JAMA internal medicine, Bioinformatics, Briefings in Bioinformatics, Journal of Biomedical Informatics, Journal of American Medical Informatics Association, Journal of Medical Internet Research, AAAI,KDD, IJCAI, ACL, NAACL, COLING, NIPS, ICML等。
卓越的医院信息、医疗数据集成和治理能力
为医院数据化、智能化建设提供坚强后盾
3871+
3871 个以上元数据
140+
团队 140+ 大型三甲医院数据集成经验
765+
765 个以上数据集成&补全&纠错规则
5
数据集成效率 5 倍于传统方法
100亿+
治理集成 100 亿+条数据
领先的完整临床机器学习建模技术
让AI洞见医学数据、刻画临床需求
3
3位一体自动化系统
拥有可配置型临床风险评估模型生产、部署、监控一体化系统
10+
10种以上机器学习模型支持
能对主流机器学习模型,如Lasso、Naive Bayes、LDA、SVM、RandomForest、DeepForest、XGBoost、LightBoost、CatBoost、Deep Neural Network等,通过生物群智能搜索算法自动调节参数筛选高性能模型或模型组,提供场景下最优机器学习解决方案
10000+
10000个以上患者风险因子组合筛选
直接从电子病历系统中对接数千个预测因子,分析识别预测因子间的交互方式和非线性关系