学科分类
/ 1
7 个结果
  • 简介:Spatialapplicationswillgainhighcomplexityasthevolumeofspatialdataincreasesrapidly.Asuitabledataprocessingandcomputinginfrastructureforspatialapplicationsneedstobeestablished.Overthepastdecade,gridhasbecomeapowerfulcomputingenvironmentfordataintensiveandcomputingintensiveapplications.Integratinggridcomputingwithspatialdataprocessingtechnology,theauthorsdesignedaspatialdataprocessinggrid(calledSDPG)toaddresstherelatedproblems.RequirementsofspatialapplicationsareexaminedandthearchitectureofSDPGisdescribedinthispaper.KeytechnologiesforimplementingSDPGarediscussedwithemphasis.

  • 标签: SDPG 网格计算 空间数据处理 GIS 应用软件
  • 简介:作为正在是更经常收集了并且分析的流数据,流处理系统面临更多的设计挑战。一挑战是执行连续窗户聚集,它包含集中的计算。当有很多聚集询问时,系统可以受不了可伸缩性问题。询问通常是类似的并且仅仅在窗口说明不同。在这份报纸,我们建议支持总数在窗户之中分享的合作聚集以便重复总数操作能被避免。与分享的总数被窗户步在限制的以前的途径不同,我们作为一系列减小在多重价值上概括聚集。因此,每减小产生的结果走能被分享。分享的进程在feed被形式化语义和我们在场决定以很低的成本分享逻辑的数据的compose-and-declare框架。试验性的结果证明我们的途径把数量级性能改进提供给最先进的结果并且有一个小存储器脚印。

  • 标签: 数据溪流 溪流聚集 询问分享 连续质问
  • 简介:Graphsarewidelyusedformodelingcomplicateddatasuchassocialnetworks,chemicalcompounds,proteininteractionsandsemanticweb.Toeffiectivelyunderstandandutilizeanycollectionofgraphs,agraphdatabasethatefficientlysupportselementaryqueryingmechanismsiscruciallyrequired.Forexample,SubgraphandSupergraphqueriesareimportanttypesofgraphquerieswhichhavemanyapplicationsinpractice.Aprimarychallengeincomputingtheanswersofgraphqueriesisthatpair-wisecomparisonsofgraphsareusuallyhardproblems.Relationaldatabasemanagementsystems(RDBMSs)haverepeatedlybeenshowntobeabletoefficientlyhostdifferenttypesofdatasuchascomplexobjectsandXMLdata.RDBMSsderivemuchoftheirperformancefromsophisticatedoptimizercomponentswhichmakeuseofphysicalpropertiesthatarespecifictotherelationalmodelsuchassortedness,properjoinorderingandpowerfulindexingmechanisms.Inthisarticle,westudytheproblemofindexingandqueryinggraphdatabasesusingtherelationalinfrastructure.Wepresentapurelyrelationalframeworkforprocessinggraphqueries.Thisframeworkreliesonbuildingalayerofgraphfeaturesknowledgewhichcapturemetadataandsummaryfeaturesoftheunderlyinggraphdatabase.Wedescribedifferentqueryingmechanismswhichmakeuseofthelayerofgraphfeaturesknowledgetoachievescalableperformanceforprocessinggraphqueries.Finally,weconductanextensivesetofexperimentsonrealandsyntheticdatasetstodemonstratetheefficiencyandthescalabilityofourtechniques.

  • 标签: 图形数据库 关联查询 加工技术 关系数据库管理系统 RDBMS 查询机制
  • 简介:Approximatequeryprocessinghasemergedasanapproachtodealingwiththehugedatavolumeandcomplexqueriesintheenvironmentofdatawarehouse.Inthispaper,wepresentanovelmethodthatprovidesapproximateanswerstoOLAPqueries.Ourmethodisbasedonbuildingacompressed(approximate)datacubebyaclusteringtechniqueandusingthiscompresseddatacubetoprovideanswerstoqueriesdirectly,soitimprovestheperformanceofthequeries.WealsoprovidethealgorithmoftheOLAPqueriesandtheconfidenceintervalsofqueryresults.AnextensiveexperimentalstudywiththeOLAPcouncilbenchmarkshowstheeffectivenessandscalabilityofourcluster-basedapproachcomparedtosampling.

  • 标签: OLAP 数据处理 决策支持系统
  • 简介:处理的大数据正在成为数据中心计算的固执己见者部分。然而,最近的研究显示了大数据工作量不能充分利用现代记忆系统。我们发现处理的大数据的戏剧的无效从缓存失误的庞大的数量和看情况的存储器存取的货摊。在这篇论文,我们介绍二优化处理这些问题。第一是slice-and-merge策略,它减少种类过程的缓存失误率。第二优化是direct-memory-access,它改革在钥匙/值的存储使用的数据结构。这些优化被评估与微基准并且真实世界的基准HiBench。结果我们的微基准清楚地以硬件事件计数表明我们的优化的有效性;并且HiBench的另外的结果显示出1.21X一般水准加速在上申请级。两结果说明那小心的硬件/软件合作设计将改进大数据处理的存储器效率。我们的工作已经集成于为ApacheHadoop的Intel分发。

  • 标签: 数据处理 内存系统 直接存储器访问 基准测试 Apache 高速缓存
  • 简介:Streamprocessingapplicationscontinuouslyprocesslargeamountsofonlinestreamingdatainrealtimeornearrealtime.Theyhavestrictlatencyconstraints.However,thecontinuousprocessingmakesthemvulnerabletoanyfailures,andtherecoveriesmayslowdowntheentireprocessingpipelineandbreaklatencyconstraints.Theupstreambackupschemeisoneofthemostwidelyappliedfault-tolerantschemesforstreamprocessingsystems.Itintroducescomplexbackupdependenciestotasks,whichincreasesthedifficultyofcontrollingrecoverylatencies.Moreover,whendependenttasksarelocatedonthesameprocessor,theyfailatthesametimeinprocessor-levelfailures,bringingextrarecoverylatenciesthatincreasetheimpactsoffailures.Thispaperstudiestherelationshipbetweenthetaskallocationandtherecoverylatencyofastreamprocessingapplication.Wepresentacorrelatedfailureeffectmodeltodescribetherecoverylatencyofastreamtopologyinprocessor-levelfailuresunderataskallocationplan.Weintroducearecovery-latencyawaretaskallocationproblem(RTAP)thatseekstaskallocationplansforstreamtopologiesthatwillachieveguaranteedrecoverylatencies.WediscussthedifferencebetweenRTAPandclassictaskallocationproblemsandpresentaheuristicalgorithmwithacomputationalcomplexityofO(nlog2n)tosolvetheproblem.Extensiveexperimentswereconductedtoverifythecorrectnessandeffectivenessofourapproach.Itimprovestheresourceusageby15%-20%onaverage.

  • 标签: stream processing task ALLOCATION FAULT-TOLERANCE UPSTREAM
  • 简介:话题建模是一种主流、有效的技术处理文章数据,用在文章分析的宽应用,自然语言,个性化的建议,计算机视觉,等等。在所有已知的话题模型之中,监督了潜伏的Dirichlet分配(sLDA)作为一个流行、竞争的监督话题模型被承认。然而,数据集的规模的渐渐的增加使sLDA越来越低效、费时间,并且在一个很狭窄的范围限制它的应用程序。解决它,平行联机sLDA,命名PO-sLDA(平行、联机的sLDA),在这研究被建议。它使用使训练过程更快速、有效的学习方法,和并行计算机制经由MapReduce框架实现了的随机的变化推理被建议支持云计算并且大数据处理的能力。PO-sLDA支持的联机训练能力扩展这条途径的申请范围,为有高即时的需求的真实应用使它有帮助。当sLDA和罐头高效地加速训练过程,用有不同尺寸的二数据集的确认证明建议途径有比较精确性。而且,它的好集中和联机训练能力为分析并且处理的大规模文章数据使它赚钱。

  • 标签: 文本处理 监督 平行 推理 随机 模特儿