CTAccel's product launched on Intel Solutions Marketplace(2024-01-08)
联捷计算方案与Agilex™ 7 FPGA F 系完美结合, 助力加速数据中心云计算(2023-08-17)
CTAccel Image Processing (CIP) acceleration perfect integration with Intel® Agilex™ FPGA(2023-06-28)
36氪获悉,FPGA加速技术与解决方案提供商「联捷科技 (CTAccel Limited) 」宣布完成A+轮融资,投资方为君盛投资,青桐资本为本轮的投资顾问。本轮资金将主要用于产品的进一步研发及市场拓展。
联捷科技成立于2016年, 专注于研发基于FPGA的数据中心图像视频等多媒体异构计算解决方案,这项技术重新定义了数据中心图像处理的计算模式,可以提供端到端解决方案,把性能和效能提升一个数量级,已获得美国及中国专利。目前,联捷科技高吞吐、低时延的FPGA图像处理加速技术解决方案目前已经广泛应用到包括O2O服务、智能手机云应用、云存储和在线视频网站细分市场中。
当下,随着AI、5G、IoT等新技术和新应用的快速发展,算力需求大幅提升。以GPU、FGPA和ASIC芯片为主的专用加速器的应用不断提升。除了图像视频处理领域,联捷科技的设计方法同样能用到其他数据中心计算如AI计算、金融计算,基因测序等。“我们的核心能力是对软件算法的识别、分析、重组的能力,并不只针对某一特定领域的计算。但To B业务的落地,尤其是数据中心的底层创新产品的落地,是非常困难且长周期的一件事,需要的也不仅仅是技术优势。因此在一个细分赛道先获得成功是务实之举,也不宜盲目扩张。”
从JPEG WebP到HEIF FPGA实时图片转码架构 (2020-02-19)
文 / 俞海乐
整理 / LiveVideoStack
1. 简介

大家好,我是CTAccel Limited创始人兼CEO俞海乐,首先是非常感谢受赛灵思邀请来参加LiveVideoStackCon音视频技术大会,我们的公司叫做联捷计算科技,目前在深圳主要是做图片和视频的加速计算,尤其是针对云端的加速计算。
1.1 做FPGA加速的原因

2. 加速解决方案

联捷计算科技做TCO Reduction、Enhanced Throughput和Latency Reduction。前两个大家可以认为是一回事,TCO的降低来自于单节点计算密度提升,当我们提到单节点密度提升,通常对应的是TCO。另外一个商业化的途径就是Latency Reduction。举个例子,比如说我做前面的加速,以前运矿车是5吨重,一小时可以往返一次。现在我发明了500吨重的载重卡车,也是一小时可以载500吨,这样吞吐就提升了。但是人家还有一部分Latency的要求,运1千克的货,要求1分钟就跑回来,这个就是坐法拉利,两种不同的变现模式,做过加速计算的人都懂。上述应用的场景主要是互联网,针对云存储、社交、电商、短视频等。
2.1 System Stack

System Stack比较偏技术一些。我们的客户也是解耦在应用的加速接口这一层,应用软件包括常见的图片和视频的处理框架软件。下面是Service & Driver ,再下面是加速引擎——AFU(AcceleratorFunction Unit),这里通常是我们实现在FPGA里的东西。无论是图片还是视频,基本上transcoding都避不开三个step,Decode 、像素级的处理,以及一些Encode。
3. 图像加速解决方案

联捷计算科技目前的产品矩阵包含了最主流的两种:JPEG和WebP。我们目前也支持苹果刚发布的HEIF。听说安卓手机在明年也会支持默认的HEIF存储格式。现在拍一张4兆的照片,默认就是2兆,但是会给互联网网站带来压力。因为以前只Decode JPEG,现在要Decode HELF,那么HELF由于单文件size更小,在固定带宽的情况下,上传到单个server数量更多。server计算密度的要求比之前JPEG解码大很多。最后是一个Lepton比较邪门的格式,是Dropbox发明的,主要用于JPEG的无损压缩。目前来说主要是Dropbox在用,其他的云存储厂家对这个方案也有一些兴趣,都处在评估阶段。
3.1 JPEG缩略图客户基于CPU的现有解决方案

3.2 挑战:缩短图像加载时间

3.3 JPEG 缩略图性能基准

3.4 在图像搜索时将webp作为缩略图

3.5 jpeg2 webp性能基准

3.6 Appleannounced to use HEIF

3.7 JPEG到HEIF转码性能基准

3.8 无损压缩

3.9 JPEGto Lepton 转码性能基准

4. 视频加速解决方案

我们也有264Encoder的基于U200软核方案,因为云平台目前没有一个上MPSoC计算实力,基本上都是基于纯FPGA的,U200又比较多是渠道导向的研发。很多客户确实需要Video的Workload,但是几个云平台都是U200为主,我们开发了基于U200的编码方案,我们设想的客户他很有可能将来异构计算也分成稳态异构和动态异构。什么叫稳态异构和动态异构,互联网公司都有波峰波谷,相对有一个业务存量,保底的用量是多少,这块是用MPSoC,甚至是ASCII,问题不大。FPGA最适合是弹出来那部分,我可能一天就用几个小时,就高峰时间段用,用完了就退资源。云平台除了AI,很少上ASCII计算加速器就是因为不灵活,我们很多用户一配,华北阿里好几个可用期,万一可用期里没有呢?就是售罄或者是资源调配不过来,ASCII很难几个小时内调配,我们认为存量资源有可能用ASCII,但是弹出来的资源还得用可编程资源,FPGA就像CPU一样是最符合资源池概念和完全可编程资源池概念的计算资源,这就是FPGA能经久不息的原因, 那为什么CPU还有这么大市场?CPU的极致性能是最弱的,但是可承用性、可编程性是最好的。所以可以给下一代异构计算一个启发,我认为将来CPU会成为功能产生平台。




这个是一些案例了。比较知名的手机互联网公司的云相册,TCU的节省、High performance、High throughput、Latency讲到底是叫用户体验提升。其实加速计算就是两件事:降成本提体验。降成本就是单节点密度,提体验就是降低Latency。不光是降低Latency,还有就是确定性延时波动非常小,10ms总在9.5和10.5之间波动,总在不会出现非常大的波动,这个是QPS的实质。但是load Latency不是非常的完全的load Latency。在图片上还好,有一张load不出来就不看了。Deterministic timing在金融上是非常非常重要的,因为很多金融方案策略是时延和timing有非常紧密联系的。我的策略就是发现100秒之后下一个单,timing错了,导致整个策略就错了,当然这个也是客户发现的。Deterministictiming在FPGA加速计算的另外一个领域有更大的应用,金融方向。

CTAccel Receives Series A Funding Led by Intel Capital (2018-11-28)
Hong Kong, November 28, 2018 – CTAccel Limited, a leading company in FPGA-based acceleration technology and solutions, today announced it has received investment in a Series A funding round led by Intel Capital with participation from Ironfire Ventures.
CTAccel will use the investment proceeds to broaden its product portfolio for better and more effective image processing and image analytics solutions development, and to strengthen the company’s global market penetration in North America, Europe and Asia Pacific.
Since 2013, CTAccel’s FPGA experts have been involved in the development of heterogeneous computing solutions for datacenters. The company’s products redefine image processing in datacenters by using patented technology that enables end-to-end solutions that improve performance and energy efficiency by an order of magnitude.
CTAccel solutions are deployed in a wide spectrum of market segments including online-to-offline (“O2O”) services, smartphone cloud applications, cloud storage and online video sites. The company’s FPGA-based accelerator brings high-throughput, low-latency image processing that delivers proven performance improvements while simultaneously reducing costs for customers.
"Our existing and in-development product offerings have positioned CTAccel to be a market leader in FPGA-accelerated solutions," said Dr. Harry Yu, founder and CEO of CTAccel. “This investment, led by Intel Capital, will speed up our new product development and extend our solutions across FPGA-accelerated datacenter computation while supporting our global expansion.”
Anthony Lin, Vice President and Managing Director of Intel Capital International, said:“Our investment in CTAccel supports Intel’s strategy to accelerate the development of the rapidly growing data economy. We look forward to continuing to work with CTAccel to increase the adoption of FPGA-based datacenter solutions in China and global markets.”
The Series A funding round extends the relationship between CTAccel and Intel and aims to leverage the extensive experience of CTAccel in hardware-software co-design, heterogeneous computing and software engineering to deliver high-performance, high-value solutions to customers.
About CTAccel Limited
CTAccel was founded in March 2016 by a team of FPGA experts from ClusterTech Limited. The team has been involved in the development of FPGA-based heterogeneous computing solutions for the datacenter since 2013. The company’s patented CTAccel Image Processing (CIP) accelerator improves the performance and efficiency of image processing in datacenters by offloading computation from CPU to FPGA. Learn more about CTAccel by visiting www.ct-accel.com.
About Intel Capital
Intel Capital invests in innovative startups targeting artificial intelligence, autonomous vehicles, datacenter and cloud, 5G, next-generation compute and a wide range of other disruptive technologies. Since 1991, Intel Capital has invested US $12.3 billion in 1,544 companies worldwide, and more than 660 portfolio companies have gone public or participated in a merger. Intel Capital curates thousands of business development introductions each year between its portfolio companies and the Global 2000. For more information on what makes Intel Capital one of the world’s most powerful venture capital firms, visit www.intelcapital.com or follow @Intelcapital.
CTAccel To Deliver NGCodec Video Encoding Solutions as Exclusive Agent in China (2018-07-06)
NGCodec, NGCodec, a pioneer in cloud video processing, is partnering with Chinese FPGA acceleration specialist CTAccel to deliver FPGA-based video encoding solutions in China. CTAccel will be the exclusive agent for NGcodec products in both mainland China and Hong Kong.
CTAccel will offer the next generation NGCodec video encoder, the RealityCodec™ H.265/HEVC encoder, using FPGA hardware acceleration for low latency while maximizing video quality, meeting the highest broadcasting standards.
“Opportunities for cloud video encoding are expanding, but traditional software approaches need massive, expensive CPU resources and cannot deliver the video quality or latency required by emerging applications,” said Oliver Gunasekara, Chief Executive Officer and Founder, NGCodec. “CTAccel is experienced and respected in FPGA acceleration and we look forward to offering state of the art video encoding solutions to customers in China through CTAccel”
CTAccel has offered image processing solutions employing a high-throughput, low latency FPGA-based accelerator with proven value among customers in China
“CTAccel is very experienced in FPGA-based acceleration. Being NGcodec’s exclusive agent in China, we are making both party’s solutions more complete. This partnership allows both CTAccel and NGcodec to explore new ways to offer our products to a wide range of customers around the world.” says Harry Yu, CEO and Co-Founder of CTAccel.
NGcodec and CTAccel have already collaborated on several projects and are committed to providing the best FPGA-based solutions to customers.
About NGcodec
NGCodec® has been in passionate pursuit of next generation video compression since 2012. With the support of investors including Xilinx, NGCodec’s agile startup team has created Reality Codec™, a compressor-decompressor technology optimized for ultra-low latency, high-quality applications. Headquartered in Sunnyvale, California, NGCodec leverages FPGA acceleration in the Cloud to lower encoding costs by 10x over traditional CPU encoders. Learn more at www.ngcodec.com online. Learn more about NGcodec: https://ngcodec.com/
About CTAccel
CTAccel Ltd. was founded in March 2016 by a team of FPGA experts from Clustertech Ltd. The company has been involved in the development of FPGA based heterogeneous computing solution for the datacenter since 2013. The company’s patented CTAccel Image Processor (CIP) improves the performance and efficiency of image processing in datacenters. Learn more about CTAccel:http://www.ct-accel.com/home-2/
CTAccel Provides High Performance FPGA-based Image Processing Accelerator on AWS F1 (2018-05-18)

CTAccel Image Processor for AWS Cloud(hereinafter referred to briefly as CIP for AWS Cloud) is available as an Amazon Machine Image on the Amazon Web Services Community AMIs. CIP for AWS Cloud is an FPGA-based image processing acceleration solution that can help you greatly improve the performance of image processing by transferring computational work-load from CPU to FPGA.

Application Scenarios
CTAccel provides rich solutions for customers who have image processing requests. And they can be used in many application scenarios, such as JPEG thumbnail, Sharpen, Main color, Watermark, Brightness-Contrast and so on.

CIP for AWS Cloud can benefit you by increasing image processing throughput, reducing computational latency and reducing TCO.
➢ Improve the Throughput by 10x
➢ Reduce Latency by 10x
➢ Reduce TCO by 3x

CTAccel Joins Accelize Ecosystem to Make FPGA-Based Image Transcoding Acceleration Available on AccelStore(2018-05-17)
Image Transcoding Accelerator from CTAccel Will Be Available on As-Needed Basis for Cloud and Enterprise Applications
Cloud Expo Asia, Hong Kong — May 16, 2018 — CTAccel announced today that it is partnering with Accelize® to make its FPGA-based CTAccel Image Processor (CIP) available on an as-needed basis on the new AccelStore™ marketplace. CIP is a high-performance image processing accelerator that improves server throughput and latency.
“By joining the Accelize ecosystem, we are making our image transcoding expertise available to a broader audience of cloud and enterprise application developers on all the cloud and enterprise platforms supported by the Accelize framework starting with OVH and AWS,” said Ivan Wong, senior product director of CTAccel. “This partnership with Accelize allows us to explore new ways to offer our products to a wider range of customers whether they are Cloud Application developers or looking for on-premise acceleration solutions in a fast, easy and cost-effective way.”
Similar to other accelerators on AccelStore, CIP will leverage the Accelize RESTFul API to enable fast and easy evaluation and deployment in just minutes on any supported platform, starting with Amazon Web Services (AWS) and OVH. The accelerator will also be available on multiple Enterprise FPGA cards for on-premise usage and purchasable with a variety of business models.
“CTAccel has proven the value of its solutions on multiple Cloud Service Providers in Asia, and we are excited to work with them to offer image transcoding acceleration to AccelStore users,” said Stephane Monboisset, vice president of marketing and partnerships for Accelize.
AccelStore is a new, online marketplace of ready-to-use accelerator functions, provided by a growing ecosystem of 3rd party developer companies, that can be seamlessly deployed in high-speed data centers including Amazon Web Services (AWS) and OVH to start, and more to come. Accelize gives IP developers the right framework and support to deploy their FPGA solutions to the Cloud and manages all aspects of distributing and licensing for them. AccelStore makes the online library of FPGA-accelerated functions available to the broad audience of cloud applications developers with easy evaluation, licensing and usage terms.
About Accelize
Accelize, a spinoff of PLDA Group, is a leading provider of Acceleration-as-a-Service, bringing the benefits of FPGA acceleration to cloud and enterprise users. Accelize operates AccelStore, a marketplace of ready-to-use accelerator functions running on FPGA platforms provided by a broad ecosystem of IP providers, design houses and ISV’s. Accelize also develops and maintains unique technologies that ease the development of FPGA accelerator functions and their monetization to benefit the entire FPGA supply chain. Its accelerator functions operate on multiple FPGA platforms in Public Cloud, Private Cloud and on premise. For more information, visit www.accelize.com.
CTAccel CIP 产品已经上线BAT三大云平台(2017-11-24)
一年一度的杭州云栖大会(2017) 刚在10月11日-10月14日在西湖区云栖小镇举办。全球各地云计算丶大数据丶人工智能的顶尖企业Intel丶Xilinx丶Nvidia丶AMD均汇集于此, 与阿里云联合发布新一代的异构计算加速云平台,聚焦全球高性能云计算丶大数据应用和人工智能创新领域最前沿的技术。联捷计算科技(CTAccel)一直致力于基于FPGA的图片处理与分析加速计算技术的研发,核心技术已获得美国专利,承蒙阿里云的邀请,作为阿里云异构计算加速云平台的生态共建合作企业之一出席会议.

(左三: 联捷计算科技CTAccel CEO俞海乐博士)
会议上CTAccel CEO俞海乐博士表达了对阿里FPGA云服务平台的高度认同, 简述了FPGA云服务可有效降低FPGA加速方案的研发环境搭建的时间, 同时解决FPGA加速方案在售前与部署面对的各种困难及阻力, 例如各种硬件合规与运维准入标准等问题, 让加速方案团队能专注于产品打磨与核心算法的开发,意义十分重大.
CTAccel在FPGA加速处理上有丰富经验, 研发人员拥有从国内外知名大学获得的理工科硕士学位和丰富的研发经验。经过历时三年的探索,团队的图片加速处理技术已经实现优于传统CPU七倍的计算性能,并获得美国专利公审。CTAccel 的图片加速处理产品CIP在图像处理计算中,可降低延时三倍,提高并发度三至七倍,降低TCO 三倍。CIP提供了目前世界上最强的图片处理能力,重新定义了数据中心图片处理计算模式,为互联网图片计算提供最高效的解决方案。
SC16: FPGA 计算时代来临,联捷计算科技加速技术场内亮点(2016-11-28)
在美国犹他州盐湖城举办的世界超算大会SC16已于11月18号圆满结束。FPGA届两大巨头INTEL与Xilinx在此大会上都展示了他们最新的FPGA加速技术,专用于提升高性能计算与互联网数据中心的性能。此次Intel展示的Arria 10 FPGA有浮点处理能力,运行AlexNet——一种卷积神经网络(CNN)时可达到1000FPS的处理速度。与此同时,Xilinx研发的Kintex Ultrascale FPGA在进行定点版本的AlexNet推断时可达到1800FPS的性能。值得注意的是,二者设计神经网络时皆运用了OpenCL,这无疑给市场打了剂强心针,这意味着使用高级编程语言编FPGA成为现实。在今后,FPGA开发者将缩短研发周期,令人诟病的漫长开发周期将成为过去式。而Xilinx的OpenCL工具SDAccel将在今年年底正式公布。联捷科技(CTAccel)是中国第一批赛灵思官方认证的SDAccel设计服务提供商。

图1:联捷科技的老朋友-Xilinx SDAccel产品总监Vinay与联捷科技技术总监促膝长谈

图2:IBM Power架構专家与联捷科技技术总监深入讨论联捷科技的平台技术
CTAccel Image Processing (CIP) acceleration perfect integration with Intel® Agilex™ FPGA(2023-06-28)
CTAccel CIP 产品已经上线BAT三大云平台(2017-11-24)
一年一度的杭州云栖大会(2017) 刚在10月11日-10月14日在西湖区云栖小镇举办。全球各地云计算丶大数据丶人工智能的顶尖企业Intel丶Xilinx丶Nvidia丶AMD均汇集于此, 与阿里云联合发布新一代的异构计算加速云平台,聚焦全球高性能云计算丶大数据应用和人工智能创新领域最前沿的技术。联捷计算科技(CTAccel)一直致力于基于FPGA的图片处理与分析加速计算技术的研发,核心技术已获得美国专利,承蒙阿里云的邀请,作为阿里云异构计算加速云平台的生态共建合作企业之一出席会议.

(左三: 联捷计算科技CTAccel CEO俞海乐博士)
会议上CTAccel CEO俞海乐博士表达了对阿里FPGA云服务平台的高度认同, 简述了FPGA云服务可有效降低FPGA加速方案的研发环境搭建的时间, 同时解决FPGA加速方案在售前与部署面对的各种困难及阻力, 例如各种硬件合规与运维准入标准等问题, 让加速方案团队能专注于产品打磨与核心算法的开发,意义十分重大.
CTAccel在FPGA加速处理上有丰富经验, 研发人员拥有从国内外知名大学获得的理工科硕士学位和丰富的研发经验。经过历时三年的探索,团队的图片加速处理技术已经实现优于传统CPU七倍的计算性能,并获得美国专利公审。CTAccel 的图片加速处理产品CIP在图像处理计算中,可降低延时三倍,提高并发度三至七倍,降低TCO 三倍。CIP提供了目前世界上最强的图片处理能力,重新定义了数据中心图片处理计算模式,为互联网图片计算提供最高效的解决方案。
SC16: FPGA 计算时代来临,联捷计算科技加速技术场内亮点(2016-11-28)
在美国犹他州盐湖城举办的世界超算大会SC16已于11月18号圆满结束。FPGA届两大巨头INTEL与Xilinx在此大会上都展示了他们最新的FPGA加速技术,专用于提升高性能计算与互联网数据中心的性能。此次Intel展示的Arria 10 FPGA有浮点处理能力,运行AlexNet——一种卷积神经网络(CNN)时可达到1000FPS的处理速度。与此同时,Xilinx研发的Kintex Ultrascale FPGA在进行定点版本的AlexNet推断时可达到1800FPS的性能。值得注意的是,二者设计神经网络时皆运用了OpenCL,这无疑给市场打了剂强心针,这意味着使用高级编程语言编FPGA成为现实。在今后,FPGA开发者将缩短研发周期,令人诟病的漫长开发周期将成为过去式。而Xilinx的OpenCL工具SDAccel将在今年年底正式公布。联捷科技(CTAccel)是中国第一批赛灵思官方认证的SDAccel设计服务提供商。

图1:联捷科技的老朋友-Xilinx SDAccel产品总监Vinay与联捷科技技术总监促膝长谈

图2:IBM Power架構专家与联捷科技技术总监深入讨论联捷科技的平台技术
CTAccel CIP 产品已经上线BAT三大云平台(2017-11-24)
一年一度的杭州云栖大会(2017) 刚在10月11日-10月14日在西湖区云栖小镇举办。全球各地云计算丶大数据丶人工智能的顶尖企业Intel丶Xilinx丶Nvidia丶AMD均汇集于此, 与阿里云联合发布新一代的异构计算加速云平台,聚焦全球高性能云计算丶大数据应用和人工智能创新领域最前沿的技术。联捷计算科技(CTAccel)一直致力于基于FPGA的图片处理与分析加速计算技术的研发,核心技术已获得美国专利,承蒙阿里云的邀请,作为阿里云异构计算加速云平台的生态共建合作企业之一出席会议.

(左三: 联捷计算科技CTAccel CEO俞海乐博士)
会议上CTAccel CEO俞海乐博士表达了对阿里FPGA云服务平台的高度认同, 简述了FPGA云服务可有效降低FPGA加速方案的研发环境搭建的时间, 同时解决FPGA加速方案在售前与部署面对的各种困难及阻力, 例如各种硬件合规与运维准入标准等问题, 让加速方案团队能专注于产品打磨与核心算法的开发,意义十分重大.
CTAccel在FPGA加速处理上有丰富经验, 研发人员拥有从国内外知名大学获得的理工科硕士学位和丰富的研发经验。经过历时三年的探索,团队的图片加速处理技术已经实现优于传统CPU七倍的计算性能,并获得美国专利公审。CTAccel 的图片加速处理产品CIP在图像处理计算中,可降低延时三倍,提高并发度三至七倍,降低TCO 三倍。CIP提供了目前世界上最强的图片处理能力,重新定义了数据中心图片处理计算模式,为互联网图片计算提供最高效的解决方案。
SC16: FPGA 计算时代来临,联捷计算科技加速技术场内亮点(2016-11-28)
在美国犹他州盐湖城举办的世界超算大会SC16已于11月18号圆满结束。FPGA届两大巨头INTEL与Xilinx在此大会上都展示了他们最新的FPGA加速技术,专用于提升高性能计算与互联网数据中心的性能。此次Intel展示的Arria 10 FPGA有浮点处理能力,运行AlexNet——一种卷积神经网络(CNN)时可达到1000FPS的处理速度。与此同时,Xilinx研发的Kintex Ultrascale FPGA在进行定点版本的AlexNet推断时可达到1800FPS的性能。值得注意的是,二者设计神经网络时皆运用了OpenCL,这无疑给市场打了剂强心针,这意味着使用高级编程语言编FPGA成为现实。在今后,FPGA开发者将缩短研发周期,令人诟病的漫长开发周期将成为过去式。而Xilinx的OpenCL工具SDAccel将在今年年底正式公布。联捷科技(CTAccel)是中国第一批赛灵思官方认证的SDAccel设计服务提供商。

图1:联捷科技的老朋友-Xilinx SDAccel产品总监Vinay与联捷科技技术总监促膝长谈

图2:IBM Power架構专家与联捷科技技术总监深入讨论联捷科技的平台技术
