Problem
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:
For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one"s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work:
Char. number range (hexadecimal) | UTF-8 octet sequence (binary) |
---|---|
0000 0000-0000 007F | 0xxxxxxx |
0000 0080-0000 07FF | 110xxxxx 10xxxxxx |
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
Given an array of integers representing the data, return whether it is a valid utf-8 encoding.
Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.
Example 1:
data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true. It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2:
data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false. The first 3 bits are all one"s and the 4th bit is 0 means it is a 3-bytes character. The next byte is a continuation byte which starts with 10 and that"s correct. But the second continuation byte does not start with 10, so it is invalid.Solution
class Solution { public boolean validUtf8(int[] data) { if (data == null || data.length == 0) return false; for (int i = 0; i < data.length; i++) { if (data[i] > 255) return false; int count = 0; if (data[i] < 128) { count = 1; } else if (data[i] >= 192 && data[i] < 224) { count = 2; } else if (data[i] < 240) { count = 3; } else if (data[i] < 248) { count = 4; } else { return false; } for (int j = 1; j < count; j++) { if (i+j >= data.length) return false; if (data[i+j] < 128 || data[i+j] >= 192) return false; } i = i+count-1; } return true; } }
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://m.specialneedsforspecialkids.com/yun/76740.html
摘要:題目鏈接這道題關(guān)鍵是搞懂題目意思。思路及代碼知道意思之后,這道題就很簡(jiǎn)單了。一個(gè),每次分三步來(lái)做,是每次都是新的統(tǒng)計(jì)后位里面,從前開(kāi)始有多少個(gè),用變量來(lái)保存,其中可能的值只有從開(kāi)始檢查,后八位中的前兩位是否為,一共檢查更新的值為 UTF-8 Validation 題目鏈接:https://leetcode.com/problems... 這道題關(guān)鍵是搞懂題目意思。 UTF-8 1 by...
摘要:題目要求檢驗(yàn)整數(shù)數(shù)組能否構(gòu)成合法的編碼的序列。剩余的字節(jié)必須以開(kāi)頭。而緊跟其后的字符必須格式為。綜上所述單字節(jié)多字節(jié)字符的跟隨字節(jié)兩個(gè)字節(jié)的起始字節(jié)三個(gè)字節(jié)的起始字節(jié)四個(gè)字節(jié)的起始字節(jié)下面分別是這題的兩種實(shí)現(xiàn)遞歸實(shí)現(xiàn)循環(huán)實(shí)現(xiàn) 題目要求 A character in UTF8 can be from 1 to 4 bytes long, subjected to the followin...
摘要:時(shí)間年月日星期三說(shuō)明使用規(guī)范校驗(yàn)接口請(qǐng)求參數(shù)源碼第一章理論簡(jiǎn)介背景介紹如今互聯(lián)網(wǎng)項(xiàng)目都采用接口形式進(jìn)行開(kāi)發(fā)。該規(guī)范定義了一個(gè)元數(shù)據(jù)模型,默認(rèn)的元數(shù)據(jù)來(lái)源是注解。 時(shí)間:2017年11月08日星期三說(shuō)明:使用JSR303規(guī)范校驗(yàn)http接口請(qǐng)求參數(shù) 源碼:https://github.com/zccodere/s... 第一章:理論簡(jiǎn)介 1-1 背景介紹 如今互聯(lián)網(wǎng)項(xiàng)目都采用HTTP接口...
摘要:和上標(biāo)注的約束都會(huì)被執(zhí)行注意如果子類覆蓋了父類的方法,那么子類和父類的約束都會(huì)被校驗(yàn)。 每篇一句 沒(méi)有任何技術(shù)方案會(huì)是一種銀彈,任何東西都是有利弊的 相關(guān)閱讀 【小家Java】深入了解數(shù)據(jù)校驗(yàn):Java Bean Validation 2.0(JSR303、JSR349、JSR380)Hibernate-Validation 6.x使用案例【小家Spring】Spring方法級(jí)別數(shù)據(jù)校...
摘要:配置的參數(shù)打開(kāi)根目錄下的在最后面加上如下的參數(shù)測(cè)試環(huán)境位內(nèi)存雙核測(cè)試版本經(jīng)測(cè)試,啟動(dòng)速度比默認(rèn)配置有所提升,占用內(nèi)存也較少其中這三行為啟用方式,不能保證在不同環(huán)境下都是最優(yōu)配置,可以替換為多核和大內(nèi)存建議使 配置eclipse的jvm參數(shù) 打開(kāi)eclipse根目錄下的eclipse.ini在最后面加上如下的jvm參數(shù) -Xms400m -Xmx1400m -XX:NewSize=128...
閱讀 741·2021-11-24 10:19
閱讀 1128·2021-09-13 10:23
閱讀 3446·2021-09-06 15:15
閱讀 1788·2019-08-30 14:09
閱讀 1704·2019-08-30 11:15
閱讀 1852·2019-08-29 18:44
閱讀 950·2019-08-29 16:34
閱讀 2470·2019-08-29 12:46