Application Octet Stream Encoding

8/10/2019

It is necessary, therefore, to definea standard mechanism for re-encodingsuch data into a 7-bit short-lineformat. This document specifiesthat such encodings will be indicated by a new 'Content-Transfer-Encoding'header field. The Content-Transfer-Encodingfield is used to indicate the type of transformation that hasbeen used in order to representthe body in an acceptable mannerfor transport.

Many Content-Types which could usefullybe transported via email are represented,in their 'natural' format, as 8-bitcharacter or binary data. Such datacannot be transmitted over some transport protocols. For example, RFC 821 restricts mailmessages to 7-bit US-ASCII data with 1000 character lines.

Youtube Application Octet Stream
Application Octet Stream Charset Binary

Unlike Content-Types, a proliferationof Content-Transfer- Encodingvalues is undesirable and unnecessary.However, establishing only a single Content-Transfer-Encoding mechanism does not seem possible. There is a tradeoff between thedesire for a compact and efficientencoding of largely-binary dataand the desire for a readable encoding of data that is mostly, but notentirely, 7-bit data. For thisreason, at least two encoding mechanismsare necessary: a 'readable' encodingand a 'dense' encoding.

The Content-Transfer-Encoding fieldis designed to specify an invertiblemapping between the 'native' representationof a type of data and a representationthat can be readily exchangedusing 7 bit mail transport protocols,such as those defined by RFC 821(SMTP). This field has not been defined by any previous standard.The field's value is a single tokenspecifying the type of encoding,as enumerated below. Formally: These values are not case sensitive.That is, Base64 and BASE64 and bAsE64 are all equivalent. An encodingtype of 7BIT requires that thebody is already in a seven-bit mail- ready representation. This isthe default value -- that is, 'Content-Transfer-Encoding: 7BIT' is assumed if the Content-Transfer-Encoding headerfield is not present.

The values '8bit', '7bit', and 'binary'all imply that NO encoding has been performed. However, they arepotentially useful as indicationsof the kind of data contained inthe object, and therefore ofthe kind of encoding that might need to be performed for transmissionin a given transport system. '7bit' means that the data isall represented as short linesof US-ASCII data. '8bit' means thatthe lines are short, but theremay be non-ASCII characters (octets with the high-order bit set). 'Binary'means that not only may non-ASCIIcharacters be present, but also thatthe lines are not necessarily shortenough for SMTP transport.

The difference between '8bit' (orany other conceivable bit-widthtoken) and the 'binary' tokenis that 'binary' does not requireadherence to any limits on line length or to the SMTP CRLF semantics,while the bit-width tokens do requiresuch adherence. If the body contains data in any bit-width other than 7-bit, the appropriate bit-width Content-Transfer-Encoding tokenmust be used (e.g., '8bit' forunencoded 8 bit wide data). If thebody contains binary data, the'binary' Content-Transfer-Encodingtoken must be used.

NOTE:

The distinction between the Content-Transfer-Encoding values of 'binary,' '8bit,'etc. may seem unimportant, in thatall of them really mean 'none' --that is, there has been noencoding of the data for transport.However, clear labeling will beof enormous value to gatewaysbetween future mail transport systemswith differing capabilities in transporting data that do not meetthe restrictions of RFC 821 transport.

As of the publication of thisdocument, there are no standardizedInternet transports for which itis legitimate to include unencoded8-bit or binary data in mail bodies. Thus there are no circumstancesin which the '8bit' or 'binary'Content-Transfer-Encoding is actuallylegal on the Internet. However,in the event that 8-bit or binarymail transport becomes a realityin Internet mail, or when this document is used in conjunctionwith any other 8-bit or binary-capabletransport mechanism, 8-bit or binarybodies should be labeled as suchusing this mechanism.

NOTE:

The five values defined for theContent-Transfer- Encoding fieldimply nothing about the Content-Typeother than the algorithm by whichit was encoded or the transport system requirements if unencoded.

Implementors may, if necessary, define new Content- Transfer-Encodingvalues, but must use an x-token,which is a name prefixed by 'X-'to indicate its non-standard status, e.g., 'Content-Transfer-Encoding: x-my-new-encoding'. However,unlike Content-Types and subtypes,the creation of new Content-Transfer-Encodingvalues is explicitly and stronglydiscouraged, as it seems likely to hinder interoperability with little potential benefit. Theiruse is allowed only as the result of an agreement between cooperatinguser agents.

If a Content-Transfer-Encoding headerfield appears as part of a messageheader, it applies to the entirebody of that message. If a Content-Transfer-Encoding header field appears as partof a body part's headers, it appliesonly to the body of that body part. If an entity is of type 'multipart' or 'message', theContent-Transfer-Encoding is notpermitted to have any value other than a bit width (e.g., '7bit','8bit', etc.) or 'binary'.

It should be noted that email ischaracter-oriented, so that themechanisms described here are mechanismsfor encoding arbitrary byte streams,not bit streams. If a bit streamis to be encoded via one of thesemechanisms, it must first be convertedto an 8-bit byte stream using thenetwork standard bit order ('big-endian'), in which the earlier bits in a stream become the higher-order bitsin a byte. A bit stream not endingat an 8-bit boundary must be paddedwith zeroes. This document providesa mechanism for noting the addition of such padding in the case ofthe application Content-Type, whichhas a 'padding' parameter.

The encoding mechanisms defined hereexplicitly encode all data inASCII. Thus, for example, supposean entity has header fields suchas: This should be interpreted to meanthat the body is a base64 ASCIIencoding of data that was originallyin ISO-8859-1, and will be in thatcharacter set again after decoding.

The following sections will definethe two standard encoding mechanisms. The definition of new content-transfer- encodings is explicitly discouragedand should only occur when absolutely necessary. All content-transfer-encoding namespace except that beginningwith 'X-' is explicitly reservedto the IANA for future use. Privateagreements about content-transfer-encodings are also explicitly discouraged.

Certain Content-Transfer-Encodingvalues may only be used on certainContent-Types. In particular,it is expressly forbidden touse any encodings other than '7bit','8bit', or 'binary' with anyContent-Type that recursively includes other Content-Type fields, notablythe 'multipart' and 'message'Content-Types. All encodings thatare desired for bodies of typemultipart or message must be doneat the innermost level, by encodingthe actual body that needs to beencoded.

NOTE ON ENCODING RESTRICTIONS:

Though the prohibition againstusing content-transfer-encodingson data of type multipart or messagemay seem overly restrictive, it is necessary to prevent nested encodings, in which data are passedthrough an encoding algorithm multiple times, and must be decoded multiple times in order to beproperly viewed. Nested encodings add considerable complexity to user agents: aside from theobvious efficiency problems withsuch multiple encodings, they can obscure the basic structure of a message. In particular, theycan imply that several decodingoperations are necessary simply tofind out what types of objectsa message contains. Banning nested encodings may complicate the jobof certain mail gateways, butthis seems less of a problem thanthe effect of nested encodingson user agents.

NOTE ON THE RELATIONSHIP BETWEENCONTENT-TYPE AND CONTENT- TRANSFER-ENCODING

It may seem that the Content-Transfer-Encodingcould be inferred from the characteristicsof the Content-Type that isto be encoded, or, at the very least, that certain Content-Transfer-Encodingscould be mandated for use with specific Content-Types. Thereare several reasons why this isnot the case. First, given the varying types of transports used formail, some encodings may be appropriatefor some Content-Type/transport combinationsand not for others. (For example,in an 8-bit transport, no encodingwould be required for text in certain character sets, while such encodingsare clearly required for 7-bit SMTP.) Second, certain Content-Typesmay require different types of transfer encoding under differentcircumstances. For example, manyPostScript bodies might consist entirely of short lines of 7-bitdata and hence require little orno encoding. Other PostScript bodies (especially those using Level 2 PostScript's binary encoding mechanism)may only be reasonably representedusing a binary transport encoding. Finally, since Content-Type isintended to be an open-ended specification mechanism, strict specification of an association between Content-Typesand encodings effectively couplesthe specification of an applicationprotocol with a specific lower-leveltransport. This is not desirablesince the developers of a Content-Typeshould not have to be aware ofall the transports in use and whattheir limitations are.

NOTE ON TRANSLATING ENCODINGS

The quoted-printable and base64encodings are designed so thatconversion between them is possible.The only issue that arises insuch a conversion is the handlingof line breaks. When converting from quoted-printable to base64a line break must be convertedinto a CRLF sequence. Similarly,a CRLF sequence in base64 datashould be converted to a quoted-printable line break, but ONLY when convertingtext data.

NOTE ON CANONICAL ENCODING MODEL:

There was some confusion, in earlier drafts of this memo, regardingthe model for when email data wasto be converted to canonical form and encoded, and in particularhow this process would affect thetreatment of CRLFs, given that therepresentation of newlines variesgreatly from system to system. Forthis reason, a canonical modelfor encoding is presented as Appendix H.

5.1 Quoted-Printable Content-Transfer-Encoding

The Quoted-Printable encoding isintended to represent data thatlargely consists of octets that correspondto printable characters in theASCII character set. It encodesthe data in such a way thatthe resulting octets are unlikelyto be modified by mail transport.If the data being encoded are mostly ASCII text, the encodedform of the data remains largelyrecognizable by humans. A body which is entirely ASCII may also beencoded in Quoted-Printable to ensurethe integrity of the data should the message pass through a character-translating, and/or line-wrappinggateway.

In this encoding, octets are to berepresented as determined by thefollowing rules:

Rule #1: (General 8-bit representation)

Any octet, except those indicatinga line break according to the newlineconvention of the canonical formof the data being encoded, maybe represented by an '=' followedby a two digit hexadecimal representationof the octet's value. The digitsof the hexadecimal alphabet, forthis purpose, are '0123456789ABCDEF'.Uppercase letters must be usedwhen sending hexadecimal data, though a robust implementation may choose to recognize lowercase letters on receipt. Thus, for example,the value 12 (ASCII form feed)can be represented by '=0C', andthe value 61 (ASCII EQUAL SIGN)can be represented by '=3D'. Except when the following rulesallow an alternative encoding,this rule is mandatory.

Rule #2: (Literal representation)Octets with decimal values of33 through 60 inclusive, and 62 through126, inclusive, MAY be representedas the ASCII characters whichcorrespond to those octets (EXCLAMATIONPOINT through LESS THAN, and GREATERTHAN through TILDE, respectively).

Rule #3: (White Space)

Octets with values of 9 and 32 MAY be represented as ASCII TAB(HT) and SPACE characters, respectively, but MUST NOT be so representedat the end of an encoded line. AnyTAB (HT) or SPACE characters onan encoded line MUST thus be followed on that line by a printablecharacter. In particular, an '='at the end of an encoded line, indicating a soft line break (seerule #5) may follow one or moreTAB (HT) or SPACE characters. Itfollows that an octet with value9 or 32 appearing at the end ofan encoded line must be representedaccording to Rule #1. This rule is necessary because some MTAs (Message Transport Agents, programs which transport messages fromone user to another, or perform apart of such transfers) are knownto pad lines of text with SPACEs,and others are known to remove'white space' characters from theend of a line. Therefore, when decoding a Quoted-Printable body,any trailing white space on a linemust be deleted, as it will necessarily have been added by intermediatetransport agents.

Rule #4 (Line Breaks)

A line break in a text body part, independent of what its representation is following the canonical representation of the data being encoded, must be representedby a (RFC 822) line break, which is a CRLF sequence, in the Quoted- Printable encoding. Ifisolated CRs and LFs, or LF CR andCR LF sequences are allowed to appear in binary data according to the canonical form, they must be represented using the '=0D','=0A', '=0A=0D' and '=0D=0A' notationsrespectively.

Note that many implementation mayelect to encode the local representationof various content types directly. In particular, this may apply toplain text material on systems that use newline conventions other thanCRLF delimiters. Such an implementationis permissible, but the generationof line breaks must be generalizedto account for the case where alternaterepresentations of newline sequencesare used.

Rule #5 (Soft Line Breaks)

The Quoted-Printable encodingREQUIRES that encoded lines be nomore than 76 characters long. Iflonger lines are to be encoded with the Quoted-Printable encoding,'soft' line breaks must be used.An equal sign as the last characteron a encoded line indicates sucha non-significant ('soft') linebreak in the encoded text. Thus ifthe 'raw' form of the line is asingle unencoded line that says: This can be represented, in the Quoted-Printable encoding, as This provides a mechanism with whichlong lines are encoded in sucha way as to be restored by the user agent. The 76 character limit does not count the trailing CRLF, but counts all othercharacters, including any equalsigns.

Since the hyphen character ('-')is represented as itself in theQuoted-Printable encoding, caremust be taken, when encapsulatinga quoted-printable encoded body ina multipart entity, to ensurethat the encapsulation boundary doesnot appear anywhere in the encodedbody. (A good strategy is to choosea boundary that includes a charactersequence such as '=_' which cannever appear in a quoted-printablebody. See the definition of multipart messages later in this document.)

NOTE: The quoted-printable encodingrepresents something of a compromise between readability and reliabilityin transport. Bodies encoded with the quoted-printable encoding will work reliably overmost mail gateways, but may notwork perfectly over a few gateways,notably those involving translationinto EBCDIC. (In theory, an EBCDIC gateway could decode a quoted-printablebody and re-encode it usingbase64, but such gateways do notyet exist.) A higher level ofconfidence is offered by the base64 Content-Transfer-Encoding.A way to get reasonably reliable transport through EBCDIC gatewaysis to also quote the ASCII characters according to rule #1. Freeware iphone to pc transfer. See AppendixB for more information.

Because quoted-printable data isgenerally assumed to be line-oriented,it is to be expected that the breaksbetween the lines of quoted printable data may be altered in transport, in the same manner that plaintext mail has always been alteredin Internet mail when passing between systems with differing newline conventions. If such alterations are likely to constitute a corruption of the data, it is probably more sensible to use the base64 encoding ratherthan the quoted-printable encoding.

5.2 Base64 Content-Transfer-Encoding

The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a form thatis not humanly readable. The encodingand decoding algorithms are simple,but the encoded data are consistentlyonly about 33 percent larger thanthe unencoded data. This encodingis based on the one used in PrivacyEnhanced Mail applications, asdefined in RFC 1113. The base64 encoding is adapted from RFC 1113, with one change: base64 eliminatesthe '*' mechanism for embeddedclear text.

A 65-character subset of US-ASCIIis used, enabling 6 bits to be represented per printable character.(The extra 65th character, '=',is used to signify a specialprocessing function.)

NOTE: This subset has the importantproperty that it is represented identically in all versionsof ISO 646, including US ASCII,and all characters in the subsetare also represented identicallyin all versions of EBCDIC. Otherpopular encodings, such as the encodingused by the UUENCODE utilityand the base85 encoding specifiedas part of Level 2 PostScript,do not share these properties,and thus do not fulfill theportability requirements a binary transport encoding for mail mustmeet.

The encoding process represents 24-bitgroups of input bits as outputstrings of 4 encoded characters.Proceeding from left to right,a 24-bit input group is formed by concatenating 3 8-bit inputgroups. These 24 bits are then treated as 4 concatenated 6-bit groups,each of which is translatedinto a single digit in the base64alphabet. When encoding a bit streamvia the base64 encoding, thebit stream must be presumedto be ordered with the most- significant-bit first. That is,the first bit in the stream willbe the high-order bit in the firstbyte, and the eighth bit will bethe low-order bit in the first byte,and so on.

Each 6-bit group is used as an indexinto an array of 64 printablecharacters. The character referencedby the index is placed in the outputstring. These characters, identified in Table 1, below, are selectedso as to be universally representable,and the set excludes characters with particular significanceto SMTP (e.g., '.', 'CR', 'LF') and to the encapsulation boundariesdefined in this document (e.g.,'-').

Table 1: The Base64 Alphabet

The output stream (encoded bytes)must be represented in linesof no more than 76 characters each.All line breaks or other charactersnot found in Table 1 must be ignoredby decoding software. In base64data, characters other than thosein Table 1, line breaks, andother white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriateunder some circumstances.

Special processing is performed iffewer than 24 bits are availableat the end of the data beingencoded. A full encoding quantumis always completed at the end ofa body. When fewer than 24input bits are available in an input group, zero bits are added (on the right) to form an integralnumber of 6-bit groups. Output characterpositions which are not requiredto represent actual input data are set to the character '='. Since all base64 input is an integral number of octets, only the following cases can arise: (1) the final quantum of encoding input is an integral multipleof 24 bits; here, the final unit of encoded output will bean integral multiple of 4 characters with no '=' padding, (2) the finalquantum of encoding input is exactly 8 bits; here, the final unit ofencoded output will be two characters followed by two '=' padding characters, or (3) the finalquantum of encoding input is exactly16 bits; here, the final unit ofencoded output will be three charactersfollowed by one '=' padding character.

Care must be taken to use the properoctets for line breaks if base64encoding is applied directly to textmaterial that has not been convertedto canonical form. In particular, text line breaks should beconverted into CRLF sequences priorto base64 encoding. The importantthing to note is that this maybe done directly by the encoder ratherthan in a prior canonicalizationstep in some implementations.

NOTE: There is no need to worryabout quoting apparent encapsulationboundaries within base64-encodedparts of multipart entities becauseno hyphen characters are used in the base64 encoding.

Join GitHub today

GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking â€œSign up for GitHubâ€, you agree to our terms of service and privacy statement. Weâ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments

commented Jan 5, 2016

hi , I have got this error when I try to resize uploaded picture .
I print out the file object I try to resize.

and when I read avatarFile , I got [Error: Unsupported MIME type: application/octet-stream] , do u hv any idea about this issue? thanks

commented Jan 14, 2016

Hello,
Not the same but something similar here: .write() will throw this error when the folder path doesn't exist. It also happens when i try to write a filename without extension. I'm just making sure that the folder path is created. Have you tried fs.exists() on that path? maybe it's a permission problem.
Greetings

commented Jan 21, 2016

@yangshenhuai sorry for the delay getting back to you. I'm mega busy IRL for the next month and a half.

Did you verify what file is (or is not) at avatarFile.path. What the error is telling you is that the file that it reads is application/octet-stream MIME type. And not a JPEG, PNG or BMP.

commented Jun 15, 2016

I had the same problem. I managed to solve it by simply rechecking the path and making sure that the directory exists. Make sure to check and make sure that all directories mentioned in the path exists /var/folders/sd/l6z16nv17txfs7n_x0v2r8r40000gn/T/

Cheers

commented Jul 27, 2016

Error handling could be improved, I guess.

commented Nov 4, 2017

This problem is relevant when you have a buffer that you want to save, and you know mimetype, but don't have the file name. For example when you receive form data in a http endpoint. I used multer for that, and it provides me with buffer and mimetype, so I could set it manually, but there is no config or method for that.

commented Mar 27, 2018

I have the same problem, fb changed some urls today and I am not able to do this:

Jimp.read('https://lookaside.facebook.com/platform/profilepic/?psid=1587355968026286&height=1024')

Error: Unsupported MIME type: application/octet-stream
at Jimp.throwError (/var/app/dvel-api-ocean/service_bot/node_modules/jimp/index.js:61:44)
at Jimp.parseBitmap (/var/app/dvel-api-ocean/service_bot/node_modules/jimp/index.js:410:31)
at /var/app/dvel-api-ocean/service_bot/node_modules/jimp/index.js:259:29
at Request._callback (/var/app/dvel-api-ocean/service_bot/node_modules/jimp/index.js:88:24)

Any idea how to fix this?

commented Mar 27, 2018

@phips28 You could try receiving the data from the URL as a Buffer and then passing to Jimp. For example, using got:

You can also double check that the MIME is appropriate before passing to Jimp by checking data.headers['content-type'] (will not be octet-stream but rather whatever the true type is) but that might be overkill since Jimp will verify when reading the Buffer.

I think that in the future, Jimp should be made able to handle this case naturally without having to go outside of it.

commented Mar 27, 2018â€¢

edited

@czycha I found the issue.

facebook returned a HTML page when there was is headers.user-agent sent.
data.headers['content-type'] = text/html

After adding the user agent in the defaults, it works with Jimp.

But I found no solution to configure the user agent for Jimp, without forking. Do you know something?

commented Mar 27, 2018

dvel-Inc@fe639c6
this is for the 0.2.28 version.

commented Mar 27, 2018

@phips28 Unfortunately, I don't believe there's a way to modify the default headers for Jimp without forking. As it stands, you can either use your fork or run the request using proper headers and pass the resulting buffer to Jimp like in my example.

commented Apr 13, 2018

I also have a similar problem with the read function.

i am trying to read a buffer
let photo = await jimp.read(file.buffer)

the buffer contains:

but the error says:
Unsupported MIME type: application/octet-stream

it used to work and i didn't change any code.
Any thoughts?

commented Apr 13, 2018

I switched to jimp2 and that seems to work

commented Apr 13, 2018â€¢

edited

@bpikaar didnt know there is a jimp2

I have to check that out. thx.

edit: jimp2 is 2 years old?

commented Aug 1, 2018

@phips28@bpikaar can i get a url that works? all the ones in this thread don't resolve to anything @czycha

added bugenhancement and removed bug-issue labels Aug 26, 2018

commented Sep 1, 2018

closing in favor of #267

I have a servlet that sends a file to the browser.

I send this headers in the servlet.

Then i send the file to the browser, but i'm having troubles with the file encoding. The content of the file is UTF-8 but i don't know how to send a header for this.

Does anyone have idea how can i do?

IgnacioIgnacio

2 Answers

There is no need to tell the browser that the file is UTF-8 encoded. By setting the content type to application/octet-stream, you specify that the file must not be interpreted, and may not be plain text at all.

If you absolutely want to declare an encoding, stop declaring the file as application/octet-stream, and declare it as 'text/plain; charset=utf-8' instead.

Martin v. LÃ¶wisMartin v. LÃ¶wis

Gursel KocaGursel Koca

Not the answer you're looking for? Browse other questions tagged javaservletsencoding or ask your own question.

Tools that I'm using for this:

Chrome Notepad++ Sublime Text 3 Fiddler WinMerge Adobe Acrobat Reader X

Synopsis

I have downloaded a pdf twice, once through Chrome as an experimental control; once again through a raw /GET request via Fiddler which returns me an octet-stream. To this point, I can save the octet-stream as pdf and I can get the proper page count and some of the page headers and numbers, but very little of the body content is loading. When I open my file in Adobe Reader X, I get an error that it

Cannot extract the embedded font 'LFIDTH+ArialMT'. Some characters may not display or print correctly and I cannot work through why it can be extracted from the 'true' pdf but cannot from the one I am saving.

Details

As for my manual pull of the file, I have provided

Accept: application/pdf, application/x-pdf, application/x-gzpdf, application/x-bzpdf

The server sent me back an aplication/octet-stream with an attachment Disposition.

So to recap:

Valid Foo.pdf sitting on my hard drive
HTTP Response with an octet-stream version of same file, in UTF-8 encoding (I assume)

Here is what I know:

I pulled the Message Body of the response from the server and dropped it to file. I then ran a WinMerge comparison of it against the contents of the pdf and every line mismatched on line endings. I re-encoded the EOLs for Unix and the diff shrank to ~1k lines out of 160k. A close inspection of the mismatch indicates that the valid pdf maintains what looks like a NUL 00 character in places whereas my octet-stream contains literal spaces. Also, the 'true' pdf is reporting EOL: LF 1252 Mixed through WinMerge. My 'raw' pdf is reporting 1252 Unix When I homogenize the 'true' pdf to 1252 Unix, I get the same issue as I explained in the 'raw' one.

Is there anything I can do to get this mess of an octet-stream straightened out?

Note that the pdf that was downloaded through Chrome is historic. I have it on my machine, but I downloaded it 'sometime in the past' and the request headers used when processing that /GET are no longer available. Attempting to download through the browser 'now' results in an error, but an explicit GET request against the resource through Fiddler is returning the pdf as an octet-stream.

Well now..

In Fiddler Session,

Right click HTTP Response with the application/octet-stream body | Save | Response | Response Body

If Content-Disposition: attachment;filename has been set on the response, the File Save Dialog will be prepopulated with filename

Easy after you know it's there.

set background image to a specified page with @page selector

php,html,pdf,selector,mpdf

According to the documentation, mPDF supports named @page selectors, so you could do this: <style> @page second { background: url('./mpdf60/bg1.jpg') 50% 0 no-repeat; } </style> and then: div.second { page: second; } and then your second page should be within a div with a second class. Look at the example..

PDF Parsing â€” Extract single page

pdf,pdf-generation

The first thing you need is a copy of the PDF specification. You can download this for free from the Adobe web site here: http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf In that document, look at section 7.7.3 which explains how the 'Page Tree' works. Basically, a PDF file contains a tree (Adobe suggests it should..

JavaMail MimeMessage.getContent unsupported encoding

email,encoding,javamail

This JavaMail FAQ entry should help: Why do I get the UnsupportedEncodingException when I invoke getContent() on a bodypart that contains text data? ..

What encoding does [BouncyCastle] PKCS10CertificationRequest.getEncoded() return?

java,encoding,cryptography,bouncycastle

Well, despite the fact that someone has seen fit to down-vote the question, I'll post the answer here for posterity. At least for v1.52, org.bouncycastle.pkcs.PKCS10CertificationRequest#getEncoded() is implemented as: public byte[] More ..getEncoded() throws IOException { return certificationRequest.getEncoded(); } This calls org.bouncycastle.asn1.pkcs.CertificationRequest#getEncoded(), which results in the inherited method org.bouncycastle.asn1.ASN1Object#getEncoded(). This method..

Enforce PDF package vignette with knitr

r,pdf,knitr,vignette

When I asked the same question on the knitr google group, Yihui Xie (author of knitr) replied: Use the vignette engine knitr::rmarkdown instead of knitr::knitr. I'm not entirely sure I understand why, but it works. Here is a link to discussion on the knitr google group..

Different output when run on Azure than on local build

c#,asp.net-mvc,azure,encoding

I suspect there is a difference in the local and remote configuration. I would check the environment variables for your web app, and compare them with your local app to see if there's any differences (i.e. CLR version, IIS version). You can check environment variables with the SCM site, which..

create pdf from current page ASP.NET

c#,asp.net,pdf,render

You have to create string congaing all Html that you want to see in pdf but css will not be applied through class. For applying css you have to write inline-css for all DOM elements. You can do so as below string = 'Pdf Contents in html format'; StringReader sr..

pdf printer that can be controlled by .Net

c#,pdf,printing

BullZip PDFprinter may work without prompt on a single file. See BullZip

Does HTML Encoding have any cons?

asp.net-mvc,razor,encoding,utf-8,xss

I found the solution as using the AntiXSS library for Razor encoderType. This answer describes it well. Special characters in html output The default Razor encoder encodes accented chars whereas the AntiXSS library does not encode them. So, accented chars are rendered as they are..

JSON in Python: encoding issue on OS X, no issue on Windows

Youtube Application Octet Stream

python,json,windows,osx,encoding

I get your OSX failure on Windows, and it should fail because writing a Unicode string to a file requires an encoding. When you write Unicode strings to a file Python 2 will implicitly convert it to a byte string using the default ascii codec and fails for non-ASCII characters..

Sending Image From Android to C# webservice

c#,android,web-services,encoding,android-image

I figure it out I didn't Canvas it before sending it to server. use this too Canvas canvas = new Canvas(mBitmap); v.draw(canvas); public void save(View v) { mBitmap = Bitmap.createBitmap(v.getWidth(), v.getHeight(), Bitmap.Config.RGB_565); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); mBitmap.compress(Bitmap.CompressFormat.JPEG,40, outputStream); byte[] imgByte = outputStream.toByteArray(); String base64Str = Base64.encodeToString(imgByte, Base64.DEFAULT); Canvas canvas..

Is it possible to save an adobe pdf file using selenium web driver and one click build Jenkins

java,pdf,selenium,jenkins

This works with Firefox: Change the Firefox profile used by Selenium (better to create a dedicated profile as described here) via Tools -> Settings -> Applications and change action of file type PDF to 'Save file'. In that case the window asking to open file or save will not show..

char encoding doing mysql insert with callablestatement or jdbctemplate

java,mysql,encoding,jdbctemplate,callable-statement

Try to include the Encoding in connection string as well like this jdbc:mysql://localhost/some_db?useUnicode=yes&characterEncoding=UTF-8

Watermark in PDF file is hiding behind images

c#,pdf,itextsharp

For questions like this, please consult The Best iText Questions on StackOverflow. This book bundles hundreds of questions previously posted and answered on StackOverflow, including some answers from our closed issue tracker. This is such an answer that wasn't published on StackOverflow before: If you have opaque shapes in your..

Python Encoding: Open/Read Image File, Decode Image, RE-Encode Image

python,image,encoding,character-encoding

The UnicodeEncodeError is popping up because a jpeg is a binary file and ASCII encoding is for plain text in plain text files. Plain text files can be created with generic text editors like notepad for Windows or nano for Linux. Most will either use ASCII or Unicode encoding. When..

ctrl+G in erl doesn't work

unicode,encoding,utf-8,erlang,docker

Fixed the problem, needed export TERM=linux.

How to display PDF in JSF, with content from ServletResponse

jsf,pdf,jsf-2,richfaces

I don't see anything wrong with your current setup.Most probably the problem lies in your XHTML page and something is causing your not to fire the event.Please refer this post for further details,surely this will be of some help to you.

Extracting Double Byte Characters/substring from a UTF-8 formatted String

java,string,encoding,utf-8

Thanks to John Kugelman for the help. the solution looks like this now: for(int codePoint : codePoints(string)) { char[] chars = Character.toChars(codePoint); System.out.println(codePoint + ' : ' + String.copyValueOf(chars)); } With the codePoints(String string)-method looking like this: private static Iterable<Integer> codePoints(final String string) { return new Iterable<Integer>() { public Iterator<Integer>..

How to work arround php notice raised incorrecty when posting a string that begins with an @?

php,post,curl,encoding

If you receive that message it means you are using PHP 5.5 (this is the version when the CURLFile class was introduced). On the same version they introduced the curl option CURLOPT_SAFE_UPLOAD (which has the default value FALSE on PHP 5.5). All you have to do is to add: curl_setopt(CURLOPT_SAFE_UPLOAD,..

PDF file path is incorrect [duplicate]

java,android,pdf,android-studio,filepath

Replace your this line file = new File(Environment.getExternalStorageDirectory() + '/raw/' + 'tirepressuremonitoringsystem3.pdf'); with this line file = new File('android.resource://com.cpt.sample/raw/tirepressuremonitoringsystem3.pdf'); ..

Prevent Caching of PDFs in ASP Classic

pdf,caching,asp-classic

The code below generates a link with the current time converted to a Double so it produces a random link each time the page is loaded to trick the browser into thinking it is a new pdf. <a href='yourpdffile.pdf?<%= CStr(CDbl(Now)) %>'>Link to the PDF</a> Now is the current time CDbl(Now)..

Adobe LiveCycle Flowed Form page margins

pdf,adobe,livecycle

You will need to configure the pagination of your Subforms using the Object > Pagination palette. You can also use the Keep With Next/Previous option to control the grouping of the objects on page break. Also make sure that your top level subform is set to Flowed.

Add Header and Footer Image in PDF using docraptor c#

c#,pdf,header,footer

At least part of the problem is the use of @@ as opposed to the singular @ when you are handling page and page bottom styles. I changed those to singular and got back a more sane looking PDF from DocRaptor. Here is a minimal example that has images in..

iOS - Differentiate between background text (watermark) and real text in PDF

ios,pdf,cgpdfscanner

In general you have no chance to reliably differentiate between 'background' and 'real' text. Text is drawn somewhere on the page in some order, and what is foreground, background, normal text, .., is a matter of human perception and may not at all be reflected in the structure of the..

Arabic words in invoice pdf print Magento

magento,pdf,tcpdf,arabic

1) Don't edit core files. 2) There are multiple invoice generators in magento: invoice from the backend, invoice on success page etc. So make sure that you are editing and testing the right one. The example below works for invoices in the backend. 3) There may be custom modules that..

Why PDF exported from Report Viewer opens in different zoom size in Adobe Reader

c#,pdf,reporting-services,reportviewer

You can't change the zoom level of generated PDFs in Reporting Services. The zoom is managed by the viewer application, which is Adobe Acrobat Reader in your case. There is no details about it in the documentation because this feature does not exist, but here is a link to a..

Objective C - bold and change string text size for drawing text onto pdf

objective-c,xcode,pdf,size,bold

Solved it by making a separate method as below (I used + since I have this inside an NSObject and is a class method rather than in a UIViewController): +(void)addText:(NSString*)text withFrame:(CGRect)frame withFont:(UIFont*)font; { [text drawInRect:frame withFont:font]; } Outside the method, declaring inputs and calling it: UIFont *font = [UIFont fontWithName:@Helvetica-Bold'..

Force UTF-8 encoding in inline CSS

css,encoding,utf-8

.close::before{ content:'u00D6'; } The CSS syntax for Unicode characters does not include the u prefix. The correct syntax is: content:'00D6'; or simply content:'D6';. This character notation always refers to a Unicode Codepoint regardless of the character encoding of the (HTML) document that contains it. However, the character '00D6' refers..

How to display special chars in html [duplicate]

php,html5,encoding,character-encoding

I think your problem might be the encoding of the file. Open it in notepad++ and change the to UTF-8 (without bom).

Camfrog Video Chat, free and safe download. Camfrog Video Chat latest version: Join live streaming video chat rooms. Camfrog Video Chat is a video chat. Free live webcam video chat room software for Windows XP, Windows 7, and Mac OS X. Camfrog video chat rooms. Camfrog is a cross-platform worldwide video chat community with millions of active users in thousands of video chat rooms. Video chat with anyone in the.

Convert unicode URL to ASCII

php,unicode,encoding,ascii

The following can be used for this transformation: function convertpath ($path) { $path1 = '; $len = strlen ($path); for ($i = 0; $i < $len; $i++) { if (preg_match ('/^[A-Za-z0-9/?=+%_.~-]$/', $path[$i])) { $path1 .= $path[$i]; } else { $path1 .= urlencode ($path[$i]); } } return $path1; } ..

Cyrllic characters in SVG font

javascript,svg,unicode,encoding,fonts

Ð¿ would be п To get the code point of a unicode character in JavaScript you can use String.prototype.codePointAt method, in your case just type this into developer console: 'Ð¿'.codePointAt(0) // 1087 To convert the other way around: String.fromCodePoint(1087) // 'Ð¿' The format in your example, &#x.. is a number..

link coming twice while exporting to pdf using itextsharp

pdf,gridview,itext

Your initial question didn't get an answer because it is rather misleading. You claim link coming twice, but that's not true. From the point of view, the link is shown as HTML syntax: <a href='http'//stackoverflow.com'>http://stackoverflow.com</a> This is the HTML definition of a single link that is stored in the cellText..

â€¢ Click on 'TOAD for Oracle Freeware', then click 'Remove/Uninstall.' Toad for oracle download 32-bit. â€¢ Click 'Yes' to confirm the uninstallation. â€¢ Click 'Start' â€¢ Click on 'Control Panel' â€¢ Click the Add or Remove Programs icon.

How to handle the File hand-off from windows in a python program

python,windows,pdf,file-io

So, yes, Windows passes the file name into the script as one of the sys.argvs. It is (so far as I can tell from printing the value) a file name without the path, that I used to open the file, so that tells me that Windows starts my program with..

PDF Filter used to encode data in iTextSharp

pdf,pdf-generation,itextsharp

iTextSharp supports the filters that are defined in the PDF specification. That means that content streams (e.g. for pages) use /FlateDecode, which is what every other PDF producer will use by default, because that's the standard compression for PDF. Image streams use other filters when applicable, for instance: JPEG images..

Problems with filename encoding on Linux when created through Java

java,linux,encoding

Well, it appears this is a long standing bug in the older File API. I've solved all my problems - with no additional configs of any kind - by switching to the newer java.nio package.

Remove all characters which can't be decode in Python

python,encoding

I would like to know how to remove all bytes which can't be decode. Is there a solution? This is simple: with open('filename', 'r', encoding='utf8', errors='ignore') as f: .. The errors='ignore' tells Python to drop unrecognized characters. It can also be passed to bytes.decode() and most other places which..

Application Octet Stream Charset Binary

Efficient way of exporting a bunch of PDF's from a user?

php,pdf,cron,wkhtmltopdf

Generate the invoices the moment they are saved in the database and store the filesystem-URL with the database entry. If the user requests a package of invoices of a certain time period, collect the already generated PDFs into a zip-file and provide a download link. Requirement is enough space for..

Will iTextSharp version 5.5.2 run with C#, .net version 3.5 code?

c#,pdf,itextsharp

From what I see in the readme and on stack, you should have no problem with 5.5.2. At our company we still use 5.1 or 5.2 and I guarantee that works in C# 3.5.

Adding an imported PDF to a table cell in iTextSharp

pdf,itextsharp

Please download chapter 6 of my book. It contains two variations on what you are trying to do: ImportingPages1, with as result time_table_imported1.pdf ImportingPages2, with as result time_table_imported2.pdf This is a code snippet: // step 1 Document document = new Document(); // step 2 PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));..

Rails 3 email a wicked pdf issue

ruby-on-rails,pdf,wicked-pdf

See this question. Basically you need to change it to something like: attachments['Report.pdf'] = WickedPdf.new.pdf_from_string( render_to_string(:pdf => 'report',:template => 'costprojects/viewproject') )..

Chinese characters in properties files are not properly decoded [duplicate]

java,spring,jsp,encoding

The problem is that the java propertie files are/must/should been encoded in Â´ISO-8859-1Â´ (Latin-1) by default. Thats an Java requirement. To overcome this you can go two ways: escape the not Latin-1 charachters by utf-8 sequences in the property file: back=Zuru00EFu00BFu00BDck (german word ('ZurÃ¼ck') with some none Latin-1 charachters) or..

Encoding problems in Python - 'ascii' codec can't encode character 'xe3' when using UTF-8

python,encoding,utf-8

Thanks a lot guys. Here it goes how I've solved the encoding problem in with Python 3.4.1: First I've inserted this line in the code to check the output encoding: print(sys.stdout.encoding) And I saw that the output encoding was: ANSI_X3.4-1968 - which stands for ASCII and doesn't support characters like..

PDF as mail attachment - Laravel

laravel,pdf,sendmail

You can only send serializable entities to the queue. This includes Eloquent models etc. But not the PDF view instance. So you will probably need to do the following: Mail::queue('emails.factuur', array('factuur' => $factuur), function($message) { $pdf = PDF::loadView('layouts.factuur', array('factuur' => $factuur)); $message->to(Input::get('email'), Input::get('naam'))->subject('Onderwerp'); $message->attach($pdf->output()); }); ..

Variable substitution within HEREDOC literal

php,html,pdf,tcpdf

You have don't need to close/open quotes for the variable. Use this code instead: $html = <<<EOD <div> </br> <img src='./tcpdf/pdffirst.png' width='500' height='800' alt='/> <img src='./charts/$filename-most.png' width='500' height='250' alt='/> </br> </div> EOD; ..

Getting hebrew json to php file

php,json,encoding,hebrew

The file content you're getting is in UTF-16 charset. You have to convert it: $content = file_get_contents('http://www.oref.org.il/WarningMessages/alerts.json'); $content=iconv('UTF-16', 'UTF-8', $content); $json = json_decode($content,true); print_r($json); ..

Django UnicodeDecodeError in model object

django,encoding

filter isn't doing what you want it to here: nivel_obj = Nivel.objects.filter(id=nivel_id) filter returns a queryset, not a single object. You can't use it as the value of the ForeignKey field. I don't yet see why that would raise the exception you're reporting, maybe something not stringifying correctly while it's..

Downloaded octet-stream then encoding as pdf; can't get line endings worked out

pdf,encoding,httpresponse,fiddler

Well now.. In Fiddler Session, Right click HTTP Response with the application/octet-stream body | Save | Response | Response Body If Content-Disposition: attachment;filename has been set on the response, the File Save Dialog will be prepopulated with filename Easy after you know it's there..

How to reorder the pages of a PDF file?

java,pdf,itext

Your formula is wrong. You have: sourcePDFReader.selectPages(String.format('%d-%d, 2-%d', tocStartsPage, totalNoPages-1, tocStartsPage -2); But that puts your TOC at page 1. That is not what you want according to your description. You want something like this: PdfReader reader = new PdfReader(baos.toByteArray()); int startToc = 13; int n = reader.getNumberOfPages(); reader.selectPages(String.format('1,%s-%s, 2-%s,..

PHP / MySQL: Certain characters not being encoded properly and appearing as question marks

php,mysql,encoding,utf-8,character-encoding

Please specify the character set by adding $conn->set_charset('utf8') ..

I would like to read data from a application/octet-stream charset=binary file with fread on linux and convert it to UTF-8 encoding. I tried with iconv, but it doesn't support binary charset. I haven't found any solution yet. Can anyone help me with it?

Thanks.

zuubszuubs

1 Answer

According to the MIME that you've given, you're reading data that's in non-textual binary format. You cannot convert it with iconv or similar, because it's meant for converting text from one (textual) encoding to another. If your data is not textual, then a conversion to any character encoding is meaningless and will just corrupt the data, but not make it any more readable.

The typical way to present binary as readable text for inspection is hex dump. There's an existing answer for implementing it in c++: https://stackoverflow.com/a/16804835/2079303

Communityâ™¦

eerorikaeerorika

Not the answer you're looking for? Browse other questions tagged c++linuxutf-8character-encodingbinary or ask your own question.

Comments are closed.