CommonMark.c 0.28.2

Cmark is a CommonMark (derived from MarkDown) reference implementation in C. (An implementation in JavaScript is also available). It provides parsing and HTML or XML transformation. An intermediate AST representation of documents can also be augmented. On performance it's on par with sundown. There's a C99 compliant libcmark and a command-line cmark tool included.

Tags c commonmark markdown html-rendering syntax-tree
License BSDL-2
State stable

Recent Releases

0.28.217 Oct 2017 05:25 minor feature: Regression in install dest for static library. Due to a mistake, 0.28.1 installed libcmark.a into include/.
0.28.112 Oct 2017 09:45 minor feature: --smart: open quote can never occur right after or ). quadratic behavior in finalize (Vicent Marti). Don't use CMAKE_INSTALL_LIBDIR to create libcmark.pc. This wasn't getting set in processing libcmark.pc.in, and we were getting the wrong entry in libcmark.pc. The new approach sets an internal libdir variable to lib LIB_SUF . This variable is used both to set the install destination and in the libcmark.pc.in template. Update README.md, replace make astyle with make format (Nguy?n Thái Ng?c Duy).
0.28.003 Aug 2017 08:45 minor feature: Update spec. . Use unsigned integer when shifting (Phil Turnbull). Avoids a UBSAN warning which can be triggered when handling a long sequence of backticks. . Avoid memcpy'ing NULL pointers (Phil Turnbull). Avoids a UBSAN warning when link title is empty string. The length of the memcpy is zero so the NULL pointer is not dereferenced but it is still undefined behaviour. . DeMorgan simplification of some tests in emphasis parser. This also brings the code into r alignment with the wording of the spec (see jgm/CommonMark#467). . undefined shift in commonmark writer. Found by google/oss-fuzz:https://oss-fuzz.com/v2/testcase-detail/4686992824598528. . latex writer: memory overflow. We got an array overflow in enumerated lists nested more than 10 deep with start number =/= 1. This commit also ensures that we don't try to set enum_ counters that aren't defined by LaTeX (generally up to enumv). Found by google/oss-fuzz:https://oss-fuzz.com/v2/testcase-detail/5546760854306816. . Check for NULL pointer in get_link_type (Phil Turnbull).echo ' (xx:)' ./build/src/cmark -t latex gave a. segfault. . Move fuzzing dictionary into single file (Phil Turnbull). This allows AFL and libFuzzer to use the same dictionary . Reset bytes after UTF8 proc (Yuki Izumi, #206). . Don't scan past an EOL (Yuki Izumi). The existing negated character classes ( ? ) are careful to always include x00 in the characters excluded, but these. catch-alls can scan right past the terminating NUL placed at the end of the buffer by _scan_at. As such, buffer overruns can occur. Also, don't scan past a newline in HTML block end scanners. . Document cases where get_ functions return NULL. E.g. cmark_node_get_url on a non-link or image. . Properly handle backslashes in link destinations. Only ascii punctuation characters are escapable, per the spec. . cmark_node_get_list_start to return 0 for bullet lists. as documented. . Use CMARK_NO_DELIM for bullet lists. . code for freeing delimit
0.27.122 Nov 2016 12:25 minor feature: Set policy for CMP0063 to avoid a warning. Put set_policy under cmake version test. Otherwise we get errors in older versions of cmake. Use VERSION_GREATER to clean up cmake version test. Improve afl target. Use afl-clang by default. Set default for path.
0.27.019 Nov 2016 03:15 minor feature: Update spec to 0.27. Warnings building with MSVC on Windows (#165, Hugh Bellamy). CMAKE_C_VISIBILITY_PRESET for cmake versions greater than 1.8 (e.g. 3.6.2) (#162, Hugh Bellamy). This lets us build swift-cmark. on Windows, using clang-cl. For non-matching entities (#161, Yuki Izumi). Modified print_delimiters (commented out) so it compiles again. Make format: don't change order of includes. Changed logic for null/eol checks. Only check once for "not at end of line". Check for null before we check for newline characters (the Previous patch would fail for NULL + CR) . Only check once for "not at end of line". Check for null before we check for newline characters (the Previous patch would fail for NULL + CR). by not advancing past both 0 and n (Yuki Izumi). Add test for NUL-LF sequence (Yuki Izumi). Memory leak in list parsing (Yuki Izumi). Use cmark_mem to free where used to alloc (Yuki Izumi). Allow a shortcut link before a (. Allow tabs after setext header line (jgm/commonmark.js#109). Don't let URI schemes start with spaces. h2..h6 HTML blocks. Added regression test. Autolink scheme can contain digits (Gábor Csárdi). Nullary function declarations in cmark.h (Nick Wellnhofer). Strict prototypes warnings. COPYING: Update file name and remove duplicate section and (Peter Eisentraut). Typo (Pavlo Kapyshin).
0.26.120 Jul 2016 03:15 minor feature: Removed unnecessary typedef that caused build failure on some platforms. Use (MAKE) in Makefile instead of hardcoded make (#146, Tobias Kortkamp).
0.26.016 Jul 2016 03:15 minor feature: Implement spec changes for list items: Empty list items cannot interrupt paragraphs. Ordered lists cannot interrupt paragraphs unless they start with 1. Removed "two blank lines break out of a list" feature. . Empty list items cannot interrupt paragraphs. Ordered lists cannot interrupt paragraphs unless they start with 1. Removed "two blank lines break out of a list" feature. sourcepos for blockquotes. sourcepos for atx headers. ATX headers and thematic breaks to allow tabs as well as spaces. chunk_set_cstr with sufof current string (#139, Nick Wellnhofer). It's possible that cmark_chunk_set_cstr is called with a substring (suf) of the current string. Delay freeing of the chunk content to handle this case correctly. Export targets on installation (Jonathan Müller). This allows using them in other cmake projects. cmake warning about CMP0048 (Jonathan Müller). commonmark renderer: Ensure we don't have a blank line before a code block when it's the first thing in a list item. Change parsing of strong/emph in response to spec changes. process_emphasis now gets better results in corner cases. The change is this: when considering matches between an interior delimiter run (one that can open and can ) and another delimiter run, we require that the sum of the lengths of the two delimiter runs mod 3 is not 0. Ported Robin Stocker's changes to link parsing in jgm/CommonMark#101. This uses a separate stack for brackets, instead of putting them on the delimiter stack. This avoids the need for looking through the delimiter stack for the next bracket. cmark_reference_lookup: Return NULL if reference is null string. character type detection in commonmark.c (Nick Wellnhofer). test failures on Windows and undefined behavior. Implement cmark_isalpha. Check for ASCII character before implicit cast to char. Use internal ctype functions in commonmark.c. . Implement cmark_isalpha. Check for ASCII character before implicit cast to char. Use internal ctype functions in commonmar
0.25.230 Mar 2016 09:25 minor feature: Open files in binary mode (#113, Nick Wellnhofer). Now that cmark supports different line endings, files must be openend in binary mode on Windows. Reset partially_consumed_tab on every new line (#114, Nick Wellnhofer). Handle buffer split across a CRLF line ending. Adds an internal field to the parser struct to keep track of last_buffer_ended_with_cr. Added test.
0.25.126 Mar 2016 08:25 minor feature: Release with no code changes. cmark version was mistakenly set to 0.25.1 in the 0.25.0 release, so this release just Ensures that this will cause no confusion later.
0.24.119 Jan 2016 09:45 minor feature: Commonmark renderer: Use HTML comment, not two blank lines, to separate a list. Item from a following code block or list. This makes the Output more portable, since the "two blank lines" rule is Unique to CommonMark. Also, it allows us to break out of a sublist without breaking out of all levels of nesting. Is_autolink - handle case where link has no children. Which previously caused a segfault. Use 4-space indent for bullet lists, for increased portability. Use 2-space + newline for line break for increased portability. Improved punctuation escaping. Previously all ) and characters after digits were escaped; now they are. Only escaped if they are genuinely in a position where They'd cause a list item. This is achieved by changes in Render.c: (a) renderer- begin_content is only set to False after a string of digits at the beginning of the Line, and (b) we never break a line before a digit. Also, begin_content is properly initialized to true. . Use HTML comment, not two blank lines, to separate a list Item from a following code block or list. This makes the Output more portable, since the "two blank lines" rule is Unique to CommonMark. Also, it allows us to break out of a sublist without breaking out of all levels of nesting. Is_autolink - handle case where link has no children. Which previously caused a segfault. Use 4-space indent for bullet lists, for increased portability. Use 2-space + newline for line break for increased portability. Improved punctuation escaping. Previously all ) and characters after digits were escaped; now they are. Only escaped if they are genuinely in a position where They'd cause a list item. This is achieved by changes in Render.c: (a) renderer- begin_content is only set to False after a string of digits at the beginning of the Line, and (b) we never break a line before a digit. Also, begin_content is properly initialized to true. Handle NULL root in consolidate_text_nodes.
0.24.014 Jan 2016 17:05 minor feature: API change Added cmark_node_replace(oldnode, newnode). Updated spec.txt to 0.24. edge case with escaped parens in link destination. This was also checked against the #82 case with asan. Removed unnecessary check for fenced in cmark_render_html. It's sufficient to check that the info string is empty. Indeed, those who use the API may well create a code block with an info string without explicitly setting fenced. Updated format of test/smart_punct.txt. Updated test/spec.txt, test/smart_punct.txt, and spec_tests.py to new format. get_containing_block logic in src/commonmark.c. This did not allow for the possibility that a node might have no containing block, causing the commonmark renderer to segfault if passed an inline node with no block parent. string representations of CUSTOM_BLOCK, CUSTOM_INLINE. The old versions raw_inline and raw_block were being used, and this led to incorrect xml output. Use default opts in python sample wrapper. Allow multiline setext header content, as per spec. Don't allow spaces in link destinations, even with pointy brackets. Conforms to latest change in spec. Updated scheme scanner according to spec change. We no longer use a whitelist of valid schemes. Allow any kind of nodes as children of CUSTOM_BLOCK. cmark.h: moved typedefs for iterator into iterator section. This just moves some code around so it makes more sense to read, and in the man page. make_man_page.py so it includes typedefs again.
0.23.030 Dec 2015 13:05 minor feature: API change Added CUSTOM_BLOCK and CUSTOM_INLINE node types. They are never generated by the parser, and do not correspond to CommonMark elements. They are designed to be inserted by filters that postprocess the AST. For example, a filter might convert specially marked code blocks to svg diagrams in HTML and tikz diagrams in LaTeX, passing these through to the renderer as a CUSTOM_BLOCK. These nodes can have children, but they also have literal text to be printed by the renderer "on enter" and "on exit." Added cmark_node_get_on_enter, cmark_node_set_on_enter, cmark_node_get_on_exit, cmark_node_set_on_exit to API. API change Rename NODE_HTML - NODE_HTML_BLOCK. NODE_INLINE_HTML - NODE_HTML_INLINE. Define aliases so the old names still work, for backwards compatibility. API change Rename CMARK_NODE_HEADER - CMARK_NODE_HEADING. Note that for backwards compatibility, we have defined aliases: CMARK_NODE_HEADER = CMARK_NODE_HEADING, cmark_node_get_header_level = cmark_node_get_heading_level, and cmark_node_set_header_level = cmark_node_set_heading_level. API change Rename CMARK_NODE_HRULE - CMARK_NODE_THEMATIC_BREAK. Defined the former as the latter for backwards compatibility. Don't allow space between link text and link label in a reference link (spec change). Separate parsing and rendering opts in cmark.h. This change also changes some of these constants' numerical values, but nothing should change in the API if you use the constants themselves. It should now be clear in the man page which options affect parsing and which affect rendering. xml renderer - Added xmlns attribute to document node. Commonmark renderer: ensure html blocks surrounded by blanks. Otherwise we get failures of roundtrip tests. Commonmark renderer: ensure that literal characters get escaped when they're at the beginning of a block, e.g. - foo. LaTeX renderer - better handling of internal links. Now we render foo (#bar) as protect hyperlink bar foo . Check for NULL pointe
0.22.025 Aug 2015 04:25 minor feature: Removed pre from blocktags scanner. pre is handled separately in rule 1 and needn't be handled in rule 6. Added iframe to list of blocktags, as per spec change. with HRULE after blank line. This previously caused cmark to break out of a list, thinking it had two consecutive blanks. Check for empty string before trying to look at line ending. Make sure every line fed to S_process_line ends with n. So S_process_line sees only unix style line endings. Ultimately we probably want a better solution, allowing the line ending style of the input file to be preserved. This solution forces output with newlines. Improved cmark_strbuf_normalize_whitespace. Now all characters that satisfy cmark_isspace are recognized as whitespace. Previously r and t (and others) weren't included. Treat line ending with EOF as ending with newline. --hardbreaks with r n line breaks. Disallow list item starting with multiple blank lines. Allow tabs before closing #s in ATX header. Removed cmark_strbuf_printf and cmark_strbuf_vprintf. These are no longer needed, and cause complications for MSVC. Also removed HAVE_VA_COPY and HAVE_C99_SNPRINTF feature tests. Added option to disable tests (Kevin Wojniak). Added CMARK_INLINE macro. Removed need to disable MSVC warnings 4267, 4244, 4800 (Kevin Wojniak). MSVC inline errors when cmark is included in sources that don't have the same set of disabled warnings (Kevin Wojniak). FileNotFoundError errors on tests when cmark is built from another project via add_subdirectory() (Kevin Wojniak). Preutf8proc functions to avoid conflict with existing library (Kevin Wojniak). Avoid name clash between Windows.pdb files (Nick Wellnhofer). Improved smart_punct.txt (see jgm/commonmark.js#61). Set POSITION_INDEPENDENT_CODE ON for static library. make bench: allow overriding BENCHFILE. Previously if you did. this, it would clopper BENCHFILE with the default bench file. make bench: Use -10 priority with renice. Improved make_autolink. Ensures that title is chunk
0.21.015 Jul 2015 06:05 minor feature: Updated to version 0.21 of spec. Added latex renderer. New exported function in API: cmark_render_latex. New source file: src/latex.hs. Updates for new HTML block spec. Removed old html_block_tag scanner. Added new html_block_start and html_block_start_7, as well as html_block_end_n for n = 1-5. Rewrote block parser for new HTML block spec. We no longer preprocess tabs to spaces before parsing. Instead, we keep track of both the byte offset and the (virtual) column as we parse block starts. This allows us to handle tabs without converting to spaces first. Tabs are left as tabs in the output, as per the revised spec. Removed utf8 validation by default. We now replace null characters in the line splitting code. Added CMARK_OPT_VALIDATE_UTF8 option and command-line option --validate-utf8. This option causes cmark to check for valid. UTF-8, replacing invalid sequences with the replacement character, U+FFFD. Previously this was done by default in connection with tab expansion, but we no longer do it by default with the new tab treatment. (Many applications will know that the input is valid UTF-8, so validation will not be necessary.). Added CMARK_OPT_SAFE option and --safe command-line flag. Added CMARK_OPT_SAFE. This option disables rendering of raw HTML. and potentially dangerous links. Added --safe option in command-line program. Updated cmark.3 man page. Added scan_dangerous_url to scanners. In HTML, suppress rendering of raw HTML and potentially dangerous links if CMARK_OPT_SAFE. Dangerous URLs are those that begin with javascript:, vbscript:, file:, or data: (except for image/png, image/gif, image/jpeg, or image/webp mime types). Added api_test for OPT_CMARK_SAFE. Rewrote README.md on security. . Added CMARK_OPT_SAFE. This option disables rendering of raw HTML and potentially dangerous links. Added --safe option in command-line program. Updated cmark.3 man page. Added scan_dangerous_url to scanners. In HTML, suppress rendering of raw HTML and potentially
0.20.009 Jun 2015 06:25 minor feature: Fixed bug in list item parsing when items indented = 4 spaces. Don't allow link labels with no non-whitespace characters (jgm/CommonMark#322). Fixed multiple issues with numeric entities (#33, Nick Wellnhofer). Support CR and CRLF line endings (Ben Trask). Added test for different line endings to api_test. Allow NULL value in string setters (Nick Wellnhofer). (NULL produces a 0-length string value.) Internally, URL and title are now stored as cmark_chunk rather than char *. Fixed memory leak in cmark_consolidate_text_nodes. Fixed is_autolink in the CommonMark renderer. Previously any link with an absolute URL was treated as an autolink. Cope with broken snprintf on Windows (Nick Wellnhofer). On Windows, snprintf returns -1 if the output was truncated. Fall back to Windows-specific _scprintf. Switched length parameter on cmark_markdown_to_html, cmark_parser_feed, and cmark_parse_document from int to size_t (#53, Nick Wellnhofer). Use a custom type bufsize_t for all string sizes and indices. This allows to switch to 64-bit string buffers by changing a single typedef and a macro definition (Nick Wellnhofer). Hardened the strbuf code, checking for integer overflows and adding range checks (Nick Wellnhofer). Removed unused function cmark_strbuf_attach (Nick Wellnhofer). Fixed all implicit 64-bit to 32-bit conversions that -Wshorten-64-to-32 warns about (Nick Wellnhofer). Added helper function cmark_strbuf_safe_strlen that converts from size_t to bufsize_t and throws an error in case of an overflow (Nick Wellnhofer). Abort on strbuf out of memory errors (Nick Wellnhofer). Previously such errors were not being trapped. This involves some internal changes to the buffer library that do not affect the API. Factored out S_find_first_nonspace in S_proces_line. Added fields offset, first_nonspace, indent, and blank to cmark_parser struct. This just removes some repetition. Added Racket Racket (5.3+) wrapper (Eli Barzilay). Removed -pg from Debug build flags. Added Ubsan
0.19.023 May 2015 09:25 minor feature: Fixed _ emphasis parsing to conform to spec. Updated spec.txt. Compile static library with -DCMARK_STATIC_DEFINE. Suppress warnings about Windows runtime library files (Nick Wellnhofer). Visual Studio Express editions do not include the redistributable files. Set CMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_NO_WARNINGS to suppress warnings. Added appyeyor: Windows continuous integration (appveyor.yml). Use os.path.join in test/cmark.py for proper cross-platform paths. Fixed Makefile.nmake. Improved make afl: added test/afl_dictionary, increased timeout for hangs. Improved README with a description of the library's strengths. Pass-through Unicode non-characters (Nick Wellnhofer). Despite their name, Unicode non-characters are valid code points. They should be passed through by a library like libcmark. Check return status of utf8proc_iterate.
0.18.304 Apr 2015 03:16 minor feature: Include patch level in soname (Nick Wellnhofer). Minor version is tied to spec version, so this allows breaking the ABI between spec releases. Install compiler-provided system runtime libraries (Changjiang Yang). Use strbuf_printf instead of snprintf. snprintf is not available on some platforms (Visual Studio 2013 and earlier). Fixed memory access bug: "invalid read of size 1" on input link ().
0.18.231 Mar 2015 20:05 minor feature: Added commonmark renderer: cmark_render_commonmark. In addition to options, this takes a width parameter. A value of 0 disables wrapping; a positive value wraps the document to the specified width. Note that width is automatically set to 0 if the CMARK_OPT_HARDBREAKS option is set. The cmark executable now allows -t commonmark for output as CommonMark. A --width option has been added to specify wrapping width. Added roundtrip_test Makefile target. This runs all the spec through the commonmark renderer, and then through the commonmark parser, and compares normalized HTML to the test. All tests pass with the current parser and renderer, giving us some confidence that the commonmark renderer is sufficiently robust. Eventually this should be pythonized and put in the cmake test routine. Removed an unnecessary check in blocks.c. By the time we check for a list start, we've already checked for a horizontal rule, so we don't need to repeat that check here. Thanks to Robin Stocker for pointing out a similar redundancy in commonmark.js. Fixed bug in cmark_strbuf_unescape (buffer.c). The old function gave incorrect results on input like *, since the next backslash would be treated as escaping the * instead of being escaped itself. scanners.re: added _scan_scheme, scan_scheme, used in the commonmark renderer. Check for CMAKE_C_COMPILER (not CC_COMPILER) when setting C flags. Update code examples in documentation, adding new parser option argument, and using CMARK_OPT_DEFAULT (Nick Wellnhofer). Added options parameter to cmark_markdown_to_html. Removed obsolete reference to CMARK_NODE_LINK_LABEL. make leakcheck now checks all output formats. test/cmark.py: set default options for markdown_to_html. Warn about buggy re2c versions (Nick Wellnhofer).
0.18.111 Mar 2015 03:25 minor feature: Build static version of library in default build (#11). cmark.h: Add missing argument to cmark_parser_new (#12).
0.1804 Mar 2015 19:05 license cleanup: Switch to 2-clause BSD license, with agreement of contributors. Added Profile build type, make prof target. Fixed autolink scanner to conform to the spec. Backslash escapes not allowed in autolinks. Don't rely on strnlen being available. Updated scanners for new whitespace definition. Added CMARK_OPT_SMART and --smart option, smart.c, smart.h. Added test for --smart option. Fixed segfault with --normalize. Moved normalization step from XML renderer to cmark_parser_finish . Added options parameter to cmark_parse_document, cmark_parse_file. Fixed man renderer's escaping for unicode characters. Don't require python3 to make cmark.3 man page. Use ASCII escapes for punctuation characters for portability. Made options an int rather than a long, for consistency. Packed cmark_node struct to fit into 128 bytes. This gives a small performance boost and lowers memory usage. Repacked delimiter struct to avoid hole. Fixed use-after-free bug, which arose when a paragraph containing only reference links and blank space was finalized (#9). Avoid using parser- current in the loop that creates new blocks, since finalize in add_child may have removed the current parser (if it contains only reference definitions). This isn't a great solution; in the long run we need to rewrite to make the logic clearer and to make it harder to make mistakes like this one. Added 'Asan' build type. make asan will link against ASan; the resulting executable will do checks for memory access issues. Add Makefile target to fuzz with AFL. The variable AFL_PATH must point to the directory containing the AFL binaries. It can be set as an environment variable or passed to make on the command line.