Skip to content

bug fix: truncate wide columns in print.data.table #7718#7758

Closed
venom1204 wants to merge 24 commits into
masterfrom
issue77188
Closed

bug fix: truncate wide columns in print.data.table #7718#7758
venom1204 wants to merge 24 commits into
masterfrom
issue77188

Conversation

@venom1204

Copy link
Copy Markdown
Contributor

closes #7718

hi @joshhwuu

I incorporated the changes as proposed, below is a short summary of the changes

  1. print.data.table.r - i updated the char.trunc to dafualt getOption("width") - 5L, and also added condition to apply for lists. improved safety for strings with missing width values.

Kindly review this when you have some time, and please let me know if you have any suggestions for improvement.
thank you.

@venom1204 venom1204 requested a review from MichaelChirico as a code owner May 25, 2026 18:29
@codecov

codecov Bot commented May 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (d4974e9) to head (1af6897).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7758      +/-   ##
==========================================
- Coverage   99.04%   99.04%   -0.01%     
==========================================
  Files          87       87              
  Lines       17064    17063       -1     
==========================================
- Hits        16901    16900       -1     
  Misses        163      163              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joshhwuu joshhwuu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, left a few comments! in the future, you can tag me as a reviewer instead of pinging me in the description. also please try to summarize changes in the description instead of referring to the issue so others can get context quicker 🙂

Comment thread R/print.data.table.R
Comment thread R/print.data.table.R Outdated
# Current implementation may have issues when dealing with strings that have combinations of full-width and half-width characters,
# if this becomes a problem in the future, we could consider string traversal instead.
char.trunc = function(x, trunc.char = getOption("datatable.prettyprint.char")) {
if (is.null(trunc.char)) trunc.char = getOption("width") - 5L

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why 5L? and this looks like a magic number, could we use a constant instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my major reason of using 5L was because toby sir mentioned this would options(datatable.prettyprint.char=getOption("width")-5) be possible? in the issue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would 5L still make sense if we had a large table and the row label prefix consists of many more characters? I would argue against a magic number or even reading the row label prefix if we can here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right the 5L buffer fails once row labels grow bigger on a narrow consoles.
to fix this we can either

  • increase the constant biffer instead of using 5L
  • or we can pass the width dyanamically for precise fit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please provide a couple examples of when 5L works and does not work?
does is work for the examples I put in the original issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its working fine for your example ,i was accounting about the final width because the 5L bugffer doent not account all layout components (row labels,spacing,'...')
for ex

options(width=25, datatable.prettyprint.char=NULL) dt = data.table( x = rep("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 1000000) )
print(dt, topn=1)
>1000000: ABCDEFGHIJKLMNOPQRST...

if you count the truncated data only then its behaving correctly , however if teh fully rendered line is considered then its excedding the 25 limit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. options(width) is documented on ?options as a limit on the whole line (the maximum number of columns on a line used in printing vectors, matrices and arrays, and when filling by cat), so in this example the output should be 25 characters,

1000000: ABCDEFGHIJKLM...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you guide me towards what could be the next step.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure, I have not thought it through.

Comment thread R/print.data.table.R Outdated
nchar_chars = nchar(x, 'char', allowNA=TRUE)
is_full_width = nchar_width > nchar_chars
idx = !is.na(x) & pmin(nchar_width, nchar_chars) > trunc.char
idx = !is.na(x) & !is.na(nchar_width) & pmin(nchar_width, nchar_chars) > trunc.char

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use na.rm in pmin instead of a guard?

@venom1204 venom1204 May 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried using na.rm = TRUE in pmin but found it causes NA values to be incorrectly flagged for truncation which leads to NA becoming "NA..." (breaking test 2253.2) and invalid 'width' argument crashes in strtrim.

@github-actions

Copy link
Copy Markdown
  • HEAD=issue77188 slower P<0.001 for DT[by] max regression fixed in #7480
    Comparison Plot

Generated via commit 1af6897

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 6 minutes and 22 seconds
Installing different package versions 12 minutes and 5 seconds
Running and plotting the test cases 5 minutes and 23 seconds

@venom1204 venom1204 requested a review from joshhwuu May 27, 2026 21:08
Comment thread inst/tests/tests.Rraw
Comment on lines +21626 to +21627
options(width=10, datatable.prettyprint.char=NULL)
test(2374.1, capture.output(print(data.table(x="12345678901234567890"))), output="12345...")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use test(options=) instead of options(…), here and below

Suggested change
options(width=10, datatable.prettyprint.char=NULL)
test(2374.1, capture.output(print(data.table(x="12345678901234567890"))), output="12345...")
test(2374.1, capture.output(print(data.table(x="12345678901234567890"))), output="12345...", options=list(width=10, datatable.prettyprint.char=NULL))

tdhock and others added 21 commits June 10, 2026 19:44
Co-authored-by: Toby Dylan Hocking <toby.dylan.hocking@usherbrooke.ca>
* prohibit duplicate key columns

* update test setup

* adjust test setups

* add Mikes found cases

* use CJ(, sorted=FALSE) and sort afterwards

* cache grpnames
* add setallocrow

* fix copyAsPlain args

* change allocrow(dt, n) n parameter to specify number of nrows and not number of additional rows

* remove repetition in docs, note address change
* `between` supports Date/IDate with missing bounds

* Restore NEWS formatting

* rm unnecessary lines from tests.Rraw
* Data desegregation + minor formatting improvements + constness improvements

---------

Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
update documentation to clarify zero-length RHS evaluations with/without by
* docs: add datatable.prettyprint.char to datatable-intro vignette

Closes #7714

Added explanation and example of datatable.prettyprint.char option
in the 'Note that' section, near the datatable.print.nrows bullet.
* don't save to var as suggested by Toby

Co-authored-by: Toby Dylan Hocking <tdhock5@gmail.com>

---------

Co-authored-by: Toby Dylan Hocking <tdhock5@gmail.com>
Since R-2.15, R automatically replaces dots with understores in
R_init_<DLL name>(...), so renaming is no longer necessary. This also
has the side effect of working around the behaviour change from R r90101
and producing correct source tarballs on Windows again: R CMD build
cleans up data.table.dll, but misses data_table.dll.

* Accomodate new DLL name in atime tests
@venom1204 venom1204 closed this Jun 10, 2026
@venom1204 venom1204 deleted the issue77188 branch June 10, 2026 19:46
@venom1204 venom1204 restored the issue77188 branch June 10, 2026 19:48
@venom1204 venom1204 deleted the issue77188 branch June 10, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

print should abbreviate columns larger than options(width)?

10 participants