this patch fixes a bug discovered in a real network where the DL CQI of a
user degraded repidly in very short time. A relativly big RLC PDU that
was still sent with the good CQI in a big grant now needs to be split
across many tiny segments because the CQI degraded so much.
The retx couting for each transmitted segment caused the retx counter to
reach maxRetx quickly.
With this patch we do not increment the retx counter for each transmitted
PDU or segment of a PDU but instead only increment the counter when
a given SN is added to the retx queue. This can happen either:
a) if the SN is negativly acknowledged and was not already on the retx queue,
b) no new data is available for tx and a SN is selected for retx.
This is in accordance with TS 36.322 which handles retx counting in section
5.2.1 according to the above description.
With soapy 0.8.0, GCC 11.1.0 warns of mismatched array bounds
in some functions.
This commit aligns the bound and adds proper wrappers to
fix subsequent warnings.
the code hasn't been maintained for a while an likely needs to be
adapted for a real-world scenarios.
in order to avoid having to maintain two MAC/PHY interfaces we
remove the code from now.
move implementation to cc file to avoid
[build] /bin/ld: CMakeFiles/rrc_nr_asn1_test.dir/rrc_nr_test.cc.o: in function `asn1::rrc_nr::setup_release_c<asn1::rrc_nr::pdcch_serving_cell_cfg_s>::set_setup()':
[build] /home/anpu/src/srsLTE/lib/include/srsran/asn1/rrc_nr.h:2276: undefined reference to `asn1::rrc_nr::setup_release_c<asn1::rrc_nr::pdcch_serving_cell_cfg_s>::set(asn1::rrc_nr::setup_release_c<asn1::rrc_nr::pdcch_serving_cell_cfg_s>::types_opts::options)'
[build] clang: error: linker command failed with exit code 1 (use -v to see invocation)
the do_status is queried from the Tx code frequently. To reduce
chances to delay the execution because the RLC Rx side is currently
holding the mutex we can use an atomic.
the patch uses try-lock whenever a status PDU is tried
to be built. This makes sure that when the lock is currently
hold (e.g. by a thread processing rx PDUs) the generation
of the status PDUs is not taking too long and blocking the calling
thread. Instead the status PDU generation is deferred to the next
Tx opportunity.
It's a probabilistic approach that assumes that at some stage the
lock can in fact be acquired.
on highly loaded systems it can happen that the get_metrics() is called
twice within a few houndred milliseconds. Logging a warning in this
case isn't needed, so reduce to info.
on the other hand, 100ms might be to convervative. Patch also
lowers the smallest interval to 10ms
* Protect PHY SR signal management in a class
* Protect intra_freq_meas vector
* Protect cell and srate shared variables in thread-safe classes
* srsue,srsenb: include TSAN options header
* Protect ue_rnti_t and rnti scheduling windows behind thread-safe classes
* Protect access to state variable in sync_state
* Protect access to metrics configuration
* Protect access to is_pending_sr
* Protect access to UE prach worker
* Protect UE mux
* Avoid unlocking mutex twice
* Fix data races in RF/ZMQ
* Fix data races in intra_measure and PHY
* Fix minor data races in MAC
* Make TSAN default behaviour to not halt on error
* Fix blocking in intra cell measurement
* Address comments
Co-authored-by: Andre Puschmann <andre@softwareradiosystems.com>
Introduce a new macro to catch UHD exceptions and log them directly instead of storing an error string, similar to what errno does.
Remove usrp logging helpers that depend on the now removed member since all calls potentially log the error directly.
RFCI has detected this assert failing in the log_backend_test. I have not been able to reproduce this locally but my theory is the following one:
one of the unit tests does the following:
backend.start();
backend.stop();
the internal running_flag member could be set to true and then to false by the main thread before the worker thread calls do_work(). If this happens
the assert will be triggered, which is wrong and too conservative, so remove the assert.
this is a rather large commit that is hard to split because
it touches quite a few components.
It's a preparation patch for adding NR split bearers in the next
step.
We realized that managing RLC and PDCP bearers for both NR and LTE
in the same entity doesn't work. This is because we use the LCID
as a key for all accesses. With NR dual connectivity however we
can have the same LCID active at the same time for both LTE and NR
carriers.
The patch solves that by creating a dedicated NR instance for RLC/PDCP
in the stack. But then the question arises for UL traffic on, e.g. LCID 4
what PDCP instance the GW should use for pushing SDUs. It doesnt' know
that. And in fact it doesn't need to. It just needs to know EPS
bearer IDs. So the next change was to remove the knowledge of what
LCIDs are from the GW. Make is agnostic and only work on EPS bearer IDs.
The handling and mapping between EPS bearer IDs and LCIDs for LTE
or NR (mainly PDCP for pushing data) is done in the Stack because
it has access to both.
The NAS also has a EPS bearer map but only knows about default and
dedicated bearers. It doesn't know on which logical channels they
are transmitted.
this patch mainly modernizes the bearer creation to use smart pointers.
that allows to simplify the error handling.
ue_stack is changed to match new interface. This commit compiles
but doesn't work.
when a lost PDU is detected a warning will be logged. In theory
this could be info as well but a warning may help to detect issues
in tests. The same event causes multiple other warnings to be logged,
which is very spammy. The patch reduces the log level for
those messages to info.
the patch is a re-implementation of the customer-specific optimization
we did in order to reduce the time the RLC holds the Tx mutex when
processing an incoming status PDU.
The patch makes sure to never operate on a raw mutex but instead
uses the deadlock-avoiding RAII lock.
before processing incoming status PDUs we should be checking
if the ACK_SN falls within our current Tx window. If not the PDU
will be dropped.
Without the check we were incorrectly processing the status PDU
and because the sequence number wrap around wasn't working
correctly if ACK_SN is smaller than vt_a we were corrupting
our Tx window.
the test verifies that the ACK_SN of a status PDU falls inside the
rx_window of the receiver. If not, than the RLC state has been
corrupted and the status PDU is likely invalid.
we had it returning int but had a bug in using the return value properly,
i.e. handling when -1 was returned in RLC TM.
Thinking about it more, it doesn't make sense to have a negative return
value here anyway. Either the RLC can return a PDU or not. If it can't the
returned lenght is zero.
when a small grant is provided it might not be possible to fit a full status
PDU. This is currently detected while packing the PDU.
In order to avoid sending potentiall contradicting status info to the sending
entity, the fix makes sure to only transmit a small PDU acking what really
has been received so far.
This might not be optimal in terms for retx but will not corrupt any
state.
it turned out that a certain order of events can lead to
a RLC transmitter stalling because even though unacknowledged PDUs
are queued, none of them was actually considered for retx.
This can happen if a pollRetxTimer expires for a SN that, meanwhile,
has already been acknowledged. The positive lead to the deletion of
the SN from the Tx window.
The fix makes sure that when a retx for a unexisting SN is requested,
the sender will consider the next unacknowledged SN instead.
TSAN doesn't work well then threads are created with attributes
thar require root rights but the process is run as normal user.
this patch avoid the thread attributes in this case. TSAN isn't going
to be used for production builds.
although the manual test with Amarisoft eNB worked fine it seems
the delay is still needed in the default case. Over 50% of the
tests failed in the nightly with:
[zmq] Error: tx time is 0.067 ms in the past (138240 < 139776)
[zmq] Error: tx time is 1.100 ms in the past (184320 < 209664)
While this usleep() should increase the pass likelihood it
still doesn't guarantee error-free runs, so we might need
to revisit it again as some stage.
the thread workers need access to their current state to exit properly
when they are set to state STOP. However, since the state is kept in
a std::vector for all workers, it seems more appropiate to add a per-thread
running variable rather then mutexing the entire vector.
it seems that different SNs should be retransmitted depending
on the actual situation. In case of pollRetx timer expiration,
vt_s - 1 should actually be resent.
This patch prepares the function to accept different SNs but
leaves it to send vt_a by default. The RLC AM test would need
to changed as well to not fail.
- Replace shared_variable members in log_channel class in favor of atomics.
- Remove the small string optimization in srslog now that we dont allocate anymore.
- Trim some critical sections in srslog.
also make sure we don't assign LCIDs beyond the possible
number.
possible fix for https://github.com/srsran/srsRAN/issues/658
Co-authored-by: herlesupreeth <herlesupreeth@gmail.com>
Co-authored-by: Francisco <francisco.paisana@softwareradiosystems.com>
if RLC PDUs with certain PDCP SNs were already in flight
when their discard timer at PDCP expires, they weren't
remove from the undelivered_sdu_info_queue causing
indesired log entries like:
08:08:52.455280 [RLC ] [W] PDCP_SN=2103 already marked as undelivered
when the SN was sent again (after wrap around).
The patch makes sure to always try to delete the PDCP SN
from the queue. It should fix (at least partly) #2560
previously the warning was printed when a Lime was connected to the PC.
Now all connected devices are printed but the warning is only
shown if the selected device is the Lime.
- use const char* instead of std::string in enumerated<>::to_string() to avoid mallocs.
- Remove the use of "typedef", and use "using" keyword instead.
- Fix rrc_nr::setup_release_c<>::to_string() broken linkage.
this commit adds a complete DL HARQ entity to the MAC of the UE.
It also refactors demux into an own class and adapts the PHY-MAC
interface to use the new MAC capabilities.
avoiding any allocations.
Each segment stores its own PDCP SN and RLC SN and has two pointers,
one for the next segment of the same RLC PDU, and another for the next segment
of the same PDCP PDU.
Previously since a char* can have any length, this was managed by FMT by converting it into a std::string.
Now we store it into a configurable size node that can store a fixed size string, otherwise it falls back to std::string.
the current value of 256 limits the number of PDCP SDUs that can be
notified from RLC. The limit is quickly hit when too many SDUs
are in flight. This can cause unwanted log entries and weird PDCP
behaviour.
the patch increases the value to 1024, which still can be too few if
many smaller SDUs are traveling.
The patch also set the log level to warning to quicker spot
misconfigs in logs.
Fixes#2616
the created RLC PDU was too large to fit inside the MAC grant
because only the header room for the short L field was used.
The patch determines the correct size before passing the opportunity to RLC.
It also improves logging in error case by using the MAC logger instead of
stderr/stdout when error occurs.
building on the previous refactor this patch now adds support
for peridoic BSR reporting (using short BSR). It furthermore does
the following changes:
* add BSR packing
* add proc_bsr_nr unit test
* move mac_nr test code into test folder under src (needs to be done with other test code too)
previously we were only counting retx if we retx the start of a segment.
this could lead to unwanted behaviour, i.e., not counting retx
correctly and thus not triggering the maxretx attempt, if the receive
always sends NACKs with a SO_start.
The RLC spec is not clear on how this should be handled correctly but
IMHO using an integer number of retx is reasonable, even for segments
that might be retransmitted more often.
The alternative of using a fractional retx counter that may be increamented
proportional to the segment size that is retx is another alternative
but considered too complex to implement (and test correctly).
- use of inheritance to simplify testing
- removal of global network manager
- pass of custon socket manager to s1ap and gtpu ctors
- overhauled the registration of socket fd,callback in socket manager
- Implement a common event "log_rrc" for all RRC events and discriminate by procedure using an enum.
- Log events for connection, reestablishment, reconfig, reject and release.
- Log the corresponding ASN1 message used by each procedure.
- Redefine the JSON object for this event to match the new structure.
Fixed a compilation error detected by the static analyzer in gcc9.3 where bounded_vector::data() was using taking the address of the internal buffer which confused it, prefer to use the data method of std::array.
This was done so it would work when circular buffer holds other things
that are not unique_pointers. Queue and pop_func had to be made public
to be able to call the pop_func when an SDU is discarded.
* Added ability to discard to dyn_block_queue
* Change way of keeping track of SDUs
* Check nullptr in poping callback. Starting to check for nullptr in RLC read_pdu.
* Adding RLC discard tests
* Clearing PDCP info when RLC discard happens
* Read SDUs until they are no longer nullptr
* Changed discard_if to use template argument
we've seen a heap-buffer overflow in fmt because printf wasn't using
the right formtter for size_t, which should be %zu
this patch fixes it for the PDCP LTE entity but we might have it elsewhere too
[1m[31m==7595==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000e6f1fc at pc 0x562273a45289 bp 0x7f35567641f0 sp 0x7f35567641e0
[1m[0m[1m[34mREAD of size 4 at 0x629000e6f1fc thread T12 (STACK)[1m[0m
0 0x562273a45288 in fmt::v7::basic_format_arg<fmt::v7::basic_printf_context<std::back_insert_iterator<fmt::v7::detail::buffer<char> >, char> > fmt::v7::detail::make_arg<fmt::v7::basic_printf_context<std::back_insert_iterator<fmt::v7::detail::buffer<char> >, char>, unsigned int>(unsigned int const&) (/osmo-gsm-tester-srsue/srslte/bin/srsue+0x9dc288)
1 0x562273a3aa86 in void fmt::v7::dynamic_format_arg_store<fmt::v7::basic_printf_context<std::back_insert_iterator<fmt::v7::detail::buffer<char> >, char> >::emplace_arg<unsigned int>(unsigned int const&) (/osmo-gsm-tester-srsue/srslte/bin/srsue+0x9d1a86)
2 0x562273a308e7 in void fmt::v7::dynamic_format_arg_store<fmt::v7::basic_printf_context<std::back_insert_iterator<fmt::v7::detail::buffer<char> >, char> >::push_back<unsigned int>(unsigned int const&) /mnt/data/jenkins/workspace/srslte_ogt_trial_builder_x86-ubuntu1804-asan/srsLTE/lib/include/srslte/srslog/bundled/fmt/core.h:1548
3 0x562274361541 in void srslog::log_channel::operator()<unsigned int&, unsigned int&, unsigned long>(char const*, unsigned int&, unsigned int&, unsigned long&&) /mnt/data/jenkins/workspace/srslte_ogt_trial_builder_x86-ubuntu1804-asan/srsLTE/lib/include/srslte/srslog/log_channel.h:101
4 0x56227430d9e7 in srslte::pdcp_entity_lte::update_rx_counts_queue(unsigned int) /mnt/data/jenkins/workspace/srslte_ogt_trial_builder_x86-ubuntu1804-asan/srsLTE/lib/src/upper/pdcp_entity_lte.cc:451
the patch refactor the logging when a new PDCP SDU is enqueued for
transmission at RLC.
If the SN is already present, only a warning is logged. From the RLC
perspective operation continues and the SDU will be transmitted.
The patch also changes the order of logs. When the SN cannot be inserted
inside the queue of undelivered SDUs, only one message is logged.
undeliverd_sdus_queue. Added a queue to store information about
rx_counts received.
Added unit test for when the SNs wrap-around in status report genaration
this patch replaces the std::vector type used in the interface between
PDCP and RLC to signal delivery status between both layers. The new
data type is a configurable but fixed-size vector.
The RLC AM doesn't need to dynamically allocate the vector for every SN but
uses the tx_window for storage.
this patch fixes the actions/handling after RLC detected
maxRetx reached for a given SN.
According to the TS, RLC should only inform upper layers and
not try to recover from the event itself.
As a consequence, we won't manipulate the Tx or Rx window.
As a result of this, we might retransmit a SN more than
the specified amount of times.
It's the task of RRC to reestablish the bearer to recover
from that.
* optimize processing of status PDU (SN is removed from window immediately)
* fix maxRetx signaling for segments
* make tx_window_t a template class, rename and use for rx_window as well
this fixes an issue when the RLC bearer isn't reset from RRC.
In this case, the RLC would retransmit the same PDU over and over
again despite the max retx counter being reached.
inspired by accepted (but not yet merged) PR to include the
(unique_)byte_buffer_t for MAC/PHY interfacing, this patch adds
a few more useful bits to that. Buffer management for UL data is now
done in MAC and only a pointer to the data is passed in the UL action.
* Move Tx softbuffer to MAC (until UL HARQ class is ready)
* Remove temparal data member in cc_worker
* Remove memcpy after packing MAC PDU
* Added interfaces for the RLC to notify the PDCP of failure to transmit
SDU
* Limit discard timer to 1500ms, to avoid issues of lingering SDUs in the undeliverd_sdus_queue.
* Fix bug in early exit of notify_delivery and notify_failure
* fix compilation issue in rlc-pdcp notification
Co-authored-by: Francisco <francisco.paisana@softwareradiosystems.com>
* reuse vector capacity for pdcp sn notification
* use an extra lookup data structure to find PDCP SNs that an RLC SN contains
* fix rlc sn->pdcp sn lookup datastructure in rlc
* fix rlc failing test
this patch adds two new config flags to the ZMQ driver that allows to:
* configure the default ZMQ trx timeout in ms
* turn on error logging if the timeout occurs
Use with, e.g.:
device_args = log_trx_timeout=true,trx_timeout_ms=3333
Removing unecessary \n for logs and changed log of PDCP info queue
capacity to debug to avoid log spam.
Changed log level for unhandled S1AP messages from error to warning
in EPC to avoid failing tests because of error message.
Changed usage of allocate_unique_buffer to make_unique_buffer()
This includes:
- Adding a queue (implemented with std::map) for undelivered PDUs.
This queue uses the SN used for TX as the key.
- Added discard timer that is started upon reception of the SDU. Upon
expiry of the timeout a discard callback removes undelivered PDUs
from the queue.
- Added the mechanisms to the notify_delivery to remove PDUs from the
undelivered queue when the PDU is ACK'ed.
- Added test case for both timer expiry and acknowledgment.
- Fix up the getter for buffered SDUs to return the undelivered SDUs
- Changed default PDCP discard timer, so AM has a discard timer by
default.
test. The race condition was being cause by write_sdu being called
simultanously to read_pdu, which could cause the read_pdu to try to get
the SDU info before it had been written by the write_sdu.
Changes in this commit include:
- Make sure PDCP sn in included in RLC AM stress test.
- Stop handling control PDU when TX is not enabled in RLC AM.
- Fixed issue with length of the PDCP SN in rlc_stress_test.
- Moved the place were sdu info erase was called to avoid double calls to erase
- Tentative fix for race condition in rlc_am_stress test.
- Added function to print information about undelivered_sdu_info_queue
for debugging.